A Technical Note: the highlighting scheme for mARkdownMSS
is a part of the new mARkdownSimple
scheme, which will replace the initial mARkdown
. The scheme is activated by the magic value #OpenITI#
The latest highlighting scheme for EditPad Pro is available here: https://github.com/OpenITI/mARkdown_scheme
The implementation of this scheme in development can be found here: https://github.com/OpenITI/mARkdownMSS; for examples of automatically generated editions, see https://openiti.github.io/mARkdownMSS/.
#=# for a diplomatic transcription
#~# for an edited version
#*# for a comment
#+# for an insertion into the text
For each line of MSS text there must be two lines of transcription, which can be complemented with additional lines—one for comments, footnotes, etc.; another—for insertions of any kind.
#=#
is the diplomatic transcription aimed at representing the text as close as possible to the witness; ideally, a transcriber should maintain a separate record to keep track of how specific cases are handled (this comment can be stored in the metadata head of the text file)#~#
is for an edited version with all the corrections that an editor deems relevant; this version should include structural and analytical tags
#*#
is for any kind of annotation or comment can be added through this line, using A[nchor tag]
#~# A33 this is the range of text to annotate A33
#*# A33 :: This is a text of a comment to the range marked with the anchor tag A33
#~# a comment will be for this word A33
#*# A33 :: there is really nothing special about this word
#+#
is for any kind of insertions that the editor deems necessary
#=# The butcher, the baker, the A55 maker
#+# A55 :: candlestick
Multiple lines would look like shown below. Note an empty line between two-line blocks.
#=# for a diplomatic transcription
#~# for an edited version
#=# for a diplomatic transcription
#~# for an edited version
#=# for a diplomatic transcription
#~# for an edited version
Comment and insertion lines can be added either at the end of the document, or, perhaps more conveniently, right after the lines that they are connected to. Each one of these lines must be separated from any other line by an empty line. For example:
#=# for a diplomatic transcription
#~# for an edited version
#*# a comment
#+# an insertion
#=# for a diplomatic transcription
#~# for an edited version
#=# for a diplomatic transcription
#~# for an edited version
#*# a comment
#+# an insertion
or like this:
#=# for a diplomatic transcription
#~# for an edited version
#=# for a diplomatic transcription
#~# for an edited version
#=# for a diplomatic transcription
#~# for an edited version
#*# a comment
#+# an insertion
#*# a comment
#+# an insertion
A note: a file with mARkdown
text will be automatically generate from the mARkdownMSS
— this file is then to be used for any kind of computational analysis. The files will maintain different extensions: .mARkdown
and .mARkdownMSS
respectively.
For transcribing marginal notes, use the same approach, but with the following beginning-of-the-line tags: #==#
& #~~#
(i.e., the middle element is repeated twice).
A + digits
lineNumber-1
, lineNumber-2
etc. The key point is that these numbers are not repeated and each anchor tag is unique.#*# A33 :: your comment
#+# A55 :: your insertion
#*#
or #+#
Structural tags include headers, beginnings of serial units (like biographies in biographical dictionaries; individual ḥadīŧs in ḥadīṯ collections, etc.), paragraph breaks.
Since headers are likely to span multiple lines, the following tag should be used:
HL_XX
, where:
H
is the indication of headerL
is the level of the headerXX
is the unique number, for which, like with A[nchor] tags, you can use the number of the line in the mARkdown
document.H1_22 Text of the header H1_22
, is the header of level 1, which starts on line 22.Examples of serial units: the same as in mARkdown, but without $
, just English letters: BIO
, BIO_MAN
, BIO_WOM
, etc. These tags are to be used when it is necessary to mark just the beginning of a serial unit; its end will be automatically determined by the beginning of the next serial unit or a header.
BR
(just these two capital letters) where you would want to have a paragraph break.BR
can also be used for poetry lines, when they are not written in the verse-per-line format; in such cases BR
is to be inserted wherever a new line should start, and %
inserted between hemistiches or at the beginning of a hemistich, if there is only one.SEP
used in the manuscript to separate hemistiches, please add it in the edited line of text after %
Punctuation can be added to the edited version of the text. Simply add it wherever you want to have it. Keep in mind, “punctuation” cannot contain alpha-numeric characters. For these cases, use A[nchor] tag for insertions.
FolioV00F000A
and FolioV00F000B
are tags for folio numbers; the final A
(recto/wajh) and B
(verso/ẓahr) stand for the front and back pages, respectively, where:
V00
is the volume number; use V1
if your manuscripts only has 1 volume; use V01
, V02
, etc., if there are more than 10 volumes in the actual manuscript.F0000
is the folio number, where the length of the number should correspond to the length of the largest number of folios in MSS. That is: is there are 500 folios, this part of the tag should looks like: F001
, F002
, … F020
, etc. If there are 80 folios, the tag can looks like: F01
, F02
, etc.V00
and F00
elements must be of the same length throughout the transcription.mARkdownMSS
document at the beginning of a manuscript folio — on the separate line (with an empty line before and an empty line after) right before the first line of text on that folio.===
is to be used to indicate an unclear, illegible, or missing section in the original scan. Use sets of three periods up to the approximate amount of illegible text. For example, if three words are illegible, you should insert: === === ===
You can tag the same types of entities which are included in mARkdown
, however, since the structure of mARkdownMSS
documents is different, the tags are also slightly different: 1) small letters instead of capital (e.g., @p02
instead of @P02
, as in mARkdown
); 2) the entity will not be highlighted like in mARkdown
(because of the mARkdownMSS
structure, an entity may be split between multiple lines, which makes highlighting impossible);
@pYX
tags (note the small letter p
), where Y
is the length of a prefix in characters, while X
is the length of the entity in words.
@p03
right in front of the name: @p03 Muḥammad bn Aḥmad
; if you are tagging the phrase wa-Muḥammad bn Aḥmad, the 0
in the tag should be changed to 1
to indicate wa-
: @p13 wa-Muḥammad bn Aḥmad
@tYX
Note: upon conversion to mARkdown, these tags (@p01
and @t01
) will be updated into standard ones (@P01
and @T01
) and color highlighting will work in the mARkdown
document.
You can use separator tags to indicate in-text graphical elements (non-alphanumeric separator) in the text of your transcription. The structure of the tag is: SEPX
, where X
is a number. Each separator tag must indicate one type of separators. For example, if you have circle separators and three-dot separators, you can use SEP1
to indicate one and SEP2
to indicate the other. This tag is to be used in the diplomatic transcription only.
For each separator tag there should be a corresponding image of a separator cropped from the original image. Ideally, the original resolution should be preserved. Each image must be named with the code used in the text (for example, if there are SEP1
, SEP2
in the text, there must be corresponding images: SEP1.jpg
, SEP2.jpg
); this approach will allow us to use original separators in the final version (images can be converted into black and white and “schematized” in order to better fit into the text).
Original Image | Schematized Image |
---|---|
[[SEP1.jpg]] | [[SEP1_BW.png]] |
[[SEP2.jpg]] | [[SEP2_BW.png]] |
[[SEP3.jpg]] | [[SEP3_BW.png]] |
[[SEP4.jpg]] | [[SEP4_BW.png]] |
You can use a figure tag in order to indicate a graphical element like a figure, miniature, image. The tag had the following structure: FIG_LLRR_HI_NNN
where:
FIG
is a figure indicator (does not change)._LLRR
is the horizontal dimension of the image (on the dozen scale), where LL
is the left margin of an image and RR
is the right one; the scale is from 0 to 12 (as it allows to have: a half, a third, a quarter):
_0006
means that the image takes left half of the page; _0012
- the image takes the full width; _0612
- the right half; _0004
- left third; _0812
- right third. (see “Line scale examples (LLRR
)” below for more details)._HI
is a number indicating the height of the image in lines._NNN
is the unique number of the image (you can use the folio number + a letter, if there are more than one image on that folio, i.e. 1, 2, 3a, 3b, etc.)Place this tag after the line of text where non-alphanumeric signs or figures appear—not in the text line itself. For each figure there should be a corresponding image (of a figure cropped from the original image). Ideally, the original resolution should be preserved. Each image must be named with the code used in the text (for example, if there are FIG1
, FIG2
in the text, there must be corresponding images: FIG1.jpg
, FIG2.jpg
); this approach will allow us to use original figures in the final version (images can be converted into black and white and “schematized” in order to blend better with text)
Original Image | Schematized Image |
---|---|
[[FIG1.jpg]] | [[FIG1_BW.png]] |
[[FIG2.jpg]] | [[FIG2_BW.png]] |
[[FIG3.jpg]] | [[FIG3_BW.png]] |
Figure Tag Examples::
FIG_0012_05_10
is an image that takes the full width of the text block, 5 lines in height and appears on folio 10.FIG_0006_10_100b
is an image that takes left half of the text block, is 10 lines in height, and is the second (b) image that appears on folio100FIG_0612_2_5
is the image that takes the right half of the text block, is 2 lines in height and appears on folio 5.Line scale examples (LLRR
):
0004
: left thirdtexttexttexttexttexttex
texttexttexttexttexttex
X X X X texttexttexttex
X X X X texttexttexttex
X X X X texttexttexttex
X X X X texttexttexttex
texttexttexttexttexttex
texttexttexttexttexttex
0912
: right quartertexttexttexttexttexttex
texttexttexttexttexttex
texttexttexttextt X X X
texttexttexttextt X X X
texttexttexttextt X X X
texttexttexttextt X X X
texttexttexttexttexttex
texttexttexttexttexttex
0012
: full widthtexttexttexttexttexttex
texttexttexttexttexttex
X X X X X X X X X X X X
X X X X X X X X X X X X
X X X X X X X X X X X X
X X X X X X X X X X X X
texttexttexttexttexttex
texttexttexttexttexttex
0408
: a third in the middletexttexttexttexttexttex
texttexttexttexttexttex
texttex X X X X texttex
texttex X X X X texttex
texttex X X X X texttex
texttex X X X X texttex
texttexttexttexttexttex
texttexttexttexttexttex
There is a whole second set of issues that Maxim is going to go through and create a script to automatically fix. Most of these are issues or modifications that we thought needed to be made to make the OpenITI mARkdown schema clearer or more precise. Actually, working with you on this project really helped us think through several of these issues. Below are the issues that Maxim is going to write a script to automatically fix:
#=#
) and edited (#~#
); Reason: mARkdown can be automatically generated from edited
; using English letters messes things up the visual directionality (it works ok on EditPadPro, but will not work in Kate, which we want to adopt as it works natively on Mac and Linux, and is open source)#mARkdownAUTOGENERATED###########
, which will be regenerated every time the script is run; it can also be placed into a separate file, where only mARkdown is used.1280CumarIbnSayyid.AyatTaylor.SCHAD2-ara1