OpenITI mARkdownMSS: Basics of the Format

A Technical Note: the highlighting scheme for mARkdownMSS is a part of the new mARkdownSimple scheme, which will replace the initial mARkdown. The scheme is activated by the magic value #OpenITI#

The latest highlighting scheme for EditPad Pro is available here: https://github.com/OpenITI/mARkdown_scheme

The implementation of this scheme in development can be found here: https://github.com/OpenITI/mARkdownMSS; for examples of automatically generated editions, see https://openiti.github.io/mARkdownMSS/.

Main Text


#=# for a diplomatic transcription
#~# for an edited version

#*# for a comment
#+# for an insertion into the text

For each line of MSS text there must be two lines of transcription, which can be complemented with additional lines—one for comments, footnotes, etc.; another—for insertions of any kind.

  1. #=# is the diplomatic transcription aimed at representing the text as close as possible to the witness; ideally, a transcriber should maintain a separate record to keep track of how specific cases are handled (this comment can be stored in the metadata head of the text file)
  2. #~# is for an edited version with all the corrections that an editor deems relevant; this version should include structural and analytical tags
  3. #*# is for any kind of annotation or comment can be added through this line, using A[nchor tag]
    1. can be used to annotate a range of text or a specific location
    2. Range example:
      • #~# A33 this is the range of text to annotate A33
      • #*# A33 :: This is a text of a comment to the range marked with the anchor tag A33
    3. Specific location example:
      • #~# a comment will be for this word A33
      • #*# A33 :: there is really nothing special about this word
  4. #+# is for any kind of insertions that the editor deems necessary
    1. Example of an insertion:
      • #=# The butcher, the baker, the A55 maker
      • #+# A55 :: candlestick

Multiple lines would look like shown below. Note an empty line between two-line blocks.


#=# for a diplomatic transcription
#~# for an edited version

#=# for a diplomatic transcription
#~# for an edited version

#=# for a diplomatic transcription
#~# for an edited version

Comment and insertion lines can be added either at the end of the document, or, perhaps more conveniently, right after the lines that they are connected to. Each one of these lines must be separated from any other line by an empty line. For example:


#=# for a diplomatic transcription
#~# for an edited version

#*# a comment

#+# an insertion

#=# for a diplomatic transcription
#~# for an edited version

#=# for a diplomatic transcription
#~# for an edited version

#*# a comment

#+# an insertion

or like this:


#=# for a diplomatic transcription
#~# for an edited version

#=# for a diplomatic transcription
#~# for an edited version

#=# for a diplomatic transcription
#~# for an edited version

#*# a comment

#+# an insertion

#*# a comment

#+# an insertion

A note: a file with mARkdown text will be automatically generate from the mARkdownMSS — this file is then to be used for any kind of computational analysis. The files will maintain different extensions: .mARkdown and .mARkdownMSS respectively.

Marginal Notes

For transcribing marginal notes, use the same approach, but with the following beginning-of-the-line tags: #==# & #~~# (i.e., the middle element is repeated twice).

Using Anchor Tags

Structural Tags

Structural tags include headers, beginnings of serial units (like biographies in biographical dictionaries; individual ḥadīŧs in ḥadīṯ collections, etc.), paragraph breaks.

Headers

Since headers are likely to span multiple lines, the following tag should be used:

Serial units

Examples of serial units: the same as in mARkdown, but without $, just English letters: BIO, BIO_MAN, BIO_WOM, etc. These tags are to be used when it is necessary to mark just the beginning of a serial unit; its end will be automatically determined by the beginning of the next serial unit or a header.

§-breaks

Poetry

Additional elements

Punctuation

Punctuation can be added to the edited version of the text. Simply add it wherever you want to have it. Keep in mind, “punctuation” cannot contain alpha-numeric characters. For these cases, use A[nchor] tag for insertions.

Folio Tags

Lacunae

=== is to be used to indicate an unclear, illegible, or missing section in the original scan. Use sets of three periods up to the approximate amount of illegible text. For example, if three words are illegible, you should insert: === === ===

Named Entities (persons, toponyms, etc.)

You can tag the same types of entities which are included in mARkdown, however, since the structure of mARkdownMSS documents is different, the tags are also slightly different: 1) small letters instead of capital (e.g., @p02 instead of @P02, as in mARkdown); 2) the entity will not be highlighted like in mARkdown (because of the mARkdownMSS structure, an entity may be split between multiple lines, which makes highlighting impossible);

Note: upon conversion to mARkdown, these tags (@p01 and @t01) will be updated into standard ones (@P01 and @T01) and color highlighting will work in the mARkdown document.

Graphic Elements from MSS

In-Text Elements: Separator Tags

You can use separator tags to indicate in-text graphical elements (non-alphanumeric separator) in the text of your transcription. The structure of the tag is: SEPX, where X is a number. Each separator tag must indicate one type of separators. For example, if you have circle separators and three-dot separators, you can use SEP1 to indicate one and SEP2 to indicate the other. This tag is to be used in the diplomatic transcription only.

For each separator tag there should be a corresponding image of a separator cropped from the original image. Ideally, the original resolution should be preserved. Each image must be named with the code used in the text (for example, if there are SEP1, SEP2 in the text, there must be corresponding images: SEP1.jpg, SEP2.jpg); this approach will allow us to use original separators in the final version (images can be converted into black and white and “schematized” in order to better fit into the text).

Original Image Schematized Image
[[SEP1.jpg]] [[SEP1_BW.png]]
[[SEP2.jpg]] [[SEP2_BW.png]]
[[SEP3.jpg]] [[SEP3_BW.png]]
[[SEP4.jpg]] [[SEP4_BW.png]]

Figures, Miniatures, Images

You can use a figure tag in order to indicate a graphical element like a figure, miniature, image. The tag had the following structure: FIG_LLRR_HI_NNN where:

Place this tag after the line of text where non-alphanumeric signs or figures appear—not in the text line itself. For each figure there should be a corresponding image (of a figure cropped from the original image). Ideally, the original resolution should be preserved. Each image must be named with the code used in the text (for example, if there are FIG1, FIG2 in the text, there must be corresponding images: FIG1.jpg, FIG2.jpg); this approach will allow us to use original figures in the final version (images can be converted into black and white and “schematized” in order to blend better with text)

Original Image Schematized Image
[[FIG1.jpg]] [[FIG1_BW.png]]
[[FIG2.jpg]] [[FIG2_BW.png]]
[[FIG3.jpg]] [[FIG3_BW.png]]

Figure Tag Examples::

Line scale examples (LLRR):

  1. 0004: left third
texttexttexttexttexttex
texttexttexttexttexttex
X X X X texttexttexttex 
X X X X texttexttexttex
X X X X texttexttexttex
X X X X texttexttexttex
texttexttexttexttexttex
texttexttexttexttexttex
  1. 0912: right quarter
texttexttexttexttexttex
texttexttexttexttexttex
texttexttexttextt X X X 
texttexttexttextt X X X
texttexttexttextt X X X
texttexttexttextt X X X
texttexttexttexttexttex
texttexttexttexttexttex
  1. 0012: full width
texttexttexttexttexttex
texttexttexttexttexttex
X X X X X X X X X X X X 
X X X X X X X X X X X X
X X X X X X X X X X X X
X X X X X X X X X X X X
texttexttexttexttexttex
texttexttexttexttexttex
  1. 0408: a third in the middle
texttexttexttexttexttex
texttexttexttexttexttex
texttex X X X X texttex 
texttex X X X X texttex
texttex X X X X texttex
texttex X X X X texttex
texttexttexttexttexttex
texttexttexttexttexttex

Issues

Notes

Up to date

There is a whole second set of issues that Maxim is going to go through and create a script to automatically fix. Most of these are issues or modifications that we thought needed to be made to make the OpenITI mARkdown schema clearer or more precise. Actually, working with you on this project really helped us think through several of these issues. Below are the issues that Maxim is going to write a script to automatically fix:

  1. We realized that mARkdown highlighters only work in non-vocalized text so Maxim is going to automatically generate a non-vocalized version of your edited/corrected lines. This line will be marked with: #m#
  1. On a related note, we are going to change the line tags to #d# (diplomatic transcription), #e# (edited/corrected transcription with vocalization), and #m# (edited/corrected transcription with mARkdown and without vocalization) for the sake of clarity.
  2. The file names should be the same as the URI names in the metadata. Maxim is going to fix a few issues in the URIs you provided and then re-name the file names with them. 
  3. For some reason there are a lot of extra spaces in the document so Maxim is going to strip those out.
  4. The additional two transcriptions below the mARkdown documents are not necessary. We only included them in the sample file for demonstration sake. Those two renders of the text can be easily reproduced from the mARkdown text itself.
  5. Finally, Maxim is going to change the text anchor name from Q to A for clarity.

Checklist

Issues

Inconsistencies