Text Encoding

TEI records are for Text encoding, and this page deals with any text encoded in any type of record. This page does not constitute a prescriptive list of features to be encoded when working with text. It simply lists the possibilities of text encoding employed until now in the project. It is always up to the encoder to decide which of these features they want to encode in a particular record.

While text can be contained in all kinds of records, large portions of text will typically be encoded in manuscript and work records. We record in manuscript records the transcription of the text of the catalogued manuscript, when available, whilst work records contain the edited text or its theoretical structure. Sometimes however, a text does not clearly fall into one of these two groups and it is not immediately clear whether it should be encoded in a manuscript or work record. This can then be decided on a case by case basis by the encoder (an issue can always be opened to discuss the best solution). It is always crucial to mention the source of the text whenever this might not be completely obvious. Here are some guiding principles for this decision:

  1. The (diplomatic) transcription of a manuscript, which represents the text with all its features as it is in that particular witness, belongs to the manuscript record. One manuscript may contain, from different sources, more than one transcription of the same manuscript.
  2. An edition of a work, i.e. a critical edition, based on one or more manuscripts (transcriptions) belongs to the work record. This is structured as the text requires, in chapters, verses, etc., and may contain an apparatus criticus.
  3. If there is text available that would belong in a manuscript record, but there is no record planned for this manuscript or its encoding is not planned, this text might also be included in the relevant work record. It is particularly important to mention the source of the text in these cases.
  4. The same is true for text of manuscripts in the process of being catalogued which might not be made available soon. It might then be useful for all to make the transcribed texts such as incipits and explicits available in the work records.
  5. It is not necessary to copy a manuscript transcription to the work record (unless the transcription is the starting point for a critical edition which will be carried out in the work record) simply to have it in both places. On the web application, manuscript witnesses of works can easily be retrieved.
  6. In such cases, judgment might be applied in deciding which markup pertaining to the text's features in a manuscript witness will be preserved in the work record.

In both manuscript and work records, text is entered in the <body> of a TEI file in a <div type="edition"> . Each text part should be contained in a <div>. <div>s should always be well nested one into the other. These will always take a @type with textpart and a @subtype with the desired value. There should always be also a @n identifying, for example, chapter 3 as well as a @corresp pointing to the appropriate <msPart> or <msItem> in the case of manuscript transcriptions.



<div type="textpart" subtype="chapter" n="7" corresp="#ms_i3" xml:id="chapter7" xml:lang="gez">
    <ab>
      <l n="1">በቀዳማዊ፡ ዐመተ፡ መንግሥቱ፡ ለበልጣሰር፡ ንጉሠ፡ ከለዴዎን፡ ሐለመ፡ ዳንኤል፡ ሕልመ፡ ወርእየ፡
        ርእሶ፡ በውስተ፡ ምስካቡ፡ ወጸሐፈ፡ ሕልሞ። </l>
      <l n="2">አነ፡ ዳንኤል፡ ሐለምኩ፡ ወርእኩ፡ አርባዕተ፡ ነፋሳተ፡
        ነፍኁ፡ ውስተ፡ ባሕር፡ ዐቢይ፡ </l>
      <l n="3">ወዐርጉ፡ አራዊት፡ አርባዕቱ፡ ዐበይት፡ እምባሕር፡ ኅቡረ፡ ወቀዳማዊ፡ </l>
      <l n="4">ከመ፡ አንበሳዊት፡ ወባቲ፡ ክንፈ፡ ወክንፊሃኒ፡ ከመዝ፡ ንስር፡ ወርኢኩዋ፡ ውስከ፡ ተመልኀ፡ ክንፊሃ፡
        ወተንሥአት፡ ወቆመት፡ በእግረ፡ ሰብእ፡ ዲበ፡ ምድር፡ ወልበ፡ ሰብእ፡ ተውህበ፡ ላቲ።</l>
    </ab>
  </div>

Example 1

Note that after each word there should be a separator or punctuation without space.

<locus> should not be used in <div type="edition">. When transcribing the text of a manuscript in the manuscript record, the elements described below such as <pb> or <cb> should be used as far as possible to state the location of the text within the manuscript, whether the entire manuscript or parts of it are transcribed. Sometimes, the indications given by catalogues are too vague to allow for such precise encoding. In this case, the location of excerpts should be stated in a <note> in the corresponding <msItem>, referring the excerpt with its @xml:id, such as in the following example from BnF Éthiopien 117:


               <msItem xml:id="ms_i1.3">
                  <locus from="63rb" to="69vb"></locus>
                  <title type="complete" ref="LIT1954Mashaf#Treatise3" xml:lang="gez">ቀዳሚ፡ ነገር፡ በእንተ፡ ምስጢረ፡ መለኮት።</title>
                  <incipit xml:lang="gez">
                     ኢሀሎ፡ አብ፡ ቅድመ፡ ወልድ። ወኢሀሎ፡ ወልድ፡
                  </incipit>
                  <note>This treatise contains the excerpt <ref target="#CommentaryHexameron"></ref> on  <locus target="#64v"></locus>.</note>
               </msItem>
            

Example 2


               <div type="textpart" xml:lang="gez" corresp="#ms_i1.3" xml:id="CommentaryHexameron">
                  <ab>
                     ወእምድኅረ፡ ዝንቱ፡ ነገር፡ አውስአ፡ ብርሃናዊ፡ ወይቤሎ፡ ለአቡየ፡
                     በኀይለ፡ ሚካኤል። ስማዕ፡ እንግርከ፡ በም<supplied reason="undefined" resp="PRS10747Zotenbe">ሥ</supplied>ጢር፡
                     ፍካሬ፡ ፍጥረተ፡ ሰማይ፡ ወምድር።
                  </ab>
               </div>
            

Example 3

If the text contains biblical verses, we will use <l> as in the example above. Also <lb> can be used to mark linebreaks. Be careful: <ab> should be only in the lowest level div nesting!

Text Diplomatic Transcription features

You may have already a lot of text from a transcription, which you want to further encode for specific features.

Please, note that these elements can be used everywhere you have text, for example, in additions, incipits or explicits.

If any of the features listed here are unclear to you or use unfamiliar terminology, please open an issue or be in touch otherwise and ask for clarification.

Text Structure

Use an empty <pb> for each page break. this takes a @n which will have the folio number, e.g. <pb n="41r"> If there are columns, use the element <cb> as in <cb n="a">


<div type="textpart" subtype="folio" n="10">
<ab>
  <pb n="10r"></pb>
    <cb n="a"></cb>እንዘ፡ ይብል፡ ካልእ፡ ክብሩ፡ ለፀሐይ፡ ወካልእ፡ ክብሩ፡ ለወርኅ፡ ወካልእ፡ ክብሮሙ፡ ለከዋክብት፨ ወኮከብ፡ እምኮከብ፡ ይኄይስ፡ ክብሩ፨ ከማሁኬ፡ ልብሰቶሙ፡ ለቅዱሳን፡ ከመ፡ ኮከብኒ፡ ወከመ፡ ወርኅኒ፡ ቦሙ፡ ልብሰት፨ ወበአምሳለ፡ ፀሐይኒ፡ ቦሙ፡ ዑጻፌ፡ ወለዝንቱሰ፡ ብፁዕ፡ ወቅዱስ፡ ምክሀ፡ ኵልነ፨ ልብሰተ፡ ክብሩ፡ ኢኮነ፡ ከመ፡ ካልአን፡ ኢከመ፡ ፀሐይ፡ ወኢካዕበተ፡ እምፀሐይ፡ ይበርህ፡ ምስብዒተ፡ ስነ፡ አልባሲሁ፡ ዘተኣንመ፡ በእደ፡ ኬንያ፡ ዘአልቦ፡ ዘይትማሰሎ፡ እምነ፡ ኪነውት፨
    <cb n="b"></cb>ብፁዕኬ፡ ዘረከበ፡ ክፍለ፡ ምስለ፡ ብፁዕ፡ <hi rend="rubric">ላሊበላ፡</hi> ወዘተዓፅፈ፡ ልብሰተ፡ ዑጻፌሁ፨ ወዘሰ፡ ኢተመሰሎ፡ በምግባሩ፡ ኢይነሥአ፡ ለክብሩ፨ ወዘሰ፡ ይፈቅድ፡ ይባእ፡ ኀበ፡ ቦአ፡
....

</ab>
</div>
               

Example 4

Hand Shifts

Where a new hand begins insert an element <handShift> with a @new pointing to the ID of the hand in <handNote>.

The following example comes from London, British Library, BL Oriental 719


                  <div type="edition" subtype="book" n="1" corresp="#BLorient719">
<ab>
                    ...
                  <cb n="b"></cb>
                 ጌልጌላሁ፡ ዘነበልባል፡ ወሰረገላሁ፡ መርዕድ። ዘያረምማ፡ ለማዕበለ፡ ባሕር፡ ወያዝኅና፡ ለሞገደ። ዘኢየኃልቅ፡ ምስፍናሁ፡ ለትውልደ፡ ትውልድ። ወይትዌዳዕ፡
                 <pb n="3v"></pb>
                 <handShift new="#h2"></handShift>
                 ለዛቲ፡ መጽሐፍ፡ ዘአጸሐፋ፡ አባ፡ አምሐ፡ በምግባረ፡ ሠናይ፡ ፍጹም፡ በእንተ፡ ፍቅሩ፡ ላሊበላ፡ ጻድቅ፡ ዘተሰምየ፡ ገብረ፡ መስቀል፡ ወከመ፡ ይኲኖ፡ መርሃ፡ ለመንግሥት፡ ሰማያት፡ ወእግዚአብሔር፡ ይጽሐፍ፡ ስሞ፡ ምስለ፡ ላሊበላ፡ ወያብኦ፡ ኀበ፡ ቦአ፡ ርስቱ፡ በሣህሉ፡ ወበምሕረቱ፡ አሜን። ዘንተ፡ መጽሐፈ፡ ዋሀበ፡ ለመካን፡ ጎልጎታ። እመቦ፡ ዘሰረቆ፡ ወዘሄደ፡ ወዘተዓገሎ፡ ወዘእውፅኦ፡ እመካኑ፡ ወዘአውሐሶ፡ ለሰብእ። ውጉዘ፡ ይኩን። በአፈ፡ ነቢያት፡ ወሐዋርያት። በአፈ፡ ጻድቃን፡ ወሰማዕት። በአፈ፡ ሚካኤል፡ ወገብርኤል፡ ወበክኵሎሙ፡ ማኅበረ፡ መላእክት። ወበአፈ፡ ጳጳሳት፡ ወሊቃነ፡ ጳጳሳት። ወበአፈ፡ እግዚእትነ፡ ማርያም፡ ጻዋ/ሪ/ተ፡ መላኮት። በአፈ፡ አብ፡ ወወልድ፡ ወመንፈስ፡ ቅዱስ፡ በዝ፡ ዓለም፡ ወበዘ፡ ይመጽእ፡ ዓለም፡ አሜን
                 <add place="bottom">
                 <handShift new="#h8"></handShift>
                 ።=። ሐሑሐ ሀሁሃ መሙሚ ሠሡሢ
                 </add>
                 <pb n="4r"></pb>
                 <add place="top" hand="#h5">ገድለ፡ ላሊበላ፡ ዘቅዱስ፡ መድኃኔ፡ ዓለም፡</add>
                 <handShift new="#h2"></handShift>
                 ምኵናኑ፡ ለዘመደ፡ ዘመድ፨ ጥንተ፡ መዋዕሉ፡ ዘኢይትኋላቍ፡ ወስፍሐ፡ ሀለዎቱ፡ ዘኢይትኄለድ፨ ዘለሐኮ፡ ለአዳም፡ በአርአያሁ፡ ከመ፡ ይትገሃድ፨ ለዘ፡ ከመዝ፡ እግዚእ፡ እንዘ፡ እገኒ፡ ወእሰግድ፡ እነግር፡ ዜና፡ ገድለ፡ ጻማሁ፡ ለብእሲ፡ ብፁዕ፡ ወቅዱስ፡ ክቡር፡ ወርኡስ፡ ስቡሕ፡ ወውዱስ፨ ለባሴ፡ ንጹሕ፡ ዘኢለከፎ፡ ደነስ፨ ጸዋሬ፡ ንዴት፡
                 <cb n="a"></cb>
                 ....
                 </ab>
                 </div>

Example 5

Additions, notes, etc.

For text related additions and marginal notes not reported under <additions>, use <add>:


 <add place="top" hand="#h5">ገድለ፡ ላሊበላ፡ ዘቅዱስ፡ መድኃኔ፡ ዓለም፡</add>

Example 6

The @place has specific values, as the schema will tell you. Please read carefully when choosing.

Always point to the <handDesc> <xml:id>s of each <handNote> in a @hand attribute if you know who added a specific text or letter.

In text

You might have to encode excerpts or text. Please see London, British Library, BL Oriental 719 for an example including some of these markup, or Oxford, Bodleian Library, Bodleian Aeth. e. 3.

description (not the schema definition!) tag usage current preview

red inks

<hi rend="rubric">

text red

superfluous text

<surplus> the scribe writes e.g. ክርስቶቶስ፡ instead of ክርስቶስ፡ you can encode this as ክርስቶ<surplus>ቶ</surplus>ስ፡

ክርስቶ{ቶ}ስ፡

later correction

<add> with @hand and @place

/text/

text deleted removing material and rewritten

<subst> with children <del rend="erasure"> and <add place="overstrike">

the top layer of the parchment was removed and the text was written again.

{text}

abbreviations

<expan> with children <abbr> and <ex> <abbr> can eventually contain <am> if the abbreviation sign is not made by one or more letters

The text presents an abbreviation of a name or a word

t(ext)

monograms

<expan> with children <abbr> and <ex> <abbr> can eventually contain <am> if the abbreviation sign is not made by one or more letters

The text presents an abbreviation of a name or a word

t(ext)

marginal signs

<emph>

[word]

a sign, pointing to the place where the text should be added

<gap reason="illegible" unit="chars" quantity="1">

you can vary unit and quantity. for characters, you will get a plus sign for each missing character

+

spaces left blank for rubrication, never filled

<space reason="rubrication" unit="chars" quantity="1">

(@quantity @unit left for rubrication)

underlined text

<hi rend="underline">

underlined

unclear letters or words

<unclear>

word/letter?

undecipherable words

<orig>

text grey

Text erroneously omitted from the surface by the scribe

<gap reason="omitted">

.....

Text omitted from the edition by the editor, for whatever reason (brevity, context, language, etc.)

<gap reason="ellipsis">

(...)

Lost text

<gap reason="lost">

used to indicate a material loss or a portion of text that is to be hypothesized for material reasons ("it was there!"), but which is lost for loss of the material

[- ca. @quantity @unit -]

text supplied by editor

<supplied reason="lost">

are used to indicate some portions of text that were restored based on philological and textual considerations

[the supplied text]

text supplied by editor

<supplied reason="omitted">

are used to indicate some portions of text that were integrated based on philological and textual considerations

<the supplied text>

text supplied by editor

<supplied reason="undefined">

please do not use this unless necessary there are other nice options.

[the supplied text (?)]

text supplied by copyist in the manuscript

<add place="inline"> with a child <supplied reason="omitted">

see above for supplied and below for add

<the omitted text> with a tooltip giving info on the position

text deleted by scribe in the manuscript

<del >, eventually with @rend and @cause

correction by editor

<choice> with children <sic> and <corr>

this will print the corrected version in bold and a tooltip which reports the original version and the author of it, clicking on the correct form will toggle the other

ancient correction

                        <choice>
                        <orig>ቦ</orig>
                        <corr>በ</corr>
                     </choice>
                     

Example 7

this will print the corrected version in {}

written upside down

<hi rend="reversed">

letters in a ligature

<hi rend="ligature">

written in negative writing

<hi rend="negative">

text erased (scratched with a knife or similar)

<del rend="erasure">

〚deleted text (when still legible)〛

text crossed out with red or black ink, a line or dashes

<del rend="strikethrough">

text marked with dots above or below the line

<del rend="expunctuated">

text deleted or washed out, the material is untouched

<del rend="effaced">

text added above

<add place="above">

all these will also have a popup indicating where the text was added

text added below the line

<add place="below">

text added at the bottom

<add place="bottom">

text added within the body of the text

<add place="inline">

text added interlinear

<add place="interlinear">

text added at the left

<add place="left">

text added in the margin

<add place="margin">

text added mixed

<add place="mixed">

text added on the opposite, i.e. facing, page

<add place="opposite">

text added on the other side of the leaf

<add place="overleaf">

text written over previously deleted text

<add place="overstrike">

text added at the right

<add place="right">

text added at the top

<add place="top">

text added at an unspecified place

<add place="unspecified">

line take-up

<seg rend="above">

line take-down

<seg rend="below">

TEI is made for this and there are many other things which can be much better and more precisely encoded.

When the record does not yet contain any text transcription, but information about some of the above-listed phenomena is available, this should be stated temporarily in a note with a commentary for future changes in the respective <msItem>:


                 <note>This prayer is written in negative writing. </note> 
               

Example 8

Note, however, the different usage for the information about rubrication, which is entered in <handNote>, see Hands Description.

Partial rubrication, where only parts of an element (typically numerals and punctuation marks) are rubricated can be marked up by pointing to a <rendition> element


                  <rendition xml:id="partialRubric">Red ink is used only partially (e.g.!!!)</rendition>
               

Example 9

This can be referred to with @rendition when encoding a part of text like in the following example.


                  <hi rend="rubric" rendition="#partialRubric">፨</hi>
               

Example 10

If possible, this becomes even more interesting if related also to a specific <handNote>.


                  <hi rend="rubric" rendition="#partialRubric" hand="#h1">፨</hi>
                  

Example 11

Translations

As there is a <div type='edition'> there can be a <div type='translation'>, a feature inherited from the main structural <div>s of an edition from the EpiDoc Guidelines. You can add this beside your <div type='edition'> in any manuscript or textual unit file where you want to provide a translation. To align parts of the translation to parts of the text, edition or transcription, you can use standard linking practices, that is to say, point to the correct @xml:id.

The following example is taken from History of the Episcopate of Alexandria.


      <div type="edition" xml:lang="gez" xml:id="etiopica">
         <div type="textpart" subtype="chapter" n="30" xml:id="chapter30Eth">
            <ab>
               ወእንዘ፡ ያንጐርጐሩ፡ ብዙኃን፡ በእንተ፡
               ...
            </ab>
         </div>
      </div>
      <div type="translation" xml:lang="en" xml:id="translationEthio" corresp="#etiopica">
         <div type="textpart" subtype="chapter" n="30" xml:id="transchapter30Eth">
            <ab>While many murmured for the fact that the appointment
               ...
            </ab>
         </div>
      </div>
   

Example 12

You can of course also add @corresp to the text parts if you so wish.

As long as you give them an @xml:id and possibly link them to the edition on which they are based, you can add as many translations as you want, in the same way in which you can add as many editions or transcriptions as you want. Please, see this referencing page for how to point to a specific translation.

Usually, if you have done a lot of encoding in a text which you are translating, you do not want, neither need to replicate that in the translation. You may however wish to mark up features of the translation itself, like omitted text which are translating or the text of the translation reports. This is especially important if the translation itself is an historical work and made choices of some respect.

For example you could use <supplied> with @reason 'subaudible' or 'omitted'. Here we use the same values which EpiDoc uses.

Revisions of this page

  • Pietro Maria Liuzzo on 2018-04-30: first version of guidelines from Wiki
  • Dorothea Reule on 2018-12-03: Added paragraph on locus indications of excerpts
  • Dorothea Reule on 2019-12-09: Added line take-up/down information, small edits for readability
  • Dorothea Reule on 2020-11-20: Added explanations on text in work and manuscript records from Pietro Liuzzo's and Daria Elagina's remarks in https://github.com/BetaMasaheft/Documentation/issues/1570