Corpora

This part of the encoding is part of the cooperation with the Ethiopian Manuscripts Archives Project.

To encode a corpus of documents you have a series of specific encoding practices to follow.

You need a corpus file, where only the metadata of the corpus of documents is given, like in Les archives royales du Maṣḥāfa Ṭefut à la cour de Gondar.

Documents will be encoded as additions and follow all the guidelines in there with some more specific attentions.

There needs to be a <relation> pointing back to the corpus file like in the following example:

<listRelation> <relation active="BNFabb152#a1" name="saws:formsPartOf" passive="corpus3"></relation> </listRelation>

The text of the document can be normally annotated with keywords and named entities but should additionally use <seg type='interpretation'> with @ana pointing to the ID of one of the interpretation keywords in the taxonomy.

<q xml:lang="gez"> <seg ana="#invocation" type="interpretation">በአኰቴተ፡ አብ፡ ወወልድ፡ ወመንፈስ፡ ቅዱስ፡</seg> <seg ana="#provision" type="interpretation">አነ፡ ሠራዕኩ፡ ወአውገዝኩ፡ ለሰብአ፡ ገዳም፡ ዘዋልድባ</seg>፡ <seg ana="#suscription" type="interpretation">አነ፡ ንጉሥ፡ ገላውዴዎስ፡ ዘተሰመይኩ፡ አጽናፍ፡ ሰገድ፡</seg> <seg ana="#clauses" type="interpretation">ከመ፡ ኢይቅረብ፡ ፩፡ ዘኢነበረ፡ ውስተ፡ ገዳም፡ በአፈ፡ አብ፡ ወወልድ፡ ወመንፈስ፡ ቅዱስ፡ ወበአፈ፡ ፲ወ፭፡ ነቢያት፡ ወ፲ወ፪፡ ሐዋርያት፡ ወበአፈ፡ እግዝእትነ፡ ማርያም፡ ወላዲተ፡ አምላክ፡</seg> <seg ana="#clauses" type="interpretation">ከመ፡ ኢይባዕ፡ ጕልት፡ ውጉዘ፡ ይኩን፡ በሥልጣኖሙ፡ ለጴጥሮስ፡ ወጳውሎስ፡</seg> <seg ana="#clauses" type="interpretation">ወከመ፡ ኢይፍሐቅ፡ ዘንተ፡ ዘተጽሕፈ፡ ኀብ፡ ዝንቱ፡ መጽሐፈ፡ መንግሥት።።።</seg> </q>

This can be done in the translation or in fidal, but better in the original.

This allows the corpus view (e.g. Archives d'Aksum) with coloring of the different parts, as well as the comparison of diplomatic patterns.

This page is referred to in the following pages

Revisions of this page

  • Pietro Maria Liuzzo on 2018-04-30: first version of guidelines from Wiki