Canonicalized TEI

A post-processed version of the TEI encoded according to these guidelines

The TEI described in these guidelines is highly customized, making use of the so-called ODD (One Document Does it all). Some of these customizations, although documented in the ODD and the derived schema, might make the data not so much interchangeable.

The Beta maṣāḥǝft project maintains a script which makes the data a bit more canonical, which can be used also by others to the same effect. The post.xsl transformation can be found here. It relies on a list of editors of the project and a canonicalized taxonomy.

We can start from the latter. While the taxonomy maintained with the data has the following format.


              <category>
                  <desc>Special Manuscripts</desc>
                  <category>
                      <catDesc>GoldenGospel</catDesc>
                  </category>
                  <category>
                      <catDesc>miniatureCollection</catDesc>
                  </category>
              </category>

Example 1

This is not what the TEI guidelines describe, it is a much smaller version, without @xml:id, where the actual string used for reference is the content of <catDesc>↗ which corresponds to the name of the TEI file in the database for that concept. The canonicalized taxonomy looks instead like the following example.


            <category>
                <desc>Special Manuscripts</desc>
                <category xml:id="GoldenGospel" corresp="https://betamasaheft.eu/authority-files/GoldenGospel/main">
                    <catDesc>Golden Gospel</catDesc>
                </category>
                <category xml:id="miniatureCollection" corresp="https://betamasaheft.eu/authority-files/miniatureCollection/main">
                    <catDesc>Miniature Collection</catDesc>
                </category>
            </category>

Example 2

Here the values of <catDesc>↗ are moved to an @xml:id and the element value is replaced by looking at the <title>↗ of that file. Additionally, a @corresp with the URL to the landing page on the app for that concept is provided.

When in the post-processed TEI, this is included, the file can point to these values internally. So, for example, <term[@key]>↗ which uses a list of values from the schema, reproduced from the taxonomy example above for the edited file, can be changed to <term[@ana]>↗ where the value is a fragment URI (e.g. #GoldenGospel).

In Beta maṣāḥǝft the different versions can be shown and obtained. Pointing simply at the ID of the file .xml (e.g. https://betamasaheft.eu/BNFet32.xml) the user obtains the file encoded according to these guidelines. prepending /tei/ (e.g. https://betamasaheft.eu/tei/BNFet32.xml) the TEI file encoded according to these Guidelines will be first transformed with post.xsl and then presented.

To stay with the example above, the edited file will not contain the <taxonomy>↗ at all, as the values are replicated in the schema as values for the @key attribute of <term>↗ or other elements and attributes. However, in the post-processed file, the entire taxonomy will be included, in the canonicalized form described above.

Another important difference regards the URIs. In the edited TEI file, we want them to be as short as possible. We use for identifiers of Beta maṣāḥǝft the plain ID of the file. This is to be interpreted using the @xml:base attribute, which contains the base URL of the app, and will allow an interpreter to point to a full URI of the web resource for that, which they can additionally use to retrieve any of the content types provided (HTML, XML or an RDF representation or centred graph). In the post-processed version of the TEI, this @xml:base is not needed, because all these pointers are spelt out entirely.

Similarly, URIs of external resources are entered in the edited XML using prefixes. These are documented in the <listPrefixDef>↗ which is also not physically present in the file, it is included.


  <listPrefixDef>
     <prefixDef ident="bm" matchPattern="([a-zA-Z0-9]+)" replacementPattern="https://www.zotero.org/groups/358366/ethiostudies/items/tag/bm:$1">
    </prefixDef>
     <prefixDef ident="betmas" matchPattern="([a-zA-Z0-9\.\-]+)" replacementPattern="https://betamasaheft.eu/$1">
    </prefixDef>
    <prefixDef ident="iha" matchPattern="([a-zA-Z0-9\.\-]+)" replacementPattern="http://islhornafr.eu//$1">
    </prefixDef>
     <prefixDef ident="ethiocal" matchPattern="([a-zA-Z0-9]+)" replacementPattern="https://raw.githubusercontent.com/BetaMasaheft/BetMas/master/BetMas/calendars/ethiopian.xml#$1">
    </prefixDef>
    <prefixDef ident="pleiades" matchPattern="(\d{5-8})" replacementPattern="https://pleiades.stoa.org/places/$1">
    </prefixDef>
    <prefixDef ident="sdc" matchPattern="([a-zA-Z0-9]+)" replacementPattern="https://w3id.org/sdc/ontology#$1">
    </prefixDef>
    <prefixDef ident="wd" matchPattern="([a-zA-Z0-9]+)" replacementPattern="https://www.wikidata.org/entity/$1">
    </prefixDef>
    <prefixDef ident="snap" matchPattern="([a-zA-Z]+)" replacementPattern="http://data.snapdrgn.net/ontology/snap#$1">
    </prefixDef>
    <prefixDef ident="saws" matchPattern="([a-zA-Z]+)" replacementPattern="http://purl.org/saws/ontology#$1">
    </prefixDef>
    <prefixDef ident="skos" matchPattern="([a-za-zA-Z]+)" replacementPattern="http://www.w3.org/2004/02/skos/core#$1">
    </prefixDef>
    <prefixDef ident="gn" matchPattern="([a-zA-Z0-9]+)" replacementPattern="http://www.geonames.org/ontology#$1">
    </prefixDef>
    <prefixDef ident="dcterms" matchPattern="([a-zA-Z]+)" replacementPattern="http://purl.org/dc/terms/$1">
    </prefixDef>
    <prefixDef ident="dc" matchPattern="([a-zA-Z]+)" replacementPattern="http://purl.org/dc/terms/$1">
    </prefixDef>
    <prefixDef ident="lawd" matchPattern="([a-zA-Z]+)" replacementPattern="http://lawd.info/ontology/$1">
    </prefixDef>
    <prefixDef ident="syriaca" matchPattern="([a-zA-Z\-]+)" replacementPattern="http://syriaca.org/documentation/relations.html#$1">
    </prefixDef>
    
    <prefixDef ident="agrelon" matchPattern="([a-zA-Z]+)" replacementPattern="http://d-nb.info/standards/elementset/agrelon.owl#$1">
    </prefixDef>
    <prefixDef ident="rel" matchPattern="([a-zA-Z]+)" replacementPattern="http://purl.org/vocab/relationship/$1">
    </prefixDef>
    <prefixDef ident="em" matchPattern="(\d+)" replacementPattern="https://www.eagle-network.eu/voc/material/lod/$1">
    </prefixDef>
    
    <prefixDef ident="eo" matchPattern="(\d+)" replacementPattern="https://www.eagle-network.eu/voc/objtyp/lod/$1">
    </prefixDef>
    
    <prefixDef ident="ew" matchPattern="(\d+)" replacementPattern="https://www.eagle-network.eu/voc/writing/lod/$1">
    </prefixDef>
     <prefixDef ident="ic" matchPattern="([a-zA-Z0-9]+)" replacementPattern="http://iconclass.org/$1">
     </prefixDef>
     <prefixDef ident="ecrm" matchPattern="([a-zA-Z0-9]+)" replacementPattern="http://erlangen-crm.org/current/$1">
     </prefixDef>
     <prefixDef ident="foaf" matchPattern="([a-zA-Z0-9]+)" replacementPattern="http://xmlns.com/foaf/0.1/$1">
     </prefixDef>
</listPrefixDef>

Example 3

The included TEI fragment contains statements like the following.


  <prefixDef ident="wd" matchPattern="([a-zA-Z0-9]+)" replacementPattern="https://www.wikidata.org/entity/$1">
    </prefixDef>

Example 4

This information, which is edited once for all files used in the post-processing to reconstruct from a name-spaced pointer the full URI.


   <relation name="skos:broadMatch" active="NAR0001gwelt" passive="betmas:LandGrant"></relation>

Example 5

Will be in the post-processed file


   <relation name="skos:broadMatch" ref="http://www.w3.org/2004/02/skos/core#broadMatch" active="https://betamasaheft.eu/NAR0001gwelt" passive="https://betamasaheft.eu/LandGrant"></relation>

Example 6

This makes also the <prefixDef>↗s not useful so they are removed from the post-processed file.

Also values of @who in <change>↗, or of @calendar in <date>↗ are governed by a list of values in the schema. In the first case, we want them instead to be, in a more canonical way, pointers to the @xml:id of an <editor>↗ or <respStmt>↗. In the post-processed file this is transformed. Similarly for the @calendar a list of <calendar>↗ elements with @xml:id is added, while the values in the attribute are transformed to references to those, by pre-pending a #.

Another change which happens when post-processing is that empty pointers are populated so that a repository reference, for example, gets also a content.


  <repository ref="INS0344AksumS"></repository>

Example 7

will become


  <repository ref="https://betamasaheft.eu/INS0344AksumS">ʾAksum Ṣǝyon</repository>

Example 8

Similarly, the <bibl>↗ containing in the TEI only a <ptr>↗ with @target is processed in the more canonical version, to include the entire rendering as TEI of the record pointed to in Zotero.


  <bibl>
<ptr target="bm:Rueppell1840Reise"></ptr>
</bibl>

Example 9

will be seen in the post-processed TEI as


  <bibl corresp="http://zotero.org/groups/358366/items/BRDJ9JFF" type="book">
<title level="m">Reise in Abyssinien</title>
<author>
<forename>Eduard</forename>
<surname>Rüppell</surname>
</author>
<pubPlace>Frankfurt am Main</pubPlace>
<publisher>
gedruckt auf Kosten des Verfassers, und in Commission bei Siegmund Schmerber
</publisher>
<date>1840</date>
<biblScope unit="volume">II</biblScope>
<note type="url">https://archive.org/details/reiseinabyssinie02rupp</note>
</bibl>

Example 10

<locus>↗ is also populated with some text when this is not present and only the attributes are given.

<locus from="1r" to="4v"></locus>

Example 11

becomes

<locus from="1r" to="4v">ff. 1r-4v </locus>

Example 12

This page is referred to in the following pages

General

Revisions of this page

Pietro Maria Liuzzo on 2019-07-03: first version of guidelines from Wiki