- ab
- abbr
- acquisition
- add
- additional
- additions
- antiphon
- app
- bibl
- binding
- bindingDesc
- catDesc
- category
- cb
- Certainty
- change
- choice
- cit
- citedRange
- collation
- collection
- colophon
- condition
- country
- creation
- custEvent
- date
- decoDesc
- decoNote
- del
- depth
- desc
- dim
- dimensions
- div
- editor
- ex
- expan
- explicit
- facsimile
- faith
- filiation
- foliation
- foreign
- gap
- geo
- graphic
- keywords
- handDesc
- handNote
- handShift
- height
- hi
- history
- idno
- incipit
- item
- l
- language
- layout
- layoutDesc
- lb
- lem
- list
- listApp
- listBibl
- listPerson
- listRelation
- listWit
- locus
- material
- measure
- msContents
- msDesc
- msIdentifier
- msItem
- msFrag
- msPart
- nationality
- notatedMusic
- note
- objectDesc
- occupation
- orig
- origDate
- origin
- origPlace
- p
- pb
- persName
- person
- personGrp
- physDesc
- place
- placeName
- provenance
- ptr
- q
- quote
- rdg
- ref
- region
- relation
- repository
- roleName
- rubric
- seal
- sealDesc
- seg
- settlement
- signatures
- source
- space
- subst
- summary
- supportDesc
- supplied
- surrogates
- TEI
- term
- textLang
- title
- unclear
- watermark
- width
- witness
- active
- ana
- assertedValue
- atLeast
- atMost
- cRef
- calendar
- cause
- cert
- color
- columns
- contemporary
- corresp
- defective
- dur
- evidence
- facs
- form
- from
- hand
- href
- ident
- key
- n
- name
- new
- notAfter
- notAfter-custom
- notBefore
- notBefore-custom
- part
- passive
- pastedown
- place
- reason
- ref
- rend
- rendition
- resp
- role
- sameAs
- script
- source
- subtype
- target
- to
- type
- unit
- url
- value
- when
- when-custom
- who
- wit
- writtenLines
- xml:base
- xml:id
- xml:lang
- @source
- Additional
- Additions and Varia
- Aligning transliteration and morphological annotations with Alpheios Alignment Tool
- Art Themes
- Attribution of single statements
- Authority files (keywords)
- Bibliographic References
- Binding Description
- Canonicalized TEI
- Catalogue Workflow
- Collation
- Colophons, Titles and Supplications
- Contributing sets of images to the research environment
- Contributing to the research environment
- Corpora
- Create New Entry
- Create a new file, delete existing, deal with doublets
- Critical Apparatus
- Critical Edition Workflow
- Dates
- Decoration Description
- Definition of Works, Textparts and Narrative Units
- Documentary Texts
- Dubious spelling
- Editing the Schema
- Editing these Guidelines
- Editions in Work Records
- Entities ID structure
- Event
- Figures and Links to Images
- General
- General Structure of Work Records
- Groups
- Hands Description
- History
- Identifiers Structure
- Images
- Images of Manuscripts for editions
- Inscriptions
- Keywords
- La Syntaxe du Codex
- Language
- Layout
- Letters
- Linking from Wikidata to the research environment
- Manuscript Contents
- Manuscript Description
- Manuscript Physical Description
- Manuscripts
- Named Entities
- Narrative Units
- Object Description
- Person
- Place or Repository
- Places
- References
- References to a text and its structure
- Referencing parts of the manuscript
- Relations
- Relative Location
- Repositories
- Revisions
- Roles and roleNames
- Scrolls
- Seals Description
- Setup
- Some useful how-to for personal workspace set up
- Spaces
- Stand-off annotations with Hypothes.is
- Standardisation of transcription from Encyclopaedia Aethiopica
- State and Certainty
- Statements about persons
- Structure
- Summary on the Use of @ref and @corresp
- TEI
- Taxonomy
- Team IDs
- Text Encoding
- Training Materials
- Transcriptions with Transkribus
- Transformation
- Transliteration Principles
- Users
- Using Xinclude
- Validation process
- Workflow
- Works
- Works Description
- Zotero Bibliography Guidelines
- titleStmt of Manuscript Records
Transcriptions with Transkribus
Generating a transcription
Even if you already worked through each page of your manuscript to produce a transcription, doing it again with Transkribus and checking it has many advantages, chiefly the alignment of the text regions and lines on the base image to the transcription. If you do not know this too, please refer to their guidelines and documentation.
The first step you will run in Transkribus is a layout analysis. Once that process has completed, please, make sure to go through each image and fix it, especially considering openings which are badly folded in the middle or areas of excessive match. Quite often red ink is not picked up by this step, for example. Columns sometimes have to be split manually, and marginal notes and other paratextual elements selected separately not to interfere with the text flow.
A transcription model has been trained with several materials provided by all our project's collaborators which can be used either to train more specific models or to directly transcribe images of written artefacts.
You can use this model picking it from those available in the Handwritten Text Recognition step. Once finished, the aligned transcription will be loaded below the images. Up to this point everything stays in Transkribus, but it can be now fruitfully moved to the project data and made available to everyone.
You may wish to do some fixing of this transcription in Transkribus before exporting. This is especially beneficial if you are using your transcription. Transkribus, as an expert tool, allows you to keep track of what you are transcribing line by line, and thus helps the accuracy of your work.
Downloading TEI from Transkribus
Now that your images are transcribed, you can download using the export functionalities of the Transkribus tool, the work you have done.
To download a TEI encoded version of this transcription, please select the option to export TEI (I use this client-side),
leave the default options for Zones selected and additionally select the option to use bounding box coordinates and to use Line Breaks <lb>
↗. Then click OK.
The TEI file you will get contains all your aligned transcription, linked between the regions of the image and the text. It has however to follow the structure of the images. If you transcribed images, for example, of openings, logically you will have a page break for each image. This TEI is thus not ready to be copy pasted into a BM Manuscript record. The next section details some tools available to make it work.
You can proceed to the next step directly if you used images from the project. If not, also the images need to be delivered with the transcription and stored in BM, because the alignment of the transcription is done on those files.
Generating a transcription for BM
You can of course fix everything by hand, or using regex in your preferred editor. We have however also an XSLT transformation which can be used, called transkribus2BM.xsl and which will restructure the TEI to fit the project's requirements. This transformation, which you can run with your engine or software of choice, e.g. in OxygenXML Editor, takes as input your Transkribus-exported TEI and needs to be given some parameters: the total number of foliated leaves, the number of protection leaves at the beginning, if this are part of your image set, the type of images, if single side or openings. The assumption is that your set of images will be tidy in this respect.
The result of this transformation is ready to be pasted into the correct Manuscript record but you may still want to fix some of the structure, or move around text regions which, for example, contain additions or other types of contents, like legends of decorations, or extras.
When adding this transcription to a record, please add a <sourceDesc>
↗ <p>
↗aragraph about the fact that the transcription is done with Transkribus
and add a <change>
↗ element detailing the transcription work carried out (corrected, not corrected, precision declared by the model, who has done the alignment and when, etc.).
It will also be useful to other users to add a <note>
↗ about the status of the transcription.
IMPORTANT: if you are using the <xi:include>
↗ for your transcription and the main manuscript record already contains a <text>
↗ element, you must (1) remove the <text>
↗ and <body>
↗ tags in the transkribusText.xml file, and (2) insert the relevant <xi:include>
↗ element inside of the <text>
↗ element of your main file.
Behaviour of the application
This transcription, once added via a normal PR process, will become part of the database, will be searchable, will be navigable, citable and hopefully offer great help to researchers.
If a transcription or parts of it are available, <incipit>
↗s and <explicit>
↗ do not need to be copied over, neither the text of additions, the locus reference will be enough to move to the appropriate piece of text.
This page is referred to in the following pages
Revisions of this page
- Pietro Maria Liuzzo on 2020-09-15: first version draft