Catalogue Workflow

A Digital-first Collaborative Catalogue of Manuscripts

This page lists one possible workflow for producing a printed manuscript catalogue working from and with the digital research environment Beta maṣāḥǝft. Currently, Beta maṣāḥǝft has no employee responsible for giving technical support to such a project.

This is a strategic page, very similar to the one about the workflow for a digital-first collaborative critical edition. It is not meant to be comprehensive or tutorial-like, but to give you the right pointers in an organized fashion.

There are many ways to do a digital-first catalogue or handlist which can then be the basis for a paper based product. Some tools are listed here, and can be found looking at the training materials. Many more exist based on XML, LaTeX, etc. One example is Nosnitsin and Reule 2021, entirely based on data edited in the online research environment Beta maṣāḥǝft.

Each workflow comes with its requirements and learning curve. The best one is the one which works best for you and your needs. This page describes one possibility which benefits at several stages from the commonly edited data in Beta maṣāḥǝft (BM) and contributes to it in several citable and version-aware ways. It hasn't yet been used for a real project.

In this page, we will

  1. go through the hypothetical phases of the work involved for preparing a catalogue of manuscripts,
  2. link to the available resources and pages of documentation (these do not need to be used in the exact ways described),
  3. try to clarify what can and what cannot be supported in such a process.
Taking a few steps towards a deeply collaborative way of working benefits you, the quality of your work and the direct impact of this work on others.

The digital-first workflow presented here is based on the principle of separation of concerns, which is typical of the world of coding and programming. This methodological approach involves the splitting of as many different levels of concern as it is possible and useful. Instead of trying to do everything with one tool and one method, each separate part of the work is done with a specifically designed method.

The most common and basic separation of concerns is the separation between the concern about the semantic annotation of the text and that of its rendering on the printed page. Instead of saying, "I make a footnote to indicate the list of variants for this passage, and I put it here because it looks nice", we turn it around and think first "I mark the variants as variants.", and then in a separate step, "I want the variants to be listed in footnotes.", and in yet another independently-documented step "I want the footnotes to be arranged in this way because it looks nice."

Separating the steps also allows for much more control and consequently consistence. Mastering this concept is not easy, because it requires a postponement of satisfaction, and thus a higher perceived risk. This page will try to demonstrate the falsity of this widely spread perception with evidence at each step, and serve as a reassurance that it is indeed the opposite which is true.

Identification of requirements

To start with the right foot, one should be quite clear from the outset about the desired final output. In most cases, this will involve the requirements of the editors of the article or book and the structure of the entry for one manuscript. Typically, none of these requirements should prevent any of the following steps, that is to say, good solutions can be found to the excuse "the editor does not let me".

My suggestion: make a list or one, possibly complete, example of the input (XML) and output (your entry in the catalogue). This would ideally before the start of the project or as early in the project as possible.

Benefits to you

If you know what your desired finished product is from the beginning, everything will be easier. For example, if you already know for which entities you want to include indices, you will be able mark up all concerned terms from the beginning without having to search for them or manually compile the indices at the end.

Start from the output...

In the GitHub Make PDF repository (Beta Masaheft) you can find all the details of a customizable package to allow you producing a PDF output from XML records. It is the same used in the Catalogue of the manuscripts of Dayr as Suryan.

The package is easily customizable with the settings file, not only for the layout features, but also for the parts, their order, and the contents. The main functionalities are documented here and the typesetting is done in Oxygen XML Editor using XSL-FO and Apache FOP.

To use it, please fork (not branch) the repository to your own GitHub account. If you open that package in Oxygen, open driver.xml and click play, the package will use BM to fetch resources from the research environment and print your book.

Benefits to you

The script will rely on your work file, which is included locally, and can therefore be only on your branch and not yet public. It will use the data stored in BM as XML and the DTS API to fetch information to compile your book. For example,

  1. it will always consistently print the parts that you want in the order that you chose,
  2. it will collect bibliography and print it in a specific section using the HLZ styles and the EthioStudies library pulling citations from all relevant files,
  3. it will format the text benefiting from the encoding,
  4. it will collect all named entities and print the indexes (also selected in settings),
  5. it will list plates, if you so desire.

Of course, as you change your data in the manuscript records, running the transformation will update the output.

All can be customized, entirely changed, adapted, features can be added, etc. The script is only one file, which you can read yourself or ask somebody who can read it for you.

Fine touches can be done in the XSL-FO, as this is also simply an XML file.

Benefits to the community

Anyone could follow the same steps or some of them here to reproduce your work. Anyone will be able to find the exact point and layer of the information which can be criticized (or not), if they wanted to.

Encode, print, check, report, correct, repeat

You might work with manuscripts already encoded in BM or with new descriptions you carried out for your project. In the last case, consider as a first step having a unified type of information, the TEI used by the project. Thus describing the manuscripts and linking the images and the text in the research environment will allow you to benefit from the existing information. Catalogue descriptions can be as long or short as desired.

Add the manuscripts , indexes will update at every run, add new information, change it, simply encode following these guidelines and look at the output. Chances are that the output will not look as you want. If it is not one of the customizable settings you can either try to read the Xquery or simply ask somebody who knows to help you. If you keep your project in GitHub help will come as easy as a commit to your repository.

The clearer you explain what the source is and what the output is that you expect, the easier the fix.

Deposit Your Dataset

Working with the data in the research environment will provide you with stable citable references for each committed and merged version of your files. But what about the entire group of resources you have used? To make the set really reproducible and your collected resources citable as a set, you may deposit them in Zenodo or another similar repository.

Benefits to you

Your complete dataset will be citable as a group of resources. You may also submit a data paper to describe your dataset.

Copyright of remixed content

As every single page of our website states, as well as each file in the Beta maṣāḥǝft dataset, the copyright of the data is of the Akademie der Wissenschaften in Hamburg, Hiob-Ludolf-Zentrum für Äthiopistik. Sharing and remixing are permitted under terms of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Therefore books created remixing and reusing this data (catalogues, editions, etc.) should have the same copyright, or a formulation equally or more permissive adequately negotiated with the publisher.

Publication products generated from BM data must respect this copyright and should contain appropriate attribution to all contributors. The above PDF producing script does some of this, not all that can be done! Once all is ok and the book/article is ready, the licensing of the data, with the agreement of the above copyright owners, can be upgraded for commercial use as well, so that the publisher can print and sell (CC-BY-SA-NC —> CC-BY-SA). This is possible and positive, as it allows even more reuse. A deposit of the source data as described above can be made and a DOI of the data deposit added to the book content. Copyright of the book/article, product of the remix of data, can be silently or formally passed to the editor. The statement in the book, in respect to the original copyright should state (C) editor name… CC-BY-SA, or an equally allowing licence as the data on which it is based requires. Editors can commercialise it and sell the book respecting this copyright. As soon as possible and following agreement with the editor, the data can additionally be made Open Access. The DOI of the open access book is added to the deposit of the data with the proper relation so that the two DOIs know about each other. This example workflow yields the following distinct products, each of which is distinctly and separately citable, adding to each other over time:

  1. publication of the source data NC until publication CC-BY-SA-NC (C) Akademie+HLCEES + Open Access and versioned in GitHub
  2. publication of the source data after publication CC-BY-SA (C) Akademie+HLCEES + Open Access and versioned in GitHub
  3. HTML versions of the data throughout CC-BY-SA-NC (C) Akademie+HLCEES + Openly available, with independently generated Permanent Identifiers for reference and stable URIs for data relations.
  4. Several additional data formats remixes with CC-BY-SA-NC (C) Akademie+HLCEES + Openly available
  5. publication of the dataset for the publication CC-BY-SA (C) Akademie+HLCEES + Open Access for the dataset with DOI
  6. the print publication CC-BY-SA (C) Editor
  7. the digital publication CC-BY-SA (C) Editor + Open Access for book with DOI.
  8. An additional versions of the DOI of the deposit of the data, once this is updated with relation to the DOI of the book.
This flow is, to the knowledge of the author of this page, consistent licencing-wise, protects the contents from commercialisation until necessary, allows sales, has no time impacts, no costs impacts, is ready for open access... is quite simple... enforces collaboration, so, is a win win win and win again, with no side effects!

This page is referred to in the following pages

Revisions of this page

  • Pietro Maria Liuzzo on 2021-11-18: first version based on Dorothea, Denis, Mersha's works
  • Dorothea Reule on 2021-12-06: corrected typos, reformulated slightly

You found "PDF" in 5 entries!

Some useful how-to for personal workspace set up1

... tutorial Morphological parser Make PDF instructions On some pages also...

Training Materials2

... The Make PDF Oxygen Project package H...

...o use the provided package to produce a PDF based on XSL-FO with the Aethiopica ...

Catalogue Workflow3

... output... In the GitHub Make PDF repository (Beta Masaheft) you can f...

... package to allow you producing a PDF output from XML records. It is the s...

...ribution to all contributors. The above PDF producing script does some of this, ...

Critical Edition Workflow8

..., link them. However, the PDF script listed in this page will not ...

... also download from Transkribus .docx, .pdf, .txt and many other for...

...different format as output, e.g. .txt, .pdf, .odt. Benefi...

... published online also as HTML and PDF, while on the branch they are still ...

...th that. In the GitHub Make PDF repository (Pietro's personal Fork),...

...imple package to allow producing also a PDF output from this steps, which ...

... this, simply open the PDF. The package is easily cus...

...ribution to all contributors. The above PDF producing script does some of this, ...

Zotero Bibliography Guidelines2

...se add a note to explain what is in the PDF which provides evidence for the deci...

... record and perhaps inform on the PDF availability. When you a...