Part of file for epitome 142
id:142 title:Observationes de arte grammatica %% t1:Georgius ſymler Vuimpinenſis natione theuthonic <lb/> grâmatices obseruationes côpilauit la:Georgius Symler Vuimpinensis natione teutonica grammatices obseruationes compilauit es:Georg Symler, natural de Wimpfen, de la nación teutona, compiló unas 'Observaciones de gramática', %% t1:quib cuncta <lb/> fere grâmaticalia fundamenta potius agregata quâ <lb/> digesta uidentur la:quibus cuncta fere grammaticalia fundamenta potius aggregata quam digesta uidentur, es:en las que prácticamente todos los fundamentos gramaticales parecen entremezclados, antes que estructurados, %% t1:Vnde pro uectiorib potius quam <lb/> nouis tyrunculis illud lectitand censuerit la:unde pro uectioribus potius quam nouis tirunculis illud lectitandum censuerit es:de lo que habría aconsejado, más en favor de los más avanzados que de los principiantes jóvenes, leerlo repetidas veces, %%
Informal skeleton overview:
- book [root elemet] - section [section 1 - ordered list - section = epitome] - head [header - metadata - full list below] - id [section identifier = epitome number] - title [section title = epitome title] - body [sets of aligned segments] - align [multilingual aligned segments - align 1 - number implicit by the order] - t0 [segment - clean or consolidated texts of t1 and t2] - t1 [segment - transliteration from source 1 - there might be only one source] - t2 [segment - transliteration from source 2 - up to N] - es [segment - Spanish translation] - note-t0 [note - for segment 0] - note-t1 [note - for segment 1] - note-t2 [note - for segment 2 - up to N] - note-es [note- for segment es] - align [multilingual aligned segments - align 2 - up to N] - section [section N]
XML skeleton examples: book - section ¶
It is an implementation of the ADT where each section
is in a separated section file,
filenames as per the id
In the case of El Libro, section
correspond to epitome;
for example, the filename for epitome 142 is 142.txt
there are 1877 epitomes.
It is a plain text
record-jar format file:
a serie of records separated by %%
at the beginning of the line,
where each line is a
key-value pair.
The first record is the head
and the rest the body
where each record is a align
, aligned
multilingual parallel segments
(§10) and the corresponding notes;
a full example.
The rationale is lowering IT technical barriers: transcribers should focus on transcription and do not worry about IT aspects. All IT aspects are taken care off. A plain text editor is sufficient and there is no requirement to install additional specific programs. When saving the file select character encoding UTF-8 without BOM; often this is done by default, though some editors might not offer these options.
The first record in the file. It contains the header elements. Unused elements might be left blank or removed. Other elements might be added as needed. Note:
and uriprint
Element name | Content | Example | Description |
id | string | 142 | Epitome number |
regb | Registrum B number | 1386 | Catálogo Concordado |
regb-uri | id parameter in the query part of the Permalink | 091E8E601D10651B670F9F0 | Catálogo Concordado |
status | valis status | FINAL | status values |
date | YYMMDD[-HHMM] | 240305 | Timestamp of the last update |
ustc | USTC number | 689207 | Universal Short Title Catalogue (USTC) |
mei | URI MEI | | URI to the Material Evidence in Incunabula (MEI) |
lic | CC BY-SA - Creative Commons Attribution-ShareAlike - see also | BY-SA is the preferred license; others might apply | |
ipr-head | name | John Doe | Intelectual Property Rights of the head
ipr-body | name1,nameN | John Doe | Intelectual Property Rights of the body .
Singular includes plural:
the owner of the transcription and translation,
if "sumista" (author) does not have the IPR due to payment or other.
curator | name1|nameN | John Doe | IT curator, ordered |
sum-t1 | name1|nameN | John Doe | Transcribers are called sumistas in honor of the original sumistas. Ordered.
The first one is also the translator, except if indicated in a "sum-LC".
Also the IPR owner(s), except if otherwise indicated in ipr-body .
It might anonymous or empty.
title | string | Marie Virginis corona | Title of book |
title-uri | URI | | URI to the book |
inc | string | Signum magnum | included; additional works included in the book |
lang | LC | la | Language; main language of the book |
materia | string | 184 | Reference to the Libro de las Materias, the number in a box next to some epitome number |
author | name1|nameN | Georgius Simler Vuimpinensis | Author(s) of the book |
author-uri | URI1|URIn | | URI for the author(s) |
tran | name1|nameN | Lorenzo Valla | Translator(s) |
tran-uri | URI1|URIn | | URI for the translator(s) |
name1|nameN | Iodocus Badius | Printer(s)/editor(s) | |
print-uri | URI1|URIn | | URI for the printer(s)/editor(s) |
per | name1|nameN | Bonifacii de Ceva | Any other person related to the book |
per-uri | URI | | URI for the person(s) |
note-head | string | Lorem ipsum dolor sit amet | Header notes |
c-pfrom | number | 9 | From page number in the Copenhague manuscript (LE-C) - empty means not in LE-C |
c-pto | number | 9 | To page number in the digitalised LE-C |
c-rfrom | rvreference | 3r | From page reference in recto-verso notation in digitalised LE-C |
c-ro | rvreference | 3r | To page reference in recto-verso notation in digitalised LE-C |
c-size | number | 0.25 | Size of the epitome in number of pages, mostly calculated programatically - 0 means less than a page |
s-pfrom | string | Not empty means in the Sevilla manuscript (LE-S), more related fields might be added later |
The rest of the records are align
aligned multilingual parallel segments that should be as small as possible.
Two characters keys are reserved for segments containing transcribes and languages data.
is "translated" into the clean version of the same language;
as most of texts are in Medieval Latin with abbreviations, they are transformed into Classical Latin,
Element name | Example | Description | ||||||||||||
t0 | Lorem ipsum | consolidated t1-tN sources; it might be absent; if only one source, it might be used for cleaned-up text; if present, translate all LC from this text | ||||||||||||
t1 | Lorem ipsum | text of source 1 | ||||||||||||
tn | Lorem ipsum | text of source N (1,2,3...); there might be only one source | ||||||||||||
LC | Lorem ipsum | translation into language LC; this element might be repeated with different LC, for example la |