This project begins with a transliteration of the Voynich Manuscript and converts that transcription into a TEI-compliant XML file. The goal is not to produce a new transcription of the manuscript, but to formalize an existing transliteration in a machine-readable scholarly format that supports analysis, preservation, and reuse.
I started by searching for any transliteration file on the Voynich Manuscript that I could find. While looking at the files on the https://www.voynich.nu/data/ site and reading through its README, I decided that using the ZL3b-n.txt file would be the best, as it is a complete transliteration. It was last updated May 5, 2025, and uploaded to the website on June 4, 2025. The file is in an IVTFF 2.0 format, which stands for Intermediate Voynich MS Transliteration File Format. This file is also in the public domain.
Here is a small sample of the formatting (This is not a direct copy of the file. It is just an example of the formatting):
<f1r> <! $Q=A $P=A $F=a $B=1 $I=T $L=A $H=1 $C=1 $X=V>
# page 1
# text only
# Currier's Language A, hand 1
#
<f1r.1,@P0> <%>fachys.ykal.ar.ataiin.shol.shory.[cth:oto]res.y.kor.sholdy<!@254;>
<f1v.2,+P0> yteey.char.or.ochy<->dcho.lkody.okodar.chody
#
#
This document IVTFF – Intermediate Voynich MS Transliteration File Format, thoroughly explains the file format and its use. I looked this over and decided how it would transform into TEI.
The following is how I determined what was what, and why I changed it the way I did.
<f1r> → <surface n="f1r">
<! > → <note type="inline">
# → <note>
<f1r.1,@P0> → <line n="1" rendition="#At #P0">
The IVTFF file uses high-ascii extensions, meaning this symbol þ shows up as @254;. We don't want the ascii code, we want the unicode version of it.
The easiest solution to change all of these was through a Python script that replaces the ascii with the unicode character, which you can look at here.
Here is a portion of the Python file:
def replace_symbols(input_file, output_file):
replacements = {
'@128;': '€',
'@130;': '‚',
'@131;': 'ƒ',
'@132;': '„',
# Continues on.
with open(input_file, "r", encoding="utf-8") as f:
content = f.read()
for old, new in replacements.items():
content = content.replace(old, new)
with open(output_file, "w", encoding="utf-8") as f:
f.write(content)
if __name__ == "__main__":
input_path = "../ixml/ZL3b-n.txt"
output_path = "../ixml/ZL3b-n_updated.txt"
replace_symbols(input_path, output_path)
print("Replacement complete.")
Now the output from the example should look like this:
<f1r> <! $Q=A $P=A $F=a $B=1 $I=T $L=A $H=1 $C=1 $X=V>
# page 1
# text only
# Currier's Language A, hand 1
#
<f1r.1,@P0> <%>fachys.ykal.ar.ataiin.shol.shory.[cth:oto]res.y.kor.sholdy<!þ>
<f1v.2,+P0> yteey.char.or.ochy<->dcho.lkody.okodar.chody
#
#
Something I considered after creating this file was potentially doing this in the Invisible XML file instead to make this have less steps. I am on the fence on if this would be beneficial or not, as it would add about 200 more lines to the grammar of the IXML, so maybe it is better to keep them separate. More on this in the next section.
Due to the standard structure of the IVTFF Format, I opted to use Invisible XML in order to transform the format into an XML file. Originally, I attemted to use a Python script to do this, but it proved to be much more complicated than an Invisible XML grammar.
In order to get Invisible XML to work, I had to write the following grammar. This all takes the formatting from the IVTFF and changes it into an XML file. For example, this file takes the <-> and changes it into <figure/>.
sourceDoc: droppedPrelude?, surface+.
-droppedPrelude: firstNote, note, note, dumbHash.
firstNote: -"#", -noteBody, -newline.
surface: (surfaceN; surfaceNros), extraInfo, comment, newline, (note; dumbHash; line)*.
@surfaceNros: lab, "f", "R", "o", "s", rab.
@surfaceN: lab, "f", [N]+, ["r";"v"], [N]*, rab.
-extraInfo: (space; AnyNotSpace)*.
note: -"#", space, noteBody, newline.
-noteBody: noteChar*.
-noteChar: ~[#d;#a].
line: (lineN; lineNros), rendition, lineBody, newline.
@lineNros: -lab, -"f", -"R", -"o", -"s", -".", [N]+, -",".
@lineN: -lab, -"f", -[N]+, -["r";"v"], -[N]*, -".", [N]+, -",".
@rendition: (at;ad;as;aq;an;am;al;ax), +"#", ["A"-"Z"], ["0";"1";"a"-"z"], -rab.
-lineBody: (space; comment; figure; milestone-start; milestone-end; ligature; choice; unclear; plainChar; dumbHand)+.
comment: -"<", -"!", commentChar*, -">".
-commentChar: ~[#d;#a;">"].
figure: -"<", (-"-"; -"~"), -">".
ligature: -"{", plainChar+, -"}".
milestone-start: -"<", -"%", -">".
milestone-end:-"<", -"$", -">".
unclear: "?".
choice: -"[", unclearAlt?, ( -":", unclearAlt?)+, -"]".
unclearAlt: (ligature; choiceChar)+.
-choiceChar: ~[#d;#a;#20;"<";"[";"]";":";"{";"}"].
-dumbHand: -"<", -"@", -"H", -"=", -[N], -">".
-at: -"@",+"#At ".
-ad: -"+",+"#Ad ".
-as: -"*",+"#As ".
-aq: -"=",+"#Aq ".
-an: -"&",+"#An ".
-am: -"~",+"#Am ".
-al: -"/",+"#Al ".
-ax: -"!",+"#Ax ".
-lab: -["<"].
-rab: -[">"].
-space: -[#20].
-keepSpace: [#20].
-newline: -#d?, -#a.
-AnyNotSpace: ~[#d;#a;#20].
-NotSpace: ~[#d;#a;#20;"<"].
-plainChar: ~[#d;#a;#20;"<";"{";"}";"[";"]";":";"?"].
-dumbHash: -"#", -newline.
Now the file example should look like this:
<sourceDoc>
<surface surfaceN="f1r">
<comment> $Q=A $P=A $F=a $B=1 $I=T $L=A $H=1 $C=1 $X=V</comment>
<note>page 1</note>
<note>text only</note>
<note>Currier's Language A, hand 1</note>
<line lineN="1" rendition="#At #P0">
<milestone-start/>fachys.ykal.ar.ataiin.shol.shory.<choice>
<unclearAlt>cth</unclearAlt>
<unclearAlt>oto</unclearAlt>
</choice>res.y.kor.sholdy<comment>þ</comment>
</line>
<line lineN="2" rendition="#Ad #P0">yteey.char.or.ochy<figure/>dcho.lkody.okodar.chody</line>
</surface>
</sourceDoc>
In order to make this a proper TEI file, I had to run the output of the Invisible XML file through XSLT. This is because it changes the elements and attribute names when given. For example, this XSLT changes <comment> to <note type="outline">, and lineN="1" to n="1"
You can view the whole file here.
Now the output should look like this:
<sourceDoc>
<surface n="f1r">
<note type="outline"> $Q=A $P=A $F=a $B=1 $I=T $L=A $H=1 $C=1 $X=V</comment>
<note>page 1</note>
<note>text only</note>
<note>Currier's Language A, hand 1</note>
<line n="1" rendition="#At #P0">
<milestone unit="block" type="start"/>fachys.ykal.ar.ataiin.shol.shory.<choice>
<unclearAlt>cth</unclearAlt>
<unclearAlt>oto</unclearAlt>
</choice>res.y.kor.sholdy<comment>þ</comment>
</line>
<line n="2" rendition="#Ad #P0">yteey.char.or.ochy<figure/>dcho.lkody.okodar.chody</line>
</surface>
</sourceDoc>
This file uses the Eva font, which was designed by Gabriel Landini. I should note that commerical use of this font is frowned upon, and should only be used for private use. The font was updated in 2025, and can be downloaded here.