Towards open, multi-source, and multi-authors digital scholarly editions.The Ampère project.
Christine Blondel (CNRS, Centre Alexandre Koyré, Paris) and Marco Segala (University of L’Aquila)
Object of this talk is to present contents, aims, and challenges of the edition of André-Marie Ampère’s writings within the Ampère project (www.ampere.cnrs.fr).
The Ampère project is both a digital scholarly edition of André-Marie Ampère’s publications, manuscripts, and correspondence and a website devoted to the history of electricity and magnetism – providing primary and secondary sources, multimedia documents with videos of historical experiments. At the moment the former is under substantial renovation through BaseX and Synopsis – in collaboration with the Atelier des Humanités Numériques of the Ecole Normale Supérieure at Lyon. We intend to present a beta version at the Dixit Convention.
Ampere’s corpus is constituted by 150 publications mainly in the domain of mathematics, physics, and philosophy of science, 1300 letters (from and to Ampère), and 53000 pages of manuscripts covering all scientific disciplines (mathematics, physics, chemistry, astronomy, natural history), the human sciences (psychology, philosophy, linguistics), and personal writings (like autobiography, personal journals, poetry). All the publications and letters and a selection of the manuscripts have been encoded according to the TEI, indexed, and will be annotated using the Pundit software.
We intend both to describe choices, challenges, and difficulties encountered during the process of encoding and indexing and to discuss the importance of implementing the digital edition with annotation software.
Choosing TEI for the encoded transcriptions was not only motivated by th exigency to adhere to consolidated standards in digital scholarly editions; as Ampère’s corpus is composed of three kinds of texts (publications, manuscripts, and correspondence), TEI-based transcriptions are easily managed to both establish connections and support interrelated research among the three types of documents (the new website will provide a faceted search engine). We have opted for light encoding and the production of XML files that will make Ampère’s writings available for web edition, annotation, and further initiative. We have enriched the transcriptions with Mathematical Markup Language (MathML) to give full searchability in the several texts comprehending substantial mathematical content.
The idea to enrich the website with an annotation software has come from the exigency to offer not only a scholarly edition but a research tool, too. Annotation is the first step in scholarly research aimed to perform analysis, comparison and connection among different texts. This is the reason why we have chosen a software that will be able to support annotation by different users (collaborative annotation): our view is that a digital scholarly edition must provide something more and something new than a printed scholarly edition.
We will conclude our talk with some reflections upon the comparison of digital and printed scholarly editions.
A new system for collaborative online creation of Scholarly Editions in digital form
Barbara Bordalejo (KU Leuven) and Peter Robinson (University of Saskatchewan)
This paper will outline the principles behind the Textual Communities editing environment, and demonstrate its implementation and use in several major projects.
Textual Communities differs from other online editorial systems in its explicit support for the two aspects of text: text as document and text as communicative act (corresponding broadly to the Text Encoding Initiative “<sourceDoc>” and “<text>” elements). Thus, it supports page-based transcription while understanding that text may flow across pages. Accordingly, text may be located according to its position on the page (so enabling fine-grained text and image linking), while also structured according to its intellectual content (as prose paragraphs within chapters, or verse lines within poems).
Textual Communities includes tools also for managing large-scale collaborative projects inside a social media environment (wikis, blogs, bulletin boards and chat). Editors may create editorial communities, populate them with document images and transcripts, and invite collaborators to contribute to the project as co-leaders, transcribers or collators. Transcribers may then be assigned transcription texts (a page or pages of manuscripts or books); submit them for approval when finished; and the leader may approve and commit the transcript for further processing (collation and analysis) and final publication. Transcripts of different versions of the same text may be collated using the CollateX system, augmented by tools for regularization and collation adjustment. Textual Communities also offers a complete API so that materials created within the environment may be exported and used within webpages, either as XML or as JSON arrays. To enable re-use, materials created within Textual Communities are by default open access, licensed either as CC-BY (Creative Commons attribution) or CC-BY-SA (Creative Commons attribution share-alike).
As well as offer examples of the projects currently using Textual Communities, the talk will reflect on some of the implications of the project. An explicit motivation for the conception of Textual Communities is that software should serve editors, and not the other way about. To put this another way: software to help editors should be built according to a clearly articulated and valid editorial model. This paper will reflect on how far Textual Communities achieves this, and what yet is to be done.
The role of digital scholarly editors in the design of components for cooperative philology
Federico Boschetti, Riccardo Del Gratta, Angelo Del Grosso
This contribution is focused on the role of the digital scholarly editor in the continuous process of analysis, development and evaluation of libraries of components for cooperative philology. By following a general trend, in the domain of digital humanities developers are progressively shifting from the project-driven approach to the new community-driven paradigm. This shift is solicited by the increasing aggregation of scholars in communities of practice that are expressing common requirements and sharing best practices.
In most cases, service providers are responding to these needs by offering web services quickly developed by taking into account the specific functionality that they expose or, worse, by wrapping legacy code. Although a pipeline of web services devoted to linguistic analysis and collaborative annotation provides many advantages in terms of flexibility, we are concerned by the impact of the main drawbacks, in order to study alternative or complementary solutions for our domain.
Maintainability, performance and atomicity are the principal issues in which we are interested. In a chain of web services, the overall system depends by the status of the singles nodes and medium or small projects not always are able to grant the necessary level of redundancy or caching strategies. Performance is affected by the trade-off among challenging conditions (e.g. memory resources, computational overload, bandwidth). Atomicity influences the reusability and the extension of services (e.g. from many points of view, Latin metrical analysis is very similar to ancient Greek metrical analysis, but a web service that atomically provides the former could be totally unusable for the latter).
At the Cooperative Philology Lab (Institute of Computational Linguistics “A. Zampolli”, CNR, Pisa) we try to address these issues by designing and developing a library of components for the domain of scholarly editing. A library can be installed locally or remotely and it provides multiple choices for maintenance and performance tuning. But above all a library of components provides the building blocks to shape local or remote services at the adequate level of atomicity, in order to ensure reusability and extendibility.
The role of the digital scholarly editors with which we have collaborated in pilot and funded projects at the CNR-ILC is crucial, because they are providing the necessary use cases that we are generalizing for the design of our library. During the workshop, we would like to stress the importance of a new generation of digital scholars that are not only creators of digital resources and consumers of computational tools or web infrastructures, but also actors in the analysis of requirements and in the evaluation of the libraries of components devoted to their activities.
Bozzi, “Computer-assisted scholarly editing of manuscript sources,” in New publication cultures in the humanities: exploring the paradigm shift, Davidhazi, Ed. Amsterdam: Amsterdam University Press, 2014, pp. 99-115. [Online]. Available: http://www.oapen.org/record/515678
McGann, “From text to work: Digital tools and the emergence of the social text,” Variants: The Journal of the European Society for Textual Scholarship, vol. 4, pp. 225-240, 2005.
Robinson, “Towards a scholarly editing system for the next decades,” in Sanskrit Computational Linguistics, ser. Lecture Notes in Computer Science, G. Huet, A. Kulkarni, and P. Scharf, Eds. Springer Berlin Heidelberg, 2009, vol. 5402, pp. 346-357. [Online]. Available: http://dx.doi.org/10.1007/978-3-642-00155-0 18
Robinson, “Towards a theory of digital editions,” Variants, no. 10,105-131, 2013.
Siemens, M. Timney, C. Leitch, C. Koolen, A. Garnett et al., “Toward modeling the social edition: An approach to understanding the electronic scholarly edition in the context of new and emerging social media,” Literary and Linguistic Computing, vol. 27, no. 4, pp. 445-461, 2012.
Inventorying, transcribing, collating: basic components of a virtual platform for scholarly editing, developed for the Historical-Critical Schnitzler Edition
The Trier Center for Digital Humanities is currently involved in the creation of a number of digital editions, be it through preparing and providing data or through the developing of software solutions. In this context, the Arthur Schnitzler Edition [Arthur Schnitzler: Digitale historisch-kritische Edition (Werke 1905–1931)] holds a special position because of its complexity and scope. A digital platform for scholarly editing is being created for it which is to cover all steps, from the first assessment and inventory of textual witnesses, through their transcription, down to the comparison of the resulting texts. It will – as a whole or in parts – be reusable by other editions, with the technical infrastructure being provided by FuD whose database environment serves as a juncture for specific modules, which will be presented briefly below.
The research network and database system (Forschungsnetzwerk und Datenbanksystem) FuD forms the technical backbone of the platform, offering a set of features for decentralized collaborative work. On the one hand it allows the inventory (metadata capture, grouping) and commentary (creation of indexes) of the material, on the other hand it provides a database environment which manages all created content and thus represents the intersection between the individual modules. Although these have XML interfaces and can be run independently, the access to a central database facilitates collaboration and has advantages when dealing with concurrent hierarchies and structures transcending document borders.
Transcription of textual witnesses: Transcribo
The transcriptions are established in Transcribo, a graphical editor developed to meet the exact needs of the project. Technically, it had to be able to communicate with FuD, to handle large image files without delay and to support an extensive set of specific annotations. The user interface sets the digital facsimile (generally the scanned physical witness) at the centre. This appears in double form, always providing the original for close examination while all processing steps take place on a slightly attenuated duplicate. This arrangement accommodates the use of multiple monitors and above all saves time-consuming jumping back and forth between image and editor window. Thus, field markings in rectangular or polygonal shape can be set topographically exactly around the graphic unit to be transcribed, and the transcribed text can then be entered directly. For the processing of typescripts an additional OCR with options for image enhancement (such as contrast and color adjustment) is integrated, providing a raw transcription as a basis for further editing.
More fundamentally, each transcribed element can be provided with a comment and each relevant philological or genetic phenomenon can be annotated in a uniform way. Here, a context menu is used with a set of options which is project-specific, with capacity for expansion. It will be adjusted as necessary throughout the project’s lifecycle, according to the requirements of the textual basis.
Additionally, we are developing a graphic environment for textual comparison, as existing solutions had proven to be inadequate for a number of reasons. Firstly, the variation between the textual witnesses can at times be quite significant, with the challenge being to spot the rare similarities rather than recording differences. Secondly, it is often necessary to compare a large number of witnesses simultaneously. Finally, we want to visualise the results in different contexts, with varying granularity. The comparison process is therefore divided into two phases: at first, larger linguistic or structural units (depending on the text: sentences, paragraphs, speeches, scenes…) are matched, regardless of their position in the overall text. This is then checked by a philologist, corrected if necessary, and then handed on for detailed comparison.
Outlook: material collection and genetic pathways
The platform is to be completed by an environment for ordering textual witnesses and defining genetic pathways. This is still in the design phase and should in due course make it possible to associate textual witnesses with the respective works or versions via drag and drop and to define sequences from various perspectives (genesis of a version, absolute chronological order, interdependence of version …).
correspSearch: A Web Service to Connect Divers Scholarly Editions of Letters
Stefan Dumont, Berlin-Brandenburg Academy of Sciences and Humanities, email@example.com
Letters are an important historical source: First, they may contain comments from contemporaries about the most different topics, events, and issues. Second, letters allow insights about connections and networks between correspondence partners. So, questions occur which can only be answered across the borders of scholarly letter editions due to the fact that these editions are usually focussed on partial correspondences (on a certain person or on a correspondence between two specific persons). But this needs time-consuming searches across various letter editions. This has been a well-known problem for quite some time, now. It has lead Wolfgang Bunzel, a Romanticism researcher, to request “the creation of a decentralized, preferably open digital platform, based on HTML/XML and operating with minimal TEI standards“ to connect divers scholarly editions of letters. This has been a well-known problem for quite some time, now. It has lead Wolfgang Bunzel, who works in the field of research about the Romanticism, to request:
“the creation of a decentralized, preferably open digital platform, based on HTML/XML and operating with minimal TEI standards, which is extensible in different directions and allows for existing web portals and websites to contribute at the lowest possible cost. This doesn’t request some kind of super structure which covers the entire amount of letters from the Romantic era (which could not be estimated exactly, anyway) but rather an intelligent linking system, which associates existing documents with one another. The creation of such nexus will naturally lead to research options reaching from searches for persons and places to specific keyword-based searches […]”
With “correspSearch” (http://correspSearch.bbaw.de) this paper will present a web service, which takes a step in this direction by aggregating metadata of letters from various (digital or printed) scholarly editions and providing them collectively via open interfaces. Each project can provide their metadata in an online available and free licensed TEI XML file, which is conform to the Correspondence Metadata Interchange (CMI) format. The CMI format was developed by the TEI Correspondence SIG and based mainly on the new TEI element correspDesc, but in a restricted and reductive manner. To identify persons and places authority controlled IDs are used (e.g. VIAF, GND etc.). The web service collect these TEI XML files automatically and periodically and offers all researchers a central web interface to search for letters in divers scholarly editions and repositories. Furthermore, via an Application Programming Interface (API) the gathered data can also be queried and retrieved by other web applications, e.g. digital scholarly editions. Thus, researchers can explore letters in divers scholarly editions as parts of larger correspondence networks.
 Wolfgang Bunzel: Briefnetzwerke der Romantik. Theorie – Praxis – Edition. In: Anne Bohnenkamp und Elke Richter (Ed.): Brief-Edition im digitalen Zeitalter (=Beihefte zu editio Bd. 34) Berlin/Boston 2013. p. 109-131, here p. 117. Please note, that this is my own translation into English.
ediarum – A Digital Work Environment for Scholarly Editions
Martin Fechner, TELOTA, Berlin-Brandenburg Academy of Sciences and Humanities, firstname.lastname@example.org
Experience shows that although using TEI-XML encoding  for the digital transcription and annotation of manuscripts can improve research in edition projects, the readiness to implement it greatly relies on the userfriendliness of the entry interface. From the perspective of a researcher, working directly in XML simply doesn’t compare to the ease of programs like MS Word. A new software solution must therefore at least offer the same amount of editorial comfort as such programs. Ideally, it would also encompass the complete life-cycle of an edition: from the first phases of transcription to the final publication. TELOTA developed such a software solution “ediarum” , which can be adapted to the needs of different research projects.
The central software component of the new digital work environment “ediarum” is Oxygen XML Author . The researcher does not edit the XML code directly, but instead works in a user-friendly Author mode. Additionally, a toolbar is provided with which the researcher can enter (also complex) markup with the push of a button. In this way text phenomena such as deletions or additions, or editorial commentary, are easily inserted. Person and place names can also be recorded with their appropriate TEI markup, and in addition they can be simultaneously linked to the corresponding index.
All documents are stored in the open source database eXistdb.. With an “ediarum” package for existdb it is easy to set up a new database and with the schema documentation component one can adapt and document the used TEI subschema.
Besides creating a digital work environment, a website can also be built based on eXistdb, XQuery, and XSLT. Through the website the researchers can easily page through or search the current data inventory.
Through the intergration of ConTeXt  into “ediarum” a further publication type, a print edition, can automatically be generated as a PDF from the TEI XML documents. The layout and format imitates previously printed volumes of critical editions and can be configured. In this way the different apparatuses appear as footnotes that refer to the main text with the help of line numbers and lemmata. The print edition can also provide the suitable index for each transcription and solves any occurring cross references between manuscripts.
This work environment has been in use since 2012 by the staff of different research projects at different institutions for their daily work . For each project the TEI XML schemata and main functions were customized to the different manuscript types and needs.
TEI/XML is a set of guidelines and data format that is maintained by the Text Encoding Initiative (TEI) for creating digital editions. It has become an important tool for creating digital editions and as such it plays a major role within the Dixit project. Creation of transcriptions using TEI/XML is a well covered subject. Numerous workshops have been given training participants how to model and encode data. Furthermore there are tools available to aid this process, such as the Oxygen XML editor, that are well known and mature.
How to easily query and publish the data contained in TEI/XML files is a more open question. The currently proposed solutions are the use of XSLT for publishing and the use of XQuery (with for example the ExistDB XML database) programming languages for querying.
For people not well versed into programming learning XSLT and XQuery can be quite difficult. Programming itself is not easy to learn, but there are specific aspects of XSLT and XQuery that further increase the difficulty. XSLT and XQuery are functional programming languages rather than imperative (as for example C, Python and Ruby are). The code of imperative programming languages consist of commands that are executed by the computer in order, which is easier to understand for a beginner. Furthermore XSLT files are itself XML files which are rather verbose compared to the plain text files used in other programming languages which is more compact and therefore it is easier to grasp the goal of the code.
This presentation will propose to use of the Python programming language to allow editors to query, manipulate and publish the data contained with the TEI/XML as an alternative to XSLT and XQuery.
Python is an imperative and dynamically typed programming language. These aspects make it relatively easy to learn. It has gained considerable popularity in recent years within the digital humanities.
Through this presentation the presenter hopes to inspire attendees to learn programming in general and Python in particular or otherwise generate debate about the different approaches.
The tranScriptorium project (2013-2015), funded by the EU Seventh Framework Programme, seeks to deliver solutions for the indexing, search, and full transcription of historical handwritten manuscript images, using modern Handwritten Text Recognition (HTR) technology. This paper aims to introduce two mutually-beneficial tools developed for this project: Transkribus and TSX.
Transkribus is a transcription and recognition platform integrating all core tools developed in tranScriptorium, namely Layout Analysis (LA) and Handwritten Text Recognition (HTR). With Transkribus users are not only able to generate thorough transcriptions which are directly linked to the relevant manuscript images, to collaborate with other researchers and to export their transcriptions in several standard formats, such as TEI, RTF, PDF but also to train HTR models and to recognize handwritten and printed documents. The platform addresses humanities scholars, archives and libraries, as well as computer scientists and volunteers to contribute and share resources.
TSX is a lightweight, browser-based transcription platform, designed to allow scholars and archives to easily set up discrete crowdsourced transcription projects. In practice, a document is uploaded to the Transkribus server, where it is subjected to line segmentation and the HTR engine, and then exposed to the crowd via TSX for transcription and encoding.
Volunteer transcribers benefit from TSX’s intuitive interface, and the HTR technology allows them to work with the manuscripts in three interconnected ways. They can transcribe and encode the manuscript with no assistance from the HTR engine, request a full HTR engine-produced transcript of a given manuscript image, and correct the transcript, or use interactive transcription to request suggestions for a subsequent word or words. TSX thus seeks to support users of all levels, and with varying levels of disposable free time, with aim of producing transcriptions which are saved on the Transkribus server.
This paper will discuss the functionality of the two tools, and how Transkribus and TSX can meet the various needs of scholars, archives, and volunteer transcribers, while bringing cutting-edge HTR technology to each user community.
The best digital editions of correspondence available today—the Van Gogh Letters Project, Mapping the Republic of Letters, Electronic Enlightenment—are true flagships for the current state of the art in digital humanities. But I would like to draw attention to a different type of project, more modest but at the same time more accessible to non-DH specialists who may contemplate digital source publication: an Omeka-based, almost DIY document collection that is constantly expanding and has evolving metadata that fits imperfectly within the Dublin Core element set. My presentation will reflect on the advantages and disadvantages of omeka.net and it will offer a historian’s “lay” perspective on the constraints and rewards of using controlled vocabularies, CSV imports, and Dublin Core elements for research and teaching purposes.
Our document collection (Vincentian Missionaries in Seventeenth-Century Europe and Africa: A Digital Edition of Sources) actually began in an XML and TEI environment, which was handled for us by the enthusiastic team of the now-closed Digital Humanities Observatory (Dublin, Ireland). When our initial funding period ended, the digital edition—which was only an optional component of a larger research project led by Dr Alison Forrestal at the National University of Ireland, Galway—was far from completed. Two years later, we decided to revive it with the help of a new sponsoring institution (DePaul University, Chicago), which had a keen interest in our topic but could only offer limited funding. Their Digital Services staff recommended moving our data to omeka.net because of the user-friendliness and export capabilities of that platform. Omeka.net indeed offers limited customization and visualization options, but it is a growing platform with an increasing number of plugins; most importantly, it does not require heavy technical assistance. I liked the arrangement because handling the collection was not my main activity and Omeka offered the flexibility and freedom of a true passion project—one on which I worked with only one student assistant, sometimes less than one day a week. And yet the collection grew exponentially. Two years ago, the online edition had 120 items and no attached transcriptions. It now has 630 items, 573 attached transcriptions, increasingly complex metadata, and it is still expanding.
From manuscript to digital edition: The challenges of editing Middle English alchemical texts
Sara Norja, email@example.com
Alchemy, later considered a pseudo-science, was one of the first experimental sciences in the Middle Ages and influenced the development of chemistry. A multitude of English medieval alchemical manuscript texts survive, written in both Latin and the vernacular. However, the uncharted material vastly outnumbers the texts edited so far. Indeed, according to Peter J. Grund (2013: 428), the editing of alchemical manuscript texts can be called “the final frontier” in Middle English (ME) textual editing. There are currently no digital editions of ME alchemical texts, although one is under preparation (Grund 2006). Indeed, there is to my knowledge only one digital edition of alchemical texts from any period: Isaac Newton’s alchemical manuscript material (Newman 2005).
In order for this branch of ME scientific writing to be used by e.g. historical linguists, more alchemical texts need to be edited – preferably in a digital form compatible with corpus search tools. This paper will discuss the challenges presented by ME alchemical texts and the ways in which a digital edition can address those challenges. A group of previously unedited and unresearched alchemical manuscript texts will act as a case study. These 10 texts date from the 15th–17th centuries and have been (falsely) attributed to the scholar Roger Bacon (c. 1214–1292?). My doctoral dissertation will include a digital scholarly edition of these Pseudo-Bacon texts, building on the standards for digital editing of ME texts proposed by Marttila (2014).
ME alchemical texts present several challenges to the editor. Perhaps the chief among these is that due to their fluid nature, with scribes combining and combining sections from various sources, it is often difficult to actually define what a certain ‘text’ is. This is a challenge if one is considering a printed edition where it is not feasible to attempt a documentary record of all the possible variations. However, a digital edition can solve this problem: because of the lack of issues such as printing costs, a digital edition of alchemical texts can provide all the versions of a text and represent their interrelations in a flexible manner. In addition, digital editions can easily provide multiple representations of alchemical texts with varying degrees of normalisation, thus catering to audiences both scholarly and popular.
In this paper I will discuss the issue of textual fluidity with regard to digital editing of alchemical texts, as well as other issues which can be resolved, or at least made less complex, with digital means.
Grund, Peter J. 2013. ‘Editing alchemical texts in Middle English: The final frontier?’. In Vincent Gillespie and Anne Hudson (eds.), Probable Truth: Editing Medieval Texts from Britain in the Twenty-First Century. Turnhout, Belgium: Brepols, 427–42.
Marttila, Ville. 2014. ‘Creating Digital Editions for Corpus Linguistics: The case of Potage Dyvers, a family of six Middle English recipe collections’. PhD dissertation. University of Helsinki, Department of Modern Languages. [http://urn.fi/URN:ISBN:978-951-51-0060-3, accessed 26 June 2015]
The Eep Talstra Centre for Bible and Computer (ETCBC)  is specialized in the study of the Hebrew Bible. Its research themes are:
linguistic variation in the historical corpus;
identification of the linguistic system of Hebrew and the way it is used in narrative, discursive and poetic genres;
solving interpretation riddles using thorough data analysis.
To this end, the ETCBC has created a text database annotated with linguistic information, which harbours decades of encoding work.
Recently, the CLARIN-NL project SHEBANQ  has made this work available online, and has produced several tools to work with this database. We found the Linguistic Annotation Framework (ISO, Laurent Romary and Nancy Ide)  a well suited formalism to base our curation efforts on and to develop tools for, and so we did: LAF-Fabric  is a new Python tool for handling big LAF-resources, and SHEBANQ is a web tool that can save textual queries as annotations.
Our approach deviates from the more usual workflow that encodes manuscripts resulting in XML documents, marked up according to the TEI guidelines. In our case, the text does not have a single plain text representation, but several. The linguistic information is not organized in a single hierarchy, but in several ones. At the same time, the text portion of the corpus is not growing, while the annotations do grow with every research project that is carried out. All these factors nudged us to adopt a radical standoff approach.
In our presentation, we will show in an interactive way how we are using LAF-Fabric to analyse our data, to create new visualizations, and to add new annotation packages. We will also demonstrate the SHEBANQ website, which facilitates researchers to discover, create, share, publish and comment textual queries on the corpus. Queries represent value in (a) that it requires considerable skill to design them, and (b) that they point to interesting properties of the text that are not easily visible to the unaided eye, even if you are a Hebrew scholar.
Because we have archived our data for Open Access and developed our tools as Open Source, every part of the analysis with these tools is reproducible. We can also play nice with other tools, and as an example we will show how SHEBANQ interlinks with Bible Online Learner, an educational tool based on the same data and queries.
Developing a Scholarly Collation Editor for New Testament Manuscripts
Catherine Smith, Institute for Textual Scholarship and Electronic Editing, University of Birmingham
This paper will introduce the collation editor developed as part of the Workspace for Collaborative Editing funded by the AHRC (UK) and DFG (Germany) between 2010 and 2013 (http://www.itsee.birmingham.ac.uk/). The goal of the Workspace project was the creation of an online platform to support the production of the Editio Critica Maior of the Greek New Testament by teams based in Birmingham, Münster and Wuppertal and other international collaborators. The Workspace project uses XML and JSON as its data formats and includes a WYSIWYG editor for transcribing manuscripts (produced by the Trier Center for Digital Humanities) and a collation editor for creating the Greek apparatus.
After introducing the Workspace project as a whole the paper will focus on the collation editor. The Collation Editor provides an interface to the CollateX engine developed by the INTEREDITION project (http://www.interedition.eu/ and http://collatex.net/). This software performs one of the most mechanical and error-prone tasks in an edition, namely the comparison of all witnesses in each variation unit to build up a critical apparatus. Each file is aligned using an algorithm taking into account not just spelling variations, additions, omissions and substitutions, but also transpositions within each block of text. However, the output still requires considerable input from scholars in order to clean up the raw data for publication as a critical apparatus. The first stage of the collation editor is regularisation (the elimination of insignificant variations such as spelling errors) which works interactively with CollateX. The user is able to make regularisations and then recollate the data; this can often improve the alignment of the witnesses. The second stage involves setting the length of each variant unit, as well as correcting any misaligned text. It also enables users to make overlapping variants for data which is best displayed as two units of different lengths. The interfaces have been developed using the redips drag-and-drop library (https://github.com/dbunic/REDIPS_drag) which enables users to interact directly with the data.
The Greek New Testament provides a very specific use-case, with a large amount of data already created and highly developed editorial principles. In addition, ongoing work by existing editorial teams offers the opportunity for immediate testing in real-life situations. Developing in these circumstances can be a challenge, with the evolution of guidelines, changes of editorial practice and ‘creeping featurism’. The paper will conclude by reflecting on the development process from a programmer’s perspective outlining the challenges faced and lessons learned along the way.
The technical architecture of the Carl Maria von Weber Digital Edition
Carl Maria von Weber (1786–1826) was a German composer who is nowadays mainly known for his best selling opera “Der Freischütz” although he has composed more than 150 works (ranging from operas to piano pieces) in total and has been of substantial influence on the history of musical composition in the whole 19th century. Next to his musical Œuvre he wrote several articles for music journals and even sketched a novel which unfortunately remained unfinished. Additionally, to these works intended for the public the Carl-Maria-von-Weber-Gesamtausgabe (WeGA) aims at publishing Weber’s private correspondence and diaries until Weber’s 200th anniversary in 2026. The WeGA was genuinely a ‘traditional’ scholarly project funded by the German Academy of Sciences and Humanities but has turned during the last years more and more into a ‘modern’ digital editions’ venture both in the field of musical and text edition.
Every primary (e.g. diary entry, letter, article) as well as every secondary textual object (e.g. prosopographies, commentaries, descriptions of works) are encoded in conformance with the current TEI P5 guidelines (additionally the encoding of metadata for musical works follows the guidelines of the Music Encoding Initiative). Schemas have been developed and documented for every type of text making use of TEI’s meta language ODD. The overall encoding focus—besides the textual features—lies on the markup of persons, places, works and roles (e.g. in an opera or play).
We regard the TEI files (currently > 30.000) as the genuine edition and try to ensure data integrity through customized TEI schemas and automated tests. Furthermore, those TEI files are kept in a private subversion repository facilitating collaborative work (of the staff members) and the traceability of changes. Finally, a Redmine project management system helps us plan and monitor issues, providing additional collaborative features such as wikis, bulletin boards, calendars, etc.
This talk aims to deliver a thorough insight into the workflows and technical architecture of the WeGA. This includes the whole ‘life cycle’ from data capture and conversion (to TEI-XML) to the publication of this data as HTML and possible other serializations. Alternative approaches for the aforementioned issues will be discussed where appropriate as well as shortcomings of the presented solutions.
The WeGA ODD files are available at https://github.com/Edirom/WeGA-ODD.
Editorial Tools and their Development as a Mode of Mediated Interaction
Tuomo Toljamo, firstname.lastname@example.org. King’s College London
Encouraged by the Convention’s exploratory aims, this talk poses the following question: Is there something to be gained from thinking about the production and sharing of editorial tools as mediated interaction—as communication mediated by the tools themselves?
Shifting the focus from technical merit and capabilities, this talk considers the production and sharing of tools as a mode of interaction: here, editorial tools themselves are seen as complex products used to interact, to communicate, to collaborate with others. To approach the topic and to illuminate the current situation, the talk first reminds us of oft-stated issues regarding the use and production of tools, and then, considers some of the existing proposals to address them. Then, it attempts to provide some means for thinking about editorial tools in terms of material interaction. Then, these means are applied onto a case example. Finally, the talk concludes by returning to the initial question.
With this question, the talk attempts to explore whether this perspective could help us to see some of the issues with editorial tools and their production in a new light, uncover new ones, or give rise to different kinds of solution proposals. Overall, the talk relates to such issues as challenges in cross-disciplinary collaboration and communication as well as to discussions of responsible models of tool production.
The research is carried out in the Digital Scholarly Editions Initial Training Network (DiXiT) which is funded under Marie Curie Actions within European Commission’s 7th Framework Programme (Grant Agreement no. 317436).
TEI Simple Processing Model. Abstraction layer for XML processing
The Guidelines of the Text Encoding Initiative Consortium (TEI) have been used throughout numerous disciplines producing huge numbers of TEI collections. These digital texts are most often transformed for display as websites and camera-ready copies. While the TEI Consortium provides XSLT stylesheets for transformation to and from many formats there is little standardization and no prescriptive approach across projects towards processing TEI documents.
TEI Simple project aims to close that gap with its Simple Processing Model (SPM), providing the baseline rules of processing TEI into various publication formats, while offering the possibility of building customized processing models within TEI Simple infrastructure. For the first time in history of TEI there exists a sound recommendation for default processing scheme, which should significantly lower the barriers for entry-level TEI users and enable better integration with editing and publication tools.
Possibly of even greater significance is the layer of abstraction provided by TEI SPM to separate high-level editorial decisions about processing from low-level output format specific intricacies and final rendition choices. SPM aims to offer maximum expressivity to the editor, at the same time encapsulating the implementation details in TEI Simple Function library. A limited fluency in XPath and CSS should be enough to tailor the default model to specific user’s needs in a majority of cases, significantly reducing time, cost and required level of technical expertise necessary for TEI Simple projects.
This presentation aims to explain both theoretical foundations of TEI SPM and practical aspects of using it with a collection of TEI documents that can be converted to TEI Simple. It is hoped that editors, curators and archivists as well as developers dealing with TEI will benefit from employing TEI SPM in their workflows.