Pundit and Muruca are open source software developed by Net7 srl. The open source semantic annotation tool Pundit (www.thepund.it) is Net7’s main product for the Digital Humanities. The main idea behind semantic annotation is to enable users not only to comment, bookmark or tag web pages, but also to semantically create structured data while annotating. The ability to express semantically typed relations among resources, relying on ontologies and specific vocabularies, not only enables users to express unambiguous and precise semantics, but also, more interestingly, fosters the reuse of such knowledge within other web applications.
Pundit allows annotators to include machine-readable semantics in their annotations, by setting up links to the web of data and by collaboratively building a knowledge graph that connects and contextualizes (unstructured) web content. It enables this structured knowledge to emerge in online communities so that a variety of applications can exploit it by, for example, providing a powerful semantic search, building innovative ad-hoc data visualizations, or improving the way scholars explore the web. Muruca (www.muruca.org) is a semantic framework for building Digital Libraries which is used by several projects (for a list, see: http://muruca.netseven.it/muruca-is-used-by/).
The workshop will be organized into several parts. Firstly, we will introduce Pundit main features, and we will show how it works in practice in some projects which are using Pundit to annotate texts and images. We will make a detailed demonstration on the integration of Pundit and Muruca by showing the full editorial workflow (backend) and the content presentation (frontend) in Burckhardtsource.org (www.burckhardtsource.org).
In the second part of the workshop, attendees will be introduced to semantic annotation through some basic and advanced exercises.
In the end, the annotations made by students will be shown through a few visualizations. We will then discuss the results and distribute a short survey through which gathering students impressions and feedbacks.
Attendees are required to bring a laptop with Google Chrome installed. It would be useful that attendee indicate us a couple of Digital Libraries (with English texts) to be used for the exercises. We will then choose one out of them.
The TEI infrastructure for the encoding of humanities texts is rich and mature. It contains extensive guidelines, stylesheets for display and conversion, schema’s, a customisation mechanism for schema’s, an active maling list, an membership-based organisation and a large community. There is a TEI way of doing text encoding that seems to provide most of the things an encoder needs. Still, only a step away, often within the same organisations, one finds people from other communities – libraries, linguistics, cultural heritage – that take a quite different approach to these same issues. This is especially true in the less used or later developed sections of the TEI Guidelines, such as taxonomies (infrequently used) or the facsimile element (new). Standards that can be seen as in some way complementary to TEI or competitors of TEI include Dublin Core, SKOS, METS/MODS, OAC, FRBR, and CIDOC CRM.
Given that digital projects often need to serve the needs of multiple communities, the question arises how to handle situation where multiple standards might seem appropriate. Do we, say, embed Dublin Core in TEI? Do we generate DC from TEI? Do we maintain separate files? Do we choose a single format as the master format, from which to generate all others?
This workshop will look at these and similar questions. We look specifically at the standards proposed by the Open Annotation Collaboration and its application in Shared Canvas and its successor IIIF as well as the CIDOC CRM ontology for the cultural heritage domain. Øyvind Eide (Passau University) and Stefanie Gehrke (Biblissima project) will introduce us to these standards and their application in a TEI context.
In the morning, we will look at OAC/Shared Canvas. After an introduction by Stefanie Gehrke we discuss a number of practical cases (perhaps in breakout sessions). In the afternoon, we use a similar model for CIDOC CRM and ontologies, introduced by Øyvind Eide. We round off with a discussion about general implications.
Participants are welcome to introduce cases for discussion from their own projects or projects at their institutions. In this case, please contact the workshop organisers, preferably before September 1.
The maximum number of participants is 12. DiXiT fellows take precedence. Others are admitted in order of registration.
About the speakers:
Øyvind Eide holds a PhD in Digital Humanities from King’s College London (2012). He was been an employee in various positions at The University of Oslo from 1995 to 2013. From October 2013 he is a Lecturer and research associate with the Chair of Digital Humanities at The University of Passau. Eide is member of the board of The European Association for Digital Humanities (EADH). He is also one of the two founding conveners of the Ontologies SIG of the Text Encoding Initiative. His research interests are focused on conceptual modelling of cultural heritage information as a tool for critical engagement with the media differences, especially the relationships between texts and maps as media of communication.
Stefanie Gehrke is metadata coordinator of the équipex Biblissima (since 2013). Biblissima is an online digital library, which provides easy and coordinated access to a huge and complex mass of documentation on manuscripts and early printed books, the texts contained therein, their circulation and their readers, from the 8th to 18th centuries (digitisations of early documents, documentary databases, editions, as well as tools to understand these documents and to produce new data). Biblissima’s prototype on illuminated manuscripts is build by using existing standards like Shared Canvas.
Having received diplomas in scandinavian language and litterature and in theology, Gehrke worked from 2008 to 2012 at the manuscript departement of the Herzog August Bibliothek in Wolfenbüttel, Germany. From 2010 to 2012, she participated in the project Europeana Regia, in the second half of the project responsible for the WP 3 (Metadata).