PierpontMorgan

Coptic Scriptorium

Seal of the University of the Pacific GeorgetownUniversitySeal neh logo

Coptic SCRIPTORIUM (Sahidic Corpus Research: Internet Platform for Interdisciplinary multilayer Methods) is a collaborative, digital project created by Caroline T. Schroeder (University of the Pacific) and Amir Zeldes (Georgetown University). The team is constantly growing.

Coptic SCRIPTORIUM provides a platform for interdisciplinary and computational research in texts in the Coptic language, particularly the Sahidic dialect.  As an open-source, open-access initiative, the SCRIPTORIUM technologies and corpus function as a collaborative environment for digital research by any scholars working in Coptic. It provides:

We hope SCRIPTORIUM will serve as a model for future digital humanities projects utilizing historical corpora or corpora in languages outside of the Indo-European and Semitic language families.

The latest release notes and news about the project are on C. Schroeder's blog. A video introduction to the project, including how to use ANNIS, is also available.

Please read our Frequently Asked Questions for more information on the project, methodologies, and terminology.

We hosted a workshop on digital research and scholarship in Coptic at Humboldt University on May 14, 2013. The program and presentations are available.

GitHub_mark You can also fork us on GitHub.

Corpora

The corpora below offer some examples of mark-up for diplomatic transcription and normalization. Most data is available in TEI XML, PAULA XML and relANNIS for use with the ANNIS corpus search software. Links are provided to search the corpus online in ANNIS. Individual documents can also be viewed in HTML for reading purposes in either diplomatic or normalized transcriptions with English translations. [For more information on TEI, PAULA, and ANNIS, check out our FAQ.]

All corpus data generated by the SCRIPTORIUM project is licensed under the Creative Commons Attribution 3.0 Unported License unless otherwise indicated.

Creative Commons License

Searching the Corpora: Example Queries

The search and visualization tool ANNIS is the most powerful way to use the texts for research purposes. We've provided some sample queries below to demonstrate some of the kinds of searches you may construct. ANNIS queries use either regular expressions or the ANNIS query language. If you are familiar with ANNIS or regular expressions, jump right in. If not, you may wish to try some of the sample queries and then substitute terms or search parameters to adapt them to your needs and learn the system. After clicking on the magnifying glass, you will be taken to a new page with the ANNIS query and results. The query will appear in the box on the upper left. The corpus/corpora you are searching will be selected on the lower left. And your search results will appear in the panel on the right.

    • Search for Greek verbs in multiple corpora:
      searchpos="V" & source_lang="Greek" & #1 _=_ #2
    • Search for focalizing converters in Besa's letters:
      searchpos="CFOC"
    • Look for locational expressions in the Apophthegmata Patrum corpus:
      searchentity="place"
    • Find some mentions of the following terms of kinship in the translation of Abraham our Father:
      searchtranslation=/.*([Mm]other|[Bb]rother|[Ff]ather|[Ss]ister|[Ss]on|[Dd]aughter).*/
    • Search for lines ending with a letter written in small print in Besa's letters:
      searchhi_rend=/.*small.*/ & lb_n & #1 _r_ #2
    • See how many lines of Abraham Our Father don't come from the manuscript MONB.YA:
      searchlb & meta::msName!="MONB.YA"
    • Find words with the morpheme ⲙⲛⲧ- in Besa's letters and Shenoute's Acephalous Text 22:
      searchmorph="ⲙⲛⲧ"
    • Find common nouns referring back to proper names in the Apophthegmata Patrum corpus:
      searchpos="N" & pos="NPROP" & entity & entity & #3 ->coref[type=/diff|appos/] #4 & #3 _r_ #1 & #4 _r_ #2

Acephalous Work 22 by Shenoute

Abraham Our Father by Shenoute

Letters of Besa

Apophthegmata Patrum

Bible: Gospel of Mark

Note: This corpus is derived from the Sahidica New Testament, which was released by Warren Wells and made available for free electronic distributionfor academic use only. It is not licensed CC-BY; click here for Sahidica licensing information.

Tools

Some of the tools below use a Sahidic Coptic lexicon based on data kindly provided by Prof. Tito Orlandi and the CMCL project. When using the part-of-speech tagging models or the tokenization script and its lexicon please make sure to refer back to the CMCL project.

Part-of-Speech Tagging

Converters


Acknowledgments

The project is supported by the National Endowment for the Humanities Office of Digital Humanities and Division of Preservation and Access, the University of the Pacific, Georgetown University, and Humboldt University.

Page last updated 5 September 2014