Datalinks Wiki
Open-Content Text Corpus

Type

Dataset

Link

https://sourceforge.net/projects/octc/

Source

Ckan.net

The OCTC hosts open-content texts, encoded in TEI P5 XML, for many languages, each in a separate subcorpus. Another part of the OCTC stores interlanguage alignment info. The project is intended to be an open platform for academic and research projects of various kinds (tool-, markup-, or language-documentation-oriented) and for collaboration on multilingual corpus encoding in general and application of the TEI Guidelines for that purpose in particular. ("TEI" stands for the href="http://www.tei-c.org/" title="TEI homepage">Text Encoding Initiative.)