Work Package 2 : Linguistics

This subproject aims to address questions regarding specific language use and development on the island of Sicily in antiquity, such as how did the use of individual languages change over time and in what ways did the different languages in use on the island, in diverse contexts and on diverse materials, interact with and affect each other. This exploration will focus firstly upon the relationship between Greek and Latin on the island over 1,000 years, since the epigraphic evidence is richest for these two languages. However, no less important are the interactions between Phoenician, Greek and the indigenous languages (Elymian and Sikel) in the archaic/classical periods (7th-4th centuries BCE). A comprehensive analysis of the earliest uses of the written word on the island will constitute a unique study of a formative period in Mediterranean history, in the regional development of early Greek, and of linguistic interaction in a colonial environment. Detailed study of linguistic practice over time and space through the primary evidence will reveal much about the social registers of use of Greek and Latin on the island over time, in the face of Greek colonisation, Roman conquest, Roman colonisation, and the influence of Christianity – and will in turn have significant implications for the core study of epigraphic culture and the socio-cultural history of the island.

Bilingual inscription advertising production of inscriptions, probably from Palermo, C1 CE (ISic000470, Palermo Museum, photo J. Prag)

Previous study has been hampered by the very incomplete dataset available for analysis. Consequently, this subproject will take the new corpus of all Sicilian texts as its starting point and transform it into a tool for systematic, computational linguistic analysis as the basis for a new and wide-ranging study of the linguistic history of the island. Such a study will go far beyond any existing study, not simply in the systematic nature of its coverage, but in its cross-linguistic range, and its temporal coverage from the Archaic period to late Antiquity. This will be achieved by extending the mark-up of the texts in the TEI corpus, through a systematic programme of the tokenization of sentences and words, parts-of-speech tagging of the individual words, lemmatization of individual words, and lastly syntactic analysis. A wide range of Natural Language Processing tools already exists to support this work, principally developed for literary texts. A pipeline of such tools, applicable to multiple ancient languages, has been consolidated in the Classical Languages Toolkit (Burns 2019). Current work on the PapyGreek project (PI Marja Vierros, on the Advisory Board for this project) is developing the Sematia platform, to enable the application of these tools to the more complex corpus of Greek papyrological texts, which are also encoded in EpiDoc (Vierros 2018; Vierros and Henriksson 2017). This provides the model and the tools for the creation of linguistic layers from the EpiDoc corpus of I.Sicily. Work is also underway in the LiLa project to consolidate and unify the tools specifically for Latin (Francesco Mambrini, researcher on the LiLa project is on the Advisory Board for this project). Additional support on the Advisory Board is provided by Professor Wolfgang De Melo (University of Oxford), an expert in Latin historical linguistics; and Dr Alex Mullen, PI on the LatinNow project studying sociolinguistics of the northwestern Roman empire through epigraphic evidence.

This sub-project will be led by a three-year post-doctoral researcher in the field of historical linguistics, over years 2-4 of the Crossreads project.