Building corpora and breaking down boundaries

The blog of the Crossreads project based at the Faculty of Classics, University of Oxford 2020-2025. We will be adding regular updates on our research and publications.



The idea of collecting inscriptions for the study of the ancient (Greco-Roman) world has a long history. Every collection makes new things possible; but every collection also brings its own limits. It is worth briefly reflecting on the history of such collections for Sicily in order to illustrate why we think it is worth doing this again, now, and in a digital format.


The collecting of inscriptions effectively began in the fifteenth century, with the first large-scale collections published in the early sixteenth century. Much of this work focused on Rome and the Roman empire, culminating in the Flemish scholar Jan Gruter’s monumental Inscriptiones antiquae totius orbis romani of 1602/1603. Gruter’s work (a huge collaborative effort to which many contributed) developed more fully the classification of inscriptions which still provides the basis for those adopted in the major corpora of the nineteenth century. The same volume, through the efforts of Joseph Scaliger, demonstrated the value of indices for such a work (Scaliger observed that ‘the index really is the soul to this body [corpus]’). For an excellent exploration of the developing study of epigraphy as a source for ancient history in the renaissance (and the source of the quotation from Scaliger, on p.150), see William Stenhouse’s Reading Inscriptions and Writing Ancient History (2005).


Sicilian material also seems first to have been recorded and collected in the sixteenth century. Greek and Latin inscriptions are cited in several sixteenth-century histories of the island, particularly the work of Tommaso Fazello (de rebus Siculis, 1558) and Octavius Caietanus (1566-1620) (Isagoge ad historiam sacram Siculam, published posthumously in 1707).


The first true corpus for Sicily was compiled by a young German scholar Georg Walther (Gualtherus), and published in Messana in 1624 (actually early 1625; an abortive and very rare first edition was published in Palermo in 1624). Gualtherus collected c.357 texts from across the island. He organised his collection geographically, by city, rather than typologically, without a table of contents, but supplemented by two somewhat idiosyncratic indices.

gualtherus1624 distribution

Gualtherus’ work was followed in 1769 (2nd edition 1784) by the corpus of Gabriele Lancillotto Castelli, principe di Torremuzza, his Siciliae et objacentium insularum veterum inscriptionum nova collectio. Castelli more than doubled the number of inscriptions recorded (c.729), but he now divided them up by category rather than origin, after the fashion of scholars such as Gruter, and included a similar set of basic indices focused upon names, institutions, and ‘res notabiliae’.


Both Gualtherus and Castelli attempted to include all known inscriptions, although they concentrated primarily on material on stone, and increasingly included only Greek or Latin texts. By the time of the third major attempt to collect together the inscriptions from the island, in the late 19th century, Greek and Latin were separated out fully, with Greek inscriptions gathered by Georg Kaibel in Inscriptiones Graecae Siciliae et Italiae (1890, volume XIV in the Berlin Inscriptiones Graecae project) and Latin by Theodor Mommsen in Corpus Inscriptionum Latinarum, vol. 10 part 2 (1883). Between them, Kaibel and Mommsen recorded c.1,177 texts (not counting the instrumenta), organised geographically but also ordered typologically within each locale, and supplemented by indices.

Since then, a number of individual museum catalogues have been published (separate Greek and Latin catalogues for Palermo and Termini; combined ones for Messina and Catania) and various thematic collections (the Archaic Greek inscriptions, divided geographically; selections of Greek inscriptions for the study of Sicilian Greek). There has, however, been no further attempt to unite the material, even in a single language (for a more detailed history of the Sicilian corpus tradition, see the paper by Stefania De Vido


The lack of an up-to-date corpus is a problem for any attempt to study the history, culture and languages of ancient Sicily (remember that the motivation for the original collection of inscriptions was precisely to facilitate such work). Most obviously, there is a lot of new material that has been found since 1890, and not all of it has been published in any form.

However, before rushing to create a new corpus, it is worth thinking about the limitations which each of these previous corpora created: by their very format and their organisation, every corpus both creates and constrains. The geographical organisation of the work of Gualtherus, Mommsen and Kaibel allows one quite easily to compare the local distribution of inscriptions (see the maps above); but Castelli organised his material by category alone and his indices did not include provenance. Since we have not yet systematically included reference to Castelli’s corpus in I.Sicily, we cannot yet generate a comparable map – a reflection of the fact that it would be a long and tedious task for anyone to put that information together from his work. But even geographical organisation entails decisions, more or less conscious, which in turn carry implications and potentially shape future narratives: Mommsen began CIL X.2 from Messina, in a Romano-centric perspective, approaching the provincia from Rome; Kaibel began IG XIV with Syracuse, a Helleno-centric view, prioritising what was seen as the most important Greek settlement. Gualtherus by contrast began his first edition with Palermo, his second with Messina, in an effort to please his civic patrons. Every such choice inevitably shapes the use which readers make of the work.


More obvious, perhaps, are the divisions created by choices to focus on individual languages. The prioritisation of Greek and Latin over, for instance, Oscan and Punic is very familiar, but particularly problematic on an island as multilingual as Sicily.

Gualtherus included inscriptions in Oscan and Arabic from Messina (see; but the most recent corpus of inscriptions from Messina explicitly limits itself to Greek and Latin (the material is indexed at: The third-century BC Oscan material is, however, directly contemporary with the use of Greek and Latin on the island and cannot be divorced from its context (in this case, the occupation of Messana by Campanian mercenaries). In similar fashion, the Punic inscriptions of western Sicily have always been treated wholly separately from the contemporary Greek epigraphy (even though there are, e.g., Greek funerary inscriptions in the necropolis of Punic Motya).

punic abecedary

Punic abecedary (alphabet), Selinunte, C4 BCE (Palermo Museum, photo J. Prag

This separation of languages becomes even more problematic when the extensive and contemporary Greek and Latin texts are treated in distinct projects (IG and CIL, or the individual museum catalogues which separate material within collections as well as in time and space). Scholars such as Kalle Korhonen have begun to explore the richness of the linguistic interactions on Roman Sicily (e.g., but such work is constantly hampered by the separation of the evidence. The falseness of the separation is vividly illustrated by bilinguals such as the famous stonecutter’s advertising sign from Palermo.

Every corpus entails choices and limits. As Scaliger emphasised, indices are therefore fundamental to the true value of such a corpus, even if those can only overcome some of the limitations created by the internal organisation. The I.Sicily corpus will bring its own limits (at the time of writing, it only includes inscriptions on stone, for example). However, the possibilities of digital cataloguing and publication greatly reduce those limits and enable the elimination of many of these boundaries.


At the most basic level, we aim to be as inclusive of material, language, and period as possible. More importantly, however, our ambition is that the interface for the user should be as unprescriptive and as flexible as possible (see the current interface at: Each user will have different interests and different questions. To that end, we aim to make as many different facets of the data available as possible, so that it is up to the user whether they approach the material by period, place, language, typology, content, material form, publication, etc. Of course, such choices are ultimately still limited by the data that we include and encode – but in a collaborative, open access project of this sort, such limitations will hopefully inspire new projects and collaborations to break down those boundaries also. One such community project is the ongoing work to develop agreed data standards across the wider epigraphic community (a focus of work at the recent V and Linked Pasts VI, which will make it ever easier not just to break down the boundaries within Sicilian epigraphic culture, but those of the island itself.