Internships

The Mind-Bending Grammars project combines biggish data analysis with historical and theoretical linguistics. The project team welcomes interns from:

Apart from the contents of the internship, interns will also experience what it is like to be part of a prestigious team-based research project.

Build a classifier with GUI training interface to assign cleft sentences (a complex type of syntactic structure)
- Starting point is existing training data provided by in-house researcher
- MAI intern combines a similarity assessment algorithm (to be selected at the start of the internship), syntactic parsing, and user feedback to increase accuracy
- Acquiring familiarity with English historical text corpora and how to process them (e.g., spelling normalization)
Building a genre classifier for EMMA, a large corpus of historical texts, starting from a gold standard. Targets are
- Automatic assignment of most probable genre to untagged texts
- Top 3 of most probable genres for in-between cases
- Automatic identification of parts of texts (and their boundaries) that represent different genres (e.g. a biography may contain both narrative prose, letters, and diary fragments)
Refining a within-text language identifier to identify foreign language passages in English historical texts
- Refining an existing algorithm for the detection of multi-word passages in Latin/French
- This includes training the data on a contemporary corpus of Latin & French texts
- Including other foreign languages in the algorithm (e.g., Welsh)
We are open to other topics related to automatic enrichment and annotation of corpus data

Interns will be introduced into constructionist corpus linguistics, including
- Annotation of one of the case studies
- Support in the compilation of the corpus