Internships
The Mind-Bending Grammars project combines biggish data analysis with historical and theoretical linguistics. The project team welcomes interns from:
- MA in Artificial Intelligence (MAI)
- MA in Linguistics or related (MAL)
Apart from the contents of the internship, interns will also experience what it is like to be part of a prestigious team-based research project.
Topics for MAI students
- Build a classifier with GUI training interface to assign cleft sentences (a complex type of syntactic structure)
- Starting point is existing training data provided by in-house researcher
- MAI intern combines a similarity assessment algorithm (to be selected at the start of the internship), syntactic parsing, and user feedback to increase accuracy
- Acquiring familiarity with English historical text corpora and how to process them (e.g., spelling normalization)
- Building a genre classifier for EMMA, a large corpus of historical texts, starting from a gold standard. Targets are
- Automatic assignment of most probable genre to untagged texts
- Top 3 of most probable genres for in-between cases
- Automatic identification of parts of texts (and their boundaries) that represent different genres (e.g. a biography may contain both narrative prose, letters, and diary fragments)
- Refining a within-text language identifier to identify foreign language passages in English historical texts
- Refining an existing algorithm for the detection of multi-word passages in Latin/French
- This includes training the data on a contemporary corpus of Latin & French texts
- Including other foreign languages in the algorithm (e.g., Welsh)
- We are open to other topics related to automatic enrichment and annotation of corpus data
Topics for MAL students
- Interns will be introduced into constructionist corpus linguistics, including
- Annotation of one of the case studies
- Support in the compilation of the corpus