Ongoing and past projects | Centre for Computational Linguistics, Psycholinguistics, and Sociolinguistics

Ongoing projects

Premodern Antwerp Spelling Traditions Analysed (PASTA). A diachronic graphematic analysis of Antwerp 'schepenbrieven' (13th-16th century). 01/04/2026 - 31/03/2030

Abstract

PASTA (Premodern Antwerp Spelling Traditions Analysed) examines how late medieval and early modern Antwerp scribes encoded speech in writing, which phonetic and other principles motivated those practices, and how the implementation of these principles evolved over several centuries. Spelling systems are governed by the interacting principles of pronunciation, uniformity, analogy, etymology and graphotactics. For Dutch, there is still no large-scale, diachronic, token-based study of a single urban writing centre that jointly analyses these principles and the hierarchy of phonetic features expressed in spelling. PASTA fills this gap with a four-century corpus of Antwerp 'schepenbrieven' (aldermen's charters, 13th–16th c.), combining existing materials with circa 200 newly transcribed charters from the Felixarchief, all uniformly lemmatised and PoS-tagged. A dedicated grapheme-phoneme correspondence layer, implemented in an open-source Python module, supports quantitative and qualitative analyses of feature consistency through time. By comparing Antwerp outcomes with later norms, the project reconstructs an "alternative history" of Dutch orthography.

Researcher(s)

Promoter: De Wulf Chris
Co-promoter: Pijpops Dirk

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

How our own lexical biases are determined by speakers of other language varieties. Exemplar-driven vs. index-driven lectal contamination. 01/01/2026 - 31/12/2029

Abstract

Mechanisms of language variation and change often take as starting point lexical biases in grammatical variation, i.e. the finding that particular words engender speakers to prefer one construction over another while forming utterances. For example, a frequent word that is biased towards a construction may 'rub off' its meaning to the construction itself. What is unclear, however, is how such lexical biases develop in the first place. To understand this, the project introduces two mechanisms that can create such lexical biases, viz. exemplar-driven and index-driven lectal contamination. Both mechanisms start from language contact between two varieties of the same language, but differ in how such contact leads to lexical biases within the varieties. Exemplar-driven contamination relies on the cognitive storage of exemplars, while index-driven contamination assumes that the words and constructions act as social indices. A pilot study focusing on nominal morphological variation has already been completed, with positive results. The project will conduct three more corpus-based case studies that test the effect of both mechanisms among other types of variation. Next, I will build an agent-based simulation of each mechanism. This will allow us to validate both mechanisms in-silico and derive exact theoretical predictions for each mechanism. Finally, these predictions will be put to the test through corpus research and a forced-choice and receptive experiment.

Researcher(s)

Promoter: Pijpops Dirk

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

How to exaggerate speech sounds when speaking to your child? Acoustic cue-weighting in production and perception of infant-directed speech. 01/11/2025 - 31/10/2028

Abstract

The proposed research project investigates how infants learn speech sound categories from the linguistic input they receive: Infant-Directed Speech (IDS). Previous research has shown that speech sounds are best distinguished by a combination of primary, essential cues and secondary, yet informative, cues. However, research on IDS and early language acquisition has largely overlooked the role of these secondary cues. Therefore, our understanding of IDS and its role in language acquisition remains overly simplified. To address this lacuna, the present project aims to develop a more comprehensive understanding of how speech sounds are phonetically contrasted through multiple cues in IDS, whether this facilitates speech sound categorization, and how this affects language acquisition. For this purpose, we will combine large-scale corpus studies of spontaneous speech production with perception experiments. Specifically, we will investigate (1) whether parents exaggerate a single fundamental cue or a combination of cues to help children learn phonological categories through the acoustic analysis of a corpus of parent-infant interactions, (2) we will test in the same corpus whether variations in caregivers' cue-weighting can (partially) explain individual differences in the infants' own cue-weighting in production and (3) we will examine how IDS cue-weighting shapes phoneme categorization, bridging the caregiver input and infant output by means of targeted perception experiments.

Researcher(s)

Promoter: Bernolet Sarah
Fellow: Genette Jérémy

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Emotions without borders: Studying the verbalization and automatic detection of emotions across languages. 01/05/2025 - 30/04/2029

Abstract

Emotions have attracted a lot of attention in psychology, socio- and psycholinguistics and communication science, but since the past decade also in the fields of computational linguistics and natural language processing. In the latter fields, the term emotion detection is used to refer to the task of automatically identifying fine-grained emotions in texts. Research on emotion detection has mainly focused on English, but with the emergence of (multilingual) large language models, the interest in multilingual approaches to emotion detection increased. Meanwhile, state-of-the-art research in psychology have developed new theories about emotion, claiming that emotions are not universal: neither in conceptualization, nor in emotion expression. This might have consequences for how multilingual emotion detection models work. Therefore, it is crucial to investigate differences in emotional language use across languages. Most studies that deal with the cultural component of emotions are limited to studying the translatability of emotion words, or focus on very specific cases and language pairs. In this research project, we will transcend the word level and go beyond the comparison of language pairs by comparing emotion verbalization across ten languages, using methods from computational linguistics. Moreover, we will investigate how state-of-the-art emotion detection models deal with cross-lingual differences in emotion verbalization.

Researcher(s)

Promoter: De Bruyne Luna
Fellow: Rognan Hannah

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Webcare through the eyes of the bystander: A cross-linguistic comparison of pragmatic-rhetorical features in hotel review-response interactions. 01/01/2025 - 31/12/2028

Abstract

Webcare, as a manifestation of digital reputation management, has become ubiquitous within the tourism industry. The significance of this online customer service communication, accessible to all, cannot be overstated. It demonstrates a commitment for guest satisfaction, thereby positively influencing the hotel's image. Consequently, it can sway prospective clients who, as bystanders, pursue this communication and subsequently opt for a specific hotel. Although recent studies suggest that guest reviews and hotel responses are influenced by cultural factors, cross-cultural analyses of hotel interactions remain limited and scarce in terms of the languages and cultures investigated. Therefore, the objective of this project is to conduct a cross-linguistic study of a multilingual corpus consisting of 80,000 hotel reviews and their corresponding responses in German, French, English (UK/US), Italian, Dutch, and Spanish (ES/MX). Specifically, this project aims to explore the cross-linguistic characteristics of hotel interactions in L1. It seeks to identify which of these characteristics are perceived as positive or negative by the bystander, who is ultimately the intended audience for these responses. The knowledge gained from this foundational research will inform the fields of pragmatics and marketing communication and present opportunities for the development of generative AI systems that can automatically craft responses tailored to the linguistic and cultural context.

Researcher(s)

Promoter: De Bruyne Luna
Co-promoter: Boone Griet
Co-promoter: Raedts Mariet

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Classification of online multimodal data. 01/01/2025 - 31/12/2026

Abstract

In an increasingly digital world, we are confronted with an information supply that is not only increasing in scale, but is also becoming increasingly diverse in form. This often involves multimodal data, in which text, audio, image and video go hand in hand. The dominance of multimodal data in the online sphere offers both interesting opportunities and challenges in research disciplines such as computational linguistics and natural language processing (NLP). On the one hand, the diversity of multimodal data allows us to obtain a more complete picture of human communication (for example on social media, forums or on online news platforms), including relatively new forms of communication such as memes, vlogs and podcasts. On the other hand, robust automatic classification methods are required to allow large-scale and efficient analysis of this type of data. Computational linguistics and NLP mainly focus on the automatic processing of text. An important research domain within these disciplines is text classification, with typical examples being sentiment analysis (in which labels such as 'positive', 'negative' or emotion categories are assigned to texts), hate speech detection, topic classification or the detection of fake news. In many cases, only unimodal (text-only) data is collected, which means that a significant part of the data available online is not used. In many other cases, only the textual part in multimodal data (for example, transcribed text in audio/video, or text without images on social media) is included for the automatic analysis. This leads to potentially important information from other modalities being ignored, or having to be manually analysed on a much smaller scale. However, recent developments in machine learning show that integrating multiple modalities can significantly increase the accuracy of classification systems. For example, in sentiment analysis on social media, it can be crucial to include not only text, but also images in the analysis, or in the case of speech/video analysis, intonation might play an important role. For this postdoc challenge we therefore encourage candidates to prepare a proposal that focuses on research into innovative and robust methodologies for the classification of online multimodal data. The ultimate goal is to enable a more holistic understanding of communication in the digital world, which can lead to improved insights into social interactions and online communication. This research can also contribute to the development of advanced tools for monitoring online content and fostering a healthier digital communication environment.

Researcher(s)

Promoter: De Bruyne Luna
Co-promoter: Daelemans Walter

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

An agentic codified framework for lower hallucination and better explainability in Legal question answering. 01/01/2025 - 31/12/2026

Abstract

In this project, we aim to overcome hallucinations in LLM legal assistants by developing an agentic codified solution for legal question-answering (QA), which incorporates rule-based and (programming) code-based representations of law, as well as distributed reasoning and verification. The trustworthy legal research assistant will transform the daily search routine of legal professionals by speeding up the process and enhancing their efficiency. Through a conversational interface, a legal professional will be able to communicate with our legal assistant in natural language to build a full argument for a given case, which can itself be uploaded to the chat interface to add more context. The assistant answers each query, providing the cited sources which can also be consulted or summarized.

Researcher(s)

Promoter: Daelemans Walter
Co-promoter: De Raedt Sylvie
Fellow: Lotfi Ehsan

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

The digital literacy of seniors: The appropriation of interactive social media writing at an older age. 01/10/2024 - 30/09/2028

Abstract

Social media such as Facebook and WhatsApp have ceased to be communication channels that are mainly claimed by youth. Still, scientific research on the (socio)linguistic characteristics of social media language focuses almost exclusively on younger generations. The current proposal wants to be a game changer in that respect: we set out to investigate whether and how seniors adopt/adapt the widely acclaimed conventions and typical features of social media writing as identified in previous, mainly adolescent-focused, research. How do seniors reconcile their firmly entrenched writing habits with the potential of a 'new' genre? Are they more inclined to follow social media conventions in intergenerational conversations with digital natives than in conversations with peers? The research design includes both spontaneous and experimental language data: we will analyze spontaneous WhatsApp conversations of seniors within associations and clubs and compare those with available WhatsApp and Messenger conversations produced by adolescents, and we will set up WhatsApp conversation experiments with individual seniors, to find out to what extent certain features can be elicited from them. Seniors' perceptions and attitudes towards social media writing will be included too. Not only will this study inform us on linguistic flexibility at an older age, it should also do justice to the agency of older generations in new media and challenge the benchmarks of social media writing that are all too often taken for granted. In the end this should lead to a more inclusive approach to digital literacy.

Researcher(s)

Promoter: Vandekerckhove Reinhild
Co-promoter: Bernolet Sarah
Co-promoter: De Wit Astrid
Fellow: Baert Lara

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Direct or indirect objects? The optional use of the preposition aan 'to' among two-place verbs. 01/05/2024 - 30/04/2028

Abstract

Dutch features both two-place verbs with direct and indirect objects where the use of the preposition aan 'to' is optional, e.g. respectively 'ik bouw (aan) een konijnenverblijf' ('I'm building (to) a rabbit shelter') en 'het contract ontglipte (aan) ons bedrijf' ('the contract slipped away from our company'). There are also a number of verbs whereby the status of the object is unclear, e.g. 'hij gehoorzaamt (aan) de heilige wet' ('he obeys the holy law'). This project aims to find out (i) when and why language users choose to employ or omit the preposition aan 'to' among these verbs, and (ii) which objects behave more like direct or indirect objects. The answer to the first question will also inform the second: for which verbs does the alternation behave more like the dative alternation (e.g. 'Sophia geeft (aan) hem een dikke knuffel' 'Sophia gives him a big hug') or more like the transitive-prepositional alternation (e.g. 'Frederik zoekt (naar) zijn vrachtwagen' 'Frederik is searching for his lorry'). These questions will be dealt with through in-depth corpus research and experiments.

Researcher(s)

Promoter: Pijpops Dirk
Fellow: Van Herpe Alexander

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Evaluating and incorporating common sense in large language models to improve implicit language understanding. 01/11/2023 - 31/10/2027

Abstract

Despite impressive progress in Natural Language Processing (NLP) applications, Natural Language Understanding (NLU), i.e., extracting semantic and discourse representations from text, remains an elusive task, leading to errors. One of the issues current NLP methods have not been able to solve yet involves implicit language use. When dealing with implicitness, i.e., conveying meaning without explicit expression, in which the intention of the speaker is deduced from indirect cues, Machine Learning (ML) models used in NLP are required to look beyond superficial textual patterns and rely on more profound world knowledge and reasoning abilities. To tackle such complex tasks, researchers have invested in incorporating common sense (CS) in ML models. Incorporating CS, while being a challenging task, is essential to interpret implicit language. However, there is no consensus in existing literature to what extent this knowledge is present in large language models. Therefore, our main goal is to evaluate and improve the incorporation of CS in these models. As a practical application of this framework, we focus on sarcasm detection, which currently suffers from lack of CS reasoning abilities. Moreover, since sarcasm also has implications for hate speech detection, we will explore using CS for sarcasm in hate speech, which has not been researched yet. Lastly, we will extrapolate our method to Dutch, to verify that our approach of CS reasoning can be generalized over languages.

Researcher(s)

Promoter: Daelemans Walter
Fellow: Gevers Ine

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

The Features of Fuzzy Representations: How do language learners specify foreign sounds? 01/10/2023 - 30/09/2028

Abstract

When language learners speak their foreign language, their accent usually betrays them as being non-native, even at a very high level of proficiency. The sounds of a language are stored in phonological representations defining lexical contrasts, such as the distinction between the words "bad" and "bed". Languages differ in the phonological representations speakers have, for instance in terms of phoneme inventory (i.e., the available sounds of the language) and phonemic contrasts. Although second language (L2) speakers may have difficulties to perceive or to produce non-native sounds, they do develop distinct phonological representations of new sounds and sound contrasts. However, the exact nature of these L2 phonological representations is yet unclear. They may include "fuzzy" representations of L2 sounds, in which learners re-apply features from representations in their first language (L1) in a different context, or leave certain features of the L2 sounds unspecified. The current project aims to study the nature of these fuzzy phonological L2 representations. We will conduct a series of experiments testing L2 speakers of English with Dutch as their L1 on L2 sounds that differ between the languages. We contrast different L2 sounds to test whether their phonological features (mis)match or not. We aim to investigate for what non-native features L2 speakers store a specified phonological representation to gain a better understanding of the causes of a foreign speech accent.

Researcher(s)

Promoter: Bernolet Sarah
Fellow: van Lieburg Rianne

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Antwerp Text Mining Centre (TEXTUA). 01/01/2022 - 31/12/2026

Abstract

Most knowledge is stored in unstructured data like text, which must be structured before it can be mined. The need and opportunities for this automatic text analysis have considerably increased recently with developments in Artificial Intelligence, not only in the humanities and social sciences, but also in the exact and medical sciences. The mission of the ATMC is to provide scalable solutions to researchers from any scientific discipline that wants to analyze and use large amounts of textual data. Text data should be seen here in a broad sense including automatically transcribed speech, written text in images, and images automatically described in text. ATMC bundles the unique existing expertise in digital text analysis at the University of Antwerp with special emphasis on explainable AI and will provide the capacity to support the growing number of interdisciplinary queries that reach us today.

Researcher(s)

Promoter: Daelemans Walter
Co-promoter: Kestemont Mike
Co-promoter: Martens David
Co-promoter: Oramas Mogrovejo José Antonio

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Past projects

Neuralex – supercharging legal search with ChatGPT-like technologies. 01/09/2023 - 31/08/2024

Abstract

Supercharging legal search with ChatGPT-like technologies. Legal professionals spend time searching for legal information: the faster they can get an answer, the better. We will develop an application like ChatGPT but targeted at Belgian legal professionals. Whereas current technologies provide a list of potentially relevant documents, we provide the user directly with an automated legal assistant which answers questions directly.

Researcher(s)

Promoter: Daelemans Walter

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Multilingual self speech audiometry (MuLiSSA). 01/04/2023 - 30/06/2025

Abstract

Disabling hearing loss (DHL) is one of the most prevalent public health issues. WHO estimates that by 2050 one out of every ten people (700 million) will be affected by DHL, putting enormous pressure on global health care systems. Speech audiometry is the core clinical test to diagnose and treat DHL. Currently, speech audiometry can only be performed in a sound booth in the presence of the qualified clinician, an audiologist, to perform the scoring. The condition for correct scoring is that the audiologist is able to understand the language of the speech test material. This project aims at taking speech audiometry out of the booth, into the waiting room and the home environment, at least for people using a modern hearing device with wireless audiostreaming capability. Solution for the multi-language setting is the other important goal of this project. A first project objective is to build remote speech testing capability and demonstrate its feasibility at TRL6 level. The "boothless testing" demonstrator will cover existing speech testing materials in multiple languages. The technology blocks relating to self-testing need to be matured from TRL3 onwards. A closed set testing user interface will be compared to open set testing. Furthermore, the effects of the wireless link technology limitations will be fully characterized in terms of their impact of speech testing outcomes. The frequency and impact of cognitive distractions and background sounds will be considered as well, especially for the home environment. Criteria for the TRL6 self-testing toll gate are good test-retest reliability and correspondence to in-booth testing (r > 0.9). A second objective is to extract more information out of speech audiometry tests. Current tests only provide an average recognition score. We want to introduce more precision, measuring as well which phoneme errors an individual patient is making. This requires the alteration of speech testing materials. Currently test lists are too short and do not cover adequately the phonemes of the language. To develop the concept, the project will start from first principles, theoretical phonetic studies and existing clinical data. As part of the objective the project will explore how clinicians can use this information for optimize device fitting or hearing rehabilitation. We will first pilot this approach in Dutch. Goal is to reach a proof-of-concept (TRL3) maturity. The third objective is to investigate the feasibility of rolling out the precision audiometry concept in many languages. Current speech tests lack comparability across languages, which hampers consolidation of hearing outcomes across multiple language areas. The current speech audiometry test approach requires the clinicians to master the language as they have to score manually the responses given by the patient on the played audiomaterial. The lack of comparability and the requirement for the clinician to master the speech test language are blockers for testing individual patients in their native language.

Researcher(s)

Promoter: Daelemans Walter
Co-promoter: Gillis Steven

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Context-Aware Fashion Recommendations through Image Processing and Conversational Language Technology. 01/05/2022 - 30/09/2023

Abstract

In an era of fast-changing trends, people struggle to create a wardrobe that fits their lifestyle and needs. With a lot of choices, it takes time and effort to find out what clothes to wear and, consequently, what clothes to buy or throw away. Professional stylists can offer support in tackling this challenge, but their services are not affordable for most customers. In addition, many existing mobile applications, which are affordable, rely on human efforts to construct such a wardrobe. Recommendation systems can perfectly fill this market niche; however, such systems usually are not able to explain their recommendations. Users want to know why the provided recommendation is given. Hence, explainable recommendations are highly demanded by users, because explainability improves the transparency, persuasiveness, effectiveness, trustworthiness, and satisfaction of recommendations for users. In this project, we aim to develop a context-aware, multi-lingual, and multi-modal recommendation system for fashion compatibility of clothes enhanced with explanation functionality. This recommendation system will be the core of a mobile application complemented by a live chat interaction with customers via a built-in chatbot, which will provide daily fashion advice based on the customers' current wardrobe, weather, and schedule. In the follow-up project, we aim to develop the proposed mobile application to deliver the technology to the b2c market. Next on, we will launch a spin-off company to constantly improve our recommendation system and implement new features in our application.

Researcher(s)

Promoter: Daelemans Walter
Co-promoter: Kestemont Mike

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Scientific Chair "International Francqui Professor 2021-2022" (Prof. dr. Bruce Connell). 01/10/2021 - 30/09/2022

Abstract

This International Chair focuses on language endangerment and the phonetic description of speech sounds which occur in the languages of the world. It will be investigated which phonetic dimensions are frequent in languages of the world and which typical patterns of sounds can be distinguished.

Researcher(s)

Promoter: Verhoeven Jo

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Intelligent document management through automatic topic discovery. 01/08/2021 - 31/07/2025

Abstract

In this project, we will develop technology for the automatic extraction of topics from documents (topic modeling) to enhance Textgain's intelligent document management software product (Ocelot). For the company, this will lead to attracting new customers and diversification of income streams. For research, this project will investigate and develop innovative techniques and methods for topic modeling using zero-shot learning and contextual embeddings.

Researcher(s)

Promoter: Daelemans Walter
Fellow: Kosar Andriy

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Demonstrating EOSC Value through Cross-domain Research Science Projects. 01/04/2021 - 30/09/2023

Abstract

In this work package within the EOSC project, we work on the development and empirical validation of an ontology of COVID-related topics from parliamentary and social media data. The research involves topic modeling, ontology extraction and use, and domain adaptation for the analysis of parliamentary data (as collected by CLARIN) and of social media data (from Twitter) and their connections.

Researcher(s)

Promoter: Daelemans Walter

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

The development of fundamental frequency in babbles and early words of typically developing children and children with hearing impairment: the case of intrinsic vowel pitch. 01/01/2021 - 31/12/2024

Abstract

In all languages of the world high vowels (such as /i/ in 'key') and /u/ in 'who') are pronounced with a higher pitch than low vowels (such as /a/ in 'far'). This phenomenon is known as 'intrinsic vowel pitch'. In the past, this phenomenon has been explained in two ways. On the one hand, intrinsic vowel pitch has to do with the operation of the speech organs: during the articulation of /i/ and /u/ the tongue is lifted far forward in the mouth. This tension pulls on the larynx and this stretches the vocal folds so that a higher pitch is obtained. In vowels like /a/ the vocal folds are not stretched to the same degree so that a lower tone is heard. On the other hand, this phenomenon supports the intentions of speakers who aim to make vowels sound as different as possible from each other in order to speak clearly. Scientists do not agree on which explanation is correct, but they do agree on the following: if the first explanation is correct then intrinsic vowel pitch is expected to occur in babble of deaf babies. Remarkably, this has never been systematically investigated in a large-scale study and this is precisely what this project aims to investigate.

Researcher(s)

Promoter: Gillis Steven
Co-promoter: Verhoeven Jo

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

European Language Equality (ELE). 01/01/2021 - 30/06/2022

Abstract

Twenty-four official languages and more than 60 regional and minority languages constitute the fabric of the EU's linguistic landscape. However, language barriers still hamper communication and the free flow of information across the EU. Multilingualism is a key cultural cornerstone of Europe and signifies what it means to be and to feel European. Many studies and resolutions, as noted in the recent EP resolution "Language equality in the digital age", have found a striking imbalance in terms of support through language technologies and issue a call to action. This project answers this call and lays the foundations for a strategic agenda and roadmap for making digital language equality a reality in Europe by 2030. The primary goal of ELE is to prepare the European Language Equality Programme, in the form of a strategic research, innovation and implementation agenda and a roadmap for achieving full digital language equality in Europe by 2030. This programme will be prepared jointly with the whole European Language Technology, Computational Linguistics and language-centric AI community, as well as with representatives of relevant initiatives and associations, language communities and RML groups. The consortium includes all relevant scientific and industrial stakeholders from all Member States and Associated Countries and engages them in the process. The whole community is included in the project through external consultation sessions. The project plan is fully optimised towards this key goal of preparing the strategic agenda and roadmap and of involving the whole European LT community. Ensuring appropriate technology support for all European languages will create jobs, growth and opportunities in the digital single market. Equally crucial, overcoming language barriers in the digital environment is essential for an inclusive society and for providing unity in diversity for many years to come. The ELE project provides a roadmap and framework to achieve this.

Project type(s)

Research Project

Exploring the limits of language non-selectivity: How do multilinguals process non-native cognates and interlingual homographs in sentences? 01/11/2020 - 31/10/2024

Abstract

When a bi- or multilingual reads a word in one language, does he/she automatically activate lexical representations from all his/her languages? This is what the widely accepted language non-selective account suggests, but the studies that support this hypothesis evince various methodological pitfalls. In a multilingual context like Flanders, it is especially important to investigate multilingual language processing in a systematic and thorough manner, to achieve an accurate understanding of how multilinguals process languages in their everyday lives. By doing so, we can find a reliable answer to the question whether they access their mental lexicon in a language non-selective or a language selective manner. The vast majority of studies that have examined bi-/multilingual language processing have used cognates (words that exist in two languages with the same meaning, e.g., "water") or interlingual homographs (IHs; words with two different meanings in two different languages, e.g., "fee" is Dutch for "fairy") that exist in the native language (L1). However, it is exactly this native language that is supposedly qualitatively different from any other language that a multilingual knows, and, hence, using words that occur in L1 may yield results that are not representative of lexical processing in general. Besides, studies often show words in isolation, which is not how we normally read. The present proposal circumvents these issues by embedding L2-L3 cognates and IHs in sentences.

Researcher(s)

Promoter: Sandra Dominiek
Fellow: Broekhuis Lisan

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Dialect Syntax Revisited 01/01/2020 - 31/12/2024

Abstract

The scientific research network Re-Examining Dialect Syntax (REEDS-network) brings together linguistic researchers from Flanders, Europe and the US from different empirical and theoretical backgrounds and with complementary expertise, in an attempt to arrive at a deeper, more rounded and better grounded understanding of dialect syntax in particular and language variation in general.

Researcher(s)

Promoter: Daelemans Walter

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

European Language Grid. 12/12/2019 - 30/06/2022

Abstract

ELG will strengthen the commercial European Language Technology landscape by establishing a pan-European marketplace. CLiPS is National Competence Centre (NCC) for Belgium. ELG has set up 32 NCCs to establish a strong European network. They will act as regional bridges to the project. The NCCs will support ELG in collecting regional information about companies, research centres, resources, services and projects. They will organise regional ELG workshops and promote ELG in their area and establish bridges to funding agencies.

Researcher(s)

Promoter: Daelemans Walter

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Artificial intelligence for creative language use. 01/12/2019 - 30/11/2021

Abstract

Recent progress in Natural Language Processing (NLP) has resulted in reliable pattern matching techniques (mostly based on deep neural networks) for many NLP tasks (text to speech, speech to text, text generation, text translation, multimodality, text analysis, …). The creative use of language (e.g. in advertising slogans, song texts, humor, irony, metaphor, …) has remained out of reach of current approaches. We will investigate how the improved stated of the art in 'literal' language processing can push the design of creative language processing systems. Valorisation roadmap: The research address two types of users and applications: (i) professional writers who will be able to use tools to generate ideas and concepts (puns, jokes, titles, short texts with metaphors) and (ii) language enthusiasts who will be provided with tools that can boost their output by producing examples and ideas. Approach: 1. Development of proofs of concept of domain-dependent creative writing 2. Design of applications in copy-writing 3. Design of applications in entertainment writing

Researcher(s)

Promoter: Daelemans Walter

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Accommodation and non-accommodation in adolescents' informal online writing: Social determiners and linguistic effects. 01/10/2019 - 30/09/2022

Abstract

The proposed study will analyze how teenagers adapt their informal online writing to their conversation partner, and by which social and contextual factors this process of accommodation is influenced. Since linguistic accommodation remains largely un(der)explored for social media writing, the project fills a gap. It will investigate the impact of multiple aspects of adolescents' socio-demographic profile and their interaction on a wide range of linguistic and pragmatic features. We will examine whether divergent patterns of linguistic adjustment can be observed for teenagers with distinct socio-demographic profiles, and which language features appear to be most or least affected. A major distinction will be made between analyses of robust intergroup accommodation and in-depth diachronic analyses of accommodation between particular individuals. This unique design might lead to challenging sociolinguistic findings with respect to the profile of (non-)accommodators. While it will primarily increase our understanding of the social, linguistic and pragmatic parameters that govern accommodative language behavior, it may in the end also open up a unique perspective on language change. Moreover, on a more general, theoretical level, this project aims to accurately delimit the concept of accommodation, in order to answer the fundamental question of whether we can unambiguously discriminate between true accommodation and other instances of linguistic adaptation.

Researcher(s)

Promoter: Vandekerckhove Reinhild
Co-promoter: Daelemans Walter
Fellow: Hilte Lisa

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

FWO Sabbatical 2019-2020 (Steven Gillis). 01/10/2019 - 30/09/2020

Abstract

The aims of planned research are: 1. Study of the speech and language development in congenitally deaf children with a cochlear implant: preparation of a state-of-the-art of the recent literature including our own empirical findings; 2. Study of speech and language development in congenitally deaf children with an auditory brainstem implant: analysis of a recently collected longitudinal corpus; 3. Preparation of the longitudinal and cross-sectional corpora collected by our research group over the last 40 years: integration into TalkBank.

Researcher(s)

Promoter: Gillis Steven

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Language development after pediatric brainstem implantation. 01/04/2019 - 30/03/2020

Abstract

This project aims to examine the oral language development of congenital deaf children with auditory brainstem implants. Thus far, only a handful of studies examined the oral language skills of children and adolescents with ABI, without going into linguistic detail. In this project, the development of their speech production will be investigated into linguistic detail and compared to that of children with typical hearing and another group of congenital deaf children, viz. children with cochlear implants. The outcomes of this project will be crucial on different levels. First, they are theoretically important to further our understanding about the role of auditory input and brain stimulation for language development. Second, results can guide speech and language therapy for these children with auditory brainstem implants, since the current therapy is entirely based on that for children with cochlear implants, without any linguistic comparisons between both groups of children. Finally, the resulting information is crucial, for e.g. parents, to determine whether the benefit of ABI implantation outweighs the surgical risk.

Researcher(s)

Promoter: Faes Jolien

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

The linguistic landscape of hate speech on social media. 01/01/2019 - 31/12/2022

Abstract

Hate speech online is a widespread social phenomenon that frequently receives a lot of media attention. We are interested in the language that is being used to express hate in social media, specifically hate against migrants and LGBT people. After gathering enough examples from public Facebook pages, we will develop methods to automatically analyze the language in these texts. The analysis will be on different levels. Some simple forms of analysis include counting words, looking at spelling mistakes, and investigating grammatical aspects. In the more complex analysis we will examine the use of metaphors, the context of the hate speech and how the hate speech can be implicit in the text, rather than overtly present. Apart from the linguistic description of this phenomenon, we strive to build systems that can automatically recognize hate speech in social media text. The project is in cooperation with research groups in Slovenia and targets Dutch, Slovene, and English.

Researcher(s)

Promoter: Daelemans Walter

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

The development and representation of Dutch syntax in learners of Dutch as a foreign language and learners of Dutch as a second native language. 01/01/2019 - 31/12/2022

Abstract

In current days of mass migration, many people learn a brand new language at a later age. This is not easy: Languages have both similarities and differences in the sentence structures with which they express particular meanings. For instance, the Dutch and French active sentences are similar in both languages (Le chat chasse la souris - De kat jaagt op de muis [The cat cases the mouse]), but Dutch has three different forms for the full passive sentence, whereas French has only one (La souris est chassée par le chat). How do learners deal with this? Previous research suggests that bilinguals share information about sentence structure across their languages, whenever these structures are similar enough. We proposed a developmental model for second language syntax in which learners go through 5 consecutive learning stages before they share syntax between languages. The goal of this project is to test and refine that theory. We will investigate the syntactic representations in different speakers of Dutch: 1) Flemish students with Dutch as their only native language; 2) Arabic-Dutch simultaneous bilinguals; 3) Walloon students who learned Dutch at the age of 10; 4) first generation immigrants learning Dutch as second Indo-European language. This will provide valuable information on the learning trajectory for Dutch syntax (with its possible problems) and on the influence of native language syntax on the development and the final representation of Dutch syntax. -

Researcher(s)

Promoter: Bernolet Sarah

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

The development of Dutch syntax in learners of Dutch as a foreign language: effects of immersion, language background and training by means of syntactic priming. 01/10/2018 - 30/09/2022

Abstract

Background: In these days of mass migration, many people learn a brand new language at a later age. This is not easy: Languages have both similarities and differences in the sentence structures with which they express particular meanings. For instance, the Dutch and French active sentences are similar in both languages (Le chat chasse la souris - De kat jaagt op de muis [The cat cases the mouse]), but Dutch has three different forms for the full passive sentence, whereas French has only one (La souris est chassée par le chat). How do learners deal with this? Aims: Previous research suggests that bilinguals share information about sentence structure across their languages, whenever these structures are similar enough. Hartsuiker and Bernolet (2017) proposed a developmental model for second language syntax in which learners go through several consecutive learning stages before they share syntax between languages. The challenging aspect is our goal to test that theory in ecologically valid settings. More specifically, we investigate the influence of immersion in the L2 and of knowledge of related languages on the development and the representation of Dutch syntax in students who learn Dutch as a foreign language. Additionally, we investigate whether and how syntactic priming experiments can aid the develoment of native-like production preferences in Dutch as an L2. Methodology: All studies in the project use syntactic priming as a tool (Branigan & Pickering, 2017): all sentences that need to be produced or comprehended are preceded by a prime sentence with the same or a competing syntactic structure. If a prime structure is represented in memory, it will influence the production and the comprehension of the upcoming sentence, within and across languages. We will investigate the syntactic representations in different speakers of Dutch: 1) Flemish students with Dutch as their only native language; 2) Walloon students who learned Dutch at the age of 10; 3) first generation immigrants learning Dutch as their first or second Indo-European language. The first production study compares groups 1 and 2. We investigate the representation of Dutch syntactic structures that lack a similar counterpart in the learners' native language (French) and we compare the production preferences for Walloon learners of Dutch living inan immersion context with the preferences of learners living in a monolingual French context. The second study investigates how we can boost the production of Dutch syntactic structures that are dispreferred due to influence of a native language. Studies 3 is a longitudinal study that explores the differences between the learning trajectories for Dutch syntax in native Arabic speakers who learn Dutch as their first or second Indo-European language (after English). Impact: By documenting the different stages in L2 syntactic development with actual learner data, this project will have a strong impact on both the psychology of language and on second language acquisition research. Additionally, this project will provide valuable information on the learning trajectory for Dutch syntax, more specifically on the influence of native language syntax, and on the effects of immersion, knowledge of related languages and specific training on the development and the final representation of Dutch syntax. Hence, the project outcome will be relevant to teachers and trainers of Dutch as a foreign language.

Researcher(s)

Promoter: Bernolet Sarah
Fellow: Sijyeniyo Edwige

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

The role of semantics in modeling the bilingual mental lexicon. 01/10/2018 - 18/06/2020

Abstract

Bilinguals, people who simultaneously know and use two or more languages, are an interesting source of clues for discovering the internal make-up of our language system. Specifically, it is interesting how bilinguals are able to reliably access the right words in the right language without making mistakes, even though languages contain significant amounts of overlap in terms of semantics, orthography and phonology. In computational psycholinguistics, we model phenomena such as word retrieval via computer models. Despite the fact that we do not have access to the actual word store embedded in our mind, modeling can provide us with clues as to how it is organized, more particularly, by constructing models that can simulate key findings in psycholinguistic experiments. Having said that, current models for bilingual word reading can account for most of the facts, but largely underspecify a crucial component of our day-to-day word retrieval: meaning. Moreover, and related to this shortcoming, most models of word access have only modeled words in isolation. In reality, however, words are always embedded in sentences and larger linguistic and non-linguistic contexts, which also influence the way we access our words. By creating models of sentence processing, we can make sure that meaning has a more central role in our models, and thereby give new explanations for several phenomena in bilingual word processing.

Researcher(s)

Promoter: Daelemans Walter
Co-promoter: Sandra Dominiek
Fellow: Tulkens Stéphan

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Sabbatical Leave Project, 2018-2019 01/10/2018 - 30/09/2019

Abstract

Two sub-projects are addressed: in stylometry, methodological issues are addressed, especially related to personality prediction from text: feature optimization, data acquisition and quality, model selection, and especially explanation of trained machine learning models. In machine learning for natural language, approaches are investigated on how to combine knowledge and reasoning with the currently predominant deep learning "black boxes".

Researcher(s)

Promoter: Daelemans Walter

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Solving Combinatorial and Probabilistic Problems in Natural Language. 01/01/2018 - 31/12/2021

Abstract

This project wants to develop a fully automated approach to solving exercises about combinatorics and probability that can be found in introductory textbooks on discrete mathematics. The ability to solve such problems is an important cognitive and intellectual skill as it is evaluated as part of academic admission tests such as SAT, GMAT and GRE. The combinatorics and probability questions will be formulated in natural language and the task will be to automatically answer these questions. We shall develop a two-step approach for tackling this task. In the first step, a question formulated in natural language will be analysed and transformed into a high-level model specified in a declarative language. In the second step, the high-level model will be solved solved using the inference mechanisms of for the declarative modeling language. The language and its solvers will be based on principles of probabilistic programming, is an increasingly popular programming paradigm. While the immediate goal is to solve textbook exercises, the long term goal is to contribute to the automation of probabilistic and combinatorics problem solving and to enable the modeling and programming for such problems in natural language, two goals that are highly relevant to cognitive computing and artificial intelligence

Researcher(s)

Promoter: Daelemans Walter

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Errors outside the lab: the interaction of psycholinguistic and sociolinguistic variables in the production of verb spelling errors in informal computer-mediated communication. 01/01/2018 - 31/12/2021

Abstract

We will investigate how social and mental processes interact in the production of spelling errors in informal computer-mediated communication (CMC). Unlike many CMC-studies, the research will not focus on prototypical CMC-features, but on unintentional spelling deviations on verb forms whose pronunciation corresponds to two spelling forms (homophones). We will study an extensive corpus of informal CMC produced by Flemish adolescents. The correct rendering of verb homophones presupposes the time-consuming application of grammatically informed spelling rules. Psycholinguistic findings show that, when working memory runs out of resources, the higherfrequency homophone can cause intrusion errors. While we expect social variables to affect (1) the NUMBER of spelling errors, we assume that they are less likely to affect (2) the PATTERN of these errors. Hypothesis (1) is inspired by sociolinguistic findings on gender and age differences with respect to norm sensitivity. Norm sensitivity should affect working-memory (conscious processing); hence, only error rates. We will also include the youngsters' educational track. Hypothesis (2) is related to the online writing process, which triggers speedy interaction. We will investigate whether the CMC-context leads to the same intrusion errors that writers find so hard to control under time-pressure. This interdisciplinary approach should lead to innovative contributions to psycholinguistics, sociolinguistics and CMC-studies. -

Researcher(s)

Promoter: Vandekerckhove Reinhild
Co-promoter: Sandra Dominiek

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

A longitudinal approach to phonetic enhancement in infant directed speech: normally hearing infants and hearing-impaired infants with a cochlear implant. 01/01/2018 - 31/12/2021

Abstract

The aim of the present project is to investigate Infant Directed Speech (IDS). Since the pathbreaking work of i.a. Snow & Ferguson (1977) a consensus has grown that IDS exhibits particular characteristics that distinguish it from Adult Directed Speech (ADS). A case in point is the production of vowels: in IDS vowels are produced more "clearly" than in ADS, as can be inferred from the larger vowel space in IDS (Kuhl 2000). This "received wisdom" has recently been fundamentally questioned. For instance, Martin et al. (2015) conclude their study of Japanese IDS and ADS: "Mothers speak less clearly to infants than to adults." We want to further investigate this contradiction by replicating the findings reported in the literature using a large database of Dutch IDS and ADS, and by systematically scrutinizing two variables that have been largely neglected up till now: 1. longitudinal development: how does IDS change relative to chronological age and, more importantly, "linguistic age" as represented by a.o. the child's evolving cumulative vocabulary and utterance length? 2. characteristics of the child as interlocutor: does speech directed to a child with normal hearing (NH) differ from speech directed to a deaf child with a cochlear implant (CI)?

Researcher(s)

Promoter: Gillis Steven

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Auditory brainstem implantation and language development 01/10/2017 - 30/09/2020

Abstract

This project aims to investigate the oral language development of congenitally hearing-impaired children with an auditory brainstem implant (ABI). ABI is a relatively new development to restore the hearing of children with a severe-to-profound hearing loss due to i.a. the absence of the auditory nerve. The speech perception outcomes of children with ABI have been investigated, but detailed linguistically underpinned studies of their speech production are virtually lacking. The goal of the present research project is to provide a first linguistically motivated description of the lexical and phonological development of children with ABI. Their development will be evaluated against the background of the acquisition process of normally hearing children and that of severe-to-profound hearing-impaired children who received a cochlear implant. The focus is on the longitudinal development of the word productions of children with ABI. First, we investigate their cumulative vocabularies and the balance between their spoken and signed words (lexical development). Second, their word productions are analysed from a phonological perspective: in what order are segments acquired and what phonological regularities account for that order and (possible) deviations from that order? Which segmental substitution and deletion patterns occur? What is the consistency and variability of their productions and how does the accuracy of their word productions develop relative to the adult target forms?

Researcher(s)

Promoter: Gillis Steven
Fellow: Faes Jolien

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Identifiability and intelligibility of the speech of hearing impaired children using a cochlear implant 01/10/2017 - 30/09/2019

Abstract

Until recently children who were born "deaf" remained "deaf", and thus were unable to acquire spoken language. Fortunately nowadays deaf children with a cochlear deficit can be helped with a surgical intervention: they receive a cochlear implant (CI) very early in life so that they can "hear", i.e., can experience sound sensations. The first concern that the parents of these children phrase, is: "will my child hear with an implant?" The answer is definitely positive. The second question usually is: "will my child speak and sound like a normal hearing (NH) child of the same age?" This question remains unanswered. We want to address this issue from two perspectives: the identifiability and intelligibility of CI children. Recent findings indicate that the speech of 6- to 7-year-old CI users deviates from that of NH peers in particular fine details. But are those details that we can measure also detectable by the human ear? Are they sufficient to reliably identify CI children's speech? This will be investigated by having people listen to recordings of speech of CI children, children with an acoustic hearing aid (HA), and NH children. A second main research question concerns the intelligibility of CI children's speech. When the children enter mainstream primary school, it is quintessential to know if they are intelligible for people not familiar with them. In this project we will assess their intelligibility using different methodologies.

Researcher(s)

Promoter: Gillis Steven
Co-promoter: Kloots Hanne
Fellow: Boonen Nathalie

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Optimization of the adaptability of clinical information extraction systems: deep learning and use of feedback propagation techniques. 01/09/2017 - 31/08/2021

Abstract

Large amounts of unstructured medical data (for example clinical notes) are today available, which offers opportunities for optimization of healthcare quality and patient security. Although Natural Language Processing technology already offers great tools and solutions to automate the processing of medical documents, performance of this technology often decreases with changes of the extraction context (medical specialty, hospital, physician's writing style). This project will study the possibility of a scalable NLP engine able to adapt to such new contexts. To reach this goal, we will explore and combine approaches based on deep neural networks, the human-in-the-loop paradigm and persistent learning. The project is a collaboration with LynxCare Clinical Informatics, a medical IT company focusing on promoting access to medical information and reducing administrative costs in hospitals.

Researcher(s)

Promoter: Daelemans Walter

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

The role of semantics in modeling the bilingual mental lexicon. 01/10/2016 - 30/09/2018

Abstract

Researcher(s)

Promoter: Daelemans Walter
Co-promoter: Sandra Dominiek
Fellow: Tulkens Stéphan

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Deep linguistic features for computational stylometry. 01/10/2016 - 30/09/2018

Abstract

The goal of stylometry is to understand and model how variations in writing style are related to (properties of) the author of a text. This research provides insight into how psychological and sociological properties of the author such as age, gender, region, personality, and others, are reflected in his or her idiolect. Such models can also be used to predict these author properties on the basis of text analysis. Applications range from literary studies to forensic science.

Researcher(s)

Promoter: Daelemans Walter
Fellow: Verhoeven Ben

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

ACCUMULATE: Acquiring crucial medical information using language technology. 01/01/2016 - 30/06/2020

Abstract

The ACCUMULATE project will automatically recognise crucial information in the free text of clinical reports written in English and Dutch by designing, developing and evaluating advanced language technology (LT) for deep semantic processing of the texts that are often morpho-syntactically not well-formed. An additional focus is on easy portability of the technology across domains and languages and on the use of visualisation techniques.

Researcher(s)

Promoter: Daelemans Walter

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

An acoustic analysis of lexical stress and rhythm in early speech interactions of Dutch children and their primary caretakers: A longitudinal study. 01/10/2015 - 30/09/2018

Abstract

The main objective of this study is to investigate the acquisition of "lexical" stress and rhythm in the period when children produce canonical babbling and their first identifiable words. A good understanding of these phenomena in children's speech is of prime importance because it has been shown that prosody plays a cardinal role in children's language acquisition.

Researcher(s)

Promoter: Gillis Steven
Co-promoter: Verhoeven Jo
Fellow: De Clerck Ilke

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Identifiability and intelligibility of the speech of hearing impaired children using a cochlear implant. 01/10/2015 - 30/09/2017

Abstract

Until recently children who were born "deaf" remained "deaf", and thus were unable to acquire spoken language. Fortunately nowadays deaf children with a cochlear deficit can be helped with a surgical intervention: they receive a cochlear implant (CI) very early in life so that they can "hear", i.e., can experience sound sensations. The first concern that the parents of these children phrase, is: "will my child hear with an implant?" The answer is definitely positive. The second question usually is: "will my child speak and sound like a normal hearing (NH) child of the same age?" This question remains unanswered. We want to address this issue from two perspectives: the identifiability and intelligibility of CI children. (1) Identifiability: Recent findings indicate that the speech of 6- to 7-year-old CI users deviates from that of NH peers in particular fine details. But are those details that we can measure also detectable by the human ear? Are they sufficient to reliably identify CI children's speech? This will be investigated by having people listen to recordings of speech of CI children, children with an acoustic hearing aid (HA), and NH children. (2) Intelligibility: A second main research question concerns the intelligibility of CI children's speech. When the children enter mainstream primary school, it is quintessential to know if they are intelligible for people not familiar with them. In this project we will assess their intelligibility using different methodologies.

Researcher(s)

Promoter: Gillis Steven
Co-promoter: Kloots Hanne
Fellow: Boonen Nathalie

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Francqui Chair 2015-2016 Prof. Peter Mariën. 01/10/2015 - 30/09/2016

Abstract

Proposed by the University, the Francqui Foundation each year awards two Francqui Chairs at the UAntwerp. These are intended to enable the invitation of a professor from another Belgian University or from abroad for a series of ten lessons. The Francqui Foundation pays the fee for these ten lessons directly to the holder of a Francqui Chair.

Researcher(s)

Promoter: Verhoeven Jo

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Digital Humanities Flanders. 01/01/2015 - 31/12/2019

Abstract

This is a fundamental research project financed by the Research Foundation – Flanders (FWO). The project was subsidized after selection by the FWO-expert panel. Its aim is to initiate cooperation between research groups.

Researcher(s)

Promoter: Daelemans Walter

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

The interaction of gender and social class in Flemish online teenage talk. 01/01/2015 - 31/12/2018

Abstract

Social class differences in teenage speech remain largely unexplored, while gender has been focused on in quite a lot of sociolinguistic research on adolescent peer group language. The interest in gender differences has also pervaded the research on informal computer-mediated communication (CMC) and more specifically on the online writing practices of adolescents in chat or texting media, but then again, the link with social class is generally absent. Yet some studies (though not on CMC) suggest that gender differences manifest themselves in different ways in different social class groups. The present research is a first attempt to fill this gap, by focusing on the interaction between social class and gender in Flemish chat language produced by adolescents with a low versus a high level of education.

Researcher(s)

Promoter: Vandekerckhove Reinhild
Co-promoter: Daelemans Walter

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Deep linguistic features for computational stylometry. 01/10/2014 - 30/09/2016

Abstract

Researcher(s)

Promoter: Daelemans Walter
Fellow: Verhoeven Ben

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Text analytics web services for profiling and opinion mining. 01/02/2014 - 31/01/2015

Abstract

Our aim is to implement commercial web services for automatic opinion detection and author profiling (age, gender, personality, education, dialect) in text. In this project we will develop the core technology: data mining and annotation, machine learning and setting up the server. In a follow-up project we will then launch a spin-off company. This kind of language technology is useful for a wide range of big data applications, and does not yet exist for Dutch, and only in part for English.

Researcher(s)

Promoter: Daelemans Walter

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Fine-Grained Sentiment and Opinion Mining of Political Social Network Messages 01/02/2014 - 31/12/2014

Abstract

This project aims to develop an annotated corpus for the purpose of fine-grained sentiment and opinion mining of social network messages. As a case study, we will monitor messages on politics in the run-up to the 2014 Belgian elections. We will annotate not only the sentiment expressed in the message in a more robust way, but also mark information on the opinion holder, the object of the opinion and the features of the object.

Researcher(s)

Promoter: De Pauw Guy

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Cognitive control in the lexical processing of interlingual and intralingual homographs. 01/01/2014 - 31/12/2017

Abstract

The research project has two major objectives: 1. An in-depth study of cognitive control in the process of visual word recognition 2. The integration of research on intralingual and interlingual lexical processing

Researcher(s)

Promoter: Sandra Dominiek

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Bootstrapping operations in language acquisition: a computational psycholinguistic approach. 01/01/2014 - 31/12/2017

Abstract

The acquisition of abstract linguistic categories is investigated. Computational models of bootstrapping operations are constructed in order to investigate how knowledge from one domain can be instrumental in acquiring knowledge of another domain. In our simulations the language addressed to very young children is used in an attempt to elucidate how grammatical categories and grammatical gender are acquired given a combination of distributional, phonological and morphological bootstrapping.

Researcher(s)

Promoter: Gillis Steven
Co-promoter: Daelemans Walter

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Stress and Rhythm in Early Speech Productions of Hearing and Congenitally Deaf Children with a Cochlear Implant: A Longitudinal Study. 01/11/2013 - 31/10/2017

Abstract

Newborn babies have been shown to be sensitive to the speech melody of the language that they hear: they recognise the word stress patterns of their mother's language, and they are sensitive to the rhythm of that language (for instance, babies can distinguish what has been called the 'Morse Code' rhythm of Germanic languages and the 'Machine Gun' rhythm of Romance languages). Thus, already in the first year of life, infants seem to know a lot about how their ambient language sounds. Nevertheless, it is not known when and how they use this knowledge in their own speech production. This project investigates infants' babbling (adult sounding syllable sequences) and their early word productions in the first two years of life. The main research question is: when and how do they produce stress (the relative prominence of syllables) and when do we find evidence that they adopt the speech rhythm of the ambient language? This is investigated by means of an acoustic analysis of children's speech and an analysis of the speech of their primary caretakers, which will represent the adult target model. A second aim is to investigate whether congenitally hearing impaired children who received a cochlear implant very early in life show similar acoustic correlates of stress marking in their speech and display similar rhythmicity as their hearing peers.

Researcher(s)

Promoter: Gillis Steven
Fellow: Pettinato Michèle

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Evaluation of tools within the SUCCEED project. 25/10/2013 - 24/10/2014

Abstract

This project represents a formal service agreement between UA and on the other hand the University of Alicante. UA provides the University of Alicante research results mentioned in the title of the project under the conditions as stipulated in this contract.

Researcher(s)

Promoter: Daelemans Walter

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

What masked words can do: Preactivation or Retrospective recruitment? 01/10/2013 - 30/09/2015

Abstract

The purpose of the present research proposal is to find out whether Bodner & Masson's view can be upheld. The general rationale that will guide all experiments is the question whether masked priming effects activate episodic memory traces when access to lexical memory (the mental lexicon) is sufficient for performing the experimental task. This general question will be approached in two ways: (i) can masked primes access episodic traces that were created in a training phase prior to the experiment and (ii) do masked primes themselves leave episodic traces?

Researcher(s)

Promoter: Sandra Dominiek
Fellow: Van Abbenyen Lien

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

An acoustic analysis of lexical stress and rhythm in early speech interactions of Dutch children and their primary caretakers: a longitudinal study. 01/10/2013 - 30/09/2015

Abstract

Researcher(s)

Promoter: Gillis Steven
Co-promoter: Verhoeven Jo
Fellow: De Clerck Ilke

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Speech accuracy in young children: hearing and hearing impaired toddlers with a cochlear implant. 01/01/2013 - 31/12/2016

Abstract

The aim of the current project is to investigate early sound development in two populations differing in access to spoken language: children with normal hearing (NH) and congenitally deaf children with "received hearing" due to cochlear implantation (CI) at an early age. In comparing speech accuracy of these two groups with "different degrees of hearing", we aim to gain a better insight into the role of the auditory perception system in language development.

Researcher(s)

Promoter: Gillis Steven

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Automatic Monitoring for Cyberspace Applications (AMiCA). 01/01/2013 - 31/12/2016

Abstract

This project represents a research agreement between the UA and on the onther hand IWT. UA provides IWT research results mentioned in the title of the project under the conditions as stipulated in this contract.

Researcher(s)

Promoter: Daelemans Walter

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project website

http://www.amicaproject.be

Project type(s)

Research Project

Digital Archive of Belgian Neo-Avant-garde Periodicals (DABNAP). 01/01/2013 - 31/12/2014

Abstract

Post-war artists' periodicals are a prime example of the neo-avant-garde DIY ethos, and simultaneously constitute a crucial source of information about this movement. This project aims to digitize a substantial and representative corpus of Belgian neo-avant-garde periodicals. Subsequently, innovative language processing tools will be applied in order to extract and visualize the network of artists who were behind the periodicals.

Researcher(s)

Promoter: Daelemans Walter

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project website

http://dighum.uantwerpen.be/dabnap/about.html

Project type(s)

Research Project

Audio-description in Dutch: A corpus-based study into the linguistic features of a new, multimodal text type. 01/10/2012 - 30/09/2016

Abstract

The project presented here is a corpus-based study of the linguistic features of a new, multimodal text type within Audiovisual Translation (AVT): Audio-description (AD) for the blind and visually impaired. The aim of this interdisciplinary project is to describe the lexico-grammatical features of AD-scripts and examine the role they play in the specific communicative function of the text. The object is to explore one of the key-issues in AD research: How are images put into words and what are the implications for the language use in AD? A recent pilot study confirmed the hypothesis that the language of AD contains distinctive grammatical (morpho-syntactic) and lexical features and that these specific patterns can be identified by corpus analysis. Firstly, the current project aims to develop an extensive and varied text corpus of AD scripts of Dutch audio-described films and series. Secondly, this text corpus will provide the basis for quantitative linguistic research, aiming to identify the prominent lexico-grammatical features of the text type. Finally, the quantitative analysis will be combined with a qualitative analysis of the (communicative) function of these features. In this last stage, special attention must be paid to the multimodal nature of the text type, since the AD-script only makes sense in combination with the dialogues, music and sound effects of the original film or series with which it forms a coherent whole. A qualitative analysis into the (communicative) function of the features will explore the unique interaction between the language of AD and the other channels of the audiovisual text. Ultimately, the project's ambition is to conduct an extensive linguistic audience design oriented analysis of the AD-discourse. This will allow us to identify the features that characterise the AD text type, will clarify how these linguistic and stylistic features are used to ensure maximum communicative efficiency, and how these features are related to the function and multimodal character of AD. The project presented here is a pioneer in the field: AD has become an international research topic recently but for Flanders and the Netherlands no study of AD is available yet. In addition, it can offer the basis for future application-oriented studies. AD in Flanders is in its infancy (public broadcaster VRT only started with its first audio-described series in January 2012). In brief, basic research projects like the one presented here support the development of a local AD tradition in Flanders that meets international quality standards.

Researcher(s)

Promoter: Vandekerckhove Reinhild
Co-promoter: Remael Aline
Fellow: Reviers Nina

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Antwerp Yiddish Noun Plurals (AYNP). 01/09/2012 - 31/08/2013

Abstract

The project will explore structure and acquisition in contemporary Yiddish used by the Jewish Ultra-Orthodox community in Antwerp, Belgium. This community lives in a unique multilingual situation that includes three main languages: Yiddish and Dutch - two living languages competing as native tongues, and Loshn Koydesh (Classical Hebrew) - restricted only for praying and not used for daily communication. Our window onto native Antwerp Yiddish is the system of noun plurals, whereby a singular noun takes on a plural suffix. The aim of the project is two-fold: first, to describe the system of noun plurals as it is currently used by adults, taking into account the intensive contact with Dutch, and second, to understand how this system is acquired by children from the same community.

Researcher(s)

Promoter: Gillis Steven
Co-promoter: van der Auwera Johan
Fellow: Abugov Netta

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Automatic Compound Processing. 01/07/2012 - 31/12/2013

Abstract

The central problem to be addressed in this project concerns a multidisciplinary investigation into sharing of knowledge and resources between closely-related languages, specifically relating to the automatic processing of compounds. Specifically, we will explore the possibility to create new knowledge about closely- related languages, and efficiently develop additional, more advanced resources for (a) compound segmentation; and (b) the semantic analysis of compounds.

Researcher(s)

Promoter: Daelemans Walter

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Morphosyntactic language skills in deaf children with cochlear implant: a cross-linguistic study on Dutch and German (MORLAS). 01/07/2012 - 30/09/2013

Abstract

The proposed project will investigate speech and language skills of cochlear implant children at the onset of their school career. The project will focus on Cochlear Implant children's achievements in a major aspect of language, its morphosyntax.

Researcher(s)

Promoter: Gillis Steven
Fellow: Laaha Sabine

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Abstract rules or statistical learning? The impact of lexical and sublexical homophony in spelling and reading homophonous verb forms. 01/01/2012 - 31/12/2015

Abstract

Homophone intrusions in the spelling of regularly inflected Dutch verb forms are used to address a central question in psycholinguistics – and cognitive science in general: do people rely on symbolic mental rules or on a knowledge base that captures the co-occurrence probabilities in the learning domain (statistical learning)? Earlier findings in our research group indicated an effect of homophone dominance in the pattern of intrusion errors when spelling homophonic verb forms: such errors occur more often when the target is the lower-frequency homophone and the intruder the higher-frequency homophone. This is compatible with a statistical learning view but cannot reject a rule-based account enriched with a frequency-sensitive mechanism. To disentangle the two accounts we will compare error patterns in the lexical and sublexical domains. An effect of homophone dominance at the sublexical level cannot be explained by a rule model. Errors in the lexical and sublexical domains will be studied in spelling and reading tasks. Finally, we will attempt to simulate the experimental patterns with two types of computational models: a symbolic model, using morphemes and rules, and a memory-based model, storing whole word forms only and using a similarity metric that can 'discover' patterns in its memory store. Together, the experimental and simulation data should enable us to formulate an answer to the question about mental rules.

Researcher(s)

Promoter: Sandra Dominiek
Co-promoter: Daelemans Walter

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

What masked words can do: Preactivation or Retrospective recruitment? 01/10/2011 - 30/09/2013

Abstract

Researcher(s)

Promoter: Sandra Dominiek
Fellow: Van Abbenyen Lien

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Understandable Dutch: the accessibility of the language of the news for different audiences. 14/09/2011 - 31/12/2012

Abstract

This project represents a formal research agreement between UA and on the other VRT. UA provides VRT research results mentioned in the title of the project under the conditions as stipulated in this contract.

Researcher(s)

Promoter: Vandekerckhove Reinhild
Co-promoter: Cuvelier Pol

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Child-directed speech and language development: hearing children of different SES backgrounds and deaf children with a cochlear implant. 01/01/2011 - 31/12/2014

Abstract

In this project we want to test the hypothesis that this relative poverty of the input is already manifest during the prelinguistic and early linguistic stages of language acquisition: particular aspects of the input make the discrimination of sounds more difficult, and make the segmentation of speech into sounds, and words, and phrases much more difficult.

Researcher(s)

Promoter: Gillis Steven

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

ALADIN - Adaptation and Learning for Assistive Domestic Vocal Interfaces. 01/01/2011 - 31/12/2014

Abstract

Researcher(s)

Promoter: Daelemans Walter
Co-promoter: De Pauw Guy

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

OPTI-FOX -.Optimization of the automated fitting to outcomes expert with language-independent hearing-in-noise test battery and electro-acoustical test box for cochlear implant users. 01/11/2010 - 31/10/2012

Abstract

The main objectives of the research project are (i) to turn an existing theoretical automated fitting model into a clinical application by means of various techniques from statistics, machine learning and optimisation; (ii) to develop an evaluation tool to measure functional hearing capacities, in casu the ability to understand speech-in-noise, representative for day-to-day listening situations.

Researcher(s)

Promoter: Coene Martine
Promoter: Dirckx Joris
Co-promoter: Dirckx Joris

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

AMICA - Automatic monitoring for cyberspace applications. 01/10/2010 - 30/09/2011

Abstract

Researcher(s)

Promoter: Daelemans Walter

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Statistical Relational Learning of Natural Language. 01/01/2010 - 31/12/2013

Abstract

This project wants to investigate how techniques of statistical relational learning can be used for natural language processing. The focus will be on challenging natural langauge processing tasks, such as semantic role labeling, where syntac and semantic depedencies, structured and unstructured data, local and global models, and probabilistic and logical information must be combined with one another. For what concerns statistical relational learning, the emphasis will lie on probabilistic extensions of the programming language Prolog. The project does not only aim at obtaining improved natural language processing techniques but also better algorithms and systems for statistical relational learning.

Researcher(s)

Promoter: Daelemans Walter

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

A Safer Internet: (Semi)automatically Recognizing Internet Paedophilia in Multilingual Online Social Networks. 01/01/2010 - 31/12/2013

Abstract

In this project we on the one hand propose a methodology to (semi)automate the manual control of peer-to-peer networks and on the other hand a methodology for the automatic extraction and analysis of stylistic characteristics (associated to personality, age group and deceptive language usage) which we want to apply to both individual internet paedophiles and groups of paedophiles in chat rooms.

Researcher(s)

Promoter: Daelemans Walter
Co-promoter: Van Vaerenbergh Leona
Fellow: Peersman Claudia

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project website

http://www.clips.uantwerpen.be/projects/daphne

Project type(s)

Research Project

Adolescent chat language in Flanders: the language geography of Flemish (sub)standardization processes againts the background of the international chat scene. 01/01/2010 - 31/12/2013

Abstract

RESEARCH QUESTIONS: -To what extent do Flemish teenagers integrate morpho-syntactic and phonological features of the Brabantic regiolect in their chat language: what are the relative frequency scores of the Brabantic regiolect variants, the non-Brabantic regiolect variants and the Standard Dutch variants for several variables? -What is the impact of the independent variable 'hometown'? Is there a correlation between the relative representation of Brabantic regiolect features and the region where the chatters come from? To what extent do teenagers from the provinces of West-Flanders, East-Flanders and Limburg integrate morpho-syntactic and phonological features from the Brabantic regiolect in their chat language? In other words: to what extent do the data reveal an expansion of Brabantic features? -What is the impact of local versus supraregional communication? Do teenagers who do not live in the Brabantic dialect area use more morpho-syntactic and phonological Brabantic features in 'interregional' than in 'intraregional' or local chat communication? Do the answers to the previous questions confirm the hypothesis that the linguistic situation in Flanders is marked by an autonomous informal standardisation process which is marked by a generalization/increasing use of the Brabantic 'tussentaal' (regiolect, intermediate language)? What are the implications of this study with respect to the relevance/applicability of chat data for the study of language variation and language change in progress? -What is the pragmatic function of several varieties? What is the position and function of English in the linguistic repertoire of Flemish teenagers? How does the chat language of Flemish teenagers connect with the international chat culture? What kind of appropriation (localization processes) can be discriminated?

Researcher(s)

Promoter: Vandekerckhove Reinhild

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Interpersonal communication training of natural language interaction with autonomous virtual characters (deLearyous). 01/01/2010 - 31/12/2012

Abstract

The goal of the deLearyous project is to develop an interactive serious 3D-game for training interpersonal communication skills in a professional context, e.g., employer-employee or customer-seller relations. The game allows trainees to interact woth virtual autonomous characters who react in a realistic and expressive way to the input of the trainee. In this way, the trainee can exercise different behavioural patterns and roles in a safe virtual environment. The role of CLiPS in the project is to develop algorithms and methods for emotion analysis of text, topic detection in text, and dialogue management.

Researcher(s)

Promoter: Daelemans Walter

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Project TST Tools for Dutch as Web services in a Workflow (TTNWW). 01/01/2010 - 30/09/2012

Abstract

This project represents a formal research agreement between UA and on the other hand the Flemish Public Service. UA provides the Flemish Public Service research results mentioned in the title of the project under the conditions as stipulated in this contract.

Researcher(s)

Promoter: Daelemans Walter

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

A web service for stylometry and readability research for the Dutch language (STYLENE). 01/01/2010 - 31/12/2011

Abstract

The goal of this project is to implement a robust, modular system for stylometry and readability research on the basis of existing techniques for automatic text analysis and machine learning, and the development of a web service that allows researchers in the humanities and social sciences to analyze texts with this system. In this way, the project will make available to researchers recent advances in research on the computational modeling of style and readability.

Researcher(s)

Promoter: Daelemans Walter

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

The computational learnability of morphologically complex languages. 01/10/2009 - 30/09/2012

Abstract

Goals of the project: Traditional spell checkers make use of an extensive word list. If a word does not occur in this list, it is marked as a spelling error. More recent systems (e.g. Németh 2009) approach the problem of spell checking for agglutinating languages from a different angle: a word is considered as a spelling error, if it cannot be generated by an underlying morphological model of the language. In this project, we investigate how such a spell checker can be used as a tool in the automatic induction of a morphotactic system for Swahili.

Researcher(s)

Promoter: Daelemans Walter
Fellow: De Pauw Guy

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Exemplar-based models of human sentence comprehension. 01/10/2009 - 30/09/2011

Abstract

This is a fundamental research project financed by the Research Foundation - Flanders (FWO). The project was subsidized after selection by the FWO-expert panel.

Researcher(s)

Promoter: Sandra Dominiek
Co-promoter: Daelemans Walter
Fellow: Vandekerckhove Bram

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Towards a synthesis of knowledge based and data-based methods in computer linguistics. 01/10/2009 - 30/09/2010

Abstract

Hybrid systems for natural language processing that combine deep analysis, based on linguistic insight, with inductive data-oriented methods, can provide a significant improvement of the accuracy and applicability of computational linguistics. There are, however, many different ways in which this kind of hybridisation can be achieved. In this project, I will look at cognitive science as an inspiration source for new hybrid approaches. This work will build on earlier work on memory-based language processing as a cognitively relevant model.

Researcher(s)

Promoter: Daelemans Walter

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Machine learning for data mining and its applications. 01/01/2009 - 31/12/2013

Abstract

The research community aims at strengthening and coordinating the Flemish research about machine learning for datamining in general, and important applications such as bio-informatics and textmining in particular. Flemish participants: Computational Modeling Lab (VUB), CNTS (UA), ESAT-SISTA (KU Leuven), DTAI (KU Leuven), ISLab (UA).

Researcher(s)

Promoter: Daelemans Walter

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Language processing. 01/01/2009 - 31/12/2013

Abstract

This is a fundamental research project financed by the Research Foundation - Flanders (FWO). The project was subsidized after selection by the FWO-expert panel.

Researcher(s)

Promoter: Sandra Dominiek

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

The source of masked priming effects: lexical or episodic memory? 01/01/2009 - 31/12/2012

Abstract

Masked priming is a technique in which a word is presented so briefly that it cannot be consciously perceived, while at the same time it has an effect on the processing speed of a subsequently presented word. For this reason the technique is often used to investigate the nature of memory structures and processes underlying word recognition. However, recently the lexical nature of these masked priming effects has been called into question by Bodner and Masson (2003, 2004, 2006). Do these effects reflect the structure of the mental lexicon or do they reflect residual activation in episodic memory, where personal experiences are stored? A series of experiments is planned to investigate whether a lexical interpretation of the effect can be defended. Given the popularity of the technique the results of this research can have far-reaching consequences with respect to the theory formation on the mental lexicon.

Researcher(s)

Promoter: Sandra Dominiek

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project website

http://www.cpl.ua.ac.be/

Project type(s)

Research Project

Artificial Creativity in visual communication and arts: an algorithm for inventive and evolving development of concepts and visualization of data. 01/10/2008 - 30/09/2012

Abstract

Using common techniques in Artificial Intelligence a software algorithm is developed that summarizes, interprets and processes textual content (or data sets). In an attempt to simulate human creativity the key concepts in this content are interrelated and recombined into creative and innovative graphical solutions and visualizations. The visual output evolves as the source data changes and expands.

Researcher(s)

Promoter: Daelemans Walter
Fellow: De Smedt Tom

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Priming effects and dialect acquisition: rule-based versus exemplar-based accounts. 01/10/2008 - 27/08/2012

Abstract

This is a fundamental research project financed by the Research Foundation - Flanders (FWO). The project was subsidized after selection by the FWO-expert panel.

Researcher(s)

Promoter: Gillis Steven
Fellow: Rys Kathy

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Sabbatical leave. 01/10/2008 - 30/09/2009

Abstract

Researcher(s)

Promoter: Gillis Steven

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

FlaReNet: Fostering Language resources Network. 01/09/2008 - 01/09/2011

Abstract

International cooperation and re-creation of a community are the most important drivers for a coherent evolution of the Language Resource (LR) area in the next years. FlaReNet will be a European forum to facilitate interaction among LR stakeholders. Its structure considers that LRs present various dimensions and must be approached from many perspectives: technical, but also organisational, economic, legal, political. The Network addresses also multicultural and multilingual aspects, essential when facing access and use of digital content in today's Europe.

Researcher(s)

Promoter: Daelemans Walter

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

DUAL-PRO. Dual electric-acoustic speech processor with linguistic assessment tools for deaf individuals with residual low frequency hearing. 01/07/2008 - 30/06/2010

Abstract

To date, individuals with sensori-neural hearing loss may benefit from either acoustic stimulation (classical hearing aids) or electric stimulation (cochlear implants). Classical hearing aids are best suited for moderate and severe hearing losses and cochlear implants for profound losses. Cochlear implants enable profoundly deaf patients to reach high levels of speech intelligibility, but they are inadequate for the perception of music. The reason for this is that implants are conceived to code for the mid and high frequencies of sound ("spectral coding") since speech information is mainly contained in these frequencies. Implants are not performing well in the coding of low frequencies ("temporal coding"). These frequencies contain mainly information related to tonality, musicality, timbre, etc. Hearing aids perform much better in the temporal coding of low frequencies. Since most profoundly deaf persons have profound losses in the mid and high frequencies while they often have residual hearing in the low frequencies, the combination of the spectral coding of a cochlear implant with the temporal coding of a hearing aid, seems promising in improving the auditory performance of implant-wearers. In addition, temporal information seems of specific importance for the linguistic development in young children and it is anticipated that improving the low frequency perception may significantly enhance their linguistic capacities, thus decreasing their handicap and increasing the probability of mainstream integration. Main objectives of the proposed project: (i) to optimise deaf patients' hearing experience by developing a new hearing device which combines both types of stimulation in the same ear; (ii) to develop a test battery for prosody reception, i.e. the perception of language rhythm and melody; and (iii) to use this new prosody test battery as a quality measure for the current generation of cochlear implants and classical hearing aids, as well as for the newly developed hybrid electric-acoustic prototype.

Researcher(s)

Promoter: Coene Martine
Co-promoter: Gillis Steven

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

NEON: subtitling in Dutch. 01/06/2008 - 31/05/2009

Abstract

In this project, CNTS develops a system for automatic subtitling on the basis of the output of speech recognition. Such a system allows the simplification and shortening of sentences when needed without making them ungrammatical and without loosing their essential meaning. As a methodology, a combination of rule-based and statistical techniques was chosen. In the project, we cooperate among others with the Belgian and Dutch television and with the speech recognition research group of the University of Leuven.

Researcher(s)

Promoter: Daelemans Walter

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

An evaluation corpus for automatic summarization. 01/01/2008 - 30/04/2009

Abstract

In this BOF project we aim to create an evaluation and development corpus for automatic summarization for Dutch. Automatically generated summaries can help to search and present large amounts of information. An important aspect in the development of automatic summarizers is the use of evaluation methods. We will construct an evaluation corpus consisting of 200 texts and minimally 5 different summaries per text.

Researcher(s)

Promoter: Hendrickx Iris

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Adolescent chat language in Flanders: the language geography of Flemish (sub)standardization processes. 01/01/2008 - 31/12/2008

Abstract

research questions: -To what extent are morpho-syntactic Brabantic regiolectic features integrated in the chat language of Flemish adolescents? -What is the impact of the regional background of the chatters? -Do the analyses confirm the hypothesis that Flanders is subject to an autonomous informal standardisation process characterized by an increasing supraregional use of Brabantic regiolect? -Does chat language offer us a new possibilities and challenges for the study of language variation and language change in progress?

Researcher(s)

Promoter: Vandekerckhove Reinhild

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Mind your Syntax. Oral language development and development of Theory of Mind in deaf children with a cochlear implantaat. 01/12/2007 - 30/11/2010

Abstract

Research in the field of developmental cognitive neuroscience has shown that audition and language exhibit plasticity, i.e. the ability to modify pre-existing neural synaptic connections dedicated to particular cognitive systems, depending on the quantity and quality of the environmental stimuli during a specific developmental stage. However, there is very little consensus in the literature with respect to the precise limits of these windows of opportunity. In this project we will tackle the issue of plasticity of the auditory system and its effect on language and general cognitive development. Two main hypotheses will be tested (i) the development of sensory, language and higher cognitive systems is triggered by qualitatively and quantitatively sufficient stimuli within a well-determined time window; and (ii) language plays a crucial role in higher cognitive development, more particularly in Theory of Mind development. These hypotheses will be tested on populations of children that have been deprived from sound due to congenital deafness. Comparative cohort studies of oral Dutch deaf children who have received cochlear implants at different ages will enable us to answer the central question of this project, namely whether cochlear implantation early in life leads to better auditory perception, providing the redundancy necessary for incidental language learning and higher cognitive development.

Researcher(s)

Promoter: Gillis Steven
Fellow: Coene Martine

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Publication of the monograph "Vocaalreductie in het Standaardnederlands in Vlaanderen en Nederland" (vocal reduction in the standard dialect in Flanders and the Netherlands). 09/10/2007 - 31/12/2007

Abstract

Researcher(s)

Promoter: Kloots Hanne

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

What verbs want: an exemplar-based model of human sentence processing. 01/10/2007 - 30/09/2009

Abstract

Researcher(s)

Promoter: Sandra Dominiek
Fellow: Vandekerckhove Bram

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Price of the Research Council 2007. 01/08/2007 - 31/08/2007

Abstract

Researcher(s)

Promoter: Schauwers Karen

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Text Mining on heterogeneous knowledge bases. An application to optimised discovery of disease relevant genetic variants 01/07/2007 - 30/06/2011

Abstract

The project proposes a methodology for text mining with heterogeneous information sources and its application to molecular genetics/genomics and knowledge management. State of the art text analysis and graph-based data mining techniques will be extended to make the methodology possible, and the methodology will be applied in a biomedical application (ranking of candidate disease-causing genes) and a knowledge management application (person profiling from www information).

Researcher(s)

Promoter: Daelemans Walter
Co-promoter: Del-Favero Jurgen
Co-promoter: De Rijk Peter
Co-promoter: Paredaens Jan

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project website

http://www.biograph.be/project/project

Project type(s)

Research Project

Computational Techniques for Stylometry for Dutch. 01/01/2007 - 31/12/2010

Abstract

In this project we investigate a methodology for the automatic extraction and analysis of style that we want to apply to both individual authors (authorship attribution, both fiction and non-fiction) and groups of authors (extraction of stylistich characteristics associated to gender and age). This methodology covers several aspects: (1) Automatic linguistic analysis of documents by means of available text analysis tools on the level of morphological structure, part of speech, global syntactic structures and semantic roles (subject, object, temporal, location) for the construction of potentially relevant stylistic characteristics. (2) Unsupervised and supervised learning techniques for selecting characteristics with high information value and constructing a model of authorial style. (3) Evaluation of these models by (a) comparison with stylistic analyses in linguistics and literary science and (b) empiric testing of the predictive power of the models.

Researcher(s)

Promoter: Daelemans Walter
Co-promoter: De Pauw Guy

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Unlocking the teachers' room. Archiving and making available a collection of spoken Standard Dutch, produced by teachers of Dutch. 01/01/2007 - 31/12/2008

Abstract

The aim of this project consists in systematically archiving and making available 200 hours of spoken Standard Dutch, produced by 160 Flemish and Dutch teachers of Dutch. The speech collection concerned is highly valuable. With respect to the composition of the corpus several social and linguistic variables were taken into account. Furthermore, the recordings are of high (stereo) quality. Therefore this corpus can be used for phonetic, phonological as well as for sociolinguistic purposes.

Researcher(s)

Promoter: Kloots Hanne

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

The Sawa Corpus ¿ a parallel corpus "English ¿ Kiswahili". 01/01/2007 - 31/12/2008

Abstract

This project aims to develop an aligned parallel corpus for the language pair English ¿ Kiswahili by means of semi-automatic annotation. This alignment not only facilitates research on statistical machine translation, but also enables projection of annotation between the two languages. In this project we investigate how dependency analyses can be projected from a source language (English) unto a target language (Kiswahili).

Researcher(s)

Promoter: De Pauw Guy

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Gravital: parsing and problem-solving in natural language as an engine for generating visual communication and art. 01/01/2007 - 31/12/2008

Abstract

The project addresses the application of parsing of natural language and problem solving as tools for the generation of visual communication and art. Within the context of the NodeBox application, we will adapt the MBSP shallow parser to the domain of design and visual communication and help integrating it into the NodeBox application.

Researcher(s)

Promoter: Daelemans Walter

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

The influence of hearing on the early lexical development of deaf children with and without cochlear implants. 01/01/2007 - 31/10/2007

Abstract

In congenital deaf children with Cochlear Implants early language is acquired in two modalities, with both spoken words and signs; deaf children without CI normally acquire their language monolingually, namely by signs. By studying the early lexical acquisition of both groups longitudinally and by comparing the results with those of normally hearing children, this study will answer the question whether in children with CI a simultaneous acquisition with influence of one modality on the other is the case, or two separate developmental paths for both modalities.

Researcher(s)

Promoter: Gillis Steven

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

The acquisition of Romanian morphosyntax. 01/01/2007 - 31/10/2007

Abstract

Drawing on the first longitudinal corpus of child Romanian, the research project aims at a systematic analysis of the morphosyntactic development of Romanian monolingual children. Organized around the acquisition of the main functional domains within the clause, the research focusses on the relationship between the features of functional categories and syntactic operations.

Researcher(s)

Promoter: Coene Martine
Fellow: Avram Larisa

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Linguistic description of resource-scarce languages using machine learning techniques. 01/10/2006 - 30/09/2009

Abstract

Linguistically annotated corpora are an important tool in the development of Natural Language Processing (NLP) applications. For commercially interesting languages, these corpora can be used to induce accurate and robust NLP tools to process new data. If no such corpora exist, which is by definition the case for resource-scarce languages, the traditional data driven algorithms are largely useless. This project investigates the automated linguistic description of minority languages on the basis of alternative classification techniques. The algorithms researched in this project avoid the need for annotated data in the target language by automatically inducing a classification, either on the basis of free text (technique: "unsupervised learning") or by using existing annotated corpora in another language (technique: "knowledge transfer"). The methodology proposed in this project allows for a hitherto largely unexplored systematic comparison and evaluation of these techniques.

Researcher(s)

Promoter: Daelemans Walter
Fellow: De Pauw Guy

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Conceptual viewpoints: Elements of a cognitive account of English tense. 01/10/2006 - 31/03/2008

Abstract

The main objective of this project is to provide an abstract and comprehensive account of English tense, on the basis of cognitive mechanisms which may be independently motivated. The empirical work on tense, aspect, and modal markers in English will serve to inform this account, which aims at a level of explicitness deemed necessary for the purpose of modeling a language's tense system.

Researcher(s)

Promoter: Sandra Dominiek
Fellow: Brisard Frank

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

DAESO - Detecting and exploiting semantic overlap. 01/06/2006 - 31/05/2009

Abstract

The well-known fact that similar information can be expressed in many different ways is one of the major challenges in building robust NLP applications. It is commonly assumed that such applications can be improved with knowledge of how natural language expressions relate to each other, for instance in terms of paraphrases (same semantic content, different wording) or entailments (one expression implied by the other). DAESO investigates the detection of semantic overlap between Dutch sentences and the exploitation of this knowledge in a range of NLP applications. For this purpose, tools will be developed for the automatic alignment and classification of semantic relations (between words, phrases and sentences) for Dutch, as well as for a Dutch text-to-text generation application which fuses related sentences into a single grammatical sentence, which may be a generalization, a specification or a reformulation of the input sentences. To facilitate development and testing of these tools, an annotated monolingual Dutch parallel/comparable corpus of 1M words will be developed, consisting of pairs of texts that express comparable information. The utility of the resources and tools will be demonstrated in the context of three applications: (1) question-answering systems (improved recall, more complete answers), (2) information extraction (improved recall), and (3) summarization (beyond extraction: sentence compression, sentence fusion, anaphora resolution).

Researcher(s)

Promoter: Daelemans Walter

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project website

http://daeso.uvt.nl/

Project type(s)

Research Project

Literacy development in bilingual children: Evidence from French-English and French-Dutch Immersion programs. 01/06/2006 - 31/05/2008

Abstract

Literacy development in bilingual children: Evidence from French-English and French-Dutch Immersion programs.

Researcher(s)

Promoter: Sandra Dominiek

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

APRO: Annotation of gender agreement and of the pleonastic versus anaphoric use of pronouns in Dutch. 01/03/2006 - 31/12/2007

Abstract

This project focuses on two problems in Dutch pronominal anaphora resolution: the detection of the pleonastic and anaphoric use of pronouns and the detection of pronouns refering to the linguistic gender of their antecedent. This project aims at the annotation of Dutch text material with regard to these two specific problems. This newly annotated text material will be integrated in an existing system for reference resolution in Dutch.

Researcher(s)

Promoter: Hoste Veronique

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Computational Linguistics and Language and Speech Technology. 01/01/2006 - 31/12/2010

Abstract

CLIF is the Flemish organization for computational linguistics, language technology and speech technology. The goal of the association is to stimulate research cooperation among the groups and the development of tools en resources the development of which is impossible by individual participating groups.

Researcher(s)

Promoter: Daelemans Walter

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Speech and language acquisition in Dutch speaking children with different degrees of hearing: Hearing children and deaf children with a cochlear implant. 01/01/2006 - 31/12/2009

Abstract

The aim of this project is to investigate segmental, intrasyllabic and intersyllabic co-occurrence patterns in prelexical babbling, and the acquisition of phonological segments and patterns in the early lexical period. Longitudinal data of deaf children with a cochlear implant (implanted in the first/second year of life) will be compared with those of a hearing age matched cohort in order to establish if they develop language in the same sequence and according to the same patterns as hearing children, and whether the delay that older implanted children show in reaching language acquisition milestones, still exists for the very early implanted children.

Researcher(s)

Promoter: Gillis Steven
Co-promoter: Govaerts Paul

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Acoustic phonetic analysis of the speech of very young children with a cochlear implant. 01/10/2005 - 30/09/2009

Abstract

The aim of this project is to investigate acoustic-phonetic characteristics of the speech of young congenitally deaf children who received a cochlear implant in their first year of life. In particular the acoustic characteristics of their babbling will be investigated in order to detect discrepancies with the babbling of hearing infants. In addition we will analyze spontaneous speech of these children at the age of six, and investigate whether it displays the typical characteristics of "deaf speech", and we will try to relate these characteristics to the infants' vocalizations in their first year of life.

Researcher(s)

Promoter: Gillis Steven
Co-promoter: Govaerts Paul

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Variation in the pronounciation of Standard Dutch: schwa epenthesis in Flanders and The Netherlands. 01/10/2005 - 30/09/2008

Abstract

Researcher(s)

Promoter: Gillis Steven
Fellow: Kloots Hanne

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Electropalatographic investigation of articulatory settings in geographically determined language varieties of Dutch. 01/10/2005 - 31/12/2007

Abstract

Researcher(s)

Promoter: Verhoeven Jo

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Coreference Resolution for Extracting Answers. (STEVIN - COREA) 01/05/2005 - 31/10/2007

Abstract

Coreference resolution is a key ingredient for the automatic interpretation of text. It has been studied mainly from a linguistic perspective, with an emphasis on establishing potential antecedents for pronouns. Practical applications, such as Information Extraction (IE), summarization and Question Answering (QA), require accurate identification of coreference relations between noun phrases in general. Computational systems for assigning such relations automatically, require the availability of a sufficient amount of annotated data for training and testing. For Dutch, annotated data is scarce and coreference resolution systems are lacking. In COREA, a robust system for assigning such relations automatically will be developed, and we will investigate the effect of making coreference relations explicit on the accuracy of systems for IE and QA.

Researcher(s)

Promoter: Daelemans Walter

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Phonological segmentation processes in English- and Dutch-speaking kindergartners and beginning readers. The role of language and phonetic factors. 01/05/2005 - 31/12/2006

Abstract

This study examines how English- and Dutch-speaking prereaders segment speech at an unconscious level, more specifically which cohesion patterns they prefer to others. Important variables are language, phonetic characteristics of segments, letter knowledge and literacy. A further aim is to investigate whether individual differences in implicit segmentation processes are also reflected in the children's later early reading development.

Researcher(s)

Promoter: Geudens Astrid

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Morphosyntactic annotation of three Dutch child language databases. 01/05/2005 - 31/12/2006

Abstract

The goal of this project is to add morphosyntactic annotation to three child language databases: the Maarten database, the CLPF database and the CI database. We will add a high quality morphological coding, we will apply a consistent interpretation and annotation of filler syllables, and we will indicate all base NP's in the databases.

Researcher(s)

Promoter: Taelman Helena

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Exploitation of CGN annotation for portability to new information sources. 01/05/2005 - 31/12/2006

Abstract

Researcher(s)

Promoter: De Pauw Guy

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Lexical and morphosyntactic development in young children with a cochlear implant : A crosslinguistic study of Dutch and Hebrew. 01/01/2005 - 31/12/2008

Abstract

The first aim of this project is to study patterns of productive spoken language acquisition in children who received a CI early in their second year of life. The children's language acquisition will be compared with that of a matched group of normaly hearing (NH) chiljren. The second aim of the project is to study language acquisition crosslinguistically: language acquisition will be compared of chldren acquiring Dutch and Hebrew as their native language. In the language specific part of the project as well as in the crosslinguistic part, we will focus on the following aspects: -The study of early lexical and morphosyntactic development of children with a Cochear Implantation (inl)lantation age: between 1 ;0 and 1 ;06); -Comparison of CI children with normal hearing children of the sam age/level of language acquisition; -Comparison of CI children and NH children's development in two typologically different languages, viz. Dutch and Hebrew, which enables the testing of specific hypotheses concerning the mechanisms of language acquisition..

Researcher(s)

Promoter: Gillis Steven

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

The purpose and desirability of Dutch-Dutch subtitling of tv programmes in Flanders: an audience-focused investigation. 01/01/2005 - 31/12/2006

Abstract

This research project investigates a new development on Flemish television, i.e. the increasing occurrence of Dutch-Dutch subtitled programmes. It aims to investigate the desirability of this trend with respect to the way in which Flemish viewers experience their linguistic identities, that is, which 'Dutch' or 'Flemish' they consider to be their mother tongue, which variants are readily understood (and which are not), and which are experienced as 'foreign'.

Researcher(s)

Promoter: De Houwer Annick
Co-promoter: Vandekerckhove Reinhild

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

The link between implicit segmentation patterns and the development of explicit segmentation, reading, and writing skills. 01/10/2004 - 20/11/2007

Abstract

The longitudinal study examines how prereaders at an unconscious (implicit) and intentional (explicit) level and investigates whether individual differences in the early, implicit segmentation process are also reflected in the children's later development of explicit segmentation skills, early reading, and writing.

Researcher(s)

Promoter: Sandra Dominiek
Fellow: Geudens Astrid

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Syntactic aspects of the impaired acquisition of determiners. 01/10/2004 - 30/09/2007

Abstract

The project aims at studying the developmental pattern of early morphosyntax in 3 groups of language-impaired children (children with SLI, classical hearing aids and a cochlear implant, CI) and to verify whether the results are related to the clinical characteristics of the children. We focus on one particular aspect of nominal syntax, i.e. the acquisition of determiners in SLI, HI and CI-children compared to a control group of normally developing hearing children. The following research questions will be addressed: (i) in which way does the acquisition of the determiner system in SLI-children differ from normally language developing children: is there a temporary or permanent delay in the projection of a syntactic D-level and if so, what is the cause for the delay?; (ii) does the syntactic development of CI-children surpass that of children who use conventional HA (cfr. Van den Broek 1998 contra Geers 2003 for speech perception and production)?; (iii) does the syntax of very early implanted CI-children develop at pace with that of a normal hearing control group or are there similarities with other language impairments which typically show grammatical deficits; (iv) which are the factors that positively influence the acquisition of determiner syntax in CI-children; (v) from a theory-internal point of view: is neurological maturation responsible for the projection of a D-position in syntax? Is it input-sensitive and therefore positively influenced by an increase in auditory perception?

Researcher(s)

Promoter: Gillis Steven
Fellow: Coene Martine

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

A constructivist analysis of 'fillers' in Dutch child language. 01/10/2004 - 30/09/2007

Abstract

Young children often insert 'fillers' in their first multiwordutterances: vocalizations that do not correspond to conventional words. For instance, it is hard to determine the meaning of the syllables [m] and [\] in utterance (a). Fillers often have the shape of a syllabic nasal or a schwa, as in utterances (a) and (b). But sometimes they consist of several syllables, as in utterance (c). (a) [m] pick ['] flowers (English learning boy, age 1;6; from Peters and Menn, 1993) (b) ['] oiseau ['] vole (Frensh learning girl, age 1; from Veneziano and Sinclair, 2000) (c) [lala] open door (English learning girl, age 1;10; from Feldman and Menn, 2003) Fillers typically occur at positions that are occupied by function morphemes in the adult language (like articles or pronouns). They are instantiations of an important language learning mechanism that has only recently been recognized as such: 'form-driven' learning. 'Form-driven' learning entails that the child first acquires the form, and gets full grips on the meaning and function of this form only later on. In other words, the child has discovered sound material at particular positions in the input, but has not yet analyzed the form and the function of this material accurately. Nevertheless, the child tries to integrate these elements in her own speech utterances. Little by little the child discovers the full distribution, function and shape of what turns out to be function morphemes. This learning mechanism contrasts with function-driven acquisition, as is proposed by nativist theories: morphosyntactic acquisition is interpreted as a self-unfolding plan of morphosyntactic functions that need to be stuffed with lexical material. Until now, fillers in Dutch child language have not yet been studied (except in the limited analysis of Wijnen et al., 1994). The aim of this research project is to investigate the role of fillers in the acquisition of Dutch, and to analyze the mechanism of 'form-driven' learning from a constructivist perspective on language acquisition.

Researcher(s)

Promoter: Gillis Steven
Fellow: Taelman Helena

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Semi-supervised learning of Information Extraction. 01/10/2004 - 31/12/2005

Abstract

Information Extraction (IE) is concerned with extracting relevant data from a collection of structured or semi-structured documents. Current systems are trained using annotated corpora that are expensive and difficult to obtain in real-life applications. Therefore in this project we want to focus on the development of IE systems using semi-supervised learning, a technique that makes use of a large collection of un-annotated and easily-available data.

Researcher(s)

Promoter: Daelemans Walter

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Situational Factors in Producing Inflected wordforms: a Psycholinguistic and Computational Approach. 01/01/2004 - 31/12/2007

Abstract

The production of inflected word forms like plural of past tenses is traditionally assumed to be a process that relies primarily on morphological, phonological and syntactical characteristics of the base form. Although descriptive grammars also mention metalinguistic factors in this context, they receive no attention in recent influential models of language production such as Steven Pinker's 1999 Words and Rules theory. However, in a recent experiment, we demonstrated that Dutch speakers do rely on metalinguistic information when producing plurals for Dutch pseudowords. Not only do these results undermine Pinker's assumption that Dutch has two default plurals that are applied solely on the basis of phonological information, but they also question whether models that have a rule-bases component are essentially capable of capturing metalinguistic information.

Researcher(s)

Promoter: Sandra Dominiek
Co-promoter: Daelemans Walter
Co-promoter: Martensen Heike

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Database of 14th century non literary Dutch texts. Construction and linguistic exploration. 01/01/2004 - 31/12/2007

Abstract

Researcher(s)

Promoter: Gillis Steven
Co-promoter: De Schutter Georges

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning. 01/12/2003 - 29/02/2008

Abstract

Pattern Analysis, Statistical Modelling and Computational Learning. (PASCAL) The objective of this FP6 network of excellence is to build a Europe-wide Distributed Institute which will pioneer principled methods of pattern analysis, statistical modelling and computational learning as core enabling technologies for multimodal interfaces that are capable of natural and seamless interaction with and among human users. The role of CNTS in the network is the application of machine learning techniques to problems in natural language processing.

Researcher(s)

Promoter: Daelemans Walter

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Time and subjectivity: a cognitive and comparative inquiry into the conceptual status of aspect and tense categories in grammar 01/10/2003 - 30/09/2006

Abstract

This study investigates the relation between categories of tense (and certain manifestations of grammatical aspect) on one hand, and differences in the degree of 'subjectification' marking their semantic pole on the other. Besides their grammatical status as grounding predications, these categories are subject to additional processes of subjectification, operating on the products of grammaticalization and thus transcending the transformation of a lexical into a grammatical predication. They give rise to the development of nonreferential meanings for items containing distinct elements of temporal reference in their prototypical uses. The present study therefore concentrates on usage types that are removed from the description of objective relations in time, moving towards the expression of subjective concerns. It is anticipated that (clausal) grounding predications demonstrate subtle internal as well as external distinctions in subjectivity, and thus in semantic status. Despite the clear focus on English in the case studies that are proposed, these remarks should be construed as holding universally.

Researcher(s)

Promoter: Sandra Dominiek
Fellow: Brisard Frank

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Semantic activation in reading interlingual homographs. 01/10/2003 - 31/12/2005

Abstract

Dutch-English bilinguals are tested in English experiments to investigate to what extent they suppress their knowledge of Dutch while readling in English. The critical items are Dutch/English homographs (e.g. <step> meaning 'scooter' in Dutch) that are employed in a semantic priming paradigm to test whether their Dutch meaning is activated automatically. The participants are tested on single words and on complete sentences to study the effect of sentence context on lexical processing. Primes are presented visually or auditory. Control experiments are conducted with English Monolinguals.

Researcher(s)

Promoter: Martensen Heike

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Reduction Phenomena in present-day Standard Dutch in Flanders and the Netherlands. 01/10/2003 - 30/09/2005

Abstract

The aim of this project is the study of reduction phenomena in spontaneous (= non-read) Standard Dutch. Reduction is studied in mono-, bi- and trisyllabic words, especially in pronouns, suffixes and loan words. We use speech that is already collected, digitalized and transcribed for the Corpus Gesproken Nederlands (Spoken Dutch Corpus), and as a part of the VNC-project Variation in the pronunciation of Standard Dutch. The VNC-speech consists of interviews with teachers of Dutch. From the Corpus Gesproken Nederlands, three components are selected: speeches, (non-read) lectures and lessons from high school teachers (except for Dutch lessons). These three types of spontaneous speech are fully comparable: it is non-broadcast speech, produced by one speaker before an audience. A more specific aim of this project is to verify the claim that the pronunciation of highly educated speakers without linguistic training differs from the pronunciation of teachers of Dutch, who are often considered to be prototypical speakers of Standard Dutch. This project links up with the renewed interest in standard language, where variation patterns in Standard Dutch in Flanders and the Netherlands are studied from a perspective of convergence and divergence. This study is also in line with international research of variation in standard languages, e.g. in German (e.g. Germany, Austria, Switzerland) and in French (e.g. France, Canada, Belgium).

Researcher(s)

Promoter: Gillis Steven
Co-promoter: De Schutter Georges
Fellow: Kloots Hanne

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

The use of very large textcorpora in the automatic discovery of structure in natural language. 01/10/2003 - 28/02/2005

Abstract

Large repositories of language samples exist today. Some examples are the text on the internet, and texts and dictionaries in many languages. However, these corpora are not always used when examining language hypotheses, or fundamental language questions. This gap is in the process of being filled, and this research hopes to be part of this development. The general aim is to arrive at a better use of existing language technologies in order to test specific hypotheses about the structure and function of language and about language change and typology.

Researcher(s)

Promoter: Daelemans Walter
Fellow: De Meulder Fien

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

The relevance of an onset-rime structure in implicit and explicit phonological awareness: A cross-linguistic study with English and Dutch speaking preschoolers and beginning readers. 01/03/2003 - 31/12/2005

Abstract

This study examines whether onset and rime are units in the child's developing phonological awareness. The onset-rime hypothesis is widely accepted bu mainly based on English research. Recent experiments in Dutch failed to support this hypothesis. To find out whether language differences account for this dissociation, a systematic cross-linguistic comparison will be conducted with English and Dutch preschoolers and first-graders. Tasks tapping into implicit and explicit phonological awareness will be used (e.g., recall task versus segmentation task).

Researcher(s)

Promoter: Martensen Heike
Co-promoter: Daems Frans
Co-promoter: Sandra Dominiek

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Are morphological representations in the mental lexicon modality-specific or modality-independent? An approach through masked cross-modal priming. 01/01/2003 - 31/12/2006

Abstract

The purpose of the current project proposal is to build on the existing knowledge from cross-modality effects in written and spoken word processing on the one hand and the priming literature on the other hand. There is a way to make one step forward if we can remove the shortcomings of intra-modal priming. Indeed, in the case of visual-visual-visual priming we cannot really address the issue of cross-modality integration as the phonological information is activated by the visually presented prime and not by an auditory stimulus presented to the participant. Using a different technique would allow us to better address the integration of information that is originally associated with different modalities (i.e., at stimulus input).

Researcher(s)

Promoter: Sandra Dominiek

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Biological Text Mining (BioMinT). 01/01/2003 - 31/03/2006

Abstract

The goal of the BioMinT project is to develop a generic text mining tool that (1) interprets diverse types of query, (2) retrieves relevant documents from the biological literature, (3) extracts the required information, and (4) outputs the result as a database slot filler or as a structured report. The consortium consists of biologists (University of Manchester, Swiss Institute of Bioinformatics) and data/text mining groups (CNTS Antwerp, PharmaDM, Austrian research Institute for AI, University of Geneva AI Lab).

Researcher(s)

Promoter: Daelemans Walter

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Semi-supervised learning of Information Extraction. 01/01/2003 - 30/09/2004

Abstract

Researcher(s)

Promoter: Tjong-Kim-Sang Erik
Co-promoter: Daelemans Walter
Co-promoter: Paredaens Jan

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

FLaVoR : Flexible Large Vocabulary Recognition : Incorporating linguistic knowledge sources through a modular recogniser architecture. 01/10/2002 - 30/09/2006

Abstract

In this project we investigate whether the 'all-in-one' strategy currently used in speech recognizers, in which task-specific, syntactic, and lexical knowledge are fused into a single model based on simple formalisms, can be replaced by a modular architecture in which apart from acoustic-phonetic and intonational features, also generic and domain-specific linguistic information sources can be used.

Researcher(s)

Promoter: Daelemans Walter
Co-promoter: Gillis Steven

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Functions of audiovisual prosody. 01/10/2002 - 30/09/2005

Abstract

This research proposal is concerned with a functional approach to verbal and visual prosody in spoken conversations. The problem to be addressed in the project is about the combined use of specific auditive cues (such as intonation, tempo, voice quality and pausing) and specific visual cues (such as facial expressions and specific body gestures) for marking different dialogue phenomena. First, we will explore how audiovisual prosody can be exploited to highlight the information status of words. Then, we will investigate how it can be used to signal whether or not the process of information exchange in a dialogue is going well. Next, we will explore how it can support the turn-taking mechanism in spontaneous interactions. Finally, we will see to what extent audiovisual prosody may reflect speakers' emotions and attitudes. The results of these different substudies will be integrated in one coherent, functional model of audiovisual prosody. All the questions will be tackled from the point of view of both the speaker and the listener, and from a crosslinguistic perspective. Insight into functional aspects of audiovisual prosody is relevant from both a theoretical and applied perspective. First, it is remarkable to observe that this important communicative device is still largely unexplored. Knowledge about how audiovisual prosody works may yield new insights into how people mark important words, deixis, turn-taking, discourse structure, etc. and more general into how languages can differ in the way they signal linguistic and paralinguistic phenomena. Second, there is an increasing interest in computer interfaces that rely on what is termed `embodied conversational agents', i.e., specific software components that appear to users as animated characters. To make these agents `believable' and `communicative', it is important to know in full detail how specific auditive and visual parameters contribute to speech communication.

Researcher(s)

Promoter: Daelemans Walter
Fellow: Swerts Marc

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Children's acquisition of phonotactic and prosodic knowledge: an empirist, inductive alternative for current nativist, deductive approaches. 01/10/2002 - 30/09/2004

Abstract

Optimality Theory (OT) is the central paradigm in current theorizing about phonological acquisition. OT is a deductive model: (a priori) linguistic knowledge is represented in the child's linguistic (grammatical) competence. In this project we explore an empirist, inductive alternative for this approach. An empirist, inductive model is defined as a model in which the mental lexicon is central in acquisition. Linguistic knowledge is collected and stored in the lexicon. The contrast between grammatical system and lexicon will be developed in according to four core dimensions: 1. Rules versus analogy 2. Stages versus lexical diffusion 3. Minimal versus maximal role for input 4. Competence versus processing We focus on the acquisition of phonotactic and prosodic knowledge, because these two areas are often presented as examples of deductive acquisition.

Researcher(s)

Promoter: Gillis Steven
Fellow: Taelman Helena

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Multilingual subtitling of multimedia content (MUSA). 01/09/2002 - 28/02/2005

Abstract

MUSA aims at the creation of a multimodal multilingual system that converts audio streams into text transcriptions, translates the transcriptions in other languages and then generates subtitles from these translated transcriptions. MUSA will operate in English, French and Greek. A state-of-the-art Speech Recognition system will be enhanced and improved to meet the project settings. An innovative Machine Translation scenario will be designed that combines a Machine Translation engine with a Translation Memory and a Term Substitution module. The Antwerp group will be involved in sentence condensing for subtitle generation, performed by an automatic analysis of the linguistic structure of the sentence.

Researcher(s)

Promoter: Daelemans Walter

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Machine learning for data mining and its applications. 01/01/2002 - 31/12/2006

Abstract

Researcher(s)

Promoter: Daelemans Walter

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Tonal dialects in Dutch : Structure, Perception and Function. 01/01/2002 - 31/12/2005

Abstract

This project investigates the phonetic and phonological nature of the Limburgian lexical tone distinction, its perceptibility, and its functioning in the interpretation of the information structure. Its aim is threefold. First, a broader data base, both phonological and phonetic, will be created by investigating two dialects in the Belgian province of Limburg, to complement existing Dutch data. Second, the variability in the phonetic salience of the tone contrast will be related to dialect's geographical proximity to nontonal dialect areas, to further our understanding of the nature of dialect contact and the erosion of the tone contrast during phonological change. Third, the extent to which the expression of the tone contrast depends on the expression of focus is to be investigated in two groups of dialects, a northern and a southern group, in order to establish the nature of the interaction between lexical tone distinctions and the possible expression of focus structure.

Researcher(s)

Promoter: Swerts Marc

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

The interaction between phonology and orthography in the process of visual word recognition: does dependency cause unity? 01/01/2002 - 31/12/2005

Abstract

In most languages with an alphabetical writing system, the pronunciation of a word is not simply the sum of the pronunciations of all its letters. There are several cases where the pronunciation of one letter is determined by another letter. Compare, e.g. The Dutch words MOOT-MOET-MORT, in which the pronunciation of the letter O is determined by the following letter. Languages do differ in the extent that letters depend on each other for pronunciation. This research project is aimed to establish how such interdependencies between letters with respect to their pronunciation affect the processes in word-recognition. The innovating power of this approach is to place the dissociation of rime effects in English and Dutch in a broader perspective. If one letter's pronunciation is determined by another letter, how does this affect word-recognition? The onset-rime effects are only one specific manifestation of this more general question.

Researcher(s)

Promoter: Sandra Dominiek
Co-promoter: Martensen Heike

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Semaduct : combining deductive and inductive techniques for lexical semantics. 01/01/2002 - 31/12/2005

Abstract

Goal of the project is to confront and integrate deductive and inductive approaches to computational linguistics in the area of lexical semantics. Subprojects include the combination of supervised and unsupervised machine learning methods for semantic knowledge acquisition and disambiguation, the incorporation of linguistic semantic knowledge in inductive approaches, and the refinement of existing semantic tag sets with machine learning techniques.

Researcher(s)

Promoter: Daelemans Walter

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

OntoBasis: Extraction of ontologies from text. 01/01/2002 - 31/12/2005

Abstract

The main goal of CNTS for this project is the application and adaptation of shallow parsing technology for (i) extraction of lexons (ontological relations from unstructured and semi-structured sources, (ii) evaluation of ontologies, and (iii) adaptation of ontologies (e.g. WordNet) to specific domains. A secondary goal is to investigate the use of ontologies to improve text analysis using shallow parsing.

Researcher(s)

Promoter: Daelemans Walter

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Incremental semantic processing of sentences: how do we arrive at specific interpretations? 01/10/2001 - 30/09/2004

Abstract

The goal of this proposal is to link notions from my own psycholinguistic research in semantic processing with the most recent linguistic theories in generative semantics. Eye-tracking experiments will be conducted that investigate linguistic principles that have been proposed to describe how enriched semantic interpretations are generated. This way, the Underspecification Model that I proposed for the processing of figurative language can be extended and refined. The ultimate aim is to arrive at a more general model of the on-line, incremental semantic processing of written texts.

Researcher(s)

Promoter: Sandra Dominiek
Co-promoter: Daems Frans
Fellow: Frisson Steven

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Psycholinguistics: Processing and Acquisition Processes of Reading and Spelling 01/01/2001 - 31/12/2005

Abstract

The purpose of this Scientific Research Community is to integrate the Flemish, Dutch, and international expertise in the study of (i) the acquisition of reading and spelling and (ii) the on-line processes in experienced readers and spellers. The central focus is the study of the reading and spelling of words (written word recognition and production), more particularly, the role of phonology and morphology and the importance of the way in which the spelling of the language represents these linguistic dimensions. Concrete goals are: the realisation of joint empirical work by several subteams of the Research Community (experiments, corpus analyses, simulation studies), more particularly within a cross-linguistic perspective, the exchange of expertise in the form of people and tools, and the organisation of workshops and one international conference.

Researcher(s)

Promoter: Gillis Steven

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Psycholinguistics: processing and acquisition aspects of reading and spelling. 01/01/2001 - 31/12/2005

Abstract

The purpose of this scientific research network is to integrate the Flemish, Dutch, and international expertise in the study of (i) the acquisition of reading and spelling and (ii) the on-line processes in experienced readers and spellers. The central focus is the study of the reading and spelling of words (written word recognition and production), more particularly, the role of phonology and morphology and the importance of the way in which the spelling of the language represents these linguistic dimensions. Concrete goals are: the realisation of joint empirical work by several sub-teams of the research network (experiments, corpus analyses, simulation studies), more particularly within a cross-linguistic perspective, the exchange of expertise in the form of people and tools, and the organisation of workshops and one international conference.

Researcher(s)

Promoter: Sandra Dominiek
Co-promoter: Gillis Steven

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Text Analysis and Machine Learning for Prosody. 01/01/2001 - 31/12/2004

Abstract

The aim of the project is to perform empirical investigations to determine whether adequate prosody can be generated on the basis of two methods that have recently shown success in other language processing domains: (a) robust analysis of text by analyses and metrics from information retrieval and information extraction, and (b) advanced machine learning systems and meta learners.

Researcher(s)

Promoter: Daelemans Walter
Co-promoter: Gillis Steven

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Language acquisition by children with cochlear implants: A longitudinal investigation 01/01/2001 - 31/12/2004

Abstract

In this project we study the auditory development, the speech and language acquisition in congenital deaf children with a cochlear implant (CI) implanted during their second year of life. Our aim is to systematically investigate the effect of the CI on different aspects of language and speech development: ? The effect of a CI on the auditory level; ? The effect of a CI on the articulatory level (the speech); ? The effect of a CI on language acquisition and communicative development. In essence, we want to investigate how access to the auditory information evolves and what impact that access to spoken language has on the child's own spontaneous speech and language. The scientific aims of the research proposal are (i) descriptive and (ii) fundamental. (i) Descriptive: a longitudinal description of the auditory development and speech-, language- and comminicative development after a CI. On the basis of this description we will be able to provide an answer to the following questions: Does language acquisition after a CI proceed in a qualitatively and/or quantitatively similar fashion as that in normal hearing babies? What is the level of spoken language development in CI-babies, as compared to normal hearing babies? Is there a qualitatively and/or quantitatively difference in the auditory development, speech- and language development between babies, depending on the age at which they receive a CI? (ii) Fundamental psycholinguistic aims: ? Study of the perception of segmental and supra-segmental characteristics of speech in relation to its production: ? Study of the phonological development on the segmental and suprasegmental level, focussing on the evolution of truncation patterns. ? Study of the lexical and morphosyntactic acquisition, focussing on the evolution of `function words' or closed class words with respect to open class words, an opposition related to perceptual salience. ? Study of communicative development, focussing on (1) the use and place of speech versus (conventional) signs, (2) the use of interactional means (attention seeking/fixing/'), (3) the magnitude and use of types of interaction turns by child and adult conversation partner.

Researcher(s)

Promoter: Gillis Steven
Co-promoter: Daelemans Walter

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

A computational psycholinguistic model of language acquisition. 01/10/2000 - 30/09/2012

Abstract

This project aims at developing a computational psycholinguistic model of children's primary language acquisition. Ultimately the model is meant to provide a computational psycholinguistic account of acquisition in a data-driven way, incorporating the structural aspects of input, the child's 'intake' of the input and the self-organizing mechanisms of the learner. The term 'computational psycholinguistic' is not only meant as a characterization of the type of theory to be developed, it also defines the methodology to be adopted: the acquisition of particular linguistic domains will be studied from a psycholinguistic perspective, viz. the investigation of child language corpora (and experimental testing of hypothesis), and from a computational perspective, viz. the use of artificial learning algorithms in simulations. Both methodologies will be implemented in an integrated fashion so as to maximize mutual informativeness and theoretical relevance. The relationship between the psycholinguistic and the computational perspective is twofold: (i) The articulation of a model of children's language acquisition in which structural aspects of the input language and the self-organizing mechanisms of the learner are related, acts as the unifying framework. (ii) Particular aspects of the acquisition of the phonology, lexicon and morphosyntax of Dutch will be studied both from a psycholinguistic and a computational perspective. Corpora will be used as primary data in psycholinguistic analyses and they will be used as input material for the artificial language learners. The performance of the latter can be evaluated using the actual acquisition patterns of the children studied.

Researcher(s)

Promoter: Gillis Steven
Fellow: Gillis Steven

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Atranos: automatic transcription and normalisation of speech 01/10/2000 - 30/09/2004

Abstract

The project aims at contributing to the development of better products for the automatic verbatim transcription of speech, and for the conversion of these transcriptions to a form that is better adapted to the needs of the end-user. One application which will be studied as a case study is the generation of subtitles for the benefit of hearing-impaired people. CNTS will investigate learning techniques for the transcription of out-of-vocabulary items, and statistical techniques for aligning and predicting subtitle text from transcriptions.

Researcher(s)

Promoter: Daelemans Walter

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Scientific research Community for Computational Linguistics and Language and Speech Technology 01/01/2000 - 31/12/2004

Abstract

The goal of this scientific research community (CLIF, Computational Linguistics in Flanders), is to bring together the academic research expertise on language and speech technology for Dutch present in Flanders. CLIF will promote and facilitate fundamental, multidisciplinary, and application-oriented research in this area and provide advice to users of language and speech technology.

Researcher(s)

Promoter: Daelemans Walter

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Corpus Spoken Dutch - Flemish part. 01/06/1998 - 30/11/2003

Abstract

The Dutch-Flemish project `Corpus Spoken Dutch' aims at collecting 10 million spoken words of present day (standard) Dutch. This corpus will have important technological applications since it will play an essential role in the development of automatic speech recognition, and in this way it will prove to be invaluable in safeguarding the position of Dutch as a (minority) language in multilingual Europe. The corpus will be important for other disciplines as well: lexicography, teaching, children's speech and language development, sociolinguistics, psycholinguistics, phonetics and phonology and conversational analysis.

Researcher(s)

Promoter: Gillis Steven

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project

Computational psycholinguistics : natural and artificial language acquisition and processing. 01/01/1998 - 31/12/2003

Abstract

The issue of abstract representations in the domains of language acquisition and adult language processing is addressed in this project. Is it possible to learn a subdomain of language without prior linguistic knowledge in this domein '? Can one achieve the final learning stage (adult performance) without developing abstract representations ? A new methodology will be used to study these questions. The research will explicitly combine the techniques that are used in three separate disciplines: language acquisition research, psycholinguistics, and artificial intelligence. Whereas the former two take the real language learner/user as their object of study, the latter one studies the artificial language learner/user. Thus far artificial learning models have always been used to simulate effects observed in actual language use. Whereas simulation reveals the computational power of the learning system and suggests interesting hypotheses on the real language learner/user, it does not falsify hypotheses generated in, for instance, psycholinguistic work. In our research we want to use artificial language learners/users in a radically different way. Apart from having them simulate effects from real language use we want to isolate factors that affect the models behaviour and then study the effects of these same factors in psycholinguistic experiments and in language acquisition data. In case of a different outcome, the effects observed in real language users can then be used to adept the architecture of the artificial learning model and see whether its performance can eventually be matched to that of the language user. This method of relating the results from acquisition and psycholinguistic research to computational work and vice versa is essentially a heuristic for discovering properties of the representational architecture for language in the real language learner/user. This basic issue, and the methodology to study it, will be approached in two linguistic domains: phonology and inflectional morphology. In phonology, the linguistic representation of stress patterns, phonotactic restrictions, and syllable structure will be studied. In morphology, irregularity effects in the past tense forrnation in Dutch will be used to study the issue of the single-route versus dual-route architecture (i.e., rules for regular forms' a lexicon for the irregular ones). A study of the factors causing interference errors in the spelling of (highly regular) past tense forms in Dutch (regular forms affecting other regulars) will shed light on the issue.

Researcher(s)

Promoter: Gillis Steven
Co-promoter: Daelemans Walter
Co-promoter: De Schutter Georges
Co-promoter: Sandra Dominiek

Research team(s)

Centre for Computational Linguistics, Psycholinguistics and Sociolinguistics (CLiPS)

Project type(s)

Research Project