Ongoing projects

Bioinformatics Solutions For the Comprehensive Study of the Human Immunopeptidome. 01/01/2025 - 31/12/2028

Abstract

The adaptive immune system works by recognizing and responding to infected or malignant cells by recognizing peptides bound to major histocompatibility complex (MHC) molecules. This induces an immune response by producing antibodies or directly attacking infected or abnormal cells to eliminate the threat. Mass spectrometry-based immunopeptidomics is a key approach to understand the adaptive immune system by identifying and characterizing peptides presented on MHC molecules. However, there is a lack of optimized bioinformatics tools for immunopeptidomics data analysis, resulting in very low spectrum annotation rates and missing out on important insights into the immune system. To overcome this challenge, we will develop a powerful de novo immunopeptide sequencing solution using deep learning to uncover increased biological knowledge from immunopeptidomics data. We will apply this tool to study the presence of aberrant peptides, e.g. due to errors in translation or transcriptional splicing, and non-human peptides, originating from pathogens and other organisms, in the human immunopeptidome. These innovations have the potential to unlock new biological and biomedical insights into the adaptive immune system that will catalyze the development of novel immunotherapies and vaccines.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Artificial Intelligence to Uncover Patterns in Mass Spectrometry Data Across Repositories. 01/01/2025 - 31/12/2028

Abstract

The relentless growth of data in the life sciences, notably in small molecule mass spectrometry (MS), presents a unique opportunity for groundbreaking discoveries. This project will introduce powerful artificial intelligence (AI) techniques to transcend traditional analysis paradigms that treat datasets in isolation, integrating fragmented data from large public databases to reveal insights that individual studies alone cannot uncover. At its core, our aim is to innovate by shifting from analyzing individual MS experiments to a comprehensive analysis across large repositories. This paradigm shift will unlock the untapped potential of public MS data, interpreting new observations within the context of the extensive molecular diversity documented in data repositories. To achieve this goal, we will develop AI-driven tools for simulating spectral libraries and incorporating statistical confidence in molecular identification. Additionally, we will employ multimodal representation learning techniques to bridge the gap between spectra and molecules on a repository scale. Standing at the intersection of AI, machine learning, and computational MS, our objective is to provide an integrated analysis of complex molecular data. This will pave the way for transformative advances across various scientific domains in the life sciences, including metabolomics, drug discovery, and environmental sciences, revolutionizing the approach to molecular discovery in the era of big data.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

The Live Mouse Tracker (LMT) as a versatile drug screening platform for rare neurological diseases. 01/01/2025 - 31/12/2025

Abstract

Establishing effective therapies for rare neurodevelopmental diseases remains one of the greatest challenges in molecular medicine. Although advances in next-generation sequencing technologies have led to the discovery of hundreds of novel genetic syndromes over the past decade, the development of individualized therapies continues to lag behind. Each rare disorder, while affecting a small group, contributes to a global burden estimated to impact over 300 million individuals. The complexity arises from the fact that these disorders, often caused by mutations in different genes, affect multiple cellular pathways, generating an overwhelming volume of data that must be analyzed to inform therapeutic strategies. Current drug interventions have seen limited success in translating promising preclinical findings into patient-ready treatments. The rapid rise of AI technologies, however, has the potential to transform this landscape. AI-driven algorithms are increasingly capable of navigating vast biomedical datasets, revealing drug candidates for rare diseases at an unprecedented pace. Many start-ups are already capitalizing on this potential, generating a flood of drug candidates for preclinical evaluation. However, this surge in candidate therapies has shifted the bottleneck from drug discovery to preclinical testing. Traditional murine test batteries are labor-intensive, expensive, and time-consuming, necessitating a standardized, scalable, and efficient platform to meet the growing demand for drug screening. We propose the development and commercialization of our Live Mouse Tracker (LMT) platform, a cutting-edge tool designed to address this critical need. The LMT system automates behavioral analysis, capable of tracking up to 39 different behaviors in groups of mice over 24-hour periods. This high-throughput capability provides a rapid and comprehensive assessment of drug efficacy in preclinical models. Our initial validation will focus on the fragile X syndrome, a widely studied neurodevelopmental disorder for which no effective treatment currently exists. By evaluating drugs that target multiple affected pathways simultaneously, we aim to pioneer a new approach to rare disease therapy development. During this project, we will validate the robustness of the LMT platform and extend it into a fully integrated service, as well as explore collaboration with other university partners to offer comprehensive preclinical drug testing solutions. This service platform has the potential to revolutionize the drug development pipeline, ensuring that AI-generated candidate drugs can be rapidly and reliably assessed, accelerating the path from bench to bedside. Through this initiative, we aim to bridge the gap between drug discovery and therapeutic application, bringing hope to millions of patients with rare neurological diseases.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Fundamentals of Multi-armed Bandit Algorithms for Recommenders Systems. 01/11/2024 - 31/10/2026

Abstract

Recommender systems powered by multi-armed bandit algorithms are becoming increasingly prevalent. However, their evaluation methods often don't accurately reflect the dynamic nature of real-world recommendation scenarios and neglect aspects other than predictive accuracy such as fairness, diversity, and long-term user satisfaction. This hinders the development of algorithms that perform well in practice, affecting various sectors such as news, e-commerce and streaming services. In this project we address these critical issues by creating a comprehensive framework for evaluating bandit algorithms across diverse recommendation scenarios and developing novel algorithms to address current shortcomings. We propose to assess the state-of-the-art bandit algorithms, identify their limitations, and introduce new methods focusing on non-stationarity, slate recommendations, and ethical concerns. The project will leverage real-world datasets and a novel simulation environment to ensure realistic and reproducible results. Our research will provide foundational tools and insights, enabling the development of more effective and responsible recommender systems that improve user engagement and satisfaction.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Deep Learning for Comprehensive Small Molecule Discovery From Untargeted Mass Spectrometry Data. 01/10/2024 - 30/09/2027

Abstract

Although small molecule mass spectrometry (MS) is a vital tool in various life sciences domains, its potential is hindered by the low annotation rate of MS/MS spectra, limiting our ability to uncover critical biological insights. This research project aims to revolutionize small molecule MS by harnessing the power of deep learning and multimodal integration to overcome this challenge. I will develop several complementary deep learning strategies for small molecule identification. First, I will develop a learned spectrum similarity score for the discovery of structurally related analogs. Second, I will use generative AI techniques to simulate comprehensive spectral libraries. Third, I will develop a solution for de novo molecule identification directly from MS/MS spectra, reducing the reliance on spectral libraries and expanding the range of discoverable molecules. Furthermore, I will introduce a holistic approach to MS by integrating three disparate data sources—MS/MS spectra, molecular structures, and natural language descriptions—into a shared latent space using multimodal representation learning. This paradigm shift will allow for direct linking of MS/MS observations to molecular structures and expert knowledge, enabling semantic search and retrieval of molecular information. Moreover, I will employ explainable AI techniques to interpret model decisions and provide insights into MS experimentation patterns.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Pioneering portable pathogen monitoring: LeapSEQ's lean and adaptive sequencing solutions. 01/10/2024 - 30/09/2026

Abstract

The main goal of this VLAIO Innovation Mandate is to create LeapSEQ, an innovative software platform designed for the effective and real-time surveillance of viruses and bacterial antimicrobial resistance (AMR) in wastewater through the use of portable nanopore technology.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Data mining approaches for monitoring and decision support of photobioreactor processes. 01/06/2024 - 31/05/2025

Abstract

This PhD project, in collaboration between the University (Adrem Data Lab) and Proviron NV, is part of the European Training Network initiative "DigitAlgaesation," focusing on the innovative production of microalgae using a proprietary photobioreactor system developed by Proviron. The research aims to develop and deploy computational data-driven decision support solutions to optimize photobioreactor processes through novel data mining and machine learning approaches. In the initial phase, the project successfully established an integrative data mining platform that aggregates heterogeneous monitoring data to predict photobioreactor process outcomes accurately. This comprehensive platform facilitated the development of both supervised regression and classification models. These models are designed to be simple, transferable, and highly accurate in predicting culture evolution from real-time monitoring data. The subsequent phase extended the predictive models to encompass various growth conditions, integrating them as usable features for predictive analysis. This involved optimizing data integration procedures and employing LSTM model for biomass concentration prediction and growth rate analysis. Having completed the development of the data mining platform and predictive models, the focus of the fourth year will shift towards the deployment and evaluation of these models. Concurrently, efforts will be directed towards developing an economically viable photo spectrometer sensor aimed at monitoring algae growth and predicting biomass with high precision. This innovative sensor will be integrated into the existing data-driven framework to further refine the predictive accuracy of the photobioreactor processes. The student will dedicate efforts to compile and publish scholarly papers detailing the advancements made in machine learning models for algae growth prediction and thedevelopment of the novel photo spectrometer sensor. These publications will contribute to the academic and scientific discourse surrounding algae production technologies and predictive modeling. Finally, the culmination of this research will be the compilation of a thesis that encapsulates the development, challenges, and breakthroughs of this multi-faceted project, showcasing the practical application of machine learning and sensor technology in optimizing microalgae production.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Ethical advisory of the ERC Project BIGDATPOL: towards an evidence-based model for big data policing: Evaluating the statistical-methodological, criminological and legal and ethical conditions. 06/05/2024 - 31/08/2028

Abstract

I am a member of the Ethical advisory of the ERC Project BIGDATPOL: towards an evidence-based model for big data policing: Evaluating the statistical-methodological, criminological and legal and ethical conditions. The tasks of the EAB are to advice the project with respect to ethical issues. My expertise is in particular with respect to ethical issues resulting from the use or development of artificial intelligence.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

De novo mass spectrometry peptide sequencing with a transformer large language model. 01/05/2024 - 30/04/2025

Abstract

The primary challenge in proteomics is identifying amino acid sequences from tandem mass spectra, which traditionally has been achieved using sequence database searching. As this method is limited to known protein sequences, de novo peptide sequencing presents an interesting alternative for the discovery of unexpected peptides. Casanovo is a state-of-the-art tool for de novo peptide sequencing, harnessing similar technologies underpinning large language models to translate mass spectra into amino acid sequences. The goal of this project is to enhance Casanovo and make it the preferred solution for de novo peptide sequencing. This will be achieved by compiling an extensive training dataset from diverse biological samples and mass spectrometry instruments and scaling up Casanovo's neural network to increase its learning capacity. Additionally, we will create a tailored model for the analysis of immunopeptidomics data by fine-tuning Casanovo's capabilities. Finally, we will develop a user-friendly web interface, making Casanovo accessible to a broad range of researchers and overcoming hardware limitations through cloud computing.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Bioinformatics network for proteomics and mass spectrometry 01/01/2024 - 31/12/2028

Abstract

Proteomics, the study of proteins and their functions, is a critical area in biology and medicine. With mass spectrometry (MS), researchers can analyze large amounts of proteomics samples, leading to valuable insights into complex biological processes. MS datasets require specialized data analysis techniques, which has led to the development of several powerful bioinformatics tools and pipelines for mass spectrometry-based proteomics. Nevertheless, the increasingly large volume and complex nature of MS-based proteomics data pose significant challenges that hinder progress in the field. To address these, there is a need for an open and collaborative approach to science. We have identified four key challenges that we will address through this Scientific Research Network (SRN): - Highly performant bioinformatics tools: As proteomics datasets grow in size, computational bottlenecks arise. Through this SRN, we will foster the development of highly performant and interoperable bioinformatics tools and workflows to process these datasets efficiently, enabling faster and more transparent analyses. - Machine learning integration: While machine learning holds great promise for proteomics data analysis, integrating it into practical workflows remains complex. Our SRN will work to bridge this gap, making machine learning techniques more accessible and seamlessly integrated into routine analyses. - Effective benchmarking: The diversity of analysis approaches makes it challenging to compare methods effectively. Our objective is to establish standardized benchmarking methods that allow researchers to systematically evaluate and improve their analysis pipelines. - Community building and educational resources: Proteomics data analysis requires specialized knowledge that is continuously evolving, making it difficult for young scientists and data science experts to enter the field. Our proposed SRN aims to build a supportive community for early-career researchers and create high-quality educational resources that facilitate the learning curve and provide accessible pathways for newcomers. With three research units in Flanders that are global leaders in MS-based proteomics, this SRN will make Flanders a focal point in the field of proteomics bioinformatics. Our collaboration with international partners will further enhance the visibility of Flemish research and contribute to a competitive position in the international research landscape, making the region attractive for ambitious and talented young researchers to work in. The six partnering research units have strong ties with the proteomics bioinformatics community within Europe and beyond, which we aim to maximally exploit to achieve our long-term goals. Indeed, instead of tackling these challenges alone, each of the six research units intends to take up a leading role in the wider research community to reach our objectives. Through this SRN, we will formalize the existing connections between the six partners and provide a clear collaborative vision and structure to drive progress and effectively mobilize the wider research community. The scope of our goals underscores the necessity of a community-scale effort. All six partners have taken up central roles in existing initiatives, such as the European Bioinformatics Community for Mass Spectrometry (EuBIC-MS), the Human Proteome Organization's Proteomics Standards Initiative (HUPO-PSI), the ELIXIR Life Science Infrastructure, and the Computational Mass Spectrometry (CompMS) interest group of the International Society for Computational Biology (ISCB), providing the critical mass of researchers required to achieve our goals.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Reference data-driven metabolomics to study the molecular composition of South African foods. 01/01/2024 - 31/12/2026

Abstract

Understanding the molecular composition of food is essential for studying its impact on human health. We have recently developed a new approach called reference data-driven metabolomics, which can perform diet readouts from untargeted metabolomics data. However, this approach currently lacks diverse and geographically representative reference data. To address this, we will expand our reference food molecular database to include indigenous and locally cultivated foods from South Africa, a region with rich cultural and culinary traditions and nutritional diversity, analyze their molecular composition using mass spectrometry, and integrate the data into the Global FoodOmics reference database. Additionally, we will develop user-friendly bioinformatics tools that simplify the data analysis process, making reference data-driven metabolomics accessible to researchers with diverse backgrounds, and study the molecular composition of indigenous South African foods. Through collaboration between South African universities and the University of Antwerp, we will combine expertise in analytical chemistry, bioinformatics, nutrition, and agricultural sciences to advance metabolomics research, expand scientific knowledge of South African diets, and provide evidence-based insights for improving nutrition and health in South African populations.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Interpretable rule-based recommender systems. 01/11/2023 - 31/10/2026

Abstract

Recommender systems help users identify the most relevant items from a huge catalogue. In recent independent evaluation studies of recommender systems, baseline association rule models are competitive with more complex state-of-the-art methods. Moreover, rule-based recommender algorithms have several exciting properties, such as the potential to be interpretable, the ability to identify local patterns and the support of context-aware predictions. First, we survey various existing recommendation algorithms with different biases and prediction strategies and evaluate them independently. Besides accuracy, we evaluate coverage and diversity and analyse the structure of the resulting rule models, which are essential towards understanding interpretability. Second, we propose to gap the bridge between recommender systems and recent multi-label classification based on learning an optimal set of rules w.r.t. to a custom loss function. We study if a decision-theoretic framework can guarantee the identification of the optimal rules for recommender systems under a loss function combining accuracy, complexity and diversity. We account for characteristics unique to recommender datasets, such as skewed distribution, implicit feedback and scale. Finally, we adopt new rule-based algorithms that are interpretable and more accurate. We apply them for healthcare recommendations to improve intensive care unit monitoring and online bandit learning for large-scale websites for e-commerce and news.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

RAPTOR: a novel multi-view integrative framework to identify key features of T cell based immunology. 01/11/2023 - 31/10/2025

Abstract

Disentangling the complex reticular components of the immune system represents an obstacle to a deep understanding of the interactions underpinning immune responses. The solution to this challenge likely lies in a systemic design that can illuminate emerging unique and shared biomarkers. However, thus far, the potential of fully integrated immunological data has remained largely untapped. By leveraging an unprecedentedly large cohort for the field, we aim to bridge the gap and build a framework for multi-view biological data fusion with a focus on the often overlooked T cell layer. The views of each cohort will be combined into a latent space. We will group individuals based on emerging new patterns, validate previously published biomarkers, deconvolute group parameters and perform response phenotyping. We will then overlay the T-cell receptor level on this space in an innovative integration to focus on the cellular mediated response. Informed by the discovered features, the T cell analysis will then be driven by epitope and disease specificity and compounded by a longitudinal aspect, to guide the development of the framework's modules. We foresee that this novel framework has great potential for transversal applicability within bioinformatics, biomedical and pharmaceutical companies. Specifically, we anticipate this framework could spark a paradigm shift towards more informed holistic therapeutics designs.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Exploring Unlearning Methods to Ensure the Privacy, Security, and Usability of Recommender Systems. 01/11/2023 - 31/10/2025

Abstract

Machine learning algorithms have proven highly effective in analyzing large amounts of data and identifying complex patterns and relationships. One application of machine learning that has received significant attention in recent years is recommender systems, which are algorithms that analyze user behavior and other data to suggest items or content that a user may be interested in. However, these systems may unintentionally retain sensitive, outdated, or faulty information. Posing a risk to user privacy, system security, and usability. In this research proposal, we aim to address this challenge by investigating methods for machine "unlearning", which would allow information to be efficiently "forgotten" or "unlearned" from machine learning models. The main objective of this proposal is to develop the foundation for future machine unlearning methods. We first evaluate current unlearning methods and explore novel adversarial attacks on these methods' verifiability, efficiency, and accuracy to gain new insights and develop the theory of unlearning. Using our gathered insights, we seek to create novel unlearning methods that are verifiable, efficient, and don't lead to unnecessary accuracy degradation. Through this research, we seek to make significant contributions to the theoretical foundations of machine unlearning while also developing unlearning methods that can be applied to real-world problems.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Computational mass spectrometry and artificial intelligence to unravel the immunopeptidome. 01/10/2023 - 30/09/2027

Abstract

The adaptive immune system is a crucial component of the immune response, providing specific defense against a wide range of pathogens and contributing to the development of immunological memory. Immunopeptidomics is a rapidly evolving field that uses mass spectrometry-based approaches to identify and quantify immunopeptides, which play a vital role in the recognition and elimination of infected or malignant cells by T cells. However, the annotation rate of immunopeptides from mass spectrometry data is currently severely limited, resulting in a significant loss of biological information. To overcome this challenge, we will develop specialized bioinformatics tools for analyzing mass spectrometry immunopeptidomics data. Specifically, we will develop an efficient and sensitive open modification search engine to identify immunopeptides that have undergone post-translational modifications. Furthermore, we will develop a deep learning-based de novo peptide sequencing approach optimized for the analysis of immunopeptidomics data. The tools developed in this project have the potential to significantly expand the amount of biological information that can be obtained from immunopeptidomics experiments, leading to transformational breakthroughs in the field.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Enabling mobile and data-driven pathogen monitoring through a paired nanopore squiggle–genome sequence database. 01/05/2023 - 31/12/2024

Abstract

Infectious disease monitoring is a global need, and the threat of existing and emerging pathogens poses a major challenge to public health. Nanopore sequencing is a revolutionary technology that enables portable sequencing and has shown its merit in the COVID-19 pandemic. This technology could enable existing laboratories that have no or limited infectious disease surveillance capacity to 'leapfrog' to sequencing-based pathogen monitoring. However, this potential hinges on the ability to operate in resource-limited settings, which is, to date, hindered by data storage and processing needs. The raw data, referred to as 'squiggles,' requires significant storage space and decoding it to DNA sequences requires graphical processing units (GPUs) that consume significant amounts of power. In this pandemic preparedness proof-of-concept project, we will build on advances from our IOF-SBO funded project LeapSEQ to remove significant hurdles to enable mobile and data-driven pathogen monitoring. These hurdles include: (1) a need for scalable storage solutions for squiggle data, (2) the lack of available pathogen data, and (3) improved computational solutions for interacting with squiggle data. We will tackle these problems by engineering and populating a proof-of-concept paired nanopore squiggle–genome sequence database using our portable LeapSEQ lab and by developing efficient data-driven algorithms for rapid pathogen monitoring. We will develop this database with strategic partners at ITM and UA and further explore LeapSEQ valorization potential in the context of global pathogen monitoring.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

ELIXIR Infrastructure for Data and Services to strengthen Life-Sciences Research in Flanders. 01/01/2023 - 31/12/2026

Abstract

Life-science is a data science; it relies on the generation, sharing and integrated analysis of vast quantities of digital data. ELIXIR is a European Research Infrastructure that brings together international resources in life-sciences to form a single infrastructure enabling scientists to find and share data, exchange expertise and access advanced tools and large scale computational facilities, across borders and disciplines. The Belgian ELIXIR Node offers a portfolio of services in data management and analysis to help researchers adopt best practices of Open Science and perform their research efficiently. We bring together expertise in Flanders in human health and plant sciences, focusing on federated learning, and enabling data integration and interpretation. A new priority area is the establishment of a sensitive data infrastructure in Belgium. We also provide training for researchers and developers. Our mission is to ensure that researchers in Flanders and Belgium can focus on their research question, rather than on technical details of data, interoperability, compute resources, etc. by providing tailored solutions based on an interoperable infrastructure across Europe.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Artificial intelligence-powered knowledge base of the observed molecular universe. 01/12/2022 - 30/11/2027

Abstract

Despite recent breakthroughs in artificial intelligence (AI) that have led to disruptive advances across many scientific domains, there are still challenges in adopting state-of-the-art AI techniques in the life sciences. Notably, analysis of small molecule untargeted mass spectrometry (MS) data is still based on expert knowledge and manually compiled rules, and each experiment is analyzed in isolation without taking into account prior knowledge. Instead, this project will develop more powerful approaches in which untargeted MS data is interpreted within the context of the vast background of previously generated, publicly available data. The research hypothesis driving the proposed project is that advanced AI techniques can uncover hidden knowledge from large amounts of open MS data in public repositories to gain a deeper understanding into the molecular composition of complex biological samples. We will develop machine learning solutions to explore the observed molecular universe and build a comprehensive small molecule knowledge base. These ambitious goals build on our unique expertise in both AI and MS to create next-generation, data-driven software solutions for molecular discovery from untargeted MS data.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Bioinformatics and machine learning for large-scale metabolomics data analysis. 01/12/2022 - 30/11/2026

Abstract

Despite recent breakthroughs in artificial intelligence (AI) that have led to disruptive advances across many scientific domains, there are still challenges in adopting state-of-the-art AI techniques in the life sciences. Notably, analysis of small molecule untargeted mass spectrometry (MS) data is still based on expert knowledge and manually compiled rules, and each experiment is analyzed in isolation without taking into account prior knowledge. Instead, this project will develop more powerful approaches in which untargeted MS data is interpreted within the context of the vast background of previously generated, publicly available data. The research hypothesis driving the proposed project is that advanced AI techniques can uncover hidden knowledge from large amounts of open MS data in public repositories to gain a deeper understanding into the molecular composition of complex biological samples. We will develop machine learning solutions to explore the observed molecular universe and build a comprehensive small molecule knowledge base. These ambitious goals build on our unique expertise in both AI and MS to create next-generation, data-driven software solutions for molecular discovery from untargeted MS data.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Sub‐quadratic graph neural networks: finding a good tradeoff between efficiency and expressivity. 01/10/2022 - 30/09/2026

Abstract

This project situates itself in the area of graph learning, an increasingly popular area in machine learning, and focusses on the development of a theoretical framework for designing and analyzing expressive, yet efficient, graph neural networks. In spite of advances in hardware, when designing graph neural networks one has to take efficiency into consideration. This implies, for example, that most graph neural networks use update functions that require a linear amount of computation. A consequence is that such networks can only learn simple functions. Although more advanced graph neural networks have been proposed, which can learn more complex functions, their applicability is limited. This is due to the fact that quadratic (or more) computation is needed, which is out of reach of large graph datasets. In this project, we aim to understand what graph neural networks can achieve *in-between* this linear and quadratic cost. We propose to formalize, study and analyze sub-quadratic graph neural networks. Such networks are still feasible (less than quadratic) and still powerful (more than what linear networks can achieve). Furthermore, a number of very recent graph neural networks fall into this sub-quadratic category. Apart from developing a mathematical framework for sub-quadratic graph neural networks, we also study their capabilities, both from a theoretical and practical point of view.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Serendipity Engine: towards surprising and interesting urban experiences. 01/10/2022 - 30/09/2026

Abstract

Concerns exist regarding the controlling and restricting nature of today's recommender systems. The trend is towards serving predictable, popular and homogeneous content, which is often referred to as "filter bubbles". In an urban context, this means that people are no longer exposed to the diversity of cities and their inhabitants, which has negative consequences for the open and democratic character of the city. This is a timely issue that needs urgent attention and there is a societal call for a transition towards applications that promote serendipity. However, what is missing today is a clear understanding of the meaning and value of serendipity in urban environments, and how this can be engendered in digital applications. In this project, we will develop such an understanding and identify the potential role of governing organisations in introducing serendipity to urban information systems. Additionally, the project will investigate how developers can design for serendipity. This will be studied on the level of data, algorithms and design. This approach is inspired by the theory of affordances and the findings that (digital) environments can be designed to afford serendipity. The affordances (in terms of data, algorithms and design) will be designed, developed and validated using Living Lab methodologies in three urban pilot scenarios. To support this Living Lab approach, a novel research methodology will be developed to study users' experienced serendipity.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Antwerp core facility for bioinformatics (BIOMINA). 01/01/2022 - 31/12/2026

Abstract

High-throughput bio-analytical instruments generate an immense data flow. Translating these data into interpretable insights about the underlying processes of life and disease is increasingly dependent on bioinformatics techniques. Since 2012 BIOMINA (biomedical informatics network Antwerpen) brings together bioinformatics expertise, scattered over life science and computer science labs in our university, in an informal network. With this proposal we wish to transform this network, with its expertise and widely used infrastructure, into a BIOMINA core facility that can deliver a professional bioinformatics service. The mission is to 1) build a sustainable support, training, and collaboration model; 2) increase bioinformatics capacity to meet growing demands; and 3) build a strong bioinformatics community. It is proposed by complementary PIs in the field, to translate the available bioinformatics strengths to support biomedical, clinical, biological, and bioengineering labs within the University and external clients in hospitals and industry.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Vector embeddings as database views. 01/01/2022 - 31/12/2025

Abstract

Over the past decade, vector embedding methods have been developed as a means of enabling machine learning over structured data such as graphs or, more generally, relational databases. While the empirical effectiveness of vector embeddings for focused learning tasks and application domains is well-researched, exactly what information of the structured data is encoded in embeddings is less understood. In this project, we postulate that by looking at embeddings through the lens of database research, we can gain more insight in what information embeddings contain. Concretely, we propose to design query languages in which vector embeddings can naturally be expressed. In this setting, questions concerning the kind of information that is encoded in the embedded vectors can naturally be phrased as a query rewriting using views problem, which we will study. Furthermore, by taking into account structural properties of embedding queries, we open the door to a transfer of methods in databases to vector embeddings, and back. In particular, database methods for incremental query evaluation and query sampling can be applied for the efficient learning of embedding parameters, while, conversely, embeddings can be exploited for database indexing.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Precision Medicine Technologies (PreMeT) 01/01/2021 - 31/12/2026

Abstract

Precision medicine is an approach to tailor healthcare individually, on the basis of the genes, lifestyle and environment of an individual. It is based on technologies that allow clinicians to predict more accurately which treatment and prevention strategies for a given disease will work in which group of affected individuals. Key drivers for precision medicine are advances in technology, such as the next generation sequencing technology in genomics, the increasing availability of health data and the growth of data sciences and artificial intelligence. In these domains, 6 strong research teams of the UAntwerpen are now joining forces to translate their research and offer a technology platform for precision medicine (PreMeT) towards industry, hospitals, research institutes and society. The mission of PreMeT is to enable precision medicine through an integrated approach of genomics and big data analysis.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Declarative Languages and Artificial Intelligence. 01/01/2021 - 31/12/2025

Abstract

A network to foster the cooperation between research groups with an interest in the use of declarative methods, promoting international cooperation and stimulating the Flemish groups in maintaining the high quality of their research.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

MIMICRY - Modulating Immunity and the Microbiome for effective CRC. 01/01/2021 - 31/12/2024

Abstract

A fundamentally important biological question is how our bodies maintain a critical balance between inflammation and immune tolerance, and how this may be modified or evaded by cancers. The human colon, a tissue where many inflammatory diseases and cancers arise, performs this balancing act basally in the presence of dietary antigens and the normal microbiome. Within this homeostatic state, colorectal polyps and colorectal cancer (CRC) arise and can evade clearance by the immune system despite treatment by immune checkpoint inhibitors. We hypothesize that these pre-malignant lesions subvert the default tolerogenic state of the colon and induce additional immunosuppressive mechanisms. Deciphering the complex interaction between the epithelium, immune system and microbiome requires a talented group of researchers with complementary expertise. The unique composition of this 'MIMICRY' iBOF consortium aims to combine human samples, state-of-the-art immunology, novel tools, and in vivo mouse models to study the multi-factorial aspects of colorectal cancers. These will help develop novel immunotherapeutic strategies.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

A framework to deduce the convoluted repertoire and epitope hierarchy of human T cell responses in visceral leishmaniasis: patient meets in silico. 01/11/2020 - 31/10/2025

Abstract

Visceral leishmaniasis (VL) is one of the most severe parasitic infectious diseases with 0.4 million cases annually. There are currently no vaccines for VL, although there is evidence of acquired T cell-mediated immunity and resistance to reinfection. Indeed, VL vaccine development is severely hampered by the absence of a good animal model and the multitude of possible Leishmania antigens that remain uncharacterized because of the low-throughput screening methodologies currently applied. As such, there is a complete lack of insight in epitope reactivity, epitope dominance hierarchy and antigenic variation. In this project, we aim to unlock this status quo by implementing a patient-centered framework integrated with in silico epitope prediction tools and in vitro immunopeptidomics that can comprehensively deduce and confirm the Leishmania epitope hierarchy in patients. Additionally, we will phenotype and monitor the human Leishmania-specific T-cell response and repertoire during the complete course of infection using single-cell RNAseq, single-cell TCRseq and CITE-seq. These recent, state-of-the art tools allow unprecedented resolution by providing an exhaustive, timely and high-throughput immune profiling. We believe that this framework can be directly wheeled for diagnostic tools and to expedite vaccine development against Leishmania and serve as a proof of concept for similar complex eukaryotic pathogens.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Tracing ions in mass spectra to identify small molecules (TractION). 01/11/2017 - 31/12/2024

Abstract

Currently, data analysis and interpretation is the most time consuming step in structural elucidation of small molecules. This still requires a lot of manual intervention time by highly trained MS experts. Moreover, the manual nature of this step makes it vulnerable to human errors. The goal of this project is to reduce the current bottleneck of data interpretation by the evaluation and development of an automatic identification pipeline. This pipeline is based on advanced spectral libraries together with adapted search algorithms and state-of-the-art pattern mining technology.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Past projects

First steps in mapping and characterizing Leishmania-specific T cells in lesions of Ethiopian patients with cutaneous leishmaniasis. 15/02/2024 - 14/11/2024

Abstract

Leishmaniasis is a severe global disease with no licensed vaccine to date. The disease outcome is driven by the complex interplay between the Leishmania parasite and the host immune response. For example, the diverse clinical presentations of in cutaneous leishmaniasis (CL) have been characterized by either a protective, anergic or pathogenic T cell response. However, due to the size and complexity of the parasite, it is still unknown how these different T cell responses arise and drive the disease course. T cells are activated by the recognition of its T cell receptor (TCR) to the Leishmania antigen presented by MHC (aMHC) molecules on antigen-presenting cells (APCs). We hypothesize that diverse Leishmania-specific aMHC-TCR interactions primarily underlie the different T cell and disease phenotypes. This is in line with our recent immunopeptidomics data that demonstrated the presence of diverse Leishmania antigens across different CL presentations. In this jPPP, we aim to trial and showcase our pipeline to identify these Leishmania antigen- specific T cells in lesion biopsies, and use a novel approach to characterize their protective/detrimental function in a spatially resolved manner. The acquired data will significantly increase the feasibility score in the upcoming grant applications wherein we aim to elevate and expand the project' scope by generating the first Leishmania T-cell epitope map and how it drives the complete disease spectrum of leishmaniasis, eventually to provide candidate antigens for vaccine development and diagnostic assays

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

FWO Sabbatical Leave 2023-2024 (Prof. F. Geerts). 01/08/2023 - 31/01/2024

Abstract

In this research project we delve deeper into the connections between graph neural networks and database query languages. Indeed, it has recently been shown that most graph neural network architectures can be viewed as some query in a query language with aggregation. As a consequence, results on the expressive power of these query languages naturally transfer to results on the expressive power of graph neural networks. This bridge between database theory and graph learning opens up many interesting avenues for further research and the transferal of techniques between these two areas. We here highlight two such avenues. The first relates to the question whether recent advances in query processing (in particular optimal worst-case join algorithms) can be leveraged to improve the efficiency of learning graph neural networks. The second relates to extending graph neural networks over other domains than the reals such that they can naturally perform computations over say booleans, semirings or other algebraic structures. This would substantially increase their applicability. Using the connection to database query languages, where such generalised semantics have been studied in depth, we aim to obtain a detailed picture of how algebraic properties of the underlying domain influence the expressive power of graph neural networks. The viewpoint of graph neural network from such a computational perspective is currently high on the agenda in the context of neural algorithmic reasoning.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

BOF Sabbatical 2023 - F. Geerts 2023 - Deepen the connection between graph learning methods and database theory. 01/08/2023 - 31/01/2024

Abstract

Investigation of exchange of techniques between database theory and graph learning. The focus will be on the characterization of the expressive power of graph learning in terms logic based equivalences

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Identification of novel anti-leukemic T cellreceptors for development of cell therapies using patient blood samples and cutting-edge computational modeling. 07/09/2022 - 31/10/2023

Abstract

The aim of this project is to develop a robust workflow and to identify promising T-cell receptors (TCRs) for the development of T cell-based immunotherapies, focusing on the leukemia-associated antigen Wilms' tumor-1 (WT1). For this, a unique collection of blood samples is available from acute myeloid leukemia (AML) patients in the context of our academic clinical trials investigating WT1-loaded dendritic cell (DC) vaccination, a cellular immunotherapy designed to activate WT1-specific T cells. By combining specialized cell sorting techniques with in-house developed bioinformatic tools, single-cell TCR and RNA sequencing will be integrated with cutting-edge computational models to link the specificity and transcriptomic profile of these T cells with patients' clinical responses.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Predicting and modeling of vaccine-induced immune response with immunoinformatics and immunosequencing 04/07/2022 - 31/12/2022

Abstract

High throughput sequencing allows characterization of the human immune system, but the resulting data cannot simply be translated into clinical insights. We have therefore developed artificial intelligence models that can translate T-cell receptor and gene expression data into useful insights about an individual's immune status. For example, we have developed the first online platform to predict the epitopes of T cells. We have demonstrated the power of this for predicting and modeling vaccination-induced immune responses.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Single cell T-cell receptor and Expression Grouped Ontologies to develop a data-driven tool to identify T-cell immunity groups within a micro-environment and characterize the complex interplay of lymphocyte subtypes contributing to an immune response 01/07/2022 - 31/12/2023

Abstract

The human immune system, and in turn its response to pathogens, cancer, and other diseases, is governed by a complex interaction between different immune cell types. T cells play a major role in the immune defense as they initiate specific elimination pathways for cancer and infected cells. The rise of single-cell sequencing has opened new doors to study the functional T-cell repertoire by mapping the T-cell subtypes in immune repertoires. However, interpretation of single-cell data is sorely lacking and no methods are available to link T-cell functionalities with their actual pathogenic targets.  This project aims to leverage single-cell data richness to visualize key interplays between T-cell subtypes, as well as other immune cells. To this end, the team will develop a user-friendly tool to functionally annotate groups of disease relevant T cells within single-cell data that are both clinically relevant and directly actionable. Following identification of the key active T cells, integration with single cell transcriptomics of other immune cell types will support the explanation of their activity. This will aid understanding of the functional immune compartment in different pathologies including cancer, infections and autoimmune disorders.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Elucidating the link between HLA-presentome and different clinical presentations of cutaneous leishmaniasis with Nanopore HLA genotyping. 01/04/2022 - 31/03/2023

Abstract

Infection with Leishmania parasites can lead to a wide spectrum of clinical manifestations. These range from diverse cutaneous presentations to a deadly systemic visceral disease, each being associated with infection by a specific set of Leishmania species. As of yet, knowledge on the host-pathogen interactions underpinning this diverse clinical spectrum is scarce. In our recent work, we have demonstrated a link between HLA genotype and susceptibility to the development of leishmaniasis, leading us to hypothesize that differential T cell antigen presentation shaped by HLA diversity may be a key driver of the development of different clinical presentations of leishmaniasis. In this project, we aim to uncover whether and how differential antigen presentation and HLA diversity is associated with the different cutaneous presentations. We will do so by combining Oxford Nanopore sequencing for HLA genotyping with state-of-the-art in silico antigen presentation predictions. Samples will be derived from Ethiopian patients infected with L. aethiopica, as this species can cause all different forms of cutaneous leishmaniasis.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

LeapSEQ: Lean data processing solutions for adaptive and portable genome sequencing, applied to infectious disease monitoring. 01/10/2021 - 30/09/2023

Abstract

Infectious diseases are becoming an increasing challenge to public health worldwide with urbanization, increased travel, climate change, habitat destruction, and deforestation fuelling local outbreaks and global spread. Metagenomic sequencing provides an attractive solution to identify all genomic material present in a patient sample without prior knowledge of the target. While metagenomic sequencing thus far relied on large, expensive and operationally demanding DNA-sequencers reserved for expert labs, the recent introduction of USB-stick sized nanopore sequencing devices offers an attractive portable and affordable solution for metagenomic sequencing in low-cost settings around the world. However for the context of pathogen detection, this technology still suffers from major data roadblocks in terms of data interpretation. In this strategic basic research project, we aim to remove significant roadblocks that stand between nanopore sequencing and its implementation for portable pathogen detection, characterisation and monitoring. These roadblocks include: (1) the reliance on expert bioinformatics skills to convert the sequencer data into interpretable results; (2) the lack of realtime interaction with the ongoing sequencing process; and (3) the selectivity challenge of detection low abundant pathogens within highly abundant host DNA. We will tackle these problems by implementing a Lean and Adaptive bioinformatics solution for Portable Sequencing ("LeapSEQ") based on in house developed data processing techniques. We will optimise and validate this tool with highly relevant infectious disease use cases together with strategic partners of ITM and UA and explore its valorisation potential in the context of global pathogen identification.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Methodological developments to map the myelin-specific T-cell receptor sequence space. 01/04/2021 - 31/03/2022

Abstract

T-cell immunity is a key player in the development and progression of multiple sclerosis (MS). The T-cell receptor (TCR) expressed on the cell surface of T-cells is one of the primary determinants of self versus non-self. The composition of the TCR sequence space targeting MS-involved antigens, such as myelin, is currently poorly characterized. We hypothesize that the TCR space holds invaluable knowledge which may unlock the molecular mechanisms of self-antigen reactivity. To this end, we aim to map the MS antigen-specific TCR space with state-of-the-art immunoinformatics data mining models.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Contextual anomaly detection for complex industrial assets (CONSCIOUS). 01/01/2021 - 30/06/2023

Abstract

CONSCIOUS (Contextual aNomaly deteCtIon for cOmplex indUstrial aSsets) focusses on context-aware anomaly detection in industrial machines and processes. In these complex environments, anomaly detection remains a major challenge caused by the highly dynamic conditions in which these assets operate. The overall objective is to research effective solutions to achieve a more accurate, robust, timely and interpretable anomaly detection in complex, heterogenous data from industrial assets by accounting for confounding contextual factors. The results will be validated on multiple real-world use cases in different domains. In this project, Sirris will collaborate with Skyline Communications, Duracell Batteries, I-care, Yazzoom, KU Leuven and University of Antwerp.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Qualitative Evaluation of Machine Learning Models. 01/11/2020 - 31/10/2024

Abstract

A common and recently widely accepted problem in the field of machine learning is the black box nature of many algorithms. In practice, machine learning algorithms are typically being viewed in terms of their inputs and outputs, but without any knowledge of their internal workings. Perhaps the most notorious examples in this context are artificial neural networks and deep learning techniques, but they are certainly not the only techniques that suffer from this problem. Matrix factorisation models for recommendation systems, for example, suffer from the same lack of interpretability. Our research focuses on applying and adapting pattern mining techniques to gain meaningful insights in big data algorithms by analyzing them in terms of both their input and output, also allowing us to compare different algorithms and discover the hidden biases that lead to those differences.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Machine learning framework for T-cell receptor repertoire-based viral diagnostics. 01/11/2020 - 31/10/2024

Abstract

Current standards of viral diagnostics rely on in-vitro methods detecting either genome or proteins of a pathogen or host antibodies against pathogenic antigens. As a result, multiple assays are required when a sample is screened for several viruses, making the process time-demanding and cost-ineffective. Moreover, some of the methods fail in the case of acute and latent infections. With this FWO-SB project, I will investigate the potential of T cell receptor (TCR) repertoires to overcome this shortcoming and introduce a new approach for the simultaneous diagnosis of multiple viral infections. To discover the TCR signatures that differ between infected and uninfected individuals, I will search for pathogen-associated patterns in TCR repertoires by applying state-of-the-art immunoinformatics and machine learning methods. The obtained results will be used to build a classification model that utilizes the TCR repertoire to predict whether an individual is virus-positive or virus-negative. The insights from this project will broaden our understanding of pathogen-induced TCR repertoire changes and serve as a foundation for the development of a computational diagnostic framework. This will have a high impact on the broad field of diagnostics as the TCR repertoire is playing an important role in various non-infectious diseases, such as cancer and autoimmune diseases.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Approaching multiple sclerosis from a computational perspective through bioinformatic analysis of the T-cell repertoire. 01/11/2020 - 31/10/2024

Abstract

Recent developments in the field of sequencing technology allow for the characterization of adaptive immune receptor repertoires with unprecedented detail. T-cell receptor (TCR) sequencing holds tremendous promise for understanding the involvement and dynamics of adaptive immune components in autoimmune disorders. As the field is rapidly evolving from pre-processing of TCR-seq data to functional analysis of adaptive immune repertoires, new opportunities emerge for the development of comprehensive approaches for the post-analysis of immune receptor profiles. These approaches can offer comprehensive solutions to address clinical questions in the research on autoimmune disorders. An important example is multiple sclerosis (MS), a neuroinflammatory disease of the central nervous system, for which very little is known about the specific T-cell clones involved in its pathogenesis. By analysing the adaptive immune repertoire of MS patients, we postulate it is possible to uncover key drivers of the MS disease process. The identified T-cell clones will present themselves as highly specific biomarkers and therapeutic targets. This translational research project will lead to novel approaches for the identification of condition-associated T-cell clones, to new monitoring tools to evaluate the efficacy of MS-therapies and to a model to predict the disease course of MS.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Diagnosis through Sorted Immune Repertoires (DiagnoSIR). 20/10/2020 - 19/07/2021

Abstract

Infectious disease laboratory diagnostic testing is still based on targeted test methods (Ag detection, PCR, ELISA, agglutination, ELISPOT, etc.). However, rapid evolutions in sequencing applications might soon dramatically change our diagnostic algorithms. For instance, metagenomic sequencing is an untargeted diagnostic tool for direct (in theory any) infectious pathogen detection without preassumptions on the causative agent. However, acute infectious pathogens rapidly disappear from the infected individual (causing diagnostic methods based on direct pathogen detection to fail) leaving behind its immune imprint (primed B and T cells). We here wish to demonstrate that immune repertoire sequencing (a cutting-edge sequencing tool that allows high-throughput mapping of B and T cell receptor variable domains) focused on recently activated immune cells is an indirect untargeted diagnostic tool for acute infectious pathogen detection. This method could therefore be an alternative to current indirect targeted assays (serology and T cell assays). To prove this concept, we will exploit recently collected acute COVID-19 patient samples.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Interpretable Qualitative Evaluation for Online Recommender Systems. 01/10/2020 - 30/09/2021

Abstract

Individuals often rely on recommendations provided by others in making routine, daily decisions. Algorithms, mimicking this behaviour, are vital to the success of e-commerce services. However, a remaining open question is why algorithms make these recommendations. This is problematic given that, the most accurate machine learning algorithms are black-box models, and we have a dynamic environment were possibly multiple models are deployed and periodically re-trained. Since any organisation requires human oversight and decision-making, there is a need for insight into user behaviour and interactions with recommendations made by black-box machine learning algorithms. Traditionally, two recommender systems are compared based on a single metric, such as click-through-rate after an A/B test. We will assess the performance of online recommender systems qualitatively by uncovering patterns that are characteristic for the differences in targeted users and items. We propose to adopt interpretable machine learning, where the goal is to produce explanations that can be used to guide processes of human understanding and decisions. We propose to mine interpretable association rules and generate, possibly grouped, counterfactual explanations why recommender system A performs better (or worse) than recommender system B.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Unraveling the post-transcriptional regulatory chain in Leishmania by multi'omic integration. 01/01/2020 - 31/12/2022

Abstract

Trypanosomatids are protozoan parasites which have evolved a gene expression system that is remarkably different from other Eukaryotes. Instead of being individually controlled by transcription factors, Trypanosomatid genes are transcribed constitutively in long arrays of tens to hundreds of functionally unrelated genes. In this study, we aim to understand how Trypanosomatids, despite this constitutive transcription system, can generate and regulate the major diversity in transcript and protein levels that is typically observed during their life cycle. Using Leishmania donovani as a model system, we will carry out the first deep characterization of transcript isoforms (using long-read PacBio sequencing) and their degree of translation ('translatome'), during the parasite's life cycle. Using state-of-the art pattern mining and machine learning approaches we will then identify mRNA sequence and structural patterns that play a role in modulating transcript stability and/or their translation efficiency. Finally, we will generate an integrated, systems biology model of protein production and its post-transcriptional regulation in Leishmania, validated by previously collected multi-?omic data. The study will lead to novel insights in the post-transcriptional regulatory chain of Trypansomatids, which remains poorly understood to this date.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Transferable deep learning for sequence based prediction of molecular interactions. 01/10/2019 - 30/09/2023

Abstract

Machine learning can be used to elucidate the presence or absence of interactions. In particular for life science research, the prediction of molecular interactions that underlie the mechanics of cells, pathogens and the immune system is a problem of great relevance. Here we aim to establish a fundamentally new technology that can predict unknown interaction graphs with models trained on the vast amount of molecular interaction data that is nowadays available thanks to high-throughput experimental techniques. This will be accomplished using a machine learning workflow that can learn the patterns in molecular sequences that underlie interactions. We will tackle this problem in a generalizable way using the latest generation of neural networks approaches by establishing a generic encoding for molecular sequences that can be readily translated to various biological problems. This encoding will be fed into an advanced deep neural network to model general molecular interactions, which can then be fine-tuned to highly specific use cases. The features that underlie the successful network will then be translated into novel visualisations to allow interpretation by biologists. We will assess the performance of this framework using both computationally simulated and real-life experimental sequence and interaction data from a diverse range of relevant use cases.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Francqui Chair 2019-2020 Prof. Luc De Raedt (KULeuven). 01/10/2019 - 30/09/2020

Abstract

Proposed by the University, the Francqui Foundation each year awards two Francqui Chairs at the UAntwerp. These are intended to enable the invitation of a professor from another Belgian University or from abroad for a series of ten lessons. The Francqui Foundation pays the fee for these ten lessons directly to the holder of a Francqui Chair.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Biomina infrastructure. 01/07/2019 - 31/12/2019

Abstract

To meet the enormous amount of data nowadays generated in the life sciences, and the associated need for bioinformatics support, the biomina consortium was established in Antwerpen (biomina = biomedical informatics network Antwerpen), an collaborative network that unites life scientists and data scientists in different faculties around biocomputing. From this initiative, technical and administrative support of bioinformatics initiatives at UA is consolidated.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

iNNOCENS: data driven clinical decision support for improved neonatal care. 01/05/2019 - 30/04/2020

Abstract

Analysis of patient related vital parameters generated in a continuous manner on a neonatal intensive care department offers the opportunity to develop computational models that can predict care-related complications. This project aims to develop a machine learning model that can predict acquired brain injury of prematurity. The model can than be implemented to generate bedside visualizations in the context of a self-learning digital early warning system.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Pattern based recommender systems. 01/04/2019 - 31/03/2023

Abstract

The goal of this project is to develop and study new algorithms for recommender systems that can be used in the Froomle platform. The focus of this research will be more specifically towards new methods that make use of recent developments in pattern mining.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

T Cell Receptor sequence mining platform MinTR. 01/04/2019 - 31/03/2020

Abstract

The T-cell repertoire is a key player in the adaptive immune system and is thus important in infectious disease defense, vaccine development, auto-immune disorders and oncology immunotherapies. T-cell receptor sequencing allows characterization of a full repertoire with a single experiment, however the data this generates cannot be readily translated into medical action. With artificial intelligence models we can translate T-cell receptor sequencing data to actionable insight into the immune status of an individual.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Elucidating the role of alternative trans-splicing in the mRNA abundance regulation of Leishmania. 01/04/2019 - 30/03/2020

Abstract

Leishmania is a genus of protozoan parasites that cause the disease leishmaniasis in humans and a wide array of vertebrate animals. The parasite exhibits a remarkable gene expression system where genes lack individual RNA polymerase II promoters and are therefore not individually controllable by transcription factors. Instead, genes are transcribed constitutively in long polycistronic units of functionally unrelated genes and co-transcriptionally processed to individual mRNAs per gene during a process called 'trans-splicing'. During trans-splicing, mRNAs receive a fixed 39 nucleotide sequence at their 5' end called 'spliced-leader'. The location where this spliced-leader is added is variable, resulting in different possible transcript lengths for a single gene (alternative trans-splicing). The abundance of mRNA per gene appears to be regulated entirely post-transcriptionally, however, it is currently unclear how this occurs. This project aims to determine the role of alternative trans-splicing in the mRNA abundance regulation of Leishmania. As this process determines the length of the transcript, we hypothesise that it affect the abundance of a transcript by altering its stability and/or included regulatory motifs. For the first time, we will make use of long read mRNA sequencing (PacBio) to study the changes in transcript repertoires during different life stages of Leishmania donovani. Additionally, we aim to identify the motifs and/or RNA structural patterns which regulate the location and usage frequency of alternative trans-splicing and polyadenylation sites. This will be investigated making use of state-of-the art pattern finding and classification approaches.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Unlocking the TCR repertoire for personalized cancer immunotherapies. 01/01/2019 - 07/10/2023

Abstract

Cancer is one of the leading causes of death worldwide. Over the past decades, new therapies have been developed that target the patients' immune system to mount an antitumor response. The efficacy of these immunotherapies has already been demonstrated in various clinical trials. Nevertheless, these therapies show a large variation in their individual responses as some patients respond well to the therapy, while others do not. In this project, we will investigate the differences between the T cell receptor (TCR) repertoires of responders and non-responders as a possible marker for immunotherapy responsiveness. We will apply state-of-the-art data mining methods and newly developed immunoinformatics tools to uncover those features that make a patient a clinical responder or non-responder. This will reveal the underlying mechanism of DC-based vaccine responsiveness. This can potentially accelerate general health care in terms of personalized medicine and will save costs.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Personaliesd search. 01/01/2019 - 31/12/2022

Abstract

It is our ambition to fundamentally move forward the state-of-the-art in personalised search - with a focus on e-commerce, e-news and video - by studying and developing new personalised search algorithms taking into account both the searched keywords and the full picture of the user's on- site and in-store (offline) behaviour. We will address the following research questions. First, in the context of personalised search, how can we measure and evaluate success? Personalised search is a relatively young research domain and as such there is not yet a standardised framework or benchmark dataset for evaluating performance, as there is in learning-to-rank or recommender systems. It is our goal to develop such a standardised framework and create a benchmark dataset that can be used across experiments. Additionally, given this project's unique position on the border between research and industry, we can not only measure the performance of the algorithms offline, but also online, with Froomle's existing clients. It is our expectation that clients in different industries will have different measures of success, e.g. clients in media may want to keep users engaged, whereas clients in retail might want to shorten the path to a purchase. Hence, we aim to identify these KPIs and lay down a framework for evaluation for each. Concretely, our goal is to do a live test in retail, in video and in news, evaluating the results with the KPI's developed specifically for the corresponding domain. Second, how can personal and search relevance be combined to determine an optimal ranking of items personalised to the individual? In order to provide the user with relevant search results ranked to their personal tastes, one needs to establish a means of combining (at least) two measures of relevance: relevance to the query and relevance to the person. Both measures can again be composites of multiple "features", e.g. pageviews, purchases, etc. for personal relevance and query match-score, authority and recency for search relevance. Here, we aim to identify which features can be relevant in delivering an optimal personalised search experience, e.g. pageviews and recency, but not authority and purchases. Then, we address the problem of combining these scores. This problem is anything but trivial and a static combination of personal and search relevance do not suffice. To solve this problem, we will develop at least one ranking algorithm that can transform multiple inputs into an optimal ranking, personalised to the individual. This requires that we will define at least one new learning objective that takes into account this personal aspect of the optimal ranking. Furthermore we will measure the corresponding performance improvement on at least one live application according to the principles and methodology derived by research question 1. Third, can we build an integrating ranking solution that approaches the problem of personalised search as a problem of optimally inferring the user's intent, rather than a problem of optimally combining the user's query with his historical behaviour? From this then builds the final research question. Rather than optimally combining query-based relevance with behaviour-based relevance, can we instead approach search as a recommendation problem, where a search query is merely an extra tool in our tool belt that will help us determine the user's current intent? Our goal is to develop at least one such algorithm and measure the corresponding performance improvement on at least one live application. Developing these new algorithms for personalised search and a framework for evaluation will allow Froomle to add personalised search to their current offering of advanced recommender systems. This will be an important step in bridging the gap between the giants of technology and other, traditionally offline businesses with a focus on e-commerce, e-news and video. 


Researcher(s)

Research team(s)

Project type(s)

  • Research Project

A study of the plasmodium vivax reticulocyte invasion pathways and ligan candidates, with special attention to the promising PvTRAg and PvRBP multigenic families. 01/01/2019 - 31/12/2022

Abstract

Plasmodium vivax is one of the 5 species causing malaria in humans, and the leading cause of malaria outside Africa. A key step in P. vivax infection is the invasion of reticulocytes (young red blood cells) by the parasite. This invasion is made possible through several interactions between host receptors (reticulocyte membrane) and parasite ligands. While these interactions are well studied for P. falciparum, they remain elusive (and are not comparable) in P. vivax, due to the inability of long-term cultures. However, identifying parasite ligands and characterising the pathways used by the parasite to enter reticulocytes is essential for drug and vaccine development, and is the question that lies at the core of this project. In order to achieve P. vivax elimination, a better understanding of the ligands involved in invasion is necessary. We hypothesize that alternate pathways are used by P. vivax to invade reticulocytes, and that the PvTRAg and PvRBP multigenic families contain important invasion ligands. Therefore, we will carry out the first study integrating newly characterized P. vivax invasion phenotypes with transcriptomic and (epi-)genomic data in field isolates. As such, we expect to advance the knowledge on the role and regulation of PvTrag and PvRBP families in invasion and to discover new potential ligands. Candidate target ligands will be validated by ex vivo invasion assays, and will finally help us to identify the most suited drug and vaccine candidates.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Coordination Biomina support. 01/01/2019 - 31/12/2021

Abstract

To meet the enormous amount of data nowadays generated in the life sciences, and the associated need for bioinformatics support, the biomina consortium was established in Antwerpen (biomina = biomedical informatics network Antwerpen), an collaborative network that unites life scientists and data scientists in different faculties around biocomputing. From this initiative, technical and administrative support of bioinformatics initiatives at UA is consolidated.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

A multi-omic approach to characterize gene dosage compensation in Leishmania. 01/10/2018 - 30/09/2021

Abstract

Leishmania is a protozoan parasite with a remarkable tolerance for aneuploidy, while this phenomenon is often deleterious in other organisms. The result of aneuploidy is that all genes of an affected chromosome have an altered gene dosage (i.e. more or less copies) compared to the euploid situation. In Leishmania, we have previously shown that the majority of transcripts and proteins follow dosage changes in a same in vitro condition, while for the remaining products dosage compensation occurs by an unknown mechanism. This project investigates whether (i) dosage compensation occurs by alterations of transcript stability, translation efficiency and/or protein stability, driven by specific transcript and protein biomolecular features and (ii) whether dosage compensation regulation is modulated during the life cycle. As such, we will determine the relative contribution of each regulation layer to the overall compensation and establish a conceptual model of dosage compensation in Trypanosomatids. This is the first integrated multi-omic of dosage compensation in Leishmania, but also in Trypanosomatids in general. The study will lead to novel insights in how this compensation is regulated in aneuploid cells, and investigate if this has a life-stage specific component to it. These fundamental mechanisms are still incompletely understood in all eukaryotes and trough this study, we believe it is possible to gain insights in potentially hitherto unrevealed regulatory mechanisms in eukaryotes.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Mining multi-omics interaction data to reveal the determinants and evolution of host-pathogen disease susceptibility. 01/10/2018 - 30/09/2020

Abstract

The relationship between pathogens and their host is often complex and their evolutionary arms race intricate. Subclinical infections are a common occurrence; host organisms are infected by a normally disease-inducing pathogen, but no symptoms are displayed. This allows pathogens to establish natural reservoirs of asymptomatic carriers that can aid in their transmission to those hosts that are susceptible to the disease. The goal of this fundamental research project is to gain understanding of the general molecular mechanisms that underlie why some animal species - or even some individuals - remain mostly asymptomatic following infection with specific pathogens, while others progress into symptomatic disease. To this end, a large collection of pathogen-host interaction networks will be established for both symptomatic and asymptomatic hosts. State-of-the-art data mining methods will then be applied to discover rules and patterns in the interaction network that are associated with disease susceptibility. Finally, these patterns will be filtered and validated using integrated multi-level 'omics information derived from both the pathogen and the host species. The results of this project will lead to both novel methodology to tackle previously uncharacterised host-pathogen interactions and deliver fundamental new insights in the biological drivers of disease susceptibility.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Reliable on-the-fly prediction of future events in data streams. 01/04/2018 - 31/03/2019

Abstract

In this project we tackle the problem of pattern mining in data streams. First, we aim to develop algorithms that can discover rich pattern types, such as episodes, in data streams. Second, we plan to examine how we can use the discovered patterns in order to make reliable predictions of future events in the stream.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Establishing a computational classification framework for tumour-specific T-cells. 01/04/2018 - 31/03/2019

Abstract

This project aims to construct a computational framework to predict which T-cells can react to a tumour-associated epitope. Key problems that will be investigated are the optimal feature representation as well as the most performant classification strategy. As a proof-of-concept, we will apply the framework on a unique dataset generated by combining a tetramer assay with single cell sequencing.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Systems biological analysis of niche adaptation in resistant and virulent Salmonella pathogens. 01/01/2018 - 31/12/2021

Abstract

The foodborne pathogen Salmonella poses a significant threat to human health worldwide. This is further complicated by the emergent spread of antibiotic resistant strains. Salmonella serotypes and subtypes can have different niches, from a broad range to a very specific niche, e.g. humans. Such bacteria can become very efficient in infecting humans and will contribute even more to the spread of antibiotic resistance. To combat the emergent spread of multiresistant bacteria, molecular monitoring of bacterial strains that show increased adaptation towards the human host, combined with high resistance and virulence, it is vital. While researchers can relatively accurately predict alarming resistant and virulent phenotypes based on whole genome sequencing data, niche adaptation prediction techniques are lagging behind. I will solve these problems by (i) analysing niche adaptation from a broad perspective and (ii) implementing cutting edge computational technologies to predict niche adaptation in Salmonella. This methodology will be built and tested on a model Salmonella serotype, Salmonella Concord. Salmonella Concord is intrinsically a highly virulent and resistant serotype, and shows geographical restriction (the Horn of Africa). It has been reported in Belgium through adopted children, mainly from Ethiopia. Insights from my research will empower health care innovations, and the predictive model will significantly improve risk assessment of pathogenic bacteria.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Mining and Exploiting Interaction Patterns in Networks. 01/01/2018 - 31/12/2021

Abstract

Most works in network analysis concentrate on static graphs and find patterns such as which are the most influential nodes in the network. Very few existing methods are able to deal with repeated interactions between nodes in a network. The main goal of the project is hence to fill this gap by developing methods to identify patterns in interactions between network nodes. These interaction patterns could characterize information propagation in social networks, or money streams in financial transaction networks. We consider three orthogonal dimensions. The first one is the pattern type. We consider, among others, temporal paths, information cascade trees and cycles. To guide our choice of which patterns to study, we get inspiration from three real-world cases: two interaction networks with payment data, one for which the task is marketing related, and one for default prediction, and one social network with an application in microfinance. The second dimension is how to use the query pattern: exhaustively find all occurrences of the patterns, or as a participation query that finds nodes that participate more often in a pattern of interest. Finally, the third dimension concerns the computational model: offline, one-pass, or streaming. It is important to scale up to large interaction networks. In summary, the novelty of our proposal lies in the combination of streaming techniques, pattern mining, and social network analysis, validated on three real-world cases.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

An interdisciplinary study on the role of the HLA genes and T-cell diversity as risk factors for herpes zoster. 01/01/2018 - 31/12/2021

Abstract

Chickenpox is a consequence of primary infection of varicella-zoster virus (VZV). Afterwards, VZV remains latent in neural ganglia until symptomatic reactivation called herpes zoster (HZ, shingles). In this project, we will first develop a novel computational framework that will allow us to estimate the probability that a pathogen-derived antigen is adequately recognised by the major histocompatibility complexes (MHC) encoded by HLA genes. Antigen bounding by MHC molecules is a necessary step prior to recognition (and further management) of infected cells. Next, we will obtain HLA data from 150 HZ patients and 150 matched controls. This will allow us to estimate whether and which HLA A/B/C genes are enriched or depleted in HZ patients. Our computational framework will allow us to estimate which VZV proteins are most likely of importance in controlling VZV. We will assess whether the HLA data is readily translated into the diversity of the T-cell receptor (TCR) against VZV, and against which of the most important VZV proteins. Finally, we will differentiate blood-derived inducible pluripotent stem cells (iPSC) into neuronal cells, infect these neuronal cells with VZV and study whether depletion of VZV-specific T-cells affects VZV proliferation, thereby confirming our earlier obtained HLA-TCR predictions.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Rapid vaccine development through immunoinformatics ans immunisequencing. 01/01/2018 - 31/10/2020

Abstract

Vaccines are used to stimulate the immune system in its defense against pathogens and cancer. Vaccine development involves extensive clinical trials that study the changes in antibodies and immune cells in response to the vaccine to determine their efficacy and safety. This is often an extensive and costly process, with a high failure rate. This project aims to develop a computational framework for use within vaccine clinical trials to make the process more efficient, more rapid and more accurate. The basis of this framework is the new immunological and molecular insights that have been gained through the advent of immunosequencing and immune-informatics technologies, and it builds further upon a successful collaboration between immunologists and data scientists.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Bequest Rosa Blanckaert - Robert Oppenheimer Prize 2017 01/12/2017 - 31/12/2018

Abstract

Dr. Pieter Meysman has been active at the Biodata Mining research lab under the guidance of Prof. Kris Laukens since the 1st of January 2013 as post-doctoral researcher. His research focus is on the application of state-of-the-art computer science techniques in the area of 'data mining' on biomedical data, under the header of bioinformatics. In his first four years at the University of Antwerp, he has published 31 scientific papers, included eight as first author and three as last author.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Foundations of Recommender Systems. 01/10/2017 - 30/09/2021

Abstract

Recommender systems are algorithms that are most well known for their applications in e-commerce. Given a specific customer and a large number of products, they automatically find the most relevant products to that specific customer. However, their relevance goes well beyond. They can also recommend genes responsible for diseases, words relevant to documents, tags relevant to a photo, courses of interest to a student etc. The existing research on recommender systems is almost fully determined by the datasets that are (publicly) available. Therefore, the following fundamental question remains largely unstudied: "Given two datasets, how can we determine which of both has the highest quality for generating recommendations?" Furthermore, the cornerstone of recommender systems research is the evaluation of the recommendations that are made by the recommender system. Most existing research relies upon historical datasets for assessing the quality of recommendations. There is however no convincing evidence that the performance of recommendations on historical datasets is a good proxy for their performance in real-life settings. Hence, also a second fundamental question remains largely unstudied: "How does the real-life performance of recommender systems correlate with measures that can be computed on historical data?" By means of this project proposal, we set out to answer these two questions, which are foundational to recommender systems research.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Intelligent quality control for mass spectrometry-based proteomics 01/10/2017 - 31/07/2021

Abstract

As mass spectrometry proteomics has matured over the past few years, a growing emphasis has been placed on quality control (QC), which is becoming a crucial factor to endorse the generated experimental results. Mass spectrometry is a highly complex technique, and because its results can be subject to significant variability, suitable QC is necessary to model the influence of this variability on experimental results. Nevertheless, extensive quality control procedures are currently lacking due to the absence of QC information alongside the experimental data and the high degree of difficulty in interpreting this complex information. For mass spectrometry proteomics to mature a systematic approach to quality control is essential. To this end we will first provide the technical infrastructure to generate QC metrics as an integral element of a mass spectrometry experiment. We will develop the qcML standard file format for mass spectrometry QC data and we will establish procedures to include detailed QC data alongside all data submissions to PRIDE, a leading public repository for proteomics data. Second, we will use this newly generated wealth of QC data to develop advanced machine learning techniques to uncover novel knowledge on the performance of a mass spectrometry experiment. This will make it possible to improve the experimental set-up, optimize the spectral acquisition, and increase the confidence in the generated results, massively empowering biological mass spectrometry.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Data mining continuous speech: Modeling infant speech acquisition by extracting building blocks and patterns in spoken language 01/10/2017 - 30/09/2019

Abstract

Complex use of language, and in particular speech, is one of the defining characteristics of humans, setting us apart from animals. In the last few decades, speech recognition has found many applications and is now, for example, a standard feature on modern smartphones. However, the flexible and powerful learning capacities of human infants have still not been equalled by any machine. Young children find a way to make sense of all the speech they hear and generalize it in a way that the patterns in the speech sounds can be disentangled, understood and repeated. In a separate line of research, the field of machine learning and data mining, algorithms have been developed to discover patterns in data. The information that can be extracted from all the available data has become an important aspect of business, if we look at video recommendation systems or the financial sector. The idea of my research is to develop and study techniques inspired by these data mining algorithms, in order to extract patterns from speech. The inherent difficulties of continuous and noisy speech have to be overcome, as it cannot just be processed in the same way as discrete and exact data. After adapting these methods and applying them to speech, I will use them in the scientific research on the building blocks of speech, evaluating their relevance and validity. Furthermore, using these, I will investigate what aspects of speech children need, and subsequently use, to learn about these building blocks.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Guiding networked societies, linking data science and modelling 01/01/2017 - 31/12/2021

Abstract

Networks of interconnected autonomous computing entities more and more support our society, interacting and influencing each other in complex and unforeseen ways. Examples are smart grids, intelligent traffic lights, logistics and voluntary peer-to-peer clouds as well as socio-technical systems or more generally the Internet of Things. Understanding the characteristics and dynamics of these systems both at the local and global scale is crucial in order to be able to guide such systems to desirable states. The partners participating in this WOG proposal each study crucial features of such complex systems, or they are experts in related fields that offer complementary techniques to analyze the massive data that is generated by them. Bringing these orthogonal fields of expertise together in a scientific research community, promises to give great opportunity for cross-fertilization and the development of novel analysis and control techniques.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Establishment of Belgian Elixir Node. 01/01/2017 - 31/12/2019

Abstract

Consolidation of VariantDB as a collaborative variant interpretation platform within ELIXIR. Given the rapid implementation of next-generation sequencing in various domains, we believe that one of the major bottlenecks will become the interpretation of the resulting data. We are convinced that a structural solution should support distributed big data storage, coupled to centralized and intelligent querying. Today, due to dispersed data, investigators resort to multiple databases and ad- hoc communication with collaborators to assess variant pathogenicity. Considering today's challenges, we aim at providing an integrated platform offering researchers intelligent decision support and seamless collaboration options. First, phenotypic information is coupled to interpretation and ranking of individual variants in the context of a single sample. Second, we integrate the ELIXIR service NGS-Logistics, to enable platform wide analysis of variant prevalence. Third, we provide automatic selection of similar patients and matched control cohorts from the available samples, to perform valid enrichment analysis. Within ELIXIR, NGS-Logistics already adheres to the philosophy of distributed storage and centralized analysis, on the level of variant calling. By implementing the features proposed above, VariantDB could complement this service at the level of variant interpretation. As a service, it will be an asset in both routine and research applications. Finally, the proposed platform is made available to all institutions, bringing new collaboration opportunities to ELIXIR partners.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Efficient mining for unexpected patterns in complex biological data. 01/10/2016 - 30/09/2020

Abstract

The last decade, life sciences have become increasingly overwhelmed and driven by large amounts of complex data. Thanks to disruptive new technologies, the speed at which the biomolecules (such as DNA, metabolites or proteins) of a living system can be analyzed, is already for several years increasing faster than the capacity of computer processors and hard drives. This trend means that "traditional techniques" to analyze and interpret biomolecular data become less suitable in the current era. Indeed, extracting relevant knowledge from these data relies on a range of dedicated "big data" techniques, falling under the terms "data mining" and "machine learning". This project addresses "pattern mining", a specific class of techniques that is very relevant for life science. Pattern mining allows for the discovery of previously unseen, interesting patterns in complex data. Traditionally, frequent pattern mining deals with finding the most frequent sets or "combinations" of items in a dataset. There are however major problems with such pattern lists, which we will address in this project. First, these pattern lists are often huge, and no domain expert is typically able to investigate and try to interpret every pattern in a pattern mining result list. Second, many of the patterns in such a list are not interesting for the domain expert, for example because they are trivial. In this project, we develop a generic formal and statistically sound framework to re-define pattern interestingness given the specific life science context. After definition of novel pattern mining interestingness criteria, we will develop efficient algorithms to mine such patterns. The algorithms will be validated on toy datasets and golden standard data. Finally we will put these methods into force to extract novel knowledge from large scale microbial gene expression compendia, a huge set of human genome sequences and drug-compound interaction networks, with the goals to generate fundamentally new biological or biomedical insights.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Mining multi-omics interaction data to reveal the determinants and evolution of host-pathogen disease susceptibility. 01/10/2016 - 30/09/2018

Abstract

The relationship between pathogens and their host is often complex and their evolutionary arms race intricate. Subclinical infections are a common occurrence; host organisms are infected by a normally disease-inducing pathogen, but no symptoms are displayed. This allows pathogens to establish natural reservoirs of asymptomatic carriers that can aid in their transmission to those hosts that are susceptible to the disease. The goal of this fundamental research project is to gain understanding of the general molecular mechanisms that underlie why some animal species - or even some individuals - remain mostly asymptomatic following infection with specific pathogens, while others progress into symptomatic disease. To this end, a large collection of pathogen-host interaction networks will be established for both symptomatic and asymptomatic hosts. State-of-the-art data mining methods will then be applied to discover rules and patterns in the interaction network that are associated with disease susceptibility. Finally, these patterns will be filtered and validated using integrated multi-level 'omics information derived from both the pathogen and the host species. The results of this project will lead to both novel methodology to tackle previously uncharacterised host-pathogen interactions and deliver fundamental new insights in the biological drivers of disease susceptibility.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Validation of the ADReM personalization algorithms as the basis for a spin-off. 01/05/2016 - 30/04/2017

Abstract

Personalization technology allows a company to present to every individual customer a personalized selection of relevant products out of a huge catalog. As a result of its research activities in the field of recommender systems, the ADReM research group has gained knowledge and expertise about personalization technology. This projects aims to start up a spin-off company to valorize this knowledge and expertise.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Development of immunoinformatics tools for the discovery of T-cell epitope recognition rules. 01/02/2016 - 31/01/2020

Abstract

Herpes viruses are ubiquitous in human society and cause several common diseases, such as cold sores (Herpes simplex) and chickenpox (Varicella). The eight species of herpes viruses known to primarily infect humans are all clinically relevant and of these, five are known to be extremely widespread amongst humans with seroprevalence rates as high as 90%. Not all individuals are equally susceptible to equivalent viral pathogens. After infection, some individuals do not become symptomatic, while others experience a high severity of the disease with serious complications. For example, a relatively benign disease such as chickenpox can become life-threatening in a small set of individuals. These differences in disease susceptibility are likely to be caused in part due to the variation in the human immune system, but remain largely unknown up to date. A key step in the activation of the adaptive immune system is the presentation of viral epitopes, usually peptides (p), by the major histocompatibility complex (MHC) present on antigen presenting cells (APC) and the recognition of this complex by a T-cell receptor (TCR). There exist many allelic variants of the genes coding for the MHC genes within the population and each variant has a different propensity to bind immunogenic (viral) peptides. This variability in the MHC alleles is one of the underlying factors that leads to differences in disease susceptibility. Previous research has demonstrated that high accuracy models can be established for the affinity of the MHC molecules for the presentation of peptides, based on machine learning methods. The resulting affinity prediction models have made it possible to assess the affinity for almost all human MHC alleles for any given peptide. However, the MHC recognition variability is only part of the story, as each individual has a unique repertoire of T-cells with a large diversity of TCR variants. The variability in TCR epitope recognition is also an important factor in differences between individual immune responses. Unfortunately, few TCR recognition models exist and they are all very limited in scope and accuracy. Therefore, the scope of this project is to develop, evaluate and apply state-of-the-art computational approaches to enable the interpretation of complex MHC-p-TCR interaction data and to elucidate the patterns that govern this system. Within this scope, a key point of interest will be the modelling of the molecular interaction between the MHC complex, encoded by its corresponding HLA allele, the antigen-specific TCR and the peptide antigen itself. Ultimately, this will result in the development of computational tools capable of predicting personalized immune responses to Herpes viruses and the efficacy of vaccine-induced viral protection.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Declarative methods in computer science. 01/01/2016 - 31/12/2020

Abstract

To cope with the need to build increasingly large and complex software systems, there is a growing demand for declarative approaches which abstract away unnecessary details and focus on the functionality of the systems. The network wants to further promote the development of such approaches which emerge from work in databases, functional and logic programming.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Updates en Provenance in Data en Knowledge Bases. 01/01/2016 - 31/12/2019

Abstract

This project is concerned with systems that store, manage, restructure, and provide access to data and knowledge. A classical such system is an enterprise database system. More recent systems, however, may consists of cooperating applications that are distributed over a network. Even the entire World Wide Web is being envisaged more and more as a global data and knowledge base. While a lot of research and development has already been devoted to making data and knowledge bases efficient and accessible, only recently attention has shifted to sharing, exchanging, annotating, updating, and transforming data and knowledge. When this happens, it is important to know what has changed, why it was changed, and how. This new type of data is called provenance data. Current systems can be enriched so that provenance data can be managed in unison with ordinary data.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Finding the cause of leishmaniasis relapse after treatment with miltefosine using untargeted proteomics. 26/11/2015 - 31/12/2016

Abstract

The protozoan parasite Leishmania donovani is responsible for the disease visceral leishmaniasis (VL) in the Indian Subcontinent. Each year, an estimated 200 000-400 000 people contract VL, which is almost always fatal if left untreated. Sodium stibogluconate (SSG) has been used for decades for the treatment of leishmaniasis, but is now being replaced by miltefosine (MIL) and amphotericin B due to toxicity and widespread drug resistance. However, recent reports indicate a significant decrease in the efficacy of MIL with 20% of the patients relapsing within 12 months after treatment. Remarkably, and in contrast with SSG resistance, this relapse could not be related to reinfection, drug quality, drug exposure, or drug-resistant parasites which poses major questions about the cause of this treatment relapse. In a previous study we showed that parasites isolated from MIL relapse patients did have a different phenotype compared to the MIL cure Leishmania donovanii. However, it is not clear what the molecular basis of this difference is, if it is causal or not, and if other mechanisms could be involved. Therefore, the goal of this study is to find which molecular features are causing the observed leishmaniasis relapse after MIL treatment and the related increase in infectivity. Untargeted 'omics studies are particularly suited for this task, since in this case, there is no prior knowledge of which mechanisms could be involved. Out of all functional levels (genome, transcriptome, proteome and metabolome) the proteome is the level that translates genomic variety into metabolic and functional changes. Therefore, this study will characterize the proteomic differences between MIL cure and MIL relapse Leishmania isolates in order to find out what is causing this relapse.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Hypermodelling strategies on multi-stream time-series data for operational optimization (HYMOP). 01/10/2015 - 30/09/2019

Abstract

HYMOP aims at tackling a collective challenge put forward by a number of Flemish lead user companies: optimizing the operation and maintenance of a fleet of industrial machines. Realizing innovative modeling and data processing/analysis techniques that are able to cope with large amounts of complex data in real-time will allow these lead users, 12 of which are brought together in our Industrial Advisory Committee, to exploit the huge potential and currently underexplored opportunity enabled by ever more sensorized and connected industrial equipment.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Data mining continuous speech: Modeling infant speech acquisition by extracting building blocks and patterns in spoken language. 01/10/2015 - 30/09/2017

Abstract

Complex use of language, and in particular speech, is one of the defining characteristics of humans, setting us apart from animals. In the last few decades, speech recognition has found many applications and is now, for example, a standard feature on modern smartphones. However, the flexible and powerful learning capacities of human infants have still not been equalled by any machine. Young children find a way to make sense of all the speech they hear and generalize it in a way that the patterns in the speech sounds can be disentangled, understood and repeated. In a separate line of research, the field of machine learning and data mining, algorithms have been developed to discover patterns in data. The information that can be extracted from all the available data has become an important aspect of business, if we look at video recommendation systems or the financial sector. The idea of my research is to develop and study techniques inspired by these data mining algorithms, in order to extract patterns from speech. The inherent difficulties of continuous and noisy speech have to be overcome, as it cannot just be processed in the same way as discrete and exact data. After adapting these methods and applying them to speech, I will use them in the scientific research on the building blocks of speech, evaluating their relevance and validity. Furthermore, using these, I will investigate what aspects of speech children need, and subsequently use, to learn about these building blocks.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Research in the field of the pattern mining. 01/10/2015 - 30/09/2016

Abstract

This project represents a research contract awarded by the University of Antwerp. The supervisor provides the Antwerp University research mentioned in the title of the project under the conditions stipulated by the university.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Integration of pattern mining in an industrial environment. 01/07/2015 - 30/06/2016

Abstract

In this project our aim is to further develop MIME, an interactive tool for interactive pattern analysis that has been developed at the University of Antwerp, and to make it suitable for industrial use. This will enable companies to find interesting and actionable patterns in their data that can be used for important operational decision making much more easily and quickly.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Identification of the HLA-dependent susceptibility to Herpes Zoster in the Belgian population. 01/02/2015 - 31/12/2015

Abstract

Herpes zoster (shingles) is caused by the varicella zoster virus and is responsible for a substantial decrease in the quality of life, especially among the elderly. In this project we intend to identify the HLA alleles that increase or decrease chance of herpes zoster in the Belgian population. These results will be combined with computational models to uncover the link between viral peptide affinity and disease susceptibility.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Accelerating Inference in Probabilistic Programs and Databases. 01/01/2015 - 31/12/2018

Abstract

The main objective of this project is to develop a unifying framework for accelerating probabilistic querying in both probabilistic databases and in probabilistic programming. The project is based on the observation that for several of these particular types of queries, algorithms for query answering as well as theoretical insights have often been studied in only one of the two areas of PP and PDB in isolation. Within the intended context of this project, our goal is to generalize and adapt these results for use in each of the other area and for obtaining a more principled understanding of their underlying issues and commonalities.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Interactome of living cells in contact with nanoparticles. 01/05/2014 - 20/08/2014

Abstract

The interdisciplinary PhD project aims to investigate the behavior and fate of engineered nanomaterials in contact with biologica I fluids and living ce lis, and consequent biological signalling responses. Emphasis is put on signalling mechanisms in immune relevant cells, underlying possible immunomodulatory effects. The nature of what is adsorbed on the nanoparticle surface is connected to the observed interactions and outcomes on cells. Both the adsorbed biomolecule corona, as weil as biomarker expression signatures and pathways will be investigated and combined via a comprehensive -omics analysis and bio-informaties approach. By modulating the nanoparticles' surface properties, e.g. through functionalization, avoidance of particular mechanisms leading to undesired outcome will be realized.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Computational models for big data algorithms. 01/03/2014 - 31/07/2015

Abstract

A central theme in computer science is the design of efficient algorithms. However, recent experiments show that many standard algorithms degrade significantly in the presence of big data. This is particularly true when evaluating classes of queries in the context of databases. Unfortunately, existing theoretical tools for analyzing algorithms cannot tell whether or not an algorithm will be feasible on big data. Indeed, algorithms that are considered to be tractable in the classical sense are not tractable anymore when big data is concerned. This calls for a revisit of classical complexity theoretical notions. The development of a formal foundation and an accompanying computational complexity to study tractability in the context of big data is the main goal of this project.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Data fusion and structured input and output Machine Learning techniques for automated clinical coding. 01/01/2014 - 31/12/2017

Abstract

This project will improve the state of the art in automated clinical coding by analyzing heterogeneous data sources and defining them in a semantic structure and by developing novel data fusion and machine learning techniques for structured input and output.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

A Scalable, Distributed Infrastructure for Probabilistic Databases. 01/11/2013 - 30/04/2017

Abstract

Probabilistic databases lie at the intersection of databases and probabilistic graphical models. Our past work in this field started at Stanford University more than 6 years ago with the development of the Trio probabilistic database system. Still today, probabilistic databases provide an emerging field of research with many interesting and yet unexplored aspects. With this proposal, we motivate for the exploration of a new, distributed and scalable, infrastructure for probabilistic databases. Rather than building a full-fledged database engine from scratch, we motivate for the specific investigation of how existing approaches (including our own prior works) can be adapted to a distributed setting in order to accelerate both the data management and the probabilistic inference via parallel query evaluations for an SQL-like environment. Currently, there exists no distributed probabilistic database system. Machine Learning approaches, on the one hand, have previously investigated distributed probabilistic inference but do not support SQL. Current distributed database engines, on the other hand, do not handle probabilistic inference or any form of uncertain data management. With this project, we aim to fill this gap between Databases and Machine Learning approaches that so far has not been investigated in the literature. We believe that the proposed topic provides a number of intriguing and challenging aspects for a PhD thesis, both from a theoretical and from a systems-engineering perspective.

Researcher(s)

  • Promoter: Geerts Floris
  • Promoter: Theobald Martin
  • Fellow: Blanco Hernan

Research team(s)

Project type(s)

  • Research Project

Exascience Life Pharma. 01/07/2013 - 31/12/2015

Abstract

This project represents a formal research agreement between UA and on the other hand Janssen. UA provides Janssen research results mentioned in the title of the project under the conditions as stipulated in this contract.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Fraud Detection using data mining. 17/05/2013 - 30/09/2013

Abstract

Patterns are looked for in fraud data with the use of data mining techniques. Specifically tailored big data mining techniques will be validated on the obtained anonymized transactional data and fraud labels.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

An integrated informatics platform for mass spectrometry-based protein assays (InSPECtor). 01/03/2013 - 28/02/2017

Abstract

This project represents a research agreement between the UA and on the onther hand IWT. UA provides IWT research results mentioned in the title of the project under the conditions as stipulated in this contract.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Querying distributed dynamic data collections. 01/01/2013 - 31/12/2016

Abstract

The aim of this proposal is to study and develop techniques for querying such dynamic distributed data collections. Our approach is based on three pillars: (1) the study of navigational query languages for linked data; (2) the study of distributed computing methods for distributed query evaluation; and, (3) the use of provenance as a mechanism for monitoring alterations to data sources.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Probabilistic data cleaning. 01/01/2013 - 31/12/2016

Abstract

The goal of this project is to study and develop probabilistic data cleaning techniques. Data cleaning refers to the process of detecting and repairing errors, duplicates and anomalies in data. In response to the large amounts of "dirty" data in today's digital society, the data quality problem is enjoying a lot of interest from various disciplines in computer science.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Evolving graph patterns. 01/01/2013 - 31/12/2016

Abstract

The goals of this project are first addressed from a theoretical perspective. Furthermore, techniques are studied using both synthetic and real experimental data. The concept of evolving graph patterns is relevant for a large series of application domains. However, we will particularly validate our approaches with bioinformatics applications, for which the extraction of this new pattern type is highly interesting.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Development of an automated software platform to support and improve the quality of clinical coding. 01/01/2013 - 31/12/2013

Abstract

The goal of this project is to develop algorithms and software to improve the quality of the clinical coding process in hospitals, and to design a valorization plan. The algorithms will automatically identify coding anomalies and suggest codes using state-of-the-art machine learning techniques. We will define a business development plan, attract potential customers, and aim to attract follow-up funding.

Researcher(s)

  • Promoter: Van den Bulcke Tim
  • Co-promoter: Goethals Bart
  • Co-promoter: Luyckx Kim
  • Co-promoter: Luyten Leon
  • Co-promoter: Smets Koen

Research team(s)

Project type(s)

  • Research Project

Verifiable Outlier Mining. 01/10/2012 - 30/09/2015

Abstract

In a nutshell, the aim of this project is to provide easy-to-understand descriptions that assist humans in manual outlier verification. We propose a novel research direction called "verifiable outlier mining" tackling open challenges in the automatic extraction of outlier descriptions. In our example, descriptions could indicate how one patient is deviating from others. Such descriptions will include the relevant attributes (e.g., "age" and "skin humidity" in our example), but also regular objects as witnesses from which the outlier is deviating. To accomplish this, the main topic addressed by this proposal is the statistically founded and scalable selection of attribute combinations highlighting the outlierness of an object.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Instant Interactive Data Exploration. 01/01/2012 - 31/12/2015

Abstract

Today, we can easily store massive amounts of information, but we lack the means to exploratively analyse databases of this scale. That is, currently, there is no technology that allows to 'wander' around the data, and make discoveries by following intuition, or simple serendipity. While standard data mining is aimed at finding highly interesting results, it is typically computationally demanding and time consuming, and hence not suited for exploring large databases. To address this problem, we propose to study instant, interactive and adaptive data mining as a new data mining paradigm. Our goal is to study methods that give high-quality (possibly approximate) results instantly, presented understandably, interactively and adaptive as to allow the user to steer the method to the most informative areas in the database rapidly.

Researcher(s)

  • Promoter: Goethals Bart
  • Co-promoter: Tatti Nikolaj
  • Co-promoter: Vreeken Jilles

Research team(s)

Project type(s)

  • Research Project

BIOMINA: Pattern for the life sciences. 01/01/2012 - 30/06/2015

Abstract

Biomina (Biomedical Informatics Expertise Centre Antwerpen) is an interdisciplinary research collaboration between UA and UZA. It aims at the development of innovative techniques for the analysis and interpretation of heterogeneous biomedical data. Biomina operates on the integration point of clinical data and 'omics data (genome, proteome, transcriptome, ...). Structuring, integration and analysis of these data is the core activity. As a centralized expertise center and research platform, it enables systems biology and translational systems medicine research.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

A principled approach for improving data quality: bridging theory and practice. 01/09/2011 - 31/08/2021

Abstract

The improvement of the quality of data has been recognised as the number one challenge for data management. The need for effective methods to detect errors in data, to identify objects from unreliable data sources, and to repair the errors is evident. Indeed, there is an increasing demand for data quality tools in the current digital society and industries in particular, to add accuracy and value to business processes. To accommodate for those needs, further fundamental research in data quality is required and its practical potential is to be realised. More specifically, building upon previous research, a uniform data quality dependency framework is to be developed to improve data quality in a variety of application domains.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

A principled approach for improving data quality: bridging theory and practice. 01/09/2011 - 31/08/2014

Abstract

The improvement of the quality of data has been recognised as the number one challenge for data management. The need for effective methods to detect errors in data, to identify objects from unreliable data sources, and to repair the errors is evident. Indeed, there is an increasing demand for data quality tools in the current digital society and industries in particular, to add accuracy and value to business processes. To accommodate for those needs, further fundamental research in data quality is required and its practical potential is to be realised. More specifically, building upon previous research, a uniform data quality dependency framework is to be developed to improve data quality in a variety of application domains.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Data mining for privacy in social networks. 01/01/2011 - 31/12/2014

Abstract

This is a fundamental research project financed by the Research Foundation - Flanders (FWO). The project was subsidized after selection by the FWO-expert panel.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Reasoning with Incomplete RDF Databases on the Semantic Web 01/01/2011 - 31/12/2012

Abstract

In this project we propose to address the problem of representing and reasoning with incomplete information for RDF for the semantic web vision. This is an unexplored but very relevant problem for both, the database and semantic web research communities. We intend to divide our research program into three autonomous but connected modules, each with its own outcomes. First, we study the semantics of incomplete RDF databases; second we study the problem of query answering and its complexity; and third, we propose efficient mechanisms and method to retrieve answers from incomplete RDF documents. We believe that by addressing these issues we can make a significant contribution towards the fulfilling of Tim Berners-Lee's semantic web vision. The funding solicited will be primary used on visiting renown research centers and/or for inviting distinguished professors to collaborate on our research. Finally, the funding provided by the BOF 'Klein Project' would enable the promoter to strengthen the international appeal and visibility of the Universiteit Antwerpen, and of the ADREM group specifically.

Researcher(s)

  • Promoter: Cortés Calabuig Álvaro

Research team(s)

Project type(s)

  • Research Project

Finding characteristic pattern sets through compression. 01/10/2010 - 30/09/2013

Abstract

I propose to study the foundations of using compression as a means for finding compact, characteristic sets of patterns. Of particular interest is the investigation of the possibility of finding such patterns directly from data, and moreover, studying how recent insights in Minimum Description Length theory and Statistics can enhance the discovery of these patterns, and vice-versa. The ultimate goal of this project is to develop the theory that allows us to mine highly useful patterns directly from any database.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Database Summarization. 01/01/2010 - 31/12/2011

Abstract

In this research we aim to find ways of summarizing a database by using the patterns that occur within it. Employing state of the art data mining techniques, the goal is to retrieve a concise subset of all patterns, that characterize the data as well as possible.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Connecting Prior Knowledge of Data and Signi¿cant Pattern Discovery 01/01/2010 - 31/12/2011

Abstract

Pattern mining is a subfield of data mining where small interesting pieces of knowledge is extracted from data. Designs and analysis of state-of-the-art pattern mining algorithms are based on worst-case scenario. However, in practice many datasets have some known characteristics that can provide more efficient algorithms and more crisp analysis. The goal of the project is to study how prior knowledge of data can be infused with pattern mining.

Researcher(s)

  • Promoter: Tatti Nikolaj

Research team(s)

Project type(s)

  • Research Project

Compressing data to find its interesting patterns. 01/01/2010 - 31/12/2011

Abstract

We propose to develop and study general techniques to using compression as a means for finding compact, characteristic sets of patterns. Such pattern sets should contain only high quality patterns that are of direct interest to the user and her application. As such, the project constitutes an attempt to answer one of the most important open research questions of the field of pattern mining: how to find the most interesting and useful patterns?

Researcher(s)

  • Promoter: Vreeken Jilles

Research team(s)

Project type(s)

  • Research Project

Theoretical Foundations of Finding Significant Patterns in Data Mining. 01/10/2009 - 30/09/2012

Abstract

The umbrella topic of the research is to study the theoretical foundations of pattern mining in binary data. The rest of the section discusses different specific directions of the research. * Axiomatic Approach for Defining Measure of Significance of Itemsets : Our goal is to study whether this tradeoff holds in general? Our hypothesis is that the property required by the APRIORI algorithm poses strong conditions on how and what prior information can be used. * Pattern Mining for Datasets with Specific Form: Our goal is to study how such specific prior knowledge can be infused into pattern mining: The goal is to study whether we can use this information in defining a significance measure but also whether this information can be used for deducing efficient algorithms.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Analysis of high-throughput data by means of support vector machines and kernel-based techniques: feature selection and adaptive model building. 01/10/2009 - 30/09/2011

Abstract

In many real-life applications, information gathered from measurements is essential to ensure the quality of products and to enable control of a production process. These measurements are typically obtained from online hardware analysers (e.g. thermometers, flow meters, etc). However, there are many characteristics that cannot be obtained through online equipment and for which time-consuming and computationally expensive analysis is required. For this reason models are typically used to predict the results of such an analysis from the process variables. The analysis is then used as a confirmation of the model. Models are sometimes also used to predict online hardware analysers. Online analysers may fail due to corrosion or drift from their calibration point. In this project we address a number of issues related to the construction of models using Support Vector Machines. Our interest in building models using SVMs has several reasons. - It is well-known that SVMs can handle high-dimensional data without suffering from the curse of dimensionality. - The use of kernels enables nonlinear modelling. - SVMs can be made insensitive to noise and outliers. - Finally, the ability of SVMs to identify "unusual" data points makes it useful in detecting outliers and anomalies. The issues we aim to address in this project are the following. I. Feature selection and incorporation of prior knowledge It is the aim to investigate whether similar results can be obtained for Support Vector Regression and how well the technique applies to single-class problems. II. Adaptive model building Techniques that can handle the adaptivity of the inferential sensor at all levels, and especially when the mathematical model needs to be partially rebuilt, are still in their infancy and are the second topic of this research project.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Finding Characteristic Pattern Sets through Compression. 01/10/2009 - 30/09/2010

Abstract

Most pattern discovery algorithms easily generate very large numbers of patterns, making results impossible to understand and hard to use. In this project, we propose to develop and study general techniques to using compression as a means for finding compact, characteristic sets of patterns. Such pattern sets should contain only high quality patterns that are of direct interest to the user and her application.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Principles of Pattern Set Mining for structured data. 01/07/2009 - 31/12/2013

Abstract

In this project, we propose to develop and study general techniques to mining sets of patterns directly. Such pattern sets should contain only high quality patterns that are of direct interest to the user and her application.By developing pattern set mining techniques, we hope to to lift pattern mining techniques from the local to the global level, which in turn should contribute to a better understanding of the role of pattern mining techniques in data mining and machine learning.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Intelligent analysis and data-mining of mass spectrometry-based proteome data. 01/07/2009 - 30/06/2013

Abstract

Mass spectrometry is a powerful analytical technique to elucidate the structure of molecules, like proteins. Until now a significant fraction of the data coming from MS analysis remains uninterpretable. This projects aims to apply state-of-the-art data mining techniques to a large set of mass spectra, aiming to find new relevant patterns that may point towards unknown structural modifications.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Machine learning for data mining and its applications. 01/01/2009 - 31/12/2013

Abstract

The research community aims at strengthening and coordinating the Flemish research about machine learning for datamining in general, and important applications such as bio-informatics and textmining in particular. Flemish participants: Computational Modeling Lab (VUB), CNTS (UA), ESAT-SISTA (KU Leuven), DTAI (KU Leuven), ADReM (UA).

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Principles of Pattern Set Mining. 01/01/2009 - 31/12/2012

Abstract

The overall goals of this project are 1) to establish a general computational framework for pattern set mining, 2) to study the computational properties of different types of selection predicates, 3) to develop algorithms and systems for dealing with pattern set mining, 4) to investigate how principles of constraint programming apply to pattern set mining, 5) to evaluate pattern set mining techniques on standard data mining and machine learning tasks, both conceptually and experimentally, and 6) to study representational and application aspects of pattern set mining.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Database cleaning. 01/01/2009 - 29/02/2012

Abstract

The goal of the project is to develop of new database techniques to support the cleaning of data, metadata and data transformations. In this context, cleaning is to be understood as the identification and correction of incompleteness, inconsistencies, inaccuracies and errors.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

DB-QueriDO: Study of the use of database techniques in storing and querying distributed Semantic Web data. 01/01/2008 - 31/12/2011

Abstract

The specific research questions the project aims to resolve are: 1. How can we achieve efficient reasoning support in case of distribution? 2. What is an efficient way of modularizing and distributing ontology-based data?

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Database Summarization. 01/01/2008 - 31/12/2009

Abstract

In this research we aim to find ways of summarizing a database by using the patterns that occur within it. Employing state of the art data mining techniques, the goal is to retrieve a concise subset of all patterns, that characterize the data as well as possible.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Analysis of high-throughput data by means of support vector machines and kernel-based techniques: feature selection and adaptive model building. 01/10/2007 - 18/01/2010

Abstract

In many real-life applications, information gathered from measurements is essential to ensure the quality of products and to enable control of a production process. These measurements are typically obtained from online hardware analysers (e.g. thermometers, flow meters, etc). However, there are many characteristics that cannot be obtained through online equipment and for which time-consuming and computationally expensive analysis is required. For this reason models are typically used to predict the results of such an analysis from the process variables. The analysis is then used as a confirmation of the model. Models are sometimes also used to predict online hardware analysers. Online analysers may fail due to corrosion or drift from their calibration point. In this project we address a number of issues related to the construction of models using Support Vector Machines. Our interest in building models using SVMs has several reasons. - It is well-known that SVMs can handle high-dimensional data without suffering from the curse of dimensionality. - The use of kernels enables nonlinear modelling. - SVMs can be made insensitive to noise and outliers. - Finally, the ability of SVMs to identify "unusual" data points makes it useful in detecting outliers and anomalies. The issues we aim to address in this project are the following. I. Feature selection and incorporation of prior knowledge It is the aim to investigate whether similar results can be obtained for Support Vector Regression and how well the technique applies to single-class problems. II. Adaptive model building Techniques that can handle the adaptivity of the inferential sensor at all levels, and especially when the mathematical model needs to be partially rebuilt, are still in their infancy and are the second topic of this research project.

Researcher(s)

  • Promoter: Goethals Bart
  • Promoter: Verdonk Brigitte
  • Fellow: Smets Koen

Research team(s)

Project type(s)

  • Research Project

Mining Relational Databases 01/01/2007 - 31/12/2008

Abstract

Finding patterns in arbitrary relational databases remains an interesting problem for which only very few efficient techniques exist. We study the framework in which pairs of queries over the data are used as patterns and consider the problem of finding interesting associations between them. More specifically, we investigate small subclasses of conjunctive queries that still allow to find interesting patterns efficiently.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Normalization of XQuery optimization. 01/10/2006 - 31/03/2007

Abstract

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Foundations of inductive databases for data mining. 01/01/2006 - 31/12/2009

Abstract

In this project, we study the realization of an inductive database model. The most important steps in the realization of such a model are : a) a uniform representation of patterns and data; b) a query-language for querying the data and the patterns; c) the integration of existing optimization techniques into the physical layer.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Updates for virtual XML views. 01/01/2006 - 31/12/2007

Abstract

The integration of different kinds of data is an important issue in the world of Content Management Systems, since one wishes to query all these (heterogeneous) data in an uniform way. The focus in our research is integrating relational data and XML by generating XML views for relational databases. Not only querying, but also updating the relational database using these XML views will be possible. However, updating the respective tables of a relational database through an (XML) view can cause some problems. Therefore we will investigate which updates are possible (and which are not), and we'll also try to convert the (relational) schema to a schema for the XML view.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

The use of statistical for XML-optimization. 01/01/2006 - 31/12/2007

Abstract

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

XML Pattern Mining. 01/10/2005 - 30/09/2006

Abstract

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

IQ - Inductive queries for mining patterns and models. 01/09/2005 - 31/08/2008

Abstract

Given the present distinct lack of a generally accepted framework for data mining, the quest for such a framework is a major research priority. The most promising approach to this task is taken by inductive databases (IDBs), which contain not only data, but also patterns. Patterns can be either local patterns, such as frequent itemsets, which are of descriptive nature, or global models, such as decision trees, which are of predictive nature. In an IDB, inductive queries can be used to generate (mine), manipulate, and apply patterns. The IDB framework is appealing as a theory for data mining, because it employs declarative queries instead of ad hoc procedural constructs. Declarative queries are often formulated using constraints and inductive querying is closely related to constraint-based data mining. The IDB framework is also appealing for data mining applications, as it supports the process of knowledge discovery in databases (KDD): the results of one (inductive) query can be used as input for another and nontrivial multi-step KDD scenarios can be supported, rather than just single data mining operations.The state-of-the-art in IDBs is that there exist various effective approaches to constraint-based mining (inductive querying) of local patterns, such as frequent itemsets and sequences, most of which work in isolation. The proposed project aims to significantly advance the state-of-the-art by developing the theory of and practical approaches to inductive querying (constraint-based mining) of global models, as well as approaches to answering complex inductive queries that involve both local patterns and global models. Based on these, showcase applications/IDBs in the area of bioinformatics will be developed, where users will be able to query data about drug activity, gene expression, gene function and protein sequences, as well as frequent patterns (e.g., subsequences in proteins) and predictive models (e.g., for drug activity or gene function).

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Complete and heuristic methods for guaranteeing privacy in data mining. 01/01/2005 - 31/12/2007

Abstract

The aim of data mining is to find useful information, such as trends and patterns, from large databases. These databases often contain confidential or personal information. Therefore, it is important to assess to what degree the application of data mining techniques can harm the privacy of individuals. In this project, we want to develop methods that assess the degree of disclosure of private information by a data mining operation. Since complete methods probably have a too high complexity, we will also pay attention to incomplete, heuristic methods.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Semi-supervised learning of Information Extraction. 01/10/2004 - 31/12/2005

Abstract

Information Extraction (IE) is concerned with extracting relevant data from a collection of structured or semi-structured documents. Current systems are trained using annotated corpora that are expensive and difficult to obtain in real-life applications. Therefore in this project we want to focus on the development of IE systems using semi-supervised learning, a technique that makes use of a large collection of un-annotated and easily-available data.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Update and Query Languages for Semistructured Data. 01/01/2004 - 31/12/2007

Abstract

In this project we consider languages for querying and updating semi-structured data as for example XML data. These languages are investigated for their theoretical and practical properties such as expressive power and suitability for query optimisation. If a database server of semi-structured data has to support such languages in an efficient way, then special techniques in the area of locking and indexing are necessary. Therefore we also investigate to what extent old techniques can be adapted for this and, if necessary, new more suitable techniques can be developed.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Foundations of new developments in database systems. 01/01/2004 - 31/12/2007

Abstract

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Updates for XML views of relational databases. 01/01/2004 - 31/12/2005

Abstract

The integration of different kinds of data is an important issue in the world of Content Management Systems, since one wishes to query all these (heterogeneous) data in an uniform way. The focus in our research is integrating relational data and XML by generating XML views for relational databases. Not only querying, but also updating the relational database using these XML views will be possible. However, updating the respective tables of a relational database through an (XML) view can cause some problems. Therefore we will investigate which updates are possible (and which are not), and we'll also try to convert the (relational) schema to a schema for the XML view.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

The use of statistical for XML-optimization. 01/01/2004 - 31/12/2005

Abstract

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Foundations of Databases for Bioinformatics. 01/01/2004 - 31/12/2005

Abstract

Large sums have been invested in gathering information about genomes, genes, proteins and other molecular characteristics of various organisms. One strongly hopes that it will be useful in describing the way cells function, explaining phylogenetic relations between various species, designing new pharmaceuticals and therapies to cure presently uncurable diseases. This precious data is however stored in databases which do not follow any widely accepted design principles, do not offer any standardized query languages and, last but not least, do not give any chance of interoperation. The truth is that there are no widely accepted design principles and no standardized query languages for the databases of bioinformatics. Moreover, often databases do not allow querying all the data, which they store. A paradigmatic example is the COG database http://www.ncbi.nlm.nih.gov/COG/. COGs are Clusters of Ortologous Genes, where each of the clusters is a set of sequences of homologous proteins from currently 73 different organisms. For each of the over 3300 COGs a phylogenetic tree has been reconstructed, based on similarity analyses of the proteins in that cluster. The trees contain a vast amount of derived information, which has been once determined and stored in the database. However, surprisingly enough, the database of COGs does not include any mechanism allowing one to use queries referring to the information stored in the trees, like, e.g., "find pairs of organisms, whose proteins are siblings in at least two trees and are at least three tree branches apart in another at least two". It seems self-evident that the family of trees over all COGs should have the status of a materialized view, often met in the classical databases, and that the user should be allowed to pose queries referring to those trees.The same situation is found in many other databases. It seems therefore unquestionable that a remedy is strongly needed, to make all what is really known in molecular biology fully accessible

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Database support for interactive data mining 01/10/2003 - 30/09/2006

Abstract

This project aims at a systematic study of the possibilities and problems for a database system for data mining. The development of a database system for data mining brings up a lot of fundamental questions. How will we represent the data? In which way can we integrate the data mining algorithms in query languages? How can we optimize the queries? A theoretical and fundamental approach to these questions is the central theme in this project.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Development of ad hoc software and hardware for the mentaly retarded. 01/01/2003 - 31/12/2004

Abstract

In the international arena increased efforts are being made to give people with mental retardation access to modern technologies. The three centers involved have been collaborating since 1990 with regard to the design and adaptation of software and hardware for this target group. This fieldwork requires scientific support: a database of available products will be developed and the centers' extensive expertise will allow them to fill major gaps on the supply side.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Integration of new database models and techniques. 01/01/1998 - 31/12/2003

Abstract

A recent development in database research, the shift of focus from general-purpose database models to special-purpose models which capture more of the semantics of the data, has revealed the existence of a number of concepts common to these new models The aim of this project is to identify a general and theoretical framework for further research in modern database applications, by integrating recent findings in three of these new applications; spatial databases, text-based databases and OLAP-systems.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project

Declarative methods in computer science. 01/01/1996 - 31/12/2015

Abstract

To cope with the need to build increasingly large and complex software systems, there is a growing demand for declarative approaches which abstract away unnecessary details and focus on the functionality of the systems. The network wants to further promote the development of such approaches which emerge from work in databases, functional and logic programming.

Researcher(s)

Research team(s)

Project type(s)

  • Research Project