Research Wout Bittremieux | Wout Bittremieux

Research team

ADReM Data Lab (ADReM)

Expertise

Dr. Bittremieux's research deals with developing advanced machine learning techniques to uncover novel knowledge from mass spectrometry-based proteomics and metabolomics data. While his current research mainly focuses on how deep learning can be used to analyze mass spectrometry data he is interested in a wide variety of bioinformatics problems. An important part of his work involves developing insights and computational approaches for quality control in biological mass spectrometry.

Exposomics: A holistic approach to assess environmental exposures and their impact on endocrine and metabolic disorders (EXPOSOME 2.0). 01/01/2026 - 31/12/2031

Abstract

Background: The exposome encompasses the totality of environmental exposures of an individual or organism throughout life (including exposure to chemicals, diet, lifestyle, climate factors, stress), and how these exposures impact biology (e.g., metabolites, hormones, etc.) and health. In particular, exposure to endocrine disrupting chemicals (EDCs), including metabolic disrupting chemicals (MDCs), has been linked to a broad range of non-communicable diseases and environmental health effects. Workflows for gathering and interpreting exposome data are still in development and are currently focusing on elucidating physiological pathways that link exposure to adverse effects. Ultimately, this will lead to a holistic understanding of how exposures interact with the phenotype to cause adverse health outcomes with potentially large societal, economic, and ecological costs. Aims: We will use innovative approaches to decipher the human exposome from early life on up to adulthood and its association with endocrine and metabolic alterations (leading to disorders, such as liver diseases, metabolic syndrome, diabetes, and obesity), as well as effects on other important physiological processes mostly driven by endocrine and metabolic signaling.

Researcher(s)

Promoter: Covaci Adrian
Co-promoter: Bervoets Lieven
Co-promoter: Bittremieux Wout
Co-promoter: De Boeck Gudrun
Co-promoter: Hermans Nina
Co-promoter: Jorens Philippe
Co-promoter: Knapen Dries
Co-promoter: Leroy Jo
Co-promoter: van Nuijs Alexander

Research team(s)

Toxicological Centre

Project type(s)

Research Project

Bioinformatics Solutions For the Comprehensive Study of the Human Immunopeptidome. 01/01/2025 - 31/12/2028

Abstract

The adaptive immune system works by recognizing and responding to infected or malignant cells by recognizing peptides bound to major histocompatibility complex (MHC) molecules. This induces an immune response by producing antibodies or directly attacking infected or abnormal cells to eliminate the threat. Mass spectrometry-based immunopeptidomics is a key approach to understand the adaptive immune system by identifying and characterizing peptides presented on MHC molecules. However, there is a lack of optimized bioinformatics tools for immunopeptidomics data analysis, resulting in very low spectrum annotation rates and missing out on important insights into the immune system. To overcome this challenge, we will develop a powerful de novo immunopeptide sequencing solution using deep learning to uncover increased biological knowledge from immunopeptidomics data. We will apply this tool to study the presence of aberrant peptides, e.g. due to errors in translation or transcriptional splicing, and non-human peptides, originating from pathogens and other organisms, in the human immunopeptidome. These innovations have the potential to unlock new biological and biomedical insights into the adaptive immune system that will catalyze the development of novel immunotherapies and vaccines.

Researcher(s)

Promoter: Bittremieux Wout
Co-promoter: Meysman Pieter

Research team(s)

ADReM Data Lab (ADReM)

Project type(s)

Research Project

Artificial Intelligence to Uncover Patterns in Mass Spectrometry Data Across Repositories. 01/01/2025 - 31/12/2028

Abstract

The relentless growth of data in the life sciences, notably in small molecule mass spectrometry (MS), presents a unique opportunity for groundbreaking discoveries. This project will introduce powerful artificial intelligence (AI) techniques to transcend traditional analysis paradigms that treat datasets in isolation, integrating fragmented data from large public databases to reveal insights that individual studies alone cannot uncover. At its core, our aim is to innovate by shifting from analyzing individual MS experiments to a comprehensive analysis across large repositories. This paradigm shift will unlock the untapped potential of public MS data, interpreting new observations within the context of the extensive molecular diversity documented in data repositories. To achieve this goal, we will develop AI-driven tools for simulating spectral libraries and incorporating statistical confidence in molecular identification. Additionally, we will employ multimodal representation learning techniques to bridge the gap between spectra and molecules on a repository scale. Standing at the intersection of AI, machine learning, and computational MS, our objective is to provide an integrated analysis of complex molecular data. This will pave the way for transformative advances across various scientific domains in the life sciences, including metabolomics, drug discovery, and environmental sciences, revolutionizing the approach to molecular discovery in the era of big data.

Researcher(s)

Promoter: Bittremieux Wout

Research team(s)

ADReM Data Lab (ADReM)

Project type(s)

Research Project

The Live Mouse Tracker (LMT) as a versatile drug screening platform for rare neurological diseases. 01/01/2025 - 31/12/2025

Abstract

Establishing effective therapies for rare neurodevelopmental diseases remains one of the greatest challenges in molecular medicine. Although advances in next-generation sequencing technologies have led to the discovery of hundreds of novel genetic syndromes over the past decade, the development of individualized therapies continues to lag behind. Each rare disorder, while affecting a small group, contributes to a global burden estimated to impact over 300 million individuals. The complexity arises from the fact that these disorders, often caused by mutations in different genes, affect multiple cellular pathways, generating an overwhelming volume of data that must be analyzed to inform therapeutic strategies. Current drug interventions have seen limited success in translating promising preclinical findings into patient-ready treatments. The rapid rise of AI technologies, however, has the potential to transform this landscape. AI-driven algorithms are increasingly capable of navigating vast biomedical datasets, revealing drug candidates for rare diseases at an unprecedented pace. Many start-ups are already capitalizing on this potential, generating a flood of drug candidates for preclinical evaluation. However, this surge in candidate therapies has shifted the bottleneck from drug discovery to preclinical testing. Traditional murine test batteries are labor-intensive, expensive, and time-consuming, necessitating a standardized, scalable, and efficient platform to meet the growing demand for drug screening. We propose the development and commercialization of our Live Mouse Tracker (LMT) platform, a cutting-edge tool designed to address this critical need. The LMT system automates behavioral analysis, capable of tracking up to 39 different behaviors in groups of mice over 24-hour periods. This high-throughput capability provides a rapid and comprehensive assessment of drug efficacy in preclinical models. Our initial validation will focus on the fragile X syndrome, a widely studied neurodevelopmental disorder for which no effective treatment currently exists. By evaluating drugs that target multiple affected pathways simultaneously, we aim to pioneer a new approach to rare disease therapy development. During this project, we will validate the robustness of the LMT platform and extend it into a fully integrated service, as well as explore collaboration with other university partners to offer comprehensive preclinical drug testing solutions. This service platform has the potential to revolutionize the drug development pipeline, ensuring that AI-generated candidate drugs can be rapidly and reliably assessed, accelerating the path from bench to bedside. Through this initiative, we aim to bridge the gap between drug discovery and therapeutic application, bringing hope to millions of patients with rare neurological diseases.

Researcher(s)

Promoter: Bittremieux Wout
Co-promoter: Annear Dale
Co-promoter: Kooy Frank

Research team(s)

ADReM Data Lab (ADReM)

Project type(s)

Research Project

Deep Learning for Comprehensive Small Molecule Discovery From Untargeted Mass Spectrometry Data. 01/10/2024 - 30/09/2027

Abstract

Although small molecule mass spectrometry (MS) is a vital tool in various life sciences domains, its potential is hindered by the low annotation rate of MS/MS spectra, limiting our ability to uncover critical biological insights. This research project aims to revolutionize small molecule MS by harnessing the power of deep learning and multimodal integration to overcome this challenge. I will develop several complementary deep learning strategies for small molecule identification. First, I will develop a learned spectrum similarity score for the discovery of structurally related analogs. Second, I will use generative AI techniques to simulate comprehensive spectral libraries. Third, I will develop a solution for de novo molecule identification directly from MS/MS spectra, reducing the reliance on spectral libraries and expanding the range of discoverable molecules. Furthermore, I will introduce a holistic approach to MS by integrating three disparate data sources—MS/MS spectra, molecular structures, and natural language descriptions—into a shared latent space using multimodal representation learning. This paradigm shift will allow for direct linking of MS/MS observations to molecular structures and expert knowledge, enabling semantic search and retrieval of molecular information. Moreover, I will employ explainable AI techniques to interpret model decisions and provide insights into MS experimentation patterns.

Researcher(s)

Promoter: Bittremieux Wout
Co-promoter: Laukens Kris
Fellow: Piedrahita Giraldo Juan Sebastian

Research team(s)

ADReM Data Lab (ADReM)

Project type(s)

Research Project

De novo mass spectrometry peptide sequencing with a transformer large language model. 01/05/2024 - 30/04/2025

Abstract

The primary challenge in proteomics is identifying amino acid sequences from tandem mass spectra, which traditionally has been achieved using sequence database searching. As this method is limited to known protein sequences, de novo peptide sequencing presents an interesting alternative for the discovery of unexpected peptides. Casanovo is a state-of-the-art tool for de novo peptide sequencing, harnessing similar technologies underpinning large language models to translate mass spectra into amino acid sequences. The goal of this project is to enhance Casanovo and make it the preferred solution for de novo peptide sequencing. This will be achieved by compiling an extensive training dataset from diverse biological samples and mass spectrometry instruments and scaling up Casanovo's neural network to increase its learning capacity. Additionally, we will create a tailored model for the analysis of immunopeptidomics data by fine-tuning Casanovo's capabilities. Finally, we will develop a user-friendly web interface, making Casanovo accessible to a broad range of researchers and overcoming hardware limitations through cloud computing.

Researcher(s)

Promoter: Bittremieux Wout

Research team(s)

ADReM Data Lab (ADReM)

Project type(s)

Research Project

Bioinformatics network for proteomics and mass spectrometry 01/01/2024 - 31/12/2028

Abstract

Proteomics, the study of proteins and their functions, is a critical area in biology and medicine. With mass spectrometry (MS), researchers can analyze large amounts of proteomics samples, leading to valuable insights into complex biological processes. MS datasets require specialized data analysis techniques, which has led to the development of several powerful bioinformatics tools and pipelines for mass spectrometry-based proteomics. Nevertheless, the increasingly large volume and complex nature of MS-based proteomics data pose significant challenges that hinder progress in the field. To address these, there is a need for an open and collaborative approach to science. We have identified four key challenges that we will address through this Scientific Research Network (SRN): - Highly performant bioinformatics tools: As proteomics datasets grow in size, computational bottlenecks arise. Through this SRN, we will foster the development of highly performant and interoperable bioinformatics tools and workflows to process these datasets efficiently, enabling faster and more transparent analyses. - Machine learning integration: While machine learning holds great promise for proteomics data analysis, integrating it into practical workflows remains complex. Our SRN will work to bridge this gap, making machine learning techniques more accessible and seamlessly integrated into routine analyses. - Effective benchmarking: The diversity of analysis approaches makes it challenging to compare methods effectively. Our objective is to establish standardized benchmarking methods that allow researchers to systematically evaluate and improve their analysis pipelines. - Community building and educational resources: Proteomics data analysis requires specialized knowledge that is continuously evolving, making it difficult for young scientists and data science experts to enter the field. Our proposed SRN aims to build a supportive community for early-career researchers and create high-quality educational resources that facilitate the learning curve and provide accessible pathways for newcomers. With three research units in Flanders that are global leaders in MS-based proteomics, this SRN will make Flanders a focal point in the field of proteomics bioinformatics. Our collaboration with international partners will further enhance the visibility of Flemish research and contribute to a competitive position in the international research landscape, making the region attractive for ambitious and talented young researchers to work in. The six partnering research units have strong ties with the proteomics bioinformatics community within Europe and beyond, which we aim to maximally exploit to achieve our long-term goals. Indeed, instead of tackling these challenges alone, each of the six research units intends to take up a leading role in the wider research community to reach our objectives. Through this SRN, we will formalize the existing connections between the six partners and provide a clear collaborative vision and structure to drive progress and effectively mobilize the wider research community. The scope of our goals underscores the necessity of a community-scale effort. All six partners have taken up central roles in existing initiatives, such as the European Bioinformatics Community for Mass Spectrometry (EuBIC-MS), the Human Proteome Organization's Proteomics Standards Initiative (HUPO-PSI), the ELIXIR Life Science Infrastructure, and the Computational Mass Spectrometry (CompMS) interest group of the International Society for Computational Biology (ISCB), providing the critical mass of researchers required to achieve our goals.

Researcher(s)

Promoter: Bittremieux Wout

Research team(s)

ADReM Data Lab (ADReM)

Project type(s)

Research Project

Reference data-driven metabolomics to study the molecular composition of South African foods. 01/01/2024 - 31/12/2026

Abstract

Understanding the molecular composition of food is essential for studying its impact on human health. We have recently developed a new approach called reference data-driven metabolomics, which can perform diet readouts from untargeted metabolomics data. However, this approach currently lacks diverse and geographically representative reference data. To address this, we will expand our reference food molecular database to include indigenous and locally cultivated foods from South Africa, a region with rich cultural and culinary traditions and nutritional diversity, analyze their molecular composition using mass spectrometry, and integrate the data into the Global FoodOmics reference database. Additionally, we will develop user-friendly bioinformatics tools that simplify the data analysis process, making reference data-driven metabolomics accessible to researchers with diverse backgrounds, and study the molecular composition of indigenous South African foods. Through collaboration between South African universities and the University of Antwerp, we will combine expertise in analytical chemistry, bioinformatics, nutrition, and agricultural sciences to advance metabolomics research, expand scientific knowledge of South African diets, and provide evidence-based insights for improving nutrition and health in South African populations.

Researcher(s)

Promoter: Bittremieux Wout

Research team(s)

ADReM Data Lab (ADReM)

Project type(s)

Research Project

Computational mass spectrometry and artificial intelligence to unravel the immunopeptidome. 01/10/2023 - 30/09/2027

Abstract

The adaptive immune system is a crucial component of the immune response, providing specific defense against a wide range of pathogens and contributing to the development of immunological memory. Immunopeptidomics is a rapidly evolving field that uses mass spectrometry-based approaches to identify and quantify immunopeptides, which play a vital role in the recognition and elimination of infected or malignant cells by T cells. However, the annotation rate of immunopeptides from mass spectrometry data is currently severely limited, resulting in a significant loss of biological information. To overcome this challenge, we will develop specialized bioinformatics tools for analyzing mass spectrometry immunopeptidomics data. Specifically, we will develop an efficient and sensitive open modification search engine to identify immunopeptides that have undergone post-translational modifications. Furthermore, we will develop a deep learning-based de novo peptide sequencing approach optimized for the analysis of immunopeptidomics data. The tools developed in this project have the potential to significantly expand the amount of biological information that can be obtained from immunopeptidomics experiments, leading to transformational breakthroughs in the field.

Researcher(s)

Promoter: Bittremieux Wout
Fellow: Pominova Marina

Research team(s)

ADReM Data Lab (ADReM)

Project type(s)

Research Project

Artificial intelligence-powered knowledge base of the observed molecular universe. 01/12/2022 - 30/11/2027

Abstract

Despite recent breakthroughs in artificial intelligence (AI) that have led to disruptive advances across many scientific domains, there are still challenges in adopting state-of-the-art AI techniques in the life sciences. Notably, analysis of small molecule untargeted mass spectrometry (MS) data is still based on expert knowledge and manually compiled rules, and each experiment is analyzed in isolation without taking into account prior knowledge. Instead, this project will develop more powerful approaches in which untargeted MS data is interpreted within the context of the vast background of previously generated, publicly available data. The research hypothesis driving the proposed project is that advanced AI techniques can uncover hidden knowledge from large amounts of open MS data in public repositories to gain a deeper understanding into the molecular composition of complex biological samples. We will develop machine learning solutions to explore the observed molecular universe and build a comprehensive small molecule knowledge base. These ambitious goals build on our unique expertise in both AI and MS to create next-generation, data-driven software solutions for molecular discovery from untargeted MS data.

Researcher(s)

Promoter: Bittremieux Wout
Fellow: Bittremieux Wout

Research team(s)

ADReM Data Lab (ADReM)

Project type(s)

Research Project

Bioinformatics and machine learning for large-scale metabolomics data analysis. 01/12/2022 - 30/11/2026

Abstract

Researcher(s)

Promoter: Bittremieux Wout
Fellow: Heirman Janne

Research team(s)

ADReM Data Lab (ADReM)

Project type(s)

Research Project

Precision Medicine Technologies (PreMeT) 01/01/2021 - 31/12/2026

Abstract

Precision medicine is an approach to tailor healthcare individually, on the basis of the genes, lifestyle and environment of an individual. It is based on technologies that allow clinicians to predict more accurately which treatment and prevention strategies for a given disease will work in which group of affected individuals. Key drivers for precision medicine are advances in technology, such as the next generation sequencing technology in genomics, the increasing availability of health data and the growth of data sciences and artificial intelligence. In these domains, 6 strong research teams of the UAntwerpen are now joining forces to translate their research and offer a technology platform for precision medicine (PreMeT) towards industry, hospitals, research institutes and society. The mission of PreMeT is to enable precision medicine through an integrated approach of genomics and big data analysis.

Researcher(s)

Promoter: Laukens Kris
Co-promoter: Bittremieux Wout
Co-promoter: Kooy Frank
Co-promoter: Loeys Bart
Co-promoter: Meester Josephina
Co-promoter: Meysman Pieter
Co-promoter: Mortier Geert
Co-promoter: Op de Beeck Ken
Co-promoter: Van Camp Guy
Co-promoter: Van Hul Wim
Co-promoter: Verstraeten Aline
Fellow: Bosschaerts Tom
Fellow: Gauglitz Julia

Research team(s)

ADReM Data Lab (ADReM)

Project type(s)

Research Project

Enabling mobile and data-driven pathogen monitoring through a paired nanopore squiggle–genome sequence database. 01/05/2023 - 31/12/2024

Abstract

Infectious disease monitoring is a global need, and the threat of existing and emerging pathogens poses a major challenge to public health. Nanopore sequencing is a revolutionary technology that enables portable sequencing and has shown its merit in the COVID-19 pandemic. This technology could enable existing laboratories that have no or limited infectious disease surveillance capacity to 'leapfrog' to sequencing-based pathogen monitoring. However, this potential hinges on the ability to operate in resource-limited settings, which is, to date, hindered by data storage and processing needs. The raw data, referred to as 'squiggles,' requires significant storage space and decoding it to DNA sequences requires graphical processing units (GPUs) that consume significant amounts of power. In this pandemic preparedness proof-of-concept project, we will build on advances from our IOF-SBO funded project LeapSEQ to remove significant hurdles to enable mobile and data-driven pathogen monitoring. These hurdles include: (1) a need for scalable storage solutions for squiggle data, (2) the lack of available pathogen data, and (3) improved computational solutions for interacting with squiggle data. We will tackle these problems by engineering and populating a proof-of-concept paired nanopore squiggle–genome sequence database using our portable LeapSEQ lab and by developing efficient data-driven algorithms for rapid pathogen monitoring. We will develop this database with strategic partners at ITM and UA and further explore LeapSEQ valorization potential in the context of global pathogen monitoring.

Researcher(s)

Promoter: Laukens Kris
Co-promoter: Bittremieux Wout

Research team(s)

ADReM Data Lab (ADReM)

Project type(s)

Research Project

Transferable deep learning for sequence based prediction of molecular interactions. 01/10/2019 - 30/09/2023

Abstract

Machine learning can be used to elucidate the presence or absence of interactions. In particular for life science research, the prediction of molecular interactions that underlie the mechanics of cells, pathogens and the immune system is a problem of great relevance. Here we aim to establish a fundamentally new technology that can predict unknown interaction graphs with models trained on the vast amount of molecular interaction data that is nowadays available thanks to high-throughput experimental techniques. This will be accomplished using a machine learning workflow that can learn the patterns in molecular sequences that underlie interactions. We will tackle this problem in a generalizable way using the latest generation of neural networks approaches by establishing a generic encoding for molecular sequences that can be readily translated to various biological problems. This encoding will be fed into an advanced deep neural network to model general molecular interactions, which can then be fine-tuned to highly specific use cases. The features that underlie the successful network will then be translated into novel visualisations to allow interpretation by biologists. We will assess the performance of this framework using both computationally simulated and real-life experimental sequence and interaction data from a diverse range of relevant use cases.

Researcher(s)

Promoter: Laukens Kris
Co-promoter: Bittremieux Wout
Co-promoter: Meysman Pieter
Fellow: Postovskaya Anna

Research team(s)

ADReM Data Lab (ADReM)

Project type(s)

Research Project

Intelligent quality control for mass spectrometry-based proteomics 01/10/2017 - 31/07/2021

Abstract

As mass spectrometry proteomics has matured over the past few years, a growing emphasis has been placed on quality control (QC), which is becoming a crucial factor to endorse the generated experimental results. Mass spectrometry is a highly complex technique, and because its results can be subject to significant variability, suitable QC is necessary to model the influence of this variability on experimental results. Nevertheless, extensive quality control procedures are currently lacking due to the absence of QC information alongside the experimental data and the high degree of difficulty in interpreting this complex information. For mass spectrometry proteomics to mature a systematic approach to quality control is essential. To this end we will first provide the technical infrastructure to generate QC metrics as an integral element of a mass spectrometry experiment. We will develop the qcML standard file format for mass spectrometry QC data and we will establish procedures to include detailed QC data alongside all data submissions to PRIDE, a leading public repository for proteomics data. Second, we will use this newly generated wealth of QC data to develop advanced machine learning techniques to uncover novel knowledge on the performance of a mass spectrometry experiment. This will make it possible to improve the experimental set-up, optimize the spectral acquisition, and increase the confidence in the generated results, massively empowering biological mass spectrometry.

Researcher(s)

Promoter: Laukens Kris
Fellow: Bittremieux Wout

Research team(s)

ADReM Data Lab (ADReM)

Project type(s)

Research Project