Research team
Expertise
Dr. Bittremieux's research deals with developing advanced machine learning techniques to uncover novel knowledge from mass spectrometry-based proteomics and metabolomics data. While his current research mainly focuses on how deep learning can be used to analyze mass spectrometry data he is interested in a wide variety of bioinformatics problems. An important part of his work involves developing insights and computational approaches for quality control in biological mass spectrometry.
Exposomics: A holistic approach to assess environmental exposures and their impact on endocrine and metabolic disorders (EXPOSOME 2.0).
Abstract
Background: The exposome encompasses the totality of environmental exposures of an individual or organism throughout life (including exposure to chemicals, diet, lifestyle, climate factors, stress), and how these exposures impact biology (e.g., metabolites, hormones, etc.) and health. In particular, exposure to endocrine disrupting chemicals (EDCs), including metabolic disrupting chemicals (MDCs), has been linked to a broad range of non-communicable diseases and environmental health effects. Workflows for gathering and interpreting exposome data are still in development and are currently focusing on elucidating physiological pathways that link exposure to adverse effects. Ultimately, this will lead to a holistic understanding of how exposures interact with the phenotype to cause adverse health outcomes with potentially large societal, economic, and ecological costs. Aims: We will use innovative approaches to decipher the human exposome from early life on up to adulthood and its association with endocrine and metabolic alterations (leading to disorders, such as liver diseases, metabolic syndrome, diabetes, and obesity), as well as effects on other important physiological processes mostly driven by endocrine and metabolic signaling.Researcher(s)
- Promoter: Covaci Adrian
- Co-promoter: Bervoets Lieven
- Co-promoter: Bittremieux Wout
- Co-promoter: De Boeck Gudrun
- Co-promoter: Hermans Nina
- Co-promoter: Jorens Philippe
- Co-promoter: Knapen Dries
- Co-promoter: Leroy Jo
- Co-promoter: van Nuijs Alexander
Research team(s)
Project type(s)
- Research Project
Deep Learning for Comprehensive Small Molecule Discovery From Untargeted Mass Spectrometry Data.
Abstract
Although small molecule mass spectrometry (MS) is a vital tool in various life sciences domains, its potential is hindered by the low annotation rate of MS/MS spectra, limiting our ability to uncover critical biological insights. This research project aims to revolutionize small molecule MS by harnessing the power of deep learning and multimodal integration to overcome this challenge. I will develop several complementary deep learning strategies for small molecule identification. First, I will develop a learned spectrum similarity score for the discovery of structurally related analogs. Second, I will use generative AI techniques to simulate comprehensive spectral libraries. Third, I will develop a solution for de novo molecule identification directly from MS/MS spectra, reducing the reliance on spectral libraries and expanding the range of discoverable molecules. Furthermore, I will introduce a holistic approach to MS by integrating three disparate data sources—MS/MS spectra, molecular structures, and natural language descriptions—into a shared latent space using multimodal representation learning. This paradigm shift will allow for direct linking of MS/MS observations to molecular structures and expert knowledge, enabling semantic search and retrieval of molecular information. Moreover, I will employ explainable AI techniques to interpret model decisions and provide insights into MS experimentation patterns.Researcher(s)
- Promoter: Bittremieux Wout
- Co-promoter: Laukens Kris
- Fellow: Piedrahita Giraldo Juan Sebastian
Research team(s)
Project type(s)
- Research Project
De novo mass spectrometry peptide sequencing with a transformer large language model.
Abstract
The primary challenge in proteomics is identifying amino acid sequences from tandem mass spectra, which traditionally has been achieved using sequence database searching. As this method is limited to known protein sequences, de novo peptide sequencing presents an interesting alternative for the discovery of unexpected peptides. Casanovo is a state-of-the-art tool for de novo peptide sequencing, harnessing similar technologies underpinning large language models to translate mass spectra into amino acid sequences. The goal of this project is to enhance Casanovo and make it the preferred solution for de novo peptide sequencing. This will be achieved by compiling an extensive training dataset from diverse biological samples and mass spectrometry instruments and scaling up Casanovo's neural network to increase its learning capacity. Additionally, we will create a tailored model for the analysis of immunopeptidomics data by fine-tuning Casanovo's capabilities. Finally, we will develop a user-friendly web interface, making Casanovo accessible to a broad range of researchers and overcoming hardware limitations through cloud computing.Researcher(s)
- Promoter: Bittremieux Wout
Research team(s)
Project type(s)
- Research Project
Bioinformatics network for proteomics and mass spectrometry
Abstract
Proteomics, the study of proteins and their functions, is a critical area in biology and medicine. With mass spectrometry (MS), researchers can analyze large amounts of proteomics samples, leading to valuable insights into complex biological processes. MS datasets require specialized data analysis techniques, which has led to the development of several powerful bioinformatics tools and pipelines for mass spectrometry-based proteomics. Nevertheless, the increasingly large volume and complex nature of MS-based proteomics data pose significant challenges that hinder progress in the field. To address these, there is a need for an open and collaborative approach to science. We have identified four key challenges that we will address through this Scientific Research Network (SRN): - Highly performant bioinformatics tools: As proteomics datasets grow in size, computational bottlenecks arise. Through this SRN, we will foster the development of highly performant and interoperable bioinformatics tools and workflows to process these datasets efficiently, enabling faster and more transparent analyses. - Machine learning integration: While machine learning holds great promise for proteomics data analysis, integrating it into practical workflows remains complex. Our SRN will work to bridge this gap, making machine learning techniques more accessible and seamlessly integrated into routine analyses. - Effective benchmarking: The diversity of analysis approaches makes it challenging to compare methods effectively. Our objective is to establish standardized benchmarking methods that allow researchers to systematically evaluate and improve their analysis pipelines. - Community building and educational resources: Proteomics data analysis requires specialized knowledge that is continuously evolving, making it difficult for young scientists and data science experts to enter the field. Our proposed SRN aims to build a supportive community for early-career researchers and create high-quality educational resources that facilitate the learning curve and provide accessible pathways for newcomers. With three research units in Flanders that are global leaders in MS-based proteomics, this SRN will make Flanders a focal point in the field of proteomics bioinformatics. Our collaboration with international partners will further enhance the visibility of Flemish research and contribute to a competitive position in the international research landscape, making the region attractive for ambitious and talented young researchers to work in. The six partnering research units have strong ties with the proteomics bioinformatics community within Europe and beyond, which we aim to maximally exploit to achieve our long-term goals. Indeed, instead of tackling these challenges alone, each of the six research units intends to take up a leading role in the wider research community to reach our objectives. Through this SRN, we will formalize the existing connections between the six partners and provide a clear collaborative vision and structure to drive progress and effectively mobilize the wider research community. The scope of our goals underscores the necessity of a community-scale effort. All six partners have taken up central roles in existing initiatives, such as the European Bioinformatics Community for Mass Spectrometry (EuBIC-MS), the Human Proteome Organization's Proteomics Standards Initiative (HUPO-PSI), the ELIXIR Life Science Infrastructure, and the Computational Mass Spectrometry (CompMS) interest group of the International Society for Computational Biology (ISCB), providing the critical mass of researchers required to achieve our goals.Researcher(s)
- Promoter: Bittremieux Wout
Research team(s)
Project type(s)
- Research Project
Reference data-driven metabolomics to study the molecular composition of South African foods.
Abstract
Understanding the molecular composition of food is essential for studying its impact on human health. We have recently developed a new approach called reference data-driven metabolomics, which can perform diet readouts from untargeted metabolomics data. However, this approach currently lacks diverse and geographically representative reference data. To address this, we will expand our reference food molecular database to include indigenous and locally cultivated foods from South Africa, a region with rich cultural and culinary traditions and nutritional diversity, analyze their molecular composition using mass spectrometry, and integrate the data into the Global FoodOmics reference database. Additionally, we will develop user-friendly bioinformatics tools that simplify the data analysis process, making reference data-driven metabolomics accessible to researchers with diverse backgrounds, and study the molecular composition of indigenous South African foods. Through collaboration between South African universities and the University of Antwerp, we will combine expertise in analytical chemistry, bioinformatics, nutrition, and agricultural sciences to advance metabolomics research, expand scientific knowledge of South African diets, and provide evidence-based insights for improving nutrition and health in South African populations.Researcher(s)
- Promoter: Bittremieux Wout
Research team(s)
Project type(s)
- Research Project
Computational mass spectrometry and artificial intelligence to unravel the immunopeptidome.
Abstract
The adaptive immune system is a crucial component of the immune response, providing specific defense against a wide range of pathogens and contributing to the development of immunological memory. Immunopeptidomics is a rapidly evolving field that uses mass spectrometry-based approaches to identify and quantify immunopeptides, which play a vital role in the recognition and elimination of infected or malignant cells by T cells. However, the annotation rate of immunopeptides from mass spectrometry data is currently severely limited, resulting in a significant loss of biological information. To overcome this challenge, we will develop specialized bioinformatics tools for analyzing mass spectrometry immunopeptidomics data. Specifically, we will develop an efficient and sensitive open modification search engine to identify immunopeptides that have undergone post-translational modifications. Furthermore, we will develop a deep learning-based de novo peptide sequencing approach optimized for the analysis of immunopeptidomics data. The tools developed in this project have the potential to significantly expand the amount of biological information that can be obtained from immunopeptidomics experiments, leading to transformational breakthroughs in the field.Researcher(s)
- Promoter: Bittremieux Wout
- Fellow: Pominova Marina
Research team(s)
Project type(s)
- Research Project
Enabling mobile and data-driven pathogen monitoring through a paired nanopore squiggle–genome sequence database.
Abstract
Infectious disease monitoring is a global need, and the threat of existing and emerging pathogens poses a major challenge to public health. Nanopore sequencing is a revolutionary technology that enables portable sequencing and has shown its merit in the COVID-19 pandemic. This technology could enable existing laboratories that have no or limited infectious disease surveillance capacity to 'leapfrog' to sequencing-based pathogen monitoring. However, this potential hinges on the ability to operate in resource-limited settings, which is, to date, hindered by data storage and processing needs. The raw data, referred to as 'squiggles,' requires significant storage space and decoding it to DNA sequences requires graphical processing units (GPUs) that consume significant amounts of power. In this pandemic preparedness proof-of-concept project, we will build on advances from our IOF-SBO funded project LeapSEQ to remove significant hurdles to enable mobile and data-driven pathogen monitoring. These hurdles include: (1) a need for scalable storage solutions for squiggle data, (2) the lack of available pathogen data, and (3) improved computational solutions for interacting with squiggle data. We will tackle these problems by engineering and populating a proof-of-concept paired nanopore squiggle–genome sequence database using our portable LeapSEQ lab and by developing efficient data-driven algorithms for rapid pathogen monitoring. We will develop this database with strategic partners at ITM and UA and further explore LeapSEQ valorization potential in the context of global pathogen monitoring.Researcher(s)
- Promoter: Laukens Kris
- Co-promoter: Bittremieux Wout
Research team(s)
Project type(s)
- Research Project
Artificial intelligence-powered knowledge base of the observed molecular universe.
Abstract
Despite recent breakthroughs in artificial intelligence (AI) that have led to disruptive advances across many scientific domains, there are still challenges in adopting state-of-the-art AI techniques in the life sciences. Notably, analysis of small molecule untargeted mass spectrometry (MS) data is still based on expert knowledge and manually compiled rules, and each experiment is analyzed in isolation without taking into account prior knowledge. Instead, this project will develop more powerful approaches in which untargeted MS data is interpreted within the context of the vast background of previously generated, publicly available data. The research hypothesis driving the proposed project is that advanced AI techniques can uncover hidden knowledge from large amounts of open MS data in public repositories to gain a deeper understanding into the molecular composition of complex biological samples. We will develop machine learning solutions to explore the observed molecular universe and build a comprehensive small molecule knowledge base. These ambitious goals build on our unique expertise in both AI and MS to create next-generation, data-driven software solutions for molecular discovery from untargeted MS data.Researcher(s)
- Promoter: Bittremieux Wout
- Fellow: Bittremieux Wout
Research team(s)
Project type(s)
- Research Project
Bioinformatics and machine learning for large-scale metabolomics data analysis.
Abstract
Despite recent breakthroughs in artificial intelligence (AI) that have led to disruptive advances across many scientific domains, there are still challenges in adopting state-of-the-art AI techniques in the life sciences. Notably, analysis of small molecule untargeted mass spectrometry (MS) data is still based on expert knowledge and manually compiled rules, and each experiment is analyzed in isolation without taking into account prior knowledge. Instead, this project will develop more powerful approaches in which untargeted MS data is interpreted within the context of the vast background of previously generated, publicly available data. The research hypothesis driving the proposed project is that advanced AI techniques can uncover hidden knowledge from large amounts of open MS data in public repositories to gain a deeper understanding into the molecular composition of complex biological samples. We will develop machine learning solutions to explore the observed molecular universe and build a comprehensive small molecule knowledge base. These ambitious goals build on our unique expertise in both AI and MS to create next-generation, data-driven software solutions for molecular discovery from untargeted MS data.Researcher(s)
- Promoter: Bittremieux Wout
- Fellow: Heirman Janne
Research team(s)
Project type(s)
- Research Project
Precision Medicine Technologies (PreMeT)
Abstract
Precision medicine is an approach to tailor healthcare individually, on the basis of the genes, lifestyle and environment of an individual. It is based on technologies that allow clinicians to predict more accurately which treatment and prevention strategies for a given disease will work in which group of affected individuals. Key drivers for precision medicine are advances in technology, such as the next generation sequencing technology in genomics, the increasing availability of health data and the growth of data sciences and artificial intelligence. In these domains, 6 strong research teams of the UAntwerpen are now joining forces to translate their research and offer a technology platform for precision medicine (PreMeT) towards industry, hospitals, research institutes and society. The mission of PreMeT is to enable precision medicine through an integrated approach of genomics and big data analysis.Researcher(s)
- Promoter: Laukens Kris
- Co-promoter: Bittremieux Wout
- Co-promoter: Kooy Frank
- Co-promoter: Loeys Bart
- Co-promoter: Meester Josephina
- Co-promoter: Meysman Pieter
- Co-promoter: Mortier Geert
- Co-promoter: Op de Beeck Ken
- Co-promoter: Van Camp Guy
- Co-promoter: Van Hul Wim
- Co-promoter: Verstraeten Aline
- Fellow: Bosschaerts Tom
- Fellow: Gauglitz Julia
Research team(s)
Project type(s)
- Research Project
Transferable deep learning for sequence based prediction of molecular interactions.
Abstract
Machine learning can be used to elucidate the presence or absence of interactions. In particular for life science research, the prediction of molecular interactions that underlie the mechanics of cells, pathogens and the immune system is a problem of great relevance. Here we aim to establish a fundamentally new technology that can predict unknown interaction graphs with models trained on the vast amount of molecular interaction data that is nowadays available thanks to high-throughput experimental techniques. This will be accomplished using a machine learning workflow that can learn the patterns in molecular sequences that underlie interactions. We will tackle this problem in a generalizable way using the latest generation of neural networks approaches by establishing a generic encoding for molecular sequences that can be readily translated to various biological problems. This encoding will be fed into an advanced deep neural network to model general molecular interactions, which can then be fine-tuned to highly specific use cases. The features that underlie the successful network will then be translated into novel visualisations to allow interpretation by biologists. We will assess the performance of this framework using both computationally simulated and real-life experimental sequence and interaction data from a diverse range of relevant use cases.Researcher(s)
- Promoter: Laukens Kris
- Co-promoter: Bittremieux Wout
- Co-promoter: Meysman Pieter
- Fellow: Postovskaya Anna
Research team(s)
Project type(s)
- Research Project
Intelligent quality control for mass spectrometry-based proteomics
Abstract
As mass spectrometry proteomics has matured over the past few years, a growing emphasis has been placed on quality control (QC), which is becoming a crucial factor to endorse the generated experimental results. Mass spectrometry is a highly complex technique, and because its results can be subject to significant variability, suitable QC is necessary to model the influence of this variability on experimental results. Nevertheless, extensive quality control procedures are currently lacking due to the absence of QC information alongside the experimental data and the high degree of difficulty in interpreting this complex information. For mass spectrometry proteomics to mature a systematic approach to quality control is essential. To this end we will first provide the technical infrastructure to generate QC metrics as an integral element of a mass spectrometry experiment. We will develop the qcML standard file format for mass spectrometry QC data and we will establish procedures to include detailed QC data alongside all data submissions to PRIDE, a leading public repository for proteomics data. Second, we will use this newly generated wealth of QC data to develop advanced machine learning techniques to uncover novel knowledge on the performance of a mass spectrometry experiment. This will make it possible to improve the experimental set-up, optimize the spectral acquisition, and increase the confidence in the generated results, massively empowering biological mass spectrometry.Researcher(s)
- Promoter: Laukens Kris
- Fellow: Bittremieux Wout
Research team(s)
Project type(s)
- Research Project