Research Jakob Raymaekers | Jakob Raymaekers

Research team

Applied mathematics

Expertise

My expertise: - Multivariate statistics - Robust statistics - Anomaly detection - Clustering - visualization - statistical machine learning

Cellwise robust regression. 01/06/2025 - 31/05/2029

Abstract

Over the last two decades, the availability of data has grown exponentially due to technological advances such as cheaper and bigger storage and a growth of information-collecting devices. As a result, datasets have increased tremendously in size, often containing millions of observations and variables. This development has created new challenges to the fields of statistics and machine learning, which aim to analyze these large datasets in an efficient and comprehensive way. In this project we focus on regression analysis, one of the most popular tools for modelling a response variable as a function of a number of predictor variables. A major difficulty in regression analysis, is that the quality of the data is generally unknown. In particular, the data may contain anomalies, measurement error and other types of contamination. Ignoring this fact can have disastrous effects on the results of any method for data analysis. On the other hand, detecting contamination is very difficult, and even more so when the size of the data set increases. This motivates a need for methodology for regression which is robust to data contamination, so that reliable results can be obtained even when the dataset is contaminated. Traditionally, robust statistics considered "casewise" contamination that appears on the level of the observation. This means that an observation is either contaminated, or it is completely free from contamination. More recently, it has been put forward that "cellwise" contamination on the level of the cell is a more appropriate assumption in the context of big data. A cellwise contamination model implies that for a given observation, certain variables may be reliable whereas others may not be. The challenge thus becomes to identify the uncontaminated data cells and use those for the estimation, while limiting the influence of the contaminated ones. While several proposals have been made for regression under cellwise contamination, the whole line of research lacks direction and general foundations. For casewise contamination, general frameworks for the development of robust estimators exist, and they include tools for analyzing their statistical and computational properties. The lack of cellwise counterparts to these frameworks makes the problem of cellwise contamination in general poorly understood. This proposal bridges knowledge from robust statistics, machine learning and optimization and builds on my very recent work on robust covariance estimation to fundamentally tackle the problem of cellwise outliers in regression. The project starts off by creating a clear overview of the state-of-the-art through a benchmark study and a summary of the existing theory. It will then investigate a general framework for cellwise robust linear regression, derive the properties of the framework and design efficient optimization strategies. It allows for extensions in the direction of regularized estimation and nonlinear modelling. In addition to the development of methodology, the project aims to assess the gravity of cellwise contamination in practical challenges by collaborating with experts on macro-economic time series modelling and drug development. Given the ubiquity of regression analysis, the anticipated results imply a broad potential impact, reaching far outside of the foundational disciplines of statistics and computer science, in disciplines including epidemiology, omics, physics, chemometrics, and economic policy.

Researcher(s)

Promoter: Raymaekers Jakob

Research team(s)

Applied mathematics

Project type(s)

Research Project

Towards Modular Proactive Process Control in Chemical production. 01/01/2025 - 31/12/2026

Abstract

To develop and implement generally applicable, proactive process control strategies that enhance efficiency, flexibility, and precision in chemical production processes. The project consists of four specific research objectives, each one offering a dual advantage by delivering both academic insights and practical business outcomes. 1. Overcoming residence time and non-linearity Improving predictive modelling in chemical production by addressing non-linearity, interdependencies, and time delays. Advanced techniques like NMPC, VTR frameworks and composite optimisation algorithms will be employed, ensuring robust system performance under varying conditions. 2. Product quality as objective function Creating predictive models that blend scientific principles, such as chemical reaction laws and physical constraints, with data-driven techniques to precisely control chemical production processes, minimising off-spec products and reducing resource consumption. This involves integrating time-specific process data with machine learning algorithms, ensuring models are both accurate and interpretable, ultimately enhancing process efficiency and economic performance. 3. From reactive to proactive process control Implementing a robust prescriptive modelling framework for chemical production processes, integrating real-time monitoring, actionable setpoint recommendations, and automated model drift detection with adaptive learning capabilities to continuously optimise process outcomes in practice. 4. Experimental proof of concept for optimisation in industry Ensuring that the methodology of objective functions remains applicable to the variety of challenges and environments within the chemical industry. By integrating the developed data analysis tools and prescriptive models, process engineers and operators will make more informed, real-time decisions, optimising energy use, product quality, and output. These scalable solutions will be tested across multiple plants, delivering benefits such as reduced costs, improved environmental compliance, and increased production efficiency, thereby strengthening BASF's competitive position in the global chemical industry.

Researcher(s)

Promoter: Verdonck Tim
Co-promoter: Raymaekers Jakob

Research team(s)

Applied mathematics

Project type(s)

Research Project

Statistical learning under cellwise contamination. 01/01/2024 - 31/12/2028

Abstract

This proposal bridges knowledge from robust statistics, machine learning and optimization and builds on my very recent work on robust covariance estimation to introduce general frameworks for unsupervised and supervised statistical learning under cellwise contamination. The project considers covariance estimation, principal component analysis, linear regression and logistic regression. It allows for extensions in the direction of regularized estimation and nonlinear modeling using kernels. In addition to the development of methodology, the project aims to assess the gravity of cellwise contamination in practical challenges by collaborating with experts on macro-economic time series modeling and drug development.

Researcher(s)

Promoter: Raymaekers Jakob
Fellow: Raymaekers Jakob

Research team(s)

Applied mathematics

Project type(s)

Research Project

Robust Directed Acyclic Graph Learning for Causal Modeling. 01/11/2022 - 31/10/2026

Abstract

Due to technological advances, the available amount of data has increased exponentially over the last decade. The field of data science (DS) has followed this growth as it provides an indispensable tool for translating data into insight and knowledge. Where DS was traditionally concerned with learning associations in data, it has become clear in recent times that causal relations often provide a deeper understanding of the data and a stronger tool in many practical applications. One of the established approaches to causal modeling is to use a directed acyclical graph (DAG) to represent the causal relations. These DAGs have to be learned based on observed data. Many of the SOTA techniques for DAG learning are very sensitive to anomalies, and yield unreliable results in their presence. We aim to develop methods for DAG learning that remain efficient and reliable under contamination of the data. The project starts by building a solid foundation for the concepts of robustness in DAG learning. Building upon these foundations, we will then proceed to build a general robust DAG learning methodology. The project envisions three different but complementary approaches to the development of robust DAG learning methods. The developed methodology will be evaluated theoretically and empirically, and tested in a variety of real world cases.

Researcher(s)

Promoter: Verdonck Tim
Co-promoter: Latré Steven
Co-promoter: Raymaekers Jakob
Fellow: Leyder Sarah

Research team(s)

Applied mathematics

Project type(s)

Research Project