Publikationen

Automating Data Lineage and Pipeline Extraction • Proceedings of the VLDB Endowment • August 2024 • PDF

Abstract: Jupyter Notebooks are widely spread in modern data science environments. They allow data professionals to create models, analyze data, and build data pipelines. With an increasing focus on research areas such as explainability and fairness in machine learning, there is a need to understand the relationship between the data and the model in ad-hoc project setups. This doctoral research aims to automate the process of extracting pipelines from Jupyter Notebooks and deriving data lineage from those pipelines without executing the notebook. The goal is to develop a set of tools that identify all datasets, transformations, models, and columns that serve model training inside a notebook without the need for human intervention or execution of these pipelines.

Zertifikate

Professional Data Engineer

Google Cloud • Mai 2025 • Credly

Professional Cloud Architect

Google Cloud • März 2025 • Credly

Professional Machine Learning Engineer

Google Cloud • Dezember 2024 • Credly

Projekte

APEX-DAG: Automating Pipeline EXtraction with Dataflow, Static Code Analysis, and Graph Attention Networks

Januar 2025 • GitHub

CLEAN-SSM: CLEANing data lakes in a Self-Supervised Manner

September 2023 • GitHub

EP Calendar: React Native Calendar Library

Februar 2021

Better Habits: React Native Habit Tracker

März 2020

Social Clouds: Self-Hosted Social Network Software

März 2017

Kontakt