Ehud Karavani

Modified

November 7, 2025

Highlights

Healthcare machine learning researcher with 8+ years of experience
EB1-A visa approved (US)
1st author Cell paper, 2019
Creator of causallib - an open-source Python package for causal inference.
750+ stars and 100+ forks on Github.
Received an IBM Research Accomplishment award (2023)
Co-inventor on 3 US patents
PyData conference speaker, lecturer, and podcast interviewee
Causal inference, machine learning, deep learning, statistics, data viz, Python

About me

Highly skilled in causal inference, machine learning, (Bayesian) statistics, and data visualization. An applied researcher and data scientist, I spend my time between building reusable tools for research and putting them into use. Advocating Clean Code for research code. Strong preference for eclectic, collaborative environments.

Experience

2017 – present	Research Staff Member Causal Machine Learning for Healthcare and Life Science, IBM-Research, Israel Creator of `causallib` – a one-stop-shop open-source Python package for flexible causal inference modeling. Received an IBM Research Accomplishment award (2023) Client project leader from start to finish: translating vague clinicians’ questions into concrete statistical hypotheses, answering them, and communicating the findings Led and designed a drug repurposing software asset, applying high-throughput causal inference to observational healthcare data Managing a team of 5 researchers. Leading the scientific pipeline, system design, and visualization app Generating 100s of hypotheses in minutes Serving 4 external engagements with top pharma clients, bringing millions in revenue Individual Contributor (IC) Causal inference consultant for projects in the US, UK, France, Japan, Kenya, South Africa, and Switzerland Led global strategy at IBM Research for causality in drug discovery Oversaw research of adaptive experimentation using Bayesian inference Mentored 10+ students and interns Onboarding lead, onboarding 10+ researchers Published 10+ papers and issued 3 US patents 2024: Deconfounding transformer-based large language models for biological sequence, overcoming batch effects across and within diverse data sources GLM-ification of deep learning models, bringing established biostatistics concepts into transformer-based deep learning models 2025: Quantum Advantage task force member [2025]: developing and testing quantum algorithms for combinatorial optimization problems
2022	Principle Statistician Laboratory for Gait & Neurodynamics, Ichilov Hospital Bayesian hierarchical/multilevel models and causal inference for gait analysis in multiple sclerosis patients Bayesian multilevel models for hurdle models of repeated patients’ measurements Formal causal inference with DAGs for minimizing inessential tests, saving over 3 hours of unnecessary tests by clinicians per patient.
2016 – 2017	Teaching Assitant The School of Computer Science, Hebrew University Introduction to Data Science Workshop in Computational Bioskills
2015 – 2016	Research Associate / Computational Biologist Institue for Medical Research Israel-Canada, Hadassah Hospital Developing novel methodologies for finding high-resolution protein-RNA interactions using high-volume RNAseq data

Education

2016 – 2019

M.Sc. in Computer Science and Computational Biology
Faculty of Science, the Hebrew University of Jerusalem, Israel

Thesis: quantifying the utility of embryo selection using genomic prediction of traits
published in Cell

Predicting physical traits from DNA (GWAS) using classical, machine learning, and deep learning methods
Pioneering the effects of prediction-based embryo selection in IVF

2013 – 2016

B.Sc. in Computer Science and Computational Biology
Faculty of Science, the Hebrew University of Jerusalem, Israel

Dean’s List of Academic Excellence (2016)
Research scholarship from IMRIC (2016)

Bachelor’s thesis published in Nucleic Acids Research

Community

PyData speaker
Causal Bandits podcast interviewee
DataNights causality series lecturer
Recurring DataHack mentor and judge
Co-Organized the 2018 Atlantic Causal Inference Conference Data Challenge

Skills

Programming skills	Python & its scientific stack (fluent) Pandas, Polars, DuckDB, Numpy, Scikit-Learn, PyTorch (lightning), Statsmodels, Bambi, PyMC, Arviz, Seaborn (objects), Matplotlib, Altair, Streamlit, cvxpy, Pydantic, Hydra… R (when needed) SQL (basic) Git + GitHub Continuous development (Travis, GitHub Actions) Linux and remote development (Cloud/AWS + Jupyter lab / VS Code)
Languages	Fluent English Native Hebrew
General	Data enthusiast Musician 🎸, hiker / backpacker 🏔️ \| Friendly 🙂

Awards

2023	IBM-Research Accomplishment For my work on causallib and research engagement with the Cleveland Clinic Foundation
2019	Best of RSNA For the paper Predicting Breast Cancer by Applying Deep Learning to Linked Health Records and Mammograms, published in Radiology
2019	Best Talk: Israeli Population Genetics Meeting For the paper Screening Human Embryos for Polygenic Traits has Limited Utility
2019	Featured Theory of the issue (Cell) For the paper Screening Human Embryos for Polygenic Traits has Limited Utility
2016	Dean’s list of academic excellence

Publications

Date	Title	Venue	DOI
2026	Critical evaluation of real-world evidence of repurposable medicines in the Alzheimer’s disease drug development pipeline using a target trial emulation	Alzheimer’s & Dementia	https://doi.org/10.1002/trc2.70193
2025	Hierarchical Bias-Driven Stratification for Interpretable Causal Effect Estimation	AISTATS	https://proceedings.mlr.press/v258/ter-minassian25a.html
2025	Peri-operative anti-inflammatory drug use and seizure recurrence after resective epilepsy surgery: Target trials emulation	iScience	https://doi.org/10.1016/j.isci.2025.112124
2024	Single-microglia transcriptomic transition network-based prediction and real-world patient data validation identifies ketorolac as a repurposable drug for Alzheimer’s disease	Alzheimer’s & Dementia	https://doi.org/10.1002/alz.14373
2024	Using Causal Inference to Investigate Contraceptive Discontinuation in Sub-Saharan Africa	International Joint Conference on Artificial Intelligence (IJCAI)	https://doi.org/10.24963/ijcai.2024/792
2024	Improving Inverse Probability Weighting by Post-calibrating Its Propensity Scores	Epidemiology	https://doi.org/10.1097/ede.0000000000001733
2023	Causalvis: Visualizations for Causal Inference	CHI: Conference on Human Factors in Computing Systems	https://doi.org/10.1145/3544548.3581236
2023	FairPRS: adjusting for admixed populations in polygenic risk scores using invariant risk minimization	Pacific Symposium on Biocomputing	https://doi.org/10.1142/9789811270611_0019
2021	Trends in clinical characteristics and associations of severe non-respiratory events related to SARS-CoV-2	MedRxiv	https://doi.org/10.1101/2021.03.24.21251900
2019	Screening human embryos for polygenic traits has limited utility	Cell	https://doi.org/10.1016/j.cell.2019.10.033
2019	A discriminative approach for finding and characterizing positivity violations using decision trees	Arxiv	https://doi.org/10.48550/arXiv.1907.08127
2019	Predicting breast cancer by applying deep learning to linked health records and mammograms	Radiology	https://doi.org/10.1148/radiol.2019182622
2019	An evaluation toolkit to guide model selection and cohort definition in causal inference	Arxiv	https://doi.org/10.48550/arXiv.1906.00442
2019	Comment: causal inference competitions: where should we aim?	Statistical Science	https://doi.org/10.1214/18-STS679
2018	In vivo cleavage rules and target repertoire of RNase III in Escherichia coli	Nucleic Acids Research	https://doi.org/10.1093/nar/gky684
2018	Benchmarking Framework for Performance-Evaluation of Causal Inference Analysis	Arxiv	https://doi.org/10.48550/arXiv.1802.05046

May go out of date. Please see my Google Scholar page for the most up-to-date information.