Ehud Karavani

Modified

March 13, 2026

Email LinkedIn Scholar Medium GitHub

Highlights

  • Machine learning researcher and causality expert with 9 years of experience in healthcare

    • Applied researcher and data science working solo, as tech-lead, or as squad leader
  • EB1-A visa approved (US)

  • 1st author Cell paper and co-inventor on 3 US patents

  • Excellent Python coder

    • Creator of causallib - an open-source Python package for causal inference.
      800+ stars and 100+ forks on Github.

    • Developed high-throughput frameworks for quasi-experiments
      from statistical engines to dashboards exploring results and supporting decisions

  • Received an IBM Research Accomplishment award (2023)

  • Communicator: PyData conference speaker, lecturer, and podcast interviewee

Causal inference, machine learning, deep learning, statistics, data viz, communication, Python

About me

I bridge the gap between rigorous statistical learning and robust software engineering. I specialize in automating analyses into scalable frameworks. Whether it’s architecting the backend of a statistical engine or designing visualization-heavy dashboards, I close the loop between complex modeling, reusable tools and actionable stakeholder insights.

As a project leader, I translate vague business/research questions into concrete hypotheses, manage agile milestones, collaborate with international peers, and navigate the nuances between technical and non-technical communication.

A T-shaped integrative thinker. I thrive on cross-pollinating fields: I have adapted biostatistics concepts to tailor modern transformer-based deep-learning architectures for biological data, applied machine learning theories of invariance to promote fairness in genetic risk scores, and used formal causal graphs to deconfound learning from multiple sources and drastically improving model generalization. I’m also a passionate advocate of synthesizing engineering practices into research practices: academic workflows as git workflows, test-driven modeling, and Clean Code for research code.

Experience

2017 – present

Staff Machine Learning Researcher
Causal Machine Learning for Healthcare and Life Science, IBM-Research

  • Creator of causallib – a one-stop-shop open-source Python package for flexible causal inference modeling.

    • Received an IBM Research Accomplishment award (2023)
  • Client project leader from start to finish: eliciting information from domain experts, translating vague clinicians’ questions into concrete statistical hypotheses, answering them, and communicating the findings

  • Led, designed, and engineered a reusable framework for drug discovery, applying high-throughput causal inference to observational healthcare data

    • Managing a team of 5 researchers.
      Leading the scientific pipeline, system design, and visualization app

    • Generating 100s of hypotheses in minutes

    • Serving 4 external engagements with top pharma clients, bringing millions in revenue

  • Individual Contributor (IC)
    Causal inference consultant for projects in the US, UK, France, Japan, Kenya, South Africa, and Switzerland

  • Led global strategy at IBM Research for causality in drug discovery

    • Steered research agenda and technical focus areas, reporting directly to Research VPs
    • Oversaw research of subgroup discovery for adaptive experimentation using Bayesian inference
  • Mentored 10+ students and interns
    Onboarding lead, onboarding 10+ researchers

    • Teaching academics how to apply software development fundamentals to research, delivering maintainable production-grade research code
  • Published 10+ papers and issued 3 US patents

2024:

  • “GLM-ification” of deep learning models, bringing established concepts from biostatistics into transformer-based deep-learning LLM-like models, tailoring them for biology.

    • Implemented concepts from generalized linear (mixed) models (e.g., zero-inflated negative-binomial regression or random effects) in PyTorch
  • Deconfounded learning from multiple fragmented sources improving model generalization

    • Identified and quantified data confounding bias (batch-like effects) leading to poor generalization in clients

    • Deconfounded learning using approaches from domain adaptation, invariant risk minimization, and conditional autoencoders, drastically improving model generalization

2025:

  • Quantum Advantage task force member: developing and testing quantum algorithms for combinatorial optimization problems
    • Applied AI and analytical approaches to improve quantum algorithms, like finding better initial parameters, reducing variational optimization rounds hardware usage and saving costs
2022

Principle Statistician
Laboratory for Gait & Neurodynamics, Ichilov Hospital

  • Lead statistician in a clinical study
  • Bayesian hierarchical/multilevel models and causal inference for gait analysis in multiple sclerosis patients
    • Bayesian multilevel models for hurdle models of repeated patients’ measurements
    • Formal causal inference with DAGs for minimizing inessential tests, saving over 3 hours of unnecessary tests by clinicians per patient.
2016 – 2017

Teaching Assitant
The School of Computer Science, Hebrew University of Jerusalem

  • Introduction to Data Science

  • Workshop in Computational Bioskills

2015 – 2016

Research Associate / Computational Biologist
Institue for Medical Research Israel-Canada, Hadassah Hospital

  • Developed novel methodologies for finding high-resolution protein-RNA interactions using high-volume RNAseq data

Education

2016 – 2019

M.Sc. in Computer Science and Computational Biology
Faculty of Science, the Hebrew University of Jerusalem, Israel

Thesis: quantifying the utility of embryo selection using genomic prediction of traits
published in Cell

  • Predicting physical traits from DNA (GWAS) using classical, machine learning, and deep learning methods

  • Pioneering the effects of prediction-based embryo selection in IVF

2013 – 2016

B.Sc. in Computer Science and Computational Biology
Faculty of Science, the Hebrew University of Jerusalem, Israel

  • Dean’s List of Academic Excellence (2016)
  • Research scholarship from IMRIC (2016)

Bachelor’s thesis published in Nucleic Acids Research

Outreach

Skills

Programming skills
  • Python & its scientific and ML stack (fluent)

    • Pandas, Polars, DuckDB, Ibis, Numpy, Scikit-Learn, PyGAM, Statsmodels, PyTorch (lightning), PyMC, Bambi, Arviz, Matplotlib, , Seaborn (objects), Altair, Streamlit, cvxpy, Pydantic, Hydra, Ray…
  • R (when needed)

  • SQL (but Ibis when possible)

  • Git + GitHub

  • Continuous development (Travis, GitHub Actions)

  • Linux and remote development (Cloud/AWS + Jupyter lab / VS Code)

  • Jupyter, Quarto, Latex, Typst

Technical skills
  • Causal Inference

  • Machine Learning and Deep Learning

  • Statistics and Bayesian Inference

  • Data Visualization

  • Verbal & written communication

  • Programming, software engineering and development

Languages
  • Fluent English

  • Native Hebrew

General
  • Data enthusiast

  • Musician 🎸, hiker / backpacker 🏔️ |

  • Friendly 🙂

Awards

2023

IBM-Research Accomplishment

For my work on causallib and research engagement with the Cleveland Clinic Foundation

2019

Best of RSNA

For the paper Predicting Breast Cancer by Applying Deep Learning to Linked Health Records and Mammograms, published in Radiology

2019

Best Talk: Israeli Population Genetics Meeting

For the paper Screening Human Embryos for Polygenic Traits has Limited Utility

2019

Featured Theory of the issue (Cell)

For the paper Screening Human Embryos for Polygenic Traits has Limited Utility

2016 Dean’s list of academic excellence

Publications

Date Title Venue DOI
2026 Network-based prediction and real-world patient data observation identify doxycycline as a repurposable drug in Alzheimer’s disease Neurotherapeutics https://doi.org/10.1016/j.neurot.2026.e00836
2026 Critical evaluation of real-world evidence of repurposable medicines in the Alzheimer’s disease drug development pipeline using a target trial emulation Alzheimer’s & Dementia https://doi.org/10.1002/trc2.70193
2025 Hierarchical Bias-Driven Stratification for Interpretable Causal Effect Estimation AISTATS https://proceedings.mlr.press/v258/ter-minassian25a.html
2025 Peri-operative anti-inflammatory drug use and seizure recurrence after resective epilepsy surgery: Target trials emulation iScience https://doi.org/10.1016/j.isci.2025.112124
2024 Single-microglia transcriptomic transition network-based prediction and real-world patient data validation identifies ketorolac as a repurposable drug for Alzheimer’s disease Alzheimer’s & Dementia https://doi.org/10.1002/alz.14373
2024 Using Causal Inference to Investigate Contraceptive Discontinuation in Sub-Saharan Africa International Joint Conference on Artificial Intelligence (IJCAI) https://doi.org/10.24963/ijcai.2024/792
2024 Improving Inverse Probability Weighting by Post-calibrating Its Propensity Scores Epidemiology https://doi.org/10.1097/ede.0000000000001733
2023 Causalvis: Visualizations for Causal Inference CHI: Conference on Human Factors in Computing Systems https://doi.org/10.1145/3544548.3581236
2023 FairPRS: adjusting for admixed populations in polygenic risk scores using invariant risk minimization Pacific Symposium on Biocomputing https://doi.org/10.1142/9789811270611_0019
2021 Trends in clinical characteristics and associations of severe non-respiratory events related to SARS-CoV-2 MedRxiv https://doi.org/10.1101/2021.03.24.21251900
2019 Screening human embryos for polygenic traits has limited utility Cell https://doi.org/10.1016/j.cell.2019.10.033
2019 A discriminative approach for finding and characterizing positivity violations using decision trees Arxiv https://doi.org/10.48550/arXiv.1907.08127
2019 Predicting breast cancer by applying deep learning to linked health records and mammograms Radiology https://doi.org/10.1148/radiol.2019182622
2019 An evaluation toolkit to guide model selection and cohort definition in causal inference Arxiv https://doi.org/10.48550/arXiv.1906.00442
2019 Comment: causal inference competitions: where should we aim? Statistical Science https://doi.org/10.1214/18-STS679
2018 In vivo cleavage rules and target repertoire of RNase III in Escherichia coli Nucleic Acids Research https://doi.org/10.1093/nar/gky684
2018 Benchmarking Framework for Performance-Evaluation of Causal Inference Analysis Arxiv https://doi.org/10.48550/arXiv.1802.05046
No matching items

May go out of date. Please see my Google Scholar page for the most up-to-date information.