Email LinkedIn Scholar Medium GitHub
Highlights
Machine learning researcher and causality expert with 9 years of experience in healthcare
- Applied researcher and data science working solo, as tech-lead, or as squad leader
EB1-A visa approved (US)
1st author Cell paper and co-inventor on 3 US patents
Excellent Python coder
Creator of causallib - an open-source Python package for causal inference.
800+ stars and 100+ forks on Github.
Developed high-throughput frameworks for quasi-experiments
from statistical engines to dashboards exploring results and supporting decisions
Received an IBM Research Accomplishment award (2023)
Communicator: PyData conference speaker, lecturer, and podcast interviewee
Causal inference, machine learning, deep learning, statistics, data viz, communication, Python
About me
I bridge the gap between rigorous statistical learning and robust software engineering. I specialize in automating analyses into scalable frameworks. Whether it’s architecting the backend of a statistical engine or designing visualization-heavy dashboards, I close the loop between complex modeling, reusable tools and actionable stakeholder insights.
As a project leader, I translate vague business/research questions into concrete hypotheses, manage agile milestones, collaborate with international peers, and navigate the nuances between technical and non-technical communication.
A T-shaped integrative thinker. I thrive on cross-pollinating fields: I have adapted biostatistics concepts to tailor modern transformer-based deep-learning architectures for biological data, applied machine learning theories of invariance to promote fairness in genetic risk scores, and used formal causal graphs to deconfound learning from multiple sources and drastically improving model generalization. I’m also a passionate advocate of synthesizing engineering practices into research practices: academic workflows as git workflows, test-driven modeling, and Clean Code for research code.
Experience
| 2017 – present |
Staff Machine Learning Researcher
Causal Machine Learning for Healthcare and Life Science, IBM-Research
Creator of causallib – a one-stop-shop open-source Python package for flexible causal inference modeling.
- Received an IBM Research Accomplishment award (2023)
Client project leader from start to finish: eliciting information from domain experts, translating vague clinicians’ questions into concrete statistical hypotheses, answering them, and communicating the findings
Led, designed, and engineered a reusable framework for drug discovery, applying high-throughput causal inference to observational healthcare data
Managing a team of 5 researchers.
Leading the scientific pipeline, system design, and visualization app
Generating 100s of hypotheses in minutes
Serving 4 external engagements with top pharma clients, bringing millions in revenue
Individual Contributor (IC)
Causal inference consultant for projects in the US, UK, France, Japan, Kenya, South Africa, and Switzerland
Led global strategy at IBM Research for causality in drug discovery
- Steered research agenda and technical focus areas, reporting directly to Research VPs
- Oversaw research of subgroup discovery for adaptive experimentation using Bayesian inference
Mentored 10+ students and interns
Onboarding lead, onboarding 10+ researchers
- Teaching academics how to apply software development fundamentals to research, delivering maintainable production-grade research code
Published 10+ papers and issued 3 US patents
2024:
“GLM-ification” of deep learning models, bringing established concepts from biostatistics into transformer-based deep-learning LLM-like models, tailoring them for biology.
- Implemented concepts from generalized linear (mixed) models (e.g., zero-inflated negative-binomial regression or random effects) in PyTorch
Deconfounded learning from multiple fragmented sources improving model generalization
Identified and quantified data confounding bias (batch-like effects) leading to poor generalization in clients
Deconfounded learning using approaches from domain adaptation, invariant risk minimization, and conditional autoencoders, drastically improving model generalization
2025:
- Quantum Advantage task force member: developing and testing quantum algorithms for combinatorial optimization problems
- Applied AI and analytical approaches to improve quantum algorithms, like finding better initial parameters, reducing variational optimization rounds hardware usage and saving costs
|
| 2022 |
Principle Statistician
Laboratory for Gait & Neurodynamics, Ichilov Hospital
- Lead statistician in a clinical study
- Bayesian hierarchical/multilevel models and causal inference for gait analysis in multiple sclerosis patients
- Bayesian multilevel models for hurdle models of repeated patients’ measurements
- Formal causal inference with DAGs for minimizing inessential tests, saving over 3 hours of unnecessary tests by clinicians per patient.
|
| 2016 – 2017 |
Teaching Assitant
The School of Computer Science, Hebrew University of Jerusalem
|
| 2015 – 2016 |
Research Associate / Computational Biologist
Institue for Medical Research Israel-Canada, Hadassah Hospital
- Developed novel methodologies for finding high-resolution protein-RNA interactions using high-volume RNAseq data
|
Education
| 2016 – 2019 |
M.Sc. in Computer Science and Computational Biology
Faculty of Science, the Hebrew University of Jerusalem, Israel
Thesis: quantifying the utility of embryo selection using genomic prediction of traits
published in Cell
Predicting physical traits from DNA (GWAS) using classical, machine learning, and deep learning methods
Pioneering the effects of prediction-based embryo selection in IVF
|
| 2013 – 2016 |
B.Sc. in Computer Science and Computational Biology
Faculty of Science, the Hebrew University of Jerusalem, Israel
- Dean’s List of Academic Excellence (2016)
- Research scholarship from IMRIC (2016)
Bachelor’s thesis published in Nucleic Acids Research |
Skills
| Programming skills |
Python & its scientific and ML stack (fluent)
- Pandas, Polars, DuckDB, Ibis, Numpy, Scikit-Learn, PyGAM, Statsmodels, PyTorch (lightning), PyMC, Bambi, Arviz, Matplotlib, , Seaborn (objects), Altair, Streamlit, cvxpy, Pydantic, Hydra, Ray…
R (when needed)
SQL (but Ibis when possible)
Git + GitHub
Continuous development (Travis, GitHub Actions)
Linux and remote development (Cloud/AWS + Jupyter lab / VS Code)
Jupyter, Quarto, Latex, Typst
|
| Technical skills |
Causal Inference
Machine Learning and Deep Learning
Statistics and Bayesian Inference
Data Visualization
Verbal & written communication
Programming, software engineering and development
|
| Languages |
Fluent English
Native Hebrew
|
| General |
|
Awards
| 2023 |
IBM-Research Accomplishment
For my work on causallib and research engagement with the Cleveland Clinic Foundation |
| 2019 |
Best of RSNA
For the paper Predicting Breast Cancer by Applying Deep Learning to Linked Health Records and Mammograms, published in Radiology |
| 2019 |
Best Talk: Israeli Population Genetics Meeting
For the paper Screening Human Embryos for Polygenic Traits has Limited Utility |
| 2019 |
Featured Theory of the issue (Cell)
For the paper Screening Human Embryos for Polygenic Traits has Limited Utility |
| 2016 |
Dean’s list of academic excellence |
Publications
May go out of date. Please see my Google Scholar page for the most up-to-date information.