PyHealth — Open-Source Healthcare AI Toolkit

Open-Source Healthcare AI

A Python toolkit for clinical deep learning — unifying datasets, tasks, and models across electronic health records, physiological signals, and medical imaging.

pip install pyhealth

Get Started in 5 Minutes GitHub Discord Community Research Initiative Read the Paper

Less memory than
pandarallel

5-Stage Pipeline in <15 Lines

The same pattern works for any task — swap the dataset and task class to move between mortality, readmission, drug recommendation, or imaging.

Mortality prediction on MIMIC-III

Open in Colab →

from pyhealth.datasets import MIMIC3Dataset
from pyhealth.tasks import MortalityPredictionMIMIC3
from pyhealth.datasets import split_by_patient, get_dataloader
from pyhealth.models import Transformer
from pyhealth.trainer import Trainer

if __name__ == "__main__":
    # 1. Load data
    dataset = MIMIC3Dataset(root="data/", tables=["DIAGNOSES_ICD", "PROCEDURES_ICD"])
    samples = dataset.set_task(MortalityPredictionMIMIC3())

    # 2. Split & load
    train_ds, val_ds, test_ds = split_by_patient(samples, [0.8, 0.1, 0.1])
    train_loader = get_dataloader(train_ds, batch_size=32, shuffle=True)

    # 3. Train
    model = Transformer(dataset=samples)
    trainer = Trainer(model=model)
    trainer.train(train_loader, val_loader, epochs=50, monitor="pr_auc")
    trainer.evaluate(test_loader)

Built for Scale — Efficient by Design

Benchmarked on MIMIC-IV at 4 parallel workers. PyHealth 2.0 uses a memory-mapped architecture that dynamically adapts to your hardware — whether you have 2 cores or 64, it scales without manual memory management so you can focus on the ML, not the infrastructure. Pandas values shown at 1 worker only (†) as it could not scale beyond single-threaded execution.

Drug Recommendation

Wall Time (hours · lower is better)

PyHealth 2.0

0.23h

MEDS

2.91h

Pandas †

1.31h

PyHealth 1.16

0.27h

Peak Memory (GB · lower is better)

PyHealth 2.0

8.9 GB

MEDS

61.7 GB

Pandas †

10.4 GB

PyHealth 1.16

83.2 GB

Length of Stay Prediction

Wall Time (hours · lower is better)

PyHealth 2.0

0.69h

MEDS

3.32h

Pandas †

2.85h

PyHealth 1.16

0.64h

Peak Memory (GB · lower is better)

PyHealth 2.0

8.5 GB

MEDS

61.8 GB

Pandas †

11.4 GB

PyHealth 1.16

80.9 GB

In-Hospital Mortality

Wall Time (hours · lower is better)

PyHealth 2.0

1.28h

MEDS

2.96h

Pandas †

26.03h

PyHealth 1.16

0.88h

Peak Memory (GB · lower is better)

PyHealth 2.0

23.4 GB

MEDS

60.0 GB

Pandas †

49.2 GB

PyHealth 1.16

187.2 GB

PyHealth 2.0 PyHealth 1.16 MEDS Pandas † (1 worker only)

† Pandas measured at 1 worker; failed beyond single-threaded execution. All other methods at 4 workers on MIMIC-IV. Full benchmark in the PyHealth 2.0 paper.

How PyHealth Standardizes Clinical AI Development

A unified API from raw clinical data to trustworthy, interpretable models — follow the pipeline step by step.

Data Foundation

pyhealth.data

Core patient and clinical event data structures.

pyhealth.datasets

Loaders for MIMIC-III/IV, eICU, OMOP, and 17+ more.

pyhealth.tasks

Standardized clinical prediction task definitions.

Modeling Pipeline

pyhealth.processors

Feature extraction, encoding, and transformation.

pyhealth.models

39+ clinical ML models across EHR, signals, imaging, and generation.

pyhealth.trainer

Unified training loop with checkpointing and early stopping.

Evaluation & Trust

pyhealth.metrics

Clinical evaluation metrics and benchmarking utilities.

pyhealth.interpret

Explainability methods for transparent clinical AI.

pyhealth.calib

Conformal prediction for rigorous uncertainty quantification.

Resources

Everything you need to get started or get involved.

Documentation

Full API reference, tutorials, quickstart guides, and worked examples for all supported tasks.

Read the docs

GitHub Repository

Browse the source code, open issues, submit pull requests, and track ongoing development.

View on GitHub

Discord Community

Join 400+ members to ask questions, share research, and collaborate on healthcare AI projects.

Join the server

Research Initiative

A structured program of 10+ researchers — including healthcare domain experts — building and publishing open-source, reproducible clinical AI research on top of PyHealth. Apply to join and collaborate with clinicians, data scientists, and AI researchers advancing trustworthy healthcare AI.

Apply to join