causalrl¶
Causal intervention-selection and causal-RL research tools. causalrl provides graph
algorithms for causal bandits, demonstration environments and agents, and explicit-latent
structural causal models with see (L1), do (L2), and counterfactual (L3) queries,
organised around the 9-task taxonomy of causal RL.
It is built for honesty about scope: identification routines return None (or raise with
a witnessing hedge) outside their supported class rather than guessing, and agents are marked
as benchmark/demo rather than production. Read Guarantees & Scope before
relying on any causal claim.
Install¶
uv pip install -e . # graph, POMIS, tabular agents/environments
uv pip install -e ".[torch]" # SCM sampling, neural mechanisms, Torch-backed demos
The core graph, POMIS, tabular-agent, and tabular-environment surfaces do not require PyTorch; SCM sampling, neural mechanisms, and structural-bandit environments do.
Quickstart¶
A causal agent that conditions on its "intuition" beats a confounding-naive agent on the Multi-Armed Bandit with Unobserved Confounders — even though both arms have identical interventional means.
from causalrl.agents.bandits import CausalThompsonSampling
from causalrl.envs.suite.mabuc import MABUCEnv
env = MABUCEnv(seed=1)
agent = CausalThompsonSampling(n_arms=2, n_contexts=2, seed=0)
obs, _ = env.reset(seed=1)
for _ in range(8000):
action = agent.act(obs)
_, reward, _, _, _ = env.step(action)
agent.update(obs, action, reward)
obs, _ = env.reset()
# CausalThompsonSampling -> ~0.75 reward/step; a confounding-naive baseline is stuck near 0.50.
What it does¶
| Task (taxonomy) | Capability | Key entry points |
|---|---|---|
| Decision under confounding | Counterfactual Thompson sampling on the MABUC | CausalThompsonSampling |
| 1 — Offline→online | Learn from confounded logs via causal bounds | UCDTR, DOVI, DeepDeconfoundedQ |
| 2 — Where to intervene | POMIS / MIS, incl. non-manipulable variables | pomis, minimal_intervention_sets |
| 3 — Counterfactual policy | Act on E[Y_do(a) | intent] |
CounterfactualOptimalPolicy |
| 4 — Transportability | Recover effects across domains | transport_formula, transported_effect |
| 5 — Causal discovery | PC / FCI structure learning | discover, CPDAG |
| 6 — Causal imitation | Imitability + confounded cloning | is_imitable, CausalImitator |
| 7 — Causal curriculum | Prerequisite-ordered skill learning | causal_curriculum |
| 8 — Reward shaping | Policy-invariant causal potentials | causal_potential, q_learning |
| 9 — Causal games | Influence diagrams + equilibria | pure_nash_equilibria, CausalGame |
| Identification | Complete ID / gID / sID / mz, partial-ID & sensitivity bounds | identify_effect, manski_bounds, ipw_sensitivity_bounds |
A runnable example for every row is in the Tour by Task.
Start here¶
- Tour by Task — one runnable example per capability.
- Tutorials — end-to-end notebooks across the full stack.
- Guarantees & Scope — what each method does and does not promise.
- Reproducible Benchmarks — the maintained, multi-seed demonstrations.
- API Reference — stable entry points.
- Citing causalrl.