API Reference¶
Graphs¶
causalrl.scm.graph.CausalGraph
¶
A causal graph: a DAG over observed variables plus bidirected edges that denote unobserved confounders (an ADMG).
Source code in src/causalrl/scm/graph.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 | |
directed_edges
property
¶
The directed edges as (parent, child) pairs.
bidirected_edges
property
¶
The bidirected (latent-confounding) edges.
has_bidirected_edges()
¶
has_incident_bidirected_edges(node)
¶
c_components()
¶
Connected components of the bidirected graph (isolated nodes are singletons).
remove_incoming_edges(node)
¶
Return a copy with all directed edges into node removed (graph mutilation).
Source code in src/causalrl/scm/graph.py
ancestors(nodes)
¶
Ancestors of nodes via directed edges, INCLUDING the inputs (the inclusive An(·)
convention used by the identification/POMIS literature).
Source code in src/causalrl/scm/graph.py
descendants(nodes)
¶
Strict descendants of nodes via directed edges (excludes the inputs).
Source code in src/causalrl/scm/graph.py
induced_subgraph(nodes)
¶
Subgraph on nodes: keep directed/bidirected edges with both endpoints in nodes.
Source code in src/causalrl/scm/graph.py
do_mutilate(intervened)
¶
ADMG mutilation for do(intervened): drop incoming directed edges to each
intervened node AND every bidirected edge incident to an intervened node
(intervention severs latent confounding into the set). Distinct from
remove_incoming_edges, which keeps bidirected edges.
Source code in src/causalrl/scm/graph.py
latent_projection(keep)
¶
Latent projection onto keep: marginalize out every node not in keep, adding a
directed edge for each directed path through removed nodes and a bidirected edge for
each confounding path through removed nodes (the Tian-Pearl / Verma projection).
Directed Vi -> Vj when a directed path from Vi to Vj has all interior
nodes removed; bidirected Vi <-> Vj when a removed common cause (a marginalized
node, or the latent behind a bidirected edge) reaches both through removed interiors.
Removing a collider induces no confounding (its parents are never in its reached set).
Source code in src/causalrl/scm/graph.py
Intervention Sets¶
causalrl.identification.intervention_sets.pomis(graph, reward, manipulable=None)
¶
All POMISs for reward: a deduplicated, canonically sorted list of frozensets.
frozenset() (the observational regime) appears when it is possibly optimal. When
manipulable is given, only those variables may be intervened on: by r40's Theorem 4
this is the unconstrained POMIS of the latent projection onto manipulable | {reward}.
Source code in src/causalrl/identification/intervention_sets.py
causalrl.identification.intervention_sets.minimal_intervention_sets(graph, reward, manipulable=None)
¶
All MISs for reward: a deduplicated, canonically sorted list of frozensets.
When manipulable is given, the result is filtered to sets that avoid the
non-manipulable variables (r40: a constrained MIS is just the filtered unconstrained MIS).
Source code in src/causalrl/identification/intervention_sets.py
causalrl.identification.intervention_sets.requires_experiment(graph, treatment, outcome)
¶
Whether learning P(outcome | do(treatment)) requires experimentation (Task 2, "when").
Returns True exactly when the effect is not identifiable from observational (L1) data,
so an online experiment (an L2 intervention) is necessary, and False when offline data
already suffices. This is the "when to intervene" companion to POMIS's "where": :func:pomis
narrows which intervention sets could be optimal, while this decides whether you must
intervene at all. Delegates to the complete ID algorithm
(:func:causalrl.identification.id_algorithm.is_identifiable_effect).
Source code in src/causalrl/identification/intervention_sets.py
Structural Causal Models¶
causalrl.scm.scm.StructuralCausalModel
¶
Executable explicit-latent DAG SCM supporting L1/L2/L3 queries.
Bidirected-edge ADMGs are accepted by :class:CausalGraph for analytical graph
algorithms, but they are not executable SCMs: shared latent causes must be represented
as explicit parent nodes with their own mechanism and exogenous distribution.
Source code in src/causalrl/scm/scm.py
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 | |
do(interventions)
¶
Layer 2: return the mutilated SCM under do(interventions). Original is unchanged.
Source code in src/causalrl/scm/scm.py
abduct(evidence=None, *, known=None, n=20000, seed=None, atol=1e-06)
¶
Layer 3, step 1 — infer the exogenous posterior given evidence/known noise.
known pins supplied exogenous values exactly (the exact, continuous path: no
rejection). Remaining exogenous are sampled; if evidence is given they are
rejection-filtered so the factual evaluation matches evidence within atol.
Returns an :class:ExogenousPosterior; call its predict(do=...).
Source code in src/causalrl/scm/scm.py
counterfactual(evidence, interventions, n, *, seed=None, atol=1e-06)
¶
Layer 3: abduction-action-prediction. Sugar over :meth:abduct + predict.
Source code in src/causalrl/scm/scm.py
see(n, *, seed=None)
¶
Assumption-Aware Agents¶
causalrl.agents.dovi.DOVI
¶
Bases: Agent
Deconfounded Optimistic Value Iteration (Wang, Yang, Wang, Bareinboim 2021), tabular.
Finite-horizon backward induction (LSVI form) whose optimism is capped by the Manski
upper causal bound on each (state, action)'s immediate reward, deconfounding the value
estimate. With horizon H::
r̃(s,a) = min( mean_online(s,a) + ucb_bonus(s,a), hi_Manski(s,a) )
Q_h(s,a) = r̃(s,a) + Σ_s' P̂(s'|s,a) · V_{h+1}(s')
V_h(s) = max_a Q_h(s,a), V_{H+1} ≡ 0, h = H, H-1, …, 1
P̂ is the empirical next-state distribution pooled from offline logs and online steps;
transitions that ended an episode (done) carry zero future value (the absorbing
terminal). At H = 1 the backup reduces exactly to v0.2's immediate Manski ceiling, so
the horizon-1 DTR result is preserved.
Bound validity: the backup is a certified optimistic bound on the return only when the
dynamics do not depend on the hidden confounder (true of SequentialDTREnv, whose
transitions are U-independent). On ConfoundedGridworld the behavior policy steers
around the hidden hazard, so P̂ is confounded and the backup is heuristic value
propagation, not a certified bound.
Source code in src/causalrl/agents/dovi.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 | |
is_certified
property
¶
Whether value propagation is within the documented causal guarantee.
causalrl.agents.scbandit.POMISThompsonSampling
¶
Bases: _ArmSubsetThompsonSampling
Thompson sampling over only the arms whose intervened-variable set is a POMIS.
manipulable (the variables you may intervene on) must be given explicitly; non-manipulable
variables are handled by the POMIS engine via latent projection (r40). When every non-reward
variable is manipulable this matches the unconstrained POMIS.
Source code in src/causalrl/agents/scbandit.py
Benchmark Reports¶
causalrl.eval.benchmark.BenchmarkEstimate
dataclass
¶
A per-seed benchmark measurement with simple descriptive uncertainty.
Source code in src/causalrl/eval/benchmark.py
causalrl.eval.benchmark.run_confounded_chain_benchmark(*, seeds=(0, 1, 2, 3, 4), n_steps=8000, tail_window=2000, n_mc=2000)
¶
Report POMIS, brute-force, and fixed-set behavior on the confounded-chain demo.
Source code in src/causalrl/eval/benchmark.py
causalrl.eval.benchmark.run_frontdoor_benchmark(*, seeds=(0, 1, 2, 3, 4), n_steps=30000, tail_window=10000, n_mc=20000)
¶
Report manipulability-aware and naive-filter behavior on the front-door demo.
Source code in src/causalrl/eval/benchmark.py
Counterfactual Decision-Making (L3 / ETT)¶
causalrl.identification.counterfactual.counterfactual_expectation(scm, *, outcome, intervention, evidence, n=20000, seed=None)
¶
Return E[ outcome_{do(intervention)} | evidence ] (a Layer-3 counterfactual mean).
Wraps :meth:StructuralCausalModel.counterfactual (abduction-action-prediction) and averages
the outcome over the retained, evidence-consistent units. With empty evidence this reduces
to the interventional mean E[outcome_{do(intervention)}].
Source code in src/causalrl/identification/counterfactual.py
causalrl.identification.counterfactual.effect_of_treatment_on_treated(scm, *, treatment, outcome, treated, control, n=20000, seed=None)
¶
Effect of Treatment on the Treated: E[Y_{treated} - Y_{control} | treatment = treated].
The treatment effect among the subpopulation that actually received treated (Pearl,
Causality §8.2.1). Under confounding this differs from the average treatment effect. Both
potential outcomes use the same seed (common random numbers), so they are evaluated on the
same abducted units and the difference is matched and low-variance.
Source code in src/causalrl/identification/counterfactual.py
causalrl.agents.counterfactual.CounterfactualOptimalPolicy
¶
Bases: Agent
Plays argmax_a E[Y_{do(action_node=a)} | intent] from a known SCM.
The Layer-3 oracle: it precomputes the Regret Decision Criterion table once at construction and
then acts greedily on the observed intent. update is a no-op — the model is known, so there
is nothing to learn online. The computed table is exposed as decision_table for inspection.
Source code in src/causalrl/agents/counterfactual.py
update(observation, action, reward)
¶
Transportability¶
causalrl.identification.transport.SelectionDiagram
dataclass
¶
A causal graph plus the variables whose mechanism differs between source and target.
Each selection variable carries an implicit selection node S -> variable (Bareinboim &
Pearl). selection_variables must be a subset of graph.nodes.
Source code in src/causalrl/identification/transport.py
causalrl.identification.transport.transport_formula(diagram, *, treatment, outcome, max_adjustment_size=3)
¶
Return the transport formula for P*(outcome | do(treatment)), or None if it is not
provably transportable within the supported class (direct / S-admissible adjustment).
Source code in src/causalrl/identification/transport.py
causalrl.identification.transport.is_transportable(diagram, *, treatment, outcome)
¶
Whether the target effect is provably transportable (see :func:transport_formula).
Source code in src/causalrl/identification/transport.py
causalrl.identification.transport.transported_effect(formula, *, treatment, treated_value, outcome, source, target, n=40000, seed=None)
¶
Compute E*[outcome | do(treatment=treated_value)] via the transport formula.
direct: the source interventional mean transfers unchanged. adjustment: reweight the
source conditionals E[outcome | treatment, z] by the target covariate marginal P*(z)
(discrete z assumed; the demo uses binary covariates). Strata absent from the source sample
are skipped.
Source code in src/causalrl/identification/transport.py
causalrl.identification.transport.is_backdoor_admissible(graph, treatment, outcome, z)
¶
Back-door criterion: z has no descendant of treatment and blocks every back-door path
(treatment ⊥ outcome | z in the graph with treatment's outgoing edges removed).
Source code in src/causalrl/identification/transport.py
General Transportability (sID)¶
causalrl.identification.transport.transport_estimand(diagram, *, treatment, outcome)
¶
The general (sID) transport estimand for P*(outcome | do(treatment)) over diagram.
A :class:SelectionDiagram adapter over
:func:causalrl.identification.id_algorithm.identify_transport: each target c-factor is taken
from the source if its mechanism is invariant, else identified from the target. Raises
:class:~causalrl.exceptions.NotIdentifiableError when not transportable. Generalizes the
direct / S-admissible-adjustment :func:transport_formula (which returns a readable closed form
for those two cases).
Source code in src/causalrl/identification/transport.py
causalrl.identification.id_algorithm.identify_transport(graph, treatment, outcome, selection)
¶
Return an :class:Estimand for the target effect P*(outcome | do(treatment)) across a
selection diagram, or raise :class:NotIdentifiableError.
selection lists the variables whose mechanism differs between source and target (each
carries an implicit selection node). The target effect is decomposed into c-factors; a c-factor
with no selection-marked variable is invariant and taken from the source distribution, while
one that contains a selection-marked variable is identified from the target's observational
distribution. With an empty selection this reduces to source identification (the ID
algorithm).
This is the sound c-factor-routing transportability algorithm; it subsumes direct and S-admissible-adjustment transportability. It is not the complete sID (a c-factor identifiable only by combining source and target is reported as non-transportable rather than guessed).
Faithful to J. Pearl, E. Bareinboim, Transportability of Causal and Statistical Relations, AAAI 2011 / External Validity, Statistical Science 2014, via Tian's c-factor decomposition.
Source code in src/causalrl/identification/id_algorithm.py
causalrl.identification.id_algorithm.is_transportable_effect(graph, treatment, outcome, selection)
¶
Whether the target effect P*(outcome | do(treatment)) is transportable (see
:func:identify_transport).
Source code in src/causalrl/identification/id_algorithm.py
causalrl.identification.id_algorithm.estimate_transported_effect(graph, treatment, outcome, selection, source_data, target_data, *, do)
¶
Estimate the target effect P*(outcome | do(treatment = do)) from source + target data.
Identifies the transport estimand (:func:identify_transport), then evaluates it with source
c-factors read from source_data and target c-factors from target_data (both discrete
integer columns over graph.nodes). Returns the outcome distribution keyed in
sorted(outcome) order.
Source code in src/causalrl/identification/id_algorithm.py
Multiple Domains And Experiments (mz / meta)¶
causalrl.identification.id_algorithm.Domain
dataclass
¶
A source domain relative to the target: which mechanisms differ, and what data it offers.
selection lists the variables whose mechanism differs from the target (each carries an
implicit selection node S -> v); a c-factor transfers from this domain only if it touches no
selection-marked variable. experiments lists intervention targets available here (each a set
of variables that can be randomized). The target domain is always implicitly available
observationally as a fallback, so it need not be listed.
Source code in src/causalrl/identification/id_algorithm.py
causalrl.identification.id_algorithm.identify_transport_general(target_graph, treatment, outcome, domains)
¶
Return an :class:Estimand for the target effect P*(outcome | do(treatment)) combining
one or more source domains with the target's own data, or raise
:class:NotIdentifiableError.
Generalizes :func:identify_transport to multiple domains (meta-transportability) and to
surrogate experiments (mz-transportability): each target c-factor is taken from any domain whose
mechanism for it is invariant and that can identify it from its observational or experimental
data, with the target as the fallback. With a single observational source it coincides with
:func:identify_transport; with no selection and no experiments it reduces to the ID algorithm.
Faithful to E. Bareinboim & J. Pearl, A General Algorithm for Deciding Transportability of Experimental Results (Journal of Causal Inference 2013) and Meta-Transportability of Causal Effects: A Formal Approach (AISTATS 2013), unified via the surrogate-experiment view of S. Lee, J. Correa & E. Bareinboim (UAI 2019). No external code is ported.
Source code in src/causalrl/identification/id_algorithm.py
causalrl.identification.id_algorithm.is_transportable_general(target_graph, treatment, outcome, domains)
¶
Whether P*(outcome | do(treatment)) is transportable from domains (see
:func:identify_transport_general).
Source code in src/causalrl/identification/id_algorithm.py
causalrl.identification.id_algorithm.estimate_transport_general(target_graph, treatment, outcome, domains, *, domain_data, experiment_data=None, do)
¶
Estimate the target effect P*(outcome | do(treatment = do)) across domains.
domain_data maps each domain name (including "target") to an observational dataset over
target_graph.nodes; experiment_data maps (domain, frozenset(target)) to a randomized
experimental dataset. Identifies the estimand via :func:identify_transport_general, evaluates
it against the supplied data, and returns the outcome distribution keyed in sorted(outcome)
order.
Source code in src/causalrl/identification/id_algorithm.py
General Identification (ID Algorithm)¶
causalrl.identification.id_algorithm.identify_effect(graph, treatment, outcome)
¶
Return a do-free :class:Estimand for P(outcome | do(treatment)), or raise.
Runs the ID algorithm. Raises :class:NotIdentifiableError (with the witnessing hedge attached
as .witness) when the effect is not non-parametrically identifiable, and
:class:CausalGraphError for malformed inputs (unknown nodes, empty outcome, or a treatment and
outcome that overlap).
Source code in src/causalrl/identification/id_algorithm.py
causalrl.identification.id_algorithm.is_identifiable_effect(graph, treatment, outcome)
¶
Whether P(outcome | do(treatment)) is identifiable from observational data.
Source code in src/causalrl/identification/id_algorithm.py
causalrl.identification.id_algorithm.estimate_effect(graph, treatment, outcome, data, *, do)
¶
Estimate P(outcome | do(treatment = do)) from observational data via the ID estimand.
Identifies the effect (raising if it cannot), then evaluates the resulting estimand on the
empirical joint of data (discrete integer columns over graph.nodes) at the intervention
do. Returns the outcome distribution as {assignment: probability} with assignments keyed
in sorted(outcome) order.
Source code in src/causalrl/identification/id_algorithm.py
causalrl.identification.id_algorithm.Estimand
¶
A do-free expression for an interventional distribution over the observational law P(V).
Render it with :meth:render; evaluate it on data with :func:estimate_effect.
Source code in src/causalrl/identification/id_algorithm.py
From Surrogate Experiments (gID)¶
causalrl.identification.id_algorithm.identify_effect_with_experiments(graph, treatment, outcome, experiments)
¶
Return an :class:Estimand for P(outcome | do(treatment)) using surrogate experiments.
This is general identification (gID): it runs the ID recursion but, where observational data
hits a hedge, it tries to obtain the needed c-factor from one of the available experiments
(each a set of variables you can intervene on; observational data is always available too).
Raises :class:NotIdentifiableError if no combination of observational data and experiments
identifies the effect.
Faithful to S. Lee, J. Correa, E. Bareinboim, General Identifiability with Arbitrary Surrogate Experiments, UAI 2019, building on Tian & Pearl's c-factor identification. No code is ported.
Source code in src/causalrl/identification/id_algorithm.py
causalrl.identification.id_algorithm.is_gid_identifiable(graph, treatment, outcome, experiments)
¶
Whether P(outcome | do(treatment)) is identifiable from data plus those experiments.
Source code in src/causalrl/identification/id_algorithm.py
causalrl.identification.id_algorithm.estimate_effect_with_experiments(graph, treatment, outcome, data, experiments_data, *, do)
¶
Estimate P(outcome | do(treatment = do)) from observational data plus experiments.
experiments_data maps each available intervention target (a frozenset of variables) to a
dataset drawn from do(target) over graph.nodes. Identifies the effect via gID
(:func:identify_effect_with_experiments), then evaluates the estimand against the empirical
observational joint and each experiment's empirical c-factor. Returns the outcome distribution
keyed in sorted(outcome) order.
Source code in src/causalrl/identification/id_algorithm.py
Causal Discovery¶
causalrl.discovery.discover(data, variables, *, threshold=0.01, max_conditioning_size=3)
¶
Discover the CPDAG over variables from data via the PC algorithm.
threshold is the conditional-mutual-information cutoff below which two variables are judged
independent; max_conditioning_size caps the separating-set search.
Source code in src/causalrl/discovery.py
causalrl.discovery.discover_interventional(observational, interventions, variables, *, threshold=0.01, shift_threshold=0.05, max_conditioning_size=3)
¶
Discover the interventional essential graph from observational and experimental data.
Runs the observational PC algorithm (:func:discover), then orients the edges incident to each
intervention target by the invariance principle: under a perfect intervention do(T) a
child of T shifts its marginal, while a parent (a non-descendant) stays invariant. Each
incident edge T - B is oriented T -> B if B shifts and B -> T if not; Meek's
rules R1-R3 then propagate the new orientations, so the result refines the observational CPDAG
toward the true DAG as more targets are experimented on.
interventions maps each intervened target T to a dataset drawn from do(T) (a perfect
intervention covering every variable in variables). shift_threshold is the
total-variation cutoff above which an endpoint's marginal is judged to have shifted.
Source code in src/causalrl/discovery.py
causalrl.discovery.discover_latent(data, variables, *, threshold=0.01, max_conditioning_size=3)
¶
Discover a PAG over variables from data via the FCI algorithm (allows latents).
Unlike :func:discover, FCI does not assume causal sufficiency: it learns the PC skeleton, then
refines it with the Possible-D-SEP step (sound under latent confounders), re-orients unshielded
colliders, and applies the complete orientation rules R1-R10 (Zhang 2008 — sound and complete
for latent confounders and selection bias). The result is a :class:PAG: a <-> b
witnesses a latent confounder; a circle endpoint is undetermined by the equivalence class.
threshold and max_conditioning_size mirror :func:discover.
Source code in src/causalrl/discovery.py
causalrl.discovery.CPDAG
dataclass
¶
A completed partially directed acyclic graph (a Markov equivalence class).
Source code in src/causalrl/discovery.py
to_causal_graph()
¶
Convert to a :class:CausalGraph; raises if any edge is still unoriented.
Source code in src/causalrl/discovery.py
causalrl.discovery.PAG
dataclass
¶
A partial ancestral graph: the complete Markov-equivalence class of MAGs (the FCI output).
marks[(a, b)] is the mark on the b end of edge a—b — a circle o (undetermined by
the equivalence class), an arrowhead >, or a tail -. An edge exists iff both (a, b)
and (b, a) are present. a -> b is tail-at-a / arrow-at-b; a <-> b
(arrowheads at both ends) witnesses a latent confounder; a o-o b is fully unoriented.
Source code in src/causalrl/discovery.py
is_directed(a, b)
¶
is_bidirected(a, b)
¶
Whether a <-> b (arrowheads at both ends — a latent confounder).
edges()
¶
(a, b, mark_at_a, mark_at_b) for each edge, with a < b.
Source code in src/causalrl/discovery.py
causalrl.discovery.conditional_mutual_information(data, x, y, z)
¶
Empirical I(X; Y | Z) in nats (discrete columns; 0 iff X ⊥ Y | Z).
Source code in src/causalrl/discovery.py
Causal Imitation Learning¶
causalrl.imitation.is_imitable(graph, *, action, outcome, observable)
¶
Whether the expert is imitable: an observed back-door-admissible set exists.
Source code in src/causalrl/imitation.py
causalrl.imitation.imitation_backdoor_set(graph, *, action, outcome, observable, max_size=3)
¶
Smallest observed back-door-admissible set for action -> outcome, or None.
A set Z ⊆ observable \ {action, outcome} is admissible when it has no descendant of
action and blocks every back-door path (action ⊥ outcome | Z with action's outgoing
edges removed). Cloning P(action | Z) then reproduces the expert's outcome distribution.
Source code in src/causalrl/imitation.py
causalrl.imitation.CausalImitator
¶
Bases: Agent
Clones P(A | Z) for a back-door-admissible observed set Z (the adjustment set).
Conditioning the cloned policy on Z reproduces the confounding the expert responded to, so
deployment matches the expert's reward. Unseen Z values fall back to the marginal P(A).
Source code in src/causalrl/imitation.py
causalrl.imitation.BehavioralCloning
¶
Bases: Agent
Naive imitator: clones the marginal action distribution P(A), ignoring covariates.
Source code in src/causalrl/imitation.py
Causal Curriculum Learning¶
causalrl.curriculum.causal_curriculum(graph, goal=None)
¶
A curriculum (skill order) respecting the causal structure: a topological order in which
every parent (prerequisite) precedes its children. With goal, restrict to the goal and its
ancestors — the skills the goal depends on — still in topological order.
Source code in src/causalrl/curriculum.py
causalrl.curriculum.curriculum_q_learning(tasks, *, episodes_per_task, alpha=0.5, epsilon=0.1, seed=None)
¶
Learn the target policy by Q-learning through a curriculum of subtasks, easiest first.
tasks is ordered from the simplest subtask to the target (the last element); all tasks share
the same state and action spaces. The Q-table is carried forward between stages (warm-start
transfer), so value learned on the easy subtasks bootstraps the harder ones. This is the causal
curriculum applied to RL: order subgoals by prerequisite structure (see
:func:causal_curriculum), then train in that order to reach a target policy that flat learning
on the sparse target alone struggles to find. Returns the greedy policy on the target task.
Faithful to Y. Bengio, J. Louradour, R. Collobert, J. Weston, Curriculum Learning, ICML 2009;
the Q-learning update matches :func:causalrl.shaping.q_learning. No external code is ported.
Source code in src/causalrl/curriculum.py
causalrl.curriculum.is_valid_curriculum(graph, order)
¶
Whether order respects prerequisites: each skill appears after every parent of it that is
also in order.
Source code in src/causalrl/curriculum.py
causalrl.curriculum.PrerequisiteLearner
¶
Causally-gated skill acquisition: walking the curriculum left to right, a skill is mastered iff all of its parents (prerequisites) are already mastered. Deterministic and order-faithful, so the effect of the ordering is exactly readable from the mastered set.
Source code in src/causalrl/curriculum.py
train(curriculum)
¶
Process the curriculum once and return the set of mastered skills.
Source code in src/causalrl/curriculum.py
Causal Reward Shaping¶
causalrl.shaping.apply_potential_shaping(mdp, potential)
¶
Return a new MDP with rewards[s, a] += gamma * Phi(s') - Phi(s) (policy-invariant).
Source code in src/causalrl/shaping.py
causalrl.shaping.causal_potential(mdp)
¶
The ideal (causal) shaping potential: V* with terminal states pinned to 0 (the
condition under which potential shaping is policy-invariant for episodic tasks).
Source code in src/causalrl/shaping.py
causalrl.shaping.value_iteration(mdp, *, tol=1e-09, max_iter=1000)
¶
Return (V*, greedy optimal policy) for the deterministic MDP.
Source code in src/causalrl/shaping.py
causalrl.shaping.q_learning(mdp, *, episodes, potential=None, alpha=0.5, epsilon=0.1, max_steps=None, seed=None)
¶
Tabular epsilon-greedy Q-learning from state 0. With potential given, the reward is
shaped online by gamma * Phi(s') - Phi(s). Returns the greedy policy.
Source code in src/causalrl/shaping.py
causalrl.shaping.TabularMDP
dataclass
¶
A finite deterministic MDP: transitions/rewards are keyed by (state, action).
Source code in src/causalrl/shaping.py
Causal Game Theory¶
causalrl.games.CausalGame
dataclass
¶
A finite game as a causal influence diagram.
utilities[agent] maps each joint action profile (a tuple in agents order) to agent's
payoff; graph is the (M)ACID with a decision and utility node per agent.
Source code in src/causalrl/games.py
causalrl.games.pure_nash_equilibria(game)
¶
All pure-strategy Nash equilibria, by enumeration over the finite joint action space.
Source code in src/causalrl/games.py
causalrl.games.mixed_nash_equilibria(game)
¶
All mixed-strategy Nash equilibria (pure and properly mixed), by support enumeration.
Two-player games are solved exactly over rational arithmetic (:class:fractions.Fraction),
so symmetric games yield exact mixes (matching pennies gives 0.5/0.5). Games with three
or more agents are solved by support enumeration with a numerical (Newton) solve of the
multilinear indifference system; every returned profile is then verified to be an ε-Nash
equilibrium (no agent can gain more than 1e-6 by deviating to a pure action). Each
equilibrium maps an agent to {action: probability} with off-support actions at zero.
Assumes a non-degenerate game; degenerate games may admit a continuum of equilibria, of which
only support-extreme points are enumerated, and the numerical solver may miss a support whose
system is ill-conditioned. Raises :class:CausalGraphError for fewer than two agents (use
:func:pure_nash_equilibria for pure equilibria of any game).
Faithful to the support-enumeration method (R. Porter, E. Nudelman, Y. Shoham, Simple Search Methods for Finding a Nash Equilibrium, Games and Economic Behavior 2008; B. von Stengel, Computing Equilibria for Two-Person Games, Handbook of Game Theory 2002). No code is ported.
Source code in src/causalrl/games.py
causalrl.games.best_response(game, agent, profile)
¶
The actions maximizing agent's payoff given the other agents' actions in profile.
Source code in src/causalrl/games.py
causalrl.games.is_nash_equilibrium(game, profile)
¶
Whether every agent's action in profile is a best response to the others.
Causal Gymnasium Wrapper¶
causalrl.envs.wrapper.CausalEnvWrapper
¶
Bases: Wrapper[Any, Any, Any, Any]
A Gymnasium wrapper that exposes the wrapped environment's causal structure.
Parameters¶
env:
Any gymnasium.Env. When the env carries a non-None .scm attribute and
reward_node is provided (and valid), the full causal interface is enabled.
When env.scm is None or reward_node is None, construction still
succeeds but the causal interface is disabled (pass-through mode).
reward_node:
The SCM variable name that represents the reward / return signal. Optional —
None disables the causal interface. If the env has a live SCM and
reward_node is supplied but not present in the graph, a ValueError is raised.
Attributes¶
reward_node:
The SCM node treated as the reward/return, or None when not set.
has_causal_interface:
True iff the wrapped env exposes a non-None SCM and reward_node is set
and present in the graph. All causal methods require this to be True.
reward_parents:
Direct SCM parents of reward_node in topological order. Requires
has_causal_interface.
scm:
The underlying :class:~causalrl.scm.scm.StructuralCausalModel (live reference to
the wrapped env's .scm), or None when not available.
active_interventions:
The currently stored intervention mapping, or None if no persistent
intervention is active.
Source code in src/causalrl/envs/wrapper.py
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 | |
has_causal_interface
property
¶
True iff the causal interface is fully operational.
Requires the wrapped env to expose a non-None .scm and reward_node
to be set and present in the graph.
scm
property
¶
The underlying SCM (live reference to the wrapped env's .scm), or None.
reward_parents
property
¶
Direct SCM parents of reward_node in graph-topological order.
These are the variables that causally determine the immediate reward signal. Pass
their names as the factor_nodes argument of
:func:~causalrl.agents.factored_advantage.factored_advantage.
Raises¶
CausalInterfaceUnavailableError
When has_causal_interface is False.
active_interventions
property
¶
The currently stored intervention mapping, or None if none is active.
do(interventions)
¶
Return a new SCM mutilated by do(interventions).
The running environment's SCM is NOT modified. This is a pure causal-graph query suitable for off-policy reasoning or shaping.
Parameters¶
interventions:
Mapping {node_name: value} passed to
:meth:~causalrl.scm.scm.StructuralCausalModel.do.
Returns¶
StructuralCausalModel The mutilated SCM under the specified do-intervention.
Raises¶
CausalInterfaceUnavailableError
When has_causal_interface is False.
Source code in src/causalrl/envs/wrapper.py
intervene(node, value)
¶
Convenience wrapper: do({node: value}).
Parameters¶
node: The SCM variable to intervene on. value: The value to pin it to (scalar, sequence, or Tensor).
Returns¶
StructuralCausalModel
The mutilated SCM under do({node: value}).
Raises¶
CausalInterfaceUnavailableError
When has_causal_interface is False.
Source code in src/causalrl/envs/wrapper.py
set_intervention(interventions)
¶
Store a persistent intervention that affects subsequent reset and step.
After this call, every reset() and step() swaps the wrapped env's .scm
to the pre-computed mutilated SCM for the duration of the call, then restores the
original SCM in a finally block. Precomputed baselines stored on the env
(e.g. arm_values) are not recomputed.
Parameters¶
interventions:
Mapping {node_name: value} to pin persistently.
Raises¶
CausalInterfaceUnavailableError
When has_causal_interface is False.
Source code in src/causalrl/envs/wrapper.py
clear_intervention()
¶
Remove the persistent intervention; subsequent calls use the unintervened SCM.
reset(*, seed=None, options=None)
¶
Reset the wrapped environment, forwarding seed and options.
When a persistent intervention is active (set_intervention was called), the
wrapped env's .scm is temporarily replaced with the mutilated SCM for the
duration of this call, then restored unconditionally.
Source code in src/causalrl/envs/wrapper.py
step(action)
¶
Step the wrapped environment.
When a persistent intervention is active (set_intervention was called), the
wrapped env's .scm is temporarily replaced with the mutilated SCM for the
duration of this call, then restored unconditionally.
Source code in src/causalrl/envs/wrapper.py
Causal Graph-Factored Advantage (CGFA)¶
causalrl.agents.factored_advantage.factored_advantage(factor_values, baselines, *, config=None, aggregation='sum', weights=None)
¶
Compute causal graph-factored advantages from per-factor value estimates.
Implements the SCM-aligned critic target from CGFA-PPO (arXiv:2605.06066, §3.2).
Given K causal parent factors of the return, and for each rollout step a vector of
per-factor value estimates V_1, …, V_K and a scalar baseline b, the per-factor
advantage is A_i = V_i - b and the combined advantage is their (weighted) sum or
mean.
When K = 1 (single factor) the output reduces exactly to the standard advantage
A = V - b, so this is a strict generalisation of the scalar advantage.
Parameters¶
factor_values:
Array of shape (T, K) where T is the number of rollout steps and K is
the number of causal factors (SCM parents of the return). Each column [:,i]
is the critic's value estimate for factor i.
baselines:
Array of shape (T,) — the shared scalar baseline for each step (typically the
current value-function estimate V(s_t)).
config:
A :class:FactoredAdvantageConfig that carries factor_nodes, aggregation,
and optional weights. When provided, aggregation and weights keyword
arguments are ignored (config takes precedence).
aggregation:
Used when config is None. "sum" (default) or "mean".
weights:
Used when config is None. Per-factor weights, shape (K,). None
means uniform unit weights.
Returns¶
NDArray[np.float64]
Shape (T,) — the combined causal-graph-factored advantage for each step.
Raises¶
ValueError
If factor_values is not 2-D, if baselines length does not match T, or
if weights shape does not match K.
Examples¶
Single-factor (reduces to standard advantage):
import numpy as np V = np.array([[2.0], [3.0], [1.0]]) # (T=3, K=1) b = np.array([1.5, 2.5, 0.5]) factored_advantage(V, b) array([0.5, 0.5, 0.5])
Two-factor sum (CGFA-PPO with two SCM parents of the return):
V2 = np.array([[2.0, 1.0], [3.0, 0.5]]) # (T=2, K=2) b2 = np.array([1.5, 2.0]) factored_advantage(V2, b2) # A_i = V_i - b; sum over i array([ 0. , -0.5])
References¶
- Cristiano da Costa Cunha, Ajmal Mian, Tim French, and Wei Liu (2026). "Causal Reinforcement Learning for Complex Card Games: A Magic: The Gathering Benchmark." arXiv:2605.06066.
Source code in src/causalrl/agents/factored_advantage.py
89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 | |
causalrl.agents.factored_advantage.FactoredAdvantageConfig
dataclass
¶
Configuration bundle for :func:factored_advantage.
Parameters¶
factor_nodes:
Ordered list of SCM parent-node names whose per-factor value estimates are passed to
:func:factored_advantage. The order must match the columns of the
factor_values array.
aggregation:
How to combine per-factor advantages into the final scalar advantage.
* ``"sum"`` (default): ``A = Σ_i A_i`` — the formulation in arXiv:2605.06066.
* ``"mean"``: ``A = mean_i A_i`` — normalises when the number of factors varies.
weights:
Optional per-factor weights w_i (length must match factor_nodes when
provided). The combined advantage becomes A = Σ_i w_i * A_i for
aggregation="sum" or the weighted mean for aggregation="mean". None
means uniform unit weights.
Source code in src/causalrl/agents/factored_advantage.py
weights_array
property
¶
The validated per-factor weight vector (uniform unit weights when unset).
Gymnasium Env Registration¶
causalrl.envs.registration.register_envs()
¶
Register causalrl demo environments in the Gymnasium registry.
Calling this function more than once in the same process is safe (idempotent).
After calling this function (or importing causalrl), you can use::
import gymnasium
import causalrl # triggers register_envs()
env = gymnasium.make("causalrl/StructuralCausalBandit-v0")
vec = gymnasium.make_vec("causalrl/StructuralCausalBandit-v0", num_envs=2)
Source code in src/causalrl/envs/registration.py
Exceptions¶
causalrl.exceptions.CausalRLError
¶
causalrl.exceptions.CausalInterfaceUnavailableError
¶
Bases: CausalRLError
The causal interface is not available on this wrapper.
Raised when a method that requires a live SCM and a named reward node is called
on a :class:~causalrl.envs.wrapper.CausalEnvWrapper that was constructed without
them (e.g. wrapping a :class:~causalrl.envs.base.ConfoundedMDP that carries
scm=None, or without passing a reward_node).
Source code in src/causalrl/exceptions.py
causalrl.exceptions.NotIdentifiableError
¶
Bases: CausalRLError
A causal query is not identifiable from the available data.
Source code in src/causalrl/exceptions.py
causalrl.exceptions.CausalGraphError
¶
Bases: CausalRLError
Invalid graph operation (unknown node, cycle, malformed edge).
causalrl.exceptions.RealizabilityError
¶
Bases: CausalRLError
A counterfactual query cannot be realized from the given evidence.
causalrl.exceptions.UnverifiedAssumptionError
¶
Bases: CausalRLError
A method's claimed guarantee requires an assumption the caller has not declared.
Partial-Identification And OPE Bounds¶
causalrl.identification.bounds.manski_bounds(data, *, treatment, outcome, action, outcome_range=(0.0, 1.0))
¶
Sharp no-assumptions bounds on E[outcome | do(treatment = action)] (Manski 1990).
From observational data (integer treatment column, numeric outcome in
outcome_range): the units that took action contribute their observed mean, while
the rest are bounded only by the outcome range. With p = P(treatment = action) and
m = E[outcome | treatment = action] the bounds are
[m*p + y_min*(1-p), m*p + y_max*(1-p)] — sharp, collapsing to a point when every unit took
action. The observational counterpart of :func:causal_q_bounds.
Source code in src/causalrl/identification/bounds.py
causalrl.identification.bounds.ipw_sensitivity_bounds(outcomes, propensities, *, gamma)
¶
Marginal-sensitivity-model bounds on the treated counterfactual mean E[Y(1)].
outcomes and propensities are the treated units' outcomes Y_i and nominal
propensities e(Z_i) = P(treated | Z_i) (what an unconfounded model fits). Under Tan's
marginal sensitivity model the true inverse-propensity weight lies within an odds-ratio factor
gamma >= 1 of the nominal, giving a_i in [1 + (1/g)(1/e_i - 1), 1 + g(1/e_i - 1)]; the
bounds are the extreme stabilized (Hájek) weighted means over that range. At gamma = 1 the
interval collapses to the IPW point estimate; it widens monotonically with gamma and
contains E[Y(1)] whenever the true confounding odds ratio is at most gamma.
Faithful to Z. Tan, A Distributional Approach for Causal Inference Using Propensity Scores (JASA 2006) and Q. Zhao, D. Small, B. Bhattacharya, Sensitivity Analysis for Inverse Probability Weighting Estimators via the Percentile Bootstrap (JRSS-B 2019). No code is ported.
Source code in src/causalrl/identification/bounds.py
causalrl.identification.bounds.causal_q_bounds(dataset, state, action, *, require_identified=False)
¶
Manski natural bounds on E[return | do(action), state] from confounded logs.
For a return in [0, 1] with empirical mean m = E[R|s,a] and propensity p = P(a|s):
lower = m * p, upper = m * p + (1 - p).
A never-logged action (p = 0) yields the vacuous [0, 1] — not identifiable from the
logs alone. With require_identified=True, a vacuous bound raises NotIdentifiableError
carrying (state, action) as the witness.
Source code in src/causalrl/identification/bounds.py
causalrl.identification.bounds.msm_policy_value_bounds(outcomes, logging_propensities, target_propensities, *, gamma)
¶
Marginal-sensitivity-model bounds on an off-policy value V(pi_t) = E[(pi_t/e0) Y].
Self-normalised (Hájek) off-policy value of a target policy pi_t estimated from logs of a
logging policy with nominal propensities e0(a|x) = P(a | x) (a valid probability in
(0, 1]). outcomes are the logged rewards Y_i; target_propensities are
pi_t(a_i | x_i) at the logged action. Under Tan's marginal sensitivity model the true
logging propensity deviates from nominal by an odds-ratio at most gamma >= 1, so the true
inverse weight 1/ẽ0 lies in [1 + odds/gamma, 1 + odds*gamma] with odds = (1-e0)/e0;
each unit's contribution weight is pi_t(a_i|x_i) * (1/ẽ0). The bounds are the extreme
stabilised weighted means of Y over those per-unit weight ranges.
Reduces to :func:ipw_sensitivity_bounds when pi_t is constant across the logged actions
(the treated / uniform-target mean — the constant cancels in the self-normalised ratio), and
collapses to the self-normalised IPS point at gamma = 1. The off-policy generalisation of
Tan's MSM in the spirit of N. Kallus & A. Zhou, Confounding-Robust Policy Evaluation in
Infinite-Horizon Reinforcement Learning (NeurIPS 2020). The caller supplies pi_t and the
nominal e0; no code is ported.
Source code in src/causalrl/identification/bounds.py
causalrl.identification.bounds.msm_contribution_bounds(outcomes, logging_propensities, target_propensities_on, target_propensities_off, *, gamma)
¶
Marginal-sensitivity-model bounds on a contribution V(pi_on) - V(pi_off).
The off-policy value DIFFERENCE between two target rules, estimated from confounded logs
under Tan's marginal sensitivity model — e.g. a per-agent credit or per-factor contribution
E[Y_{do(F=1)}] - E[Y_{do(F=0)}]. Each arm is bounded by :func:msm_policy_value_bounds
(target_propensities_on = pi_on(a_i | x_i) at the logged action, ..._off likewise,
shared nominal e0) and the contribution interval is the difference
[ on.lower - off.upper , on.upper - off.lower ].
Always valid (it contains the true difference for any targets); sharp when the two
target supports are disjoint — e.g. the deterministic one-hot arms 1{F=1} / 1{F=0}
that partition the logged units, so the two arms' weight perturbations are independent — and
conservative otherwise. Collapses to the difference of the two self-normalised IPS points at
gamma = 1 and widens monotonically with gamma.
Source code in src/causalrl/identification/bounds.py
causalrl.identification.bounds.msm_per_step_bounds(rewards_by_step, propensities_by_step, *, gamma)
¶
Per-step marginal-sensitivity-model bounds on a cumulative (summed) reward.
Each element of rewards_by_step / propensities_by_step is one time step's
per-unit rewards r_t and nominal propensities e_t; the cumulative-reward MSM
bound is the sum over steps of the per-step :func:ipw_sensitivity_bounds. This is the
additive (per-step) cumulative-reward MSM: each step is bounded independently under the
sensitivity model and the bounds add, which is tight for sparse / per-step rewards.
Reusable kernel of the per-step cumulative-reward MSM used for confounded multi-step OPE (Bennett & Kallus, Efficient and Sharp OPE in Robust MDPs, NeurIPS 2024). The experiment supplies the per-step nominal propensities; no code is ported.
Source code in src/causalrl/identification/bounds.py
causalrl.identification.bounds.msm_stratified_bounds(values, propensities, strata, target_weights, *, gamma)
¶
Stratified marginal-sensitivity-model bounds: Σ_s w_s · MSM_s.
Compute the MSM bound within each stratum (units sharing a strata label) and combine
them with target_weights (the target stratum marginal w_s, e.g. a uniform initial
state distribution). Strata absent from the data contribute nothing. Because conditioning
removes between-stratum weight variation, the stratified bound is never wider than the
pooled :func:ipw_sensitivity_bounds and never under-covers (THEORY.md, Prop 1).
The reusable kernel of the stratified cumulative-reward MSM; the experiment supplies the stratum labels (e.g. initial state) and nominal propensities.
Source code in src/causalrl/identification/bounds.py
Decision Certificates¶
The decision stack — certify whether a confounded / off-policy decision ("is the treated arm
better than the control arm?") is robust to hidden confounding, cheapest layer first.
certify_decision is the one-call front door over the layers below.
causalrl.identification.decision.certify_decision(outcomes, treated, *, confounder_bins=None, mi_cap=None, propensities=None, gamma_max=10.0)
¶
Certify whether a binary decision from confounded logs is robust to hidden confounding.
outcomes are logged rewards Y_i; treated is the binary arm indicator (1 = the
arm under test, 0 = baseline). The naive decision is the sign of the logged contrast
E[Y | F=1] - E[Y | F=0]. Supply at least one evidence source — they compose:
confounder_bins(a measured hidden variableZ) ormi_cap(a structural cap on the information channelMI(I; Z)) runs the sign-robustness layer (:func:pivotality_certificate): certifies iff no hidden confounder consistent with that information can flip the contrast's sign. One-sided — failure to certify is not evidence of a flip.propensities(the nominal logging propensitiese0(a_i | x_i)at the logged action) runs the MSM tipping layer: the smallest Tan odds-ratioGammaat which the off-policy (IPS) value contrast band first includes zero (:func:tipping_gammaover the sharp one-hot :func:msm_contribution_bounds).tipping_gamma is Nonemeans the decision is robust to confounding at least as strong asgamma_max.
With informative propensities the MSM layer concerns the inverse-propensity-weighted off-policy contrast, which coincides with the raw logged contrast only under uniform logging.
Source code in src/causalrl/identification/decision.py
causalrl.identification.decision.DecisionCertificate
¶
Bases: NamedTuple
Result of :func:certify_decision.
certified is the headline: is the naive (logged) decision robust to hidden confounding,
by the strongest layer that ran? When the structural/measured layer ran it carries its
one-sided guarantee (no confounder consistent with the supplied information can flip the
sign); otherwise it reports whether the MSM layer found the decision robust up to gamma_max.
The component fields and summary make the exact guarantee explicit.
Source code in src/causalrl/identification/decision.py
causalrl.identification.bounds.pivotality_certificate(outcomes, treated, confounder_bins=None, *, mi_cap=None)
¶
One-sided sign-robustness certificate for a naive contrast under hidden confounding.
certified=True means: no hidden variable consistent with the supplied information can
flip the sign of E[Y|F=1] - E[Y|F=0]. Two modes:
confounder_binsgiven (a measured hidden variable, e.g. post-hoc showdown/oracle data): certify iff the TV-form :func:confounding_bias_boundis strictly below|naive|;mi_measuredreports the plug-in channel.mi_capgiven (a structural cap onMI(I;Z)from the environment's information rules — the data-processing route): certify iffmi_cap < mi_flipcomputed with the outcome-span relaxation (noZneeded anywhere).
The cheapest layer of the decision stack — certificate, then MSM band
(:func:msm_contribution_bounds), then abstention (:func:tipping_gamma). One-sided:
failure to certify is NOT evidence of a flip. Verified against measured ground truth in
three game-log regimes in experiments/games/theory/verify_pivotality.py.
Source code in src/causalrl/identification/bounds.py
causalrl.identification.bounds.confounding_bias_bound(outcomes, treated, confounder_bins, *, form='tv')
¶
Upper bound on the omitted-variable bias |naive - Z-adjusted| from logged rows.
Omitted-variable bias in total-variation form:
``|bias| <= M1 * TV(P_{Z|F=1}, P_Z) + M0 * TV(P_{Z|F=0}, P_Z)``
with M_f the span over Z-bins of E[Y | F=f, Z] (each signed measure integrates
to zero, so the midrange trick gives the span, not the sup; sharp — attained with equality
by an explicit two-point family at every parameter value). form="mi" applies Pinsker
with the KL budget p*KL1 + (1-p)*KL0 = MI(F;Z) split optimally across arms (sharp
small-MI constant):
``|bias| <= sqrt( MI/2 * (M1^2/p + M0^2/(1-p)) )`` (capped at the trivial ``M1+M0``),
the (looser-than-TV) form whose budget the environment's information structure can cap.
Every stratum must contain both arms (positivity); restrict to an overlap population first —
a logger that conditions hard on Z destroys overlap, which is a finding, not a nuisance.
Source code in src/causalrl/identification/bounds.py
causalrl.identification.bounds.mi_flip_threshold(naive, span_treated, span_control, p_treated)
¶
Channel capacity (nats) below which NO hidden confounder can flip the naive sign.
The decision-pivotality threshold (sharp form)
``MI_flip = 2 naive^2 / (M1^2/p + M0^2/(1-p))``:
if the mutual information between the logged binary action and a hidden variable Z is
below this value, the omitted-Z bias is strictly smaller than |naive| (omitted-
variable bias in TV form + Pinsker with the KL budget split optimally across arms), so the
Z-adjusted contrast has the same sign as the naive one. This constant is SHARP: a
binary-channel family attains the corresponding bias bound with ratio -> 1 in the small-MI
limit (THEORY_pivotality.md, tightness section); the additive form
2(|naive|/(M1/sqrt(p)+M0/sqrt(1-p)))^2 is its (looser) Cauchy-Schwarz relaxation.
Combined with the data-processing inequality MI(F;Z) <= MI(I;Z) — the logging actor's
policy reads only its information set I — the information structure of the
environment caps the reachable confounding budget.
span_treated / span_control are the spans of E[Y | F=f, Z=z] over z (the
regression spans M1 / M0); with Z unmeasured, use the outcome span for both
(valid, looser).
Source code in src/causalrl/identification/bounds.py
causalrl.identification.bounds.tipping_gamma(bound, *, reference=0.0, gamma_max=10.0, tol=0.001)
¶
Sensitivity tipping point: the smallest gamma >= 1 at which the partial-ID interval
bound(gamma) first contains reference.
This is the causal-sensitivity reporting companion to the MSM bound kernels: it answers
"how strong would unmeasured confounding have to be (on the MSM/Rosenbaum odds-ratio scale)
to overturn the conclusion that the estimand lies strictly on one side of reference?".
A larger tipping gamma ⇒ a more robust conclusion — the odds-ratio-scale analog of the
E-value (VanderWeele & Ding, Ann. Intern. Med. 2017) and of Rosenbaum's Gamma.
bound maps a sensitivity level gamma to an :class:Interval; it must collapse to a
point at gamma = 1 and widen monotonically with gamma (as every MSM kernel here does —
e.g. lambda g: msm_contribution_bounds(y, e0, on, off, gamma=g)). Returns 1.0 if the
point already sits on reference, and None if the interval never reaches reference
by gamma_max (the conclusion is robust to confounding at least that strong).