04.Bayesian networks(W5)
What's causality
X
causes Y
iff (if noly if) there is some manipulation of X
leading to a change in the probability distribution of Y
. (Judea Pearl, 2000; Neapolitan, 2003)
How to discover causality?
The gold standard method: A randomized controlled experiment (RCE).
The reasons for the unavailability of the experiments.
Infeasible
Unethical
Expensive
What's the IDA(Intervention calculus when the DAG is Absent)
The purpose of IDA in this context is to provide a methodological framework for inferring causality in situations where a DAG
cannot be fully constructed.
Steps for IDA
- Use the
PC
algorithm to learn theCPDAG
(Completed Partially Directed Acyclic Graph). Infer the causal effects
, e.g. eff(X2, Y), in all DAGs in the equivalent class- Find the
lower bound
of causal effects.
Causal inference - IDA
Local structure learning algorithms
- A major weakness of
PC
algorithm- Time complexity is exponential to the number of variables.
- Fix a variable and find parent and child nodes of the node.
- Both PC-Simple and Hiton-PC have polynomial time complexity with the number of variables.
1. PC
algorithm
Steps of Example
Initial: Let the be the target, .
- Test the independence between with each variable.
- are independent of Z
- Test the independence between with each variable given another variable.
- is indenpendent of given .
- Test the independence between with each variable given another two variables.
- is independent of given
- Since , the program is terminated
2. Hiton-PC
algorithm
Steps of Example
Initial: Let the be the target, ,
OPEN sorted in descending order of association strength
Set
Test the independence between with given by
- It's not independence.
Test the independence between with given by the combination from .
- is independent of given by
- Remove from
Test the independence between with given by the combination from .
- It's not independence.
Test the independence between with given by the combination from .
- It's not independence.
Test the independence between item from (Pick up item from left to right) given by the combination of other right items.
- test the independence between with given by combination of
- test the independence between with given by combination of
- test the independence between with given by combination of
- is indenpendent given by
Cohort study
Cohorts: share common characteristics but exposed or not exposed.
Determine how the exposure causes an outcome.
Measure:
Diseased | Healthy | |
---|---|---|
Exposed | a | b |
Not Exposed | c | d |
Limitations of cohort study
- Need to know a hypothesis beforehand
- Domain experts determine the control variables.
- Collect data and test the hypothesis.
- Not for data exploration.
We need:
- Given a dataset without any hypotheses.
- An automatic method to find and validate hypotheses.
- For data exploration.
Matches
References
- Week 5 Slides from Thuc (SP52023)
- 贝叶斯理论:Data Science必备技能,数据分析师为你揭开 Bayes 统计的神秘面纱(第426期