import numpy as npfrom functools import *import mathdef dag_count(n): ret = 0 if 0 == n or 1 == n: return 1 if 2 == n: return 3 if n >= 2: for i in range(1, n+1): ret += ((-1)**(i+1)) * math.comb(n, i) * (2**(i*(n-i))) * dag_count(n-i) return ret
f(2)=3 f(3)=25 f(5)=29000 f(10)=4.2∗1018
How to search
Suppose we want to determine whether job status (J) has a causal effect on whether someone defaults on a loan (F). Furthermore, we articulate just two values for each variable as follows:
How to Score
The Bayesian information criterion (BIC) score is as follow: BIC(G:D)=ln(P(D∣P^,G))−2dln(m)
m: the number of data items, d: the dimension of the DAG model P^: the set of maximum likelihood values of the parameters. The dimension is the number of parameters in the model.
The BIC score is intuitively appealing because it contains (1): a term that shows how well the model predicts the data when the parameter set is equal to its ML value, and (2): a term that punishes for model complexity. Another nice feature of the BIC is that it does not depend on the prior distribution of the parameters, which means there is no need to assess one.
Use statistical tests to evaluate the dependency between variables
Exponential to the number of nodes
Step1: Correlation
Correlation Graph:
Identify correlations between every pair of variables in the dataset.
An edge between 2 nodes represents the correlated pair
Step2: Conditional independence tests
Test whether B and C are correlated just because of common cause A, we use conditional independence test: I(B,C∣A). e.g. Partial correlation, Chi-square