03.Bayesian networks(W4)
Bayesian networkds learning
Search and Score based methods
- Search for all possible DAGs
- Score each DAG with a scoring function
- The DAG with the highest score (best fit the data)
- NP-hard problem
import numpy as np
from functools import *
import math
def dag_count(n):
ret = 0
if 0 == n or 1 == n:
return 1
if 2 == n:
return 3
if n >= 2:
for i in range(1, n+1):
ret += ((-1)**(i+1)) * math.comb(n, i) * (2**(i*(n-i))) * dag_count(n-i)
return retSuppose we want to determine whether job status

The Bayesian information criterion (BIC) score is as follow:
The dimension is the number of parameters in the model.
The BIC score is intuitively appealing because it contains
(1): a term that shows how well the model predicts the data when the parameter set is equal to its ML value, and
(2): a term that punishes for model complexity.
Another nice feature of the BIC is that it does not depend on the prior distribution of the parameters, which means there is no need to assess one.
Examples


Constraint based approach
- Use statistical tests to evaluate the dependency between variables
- Exponential to the number of nodes
Correlation Graph:

- Identify correlations between every pair of variables in the dataset.
- An edge between 2 nodes represents the correlated pair

Test whether
e.g. Partial correlation, Chi-square

PC(Peter & Clark) algorithm

steps:



References
- Week 4 Slides from Thuc (SP52023)
- PC 算法 - 贝叶斯网络与其结构学习算法
