03.Bayesian networks(W4)

HaiyueAugust 25, 2023About 3 min

Bayesian networkds learning

Search and Score based methods

Search for all possible DAGs
Score each DAG with a scoring function
The DAG with the highest score (best fit the data)
NP-hard problem

How many DAGs Must be scored?

$f(n) = \displaystyle\sum_{i=1}^n(-1)^{i+1}\dbinom{n}{i}2^{i(n-i)}f(n-i), n > 2$

import numpy as np
from functools import *
import math

def dag_count(n):
  ret = 0
  if 0 == n or 1 == n:
    return 1
  if 2 == n:
    return 3
  if n >= 2:
    for i in range(1, n+1):
      ret += ((-1)**(i+1)) * math.comb(n, i) * (2**(i*(n-i))) * dag_count(n-i)
  return ret

$f(2) = 3$
$f(3) = 25$
$f(5) = 29000$
$f(10) = 4.2 * 10^{18}$

How to search

Suppose we want to determine whether job status $(J)$ has a causal effect on whether someone defaults on a loan $(F)$ . Furthermore, we articulate just two values for each variable as follows:
Search and Score

How to Score

The Bayesian information criterion (BIC) score is as follow:
$BIC(G:D) = ln(P(D|\hat{P}, G)) - \frac{d}{2}ln(m)$

$m$ : the number of data items,
$d$ : the dimension of the DAG model
$\hat{P}$ : the set of maximum likelihood values of the parameters.
The dimension is the number of parameters in the model.

The BIC score is intuitively appealing because it contains
(1): a term that shows how well the model predicts the data when the parameter set is equal to its ML value, and
(2): a term that punishes for model complexity.
Another nice feature of the BIC is that it does not depend on the prior distribution of the parameters, which means there is no need to assess one.

Examples

Graph 1

\begin{equation} \begin{split} \hat{P}(j_1) &= \frac{5}{8}\\ \hat{P}(f_1|j_1) &= \frac{4}{5}\\ \hat{P}(f_1|j_2) &= \frac{1}{3}\\ &\Downarrow \\ \textcolor{red}{P(D|\hat{P},G_1)} &= [\hat{P}(f_1|j_1)\hat{P}(j_1)]^4[\hat{P}(f_2|j_1)\hat{P}(j_1)][\hat{P}(f_1|j_2)\hat{P}(j_2)][\hat{P}(f_2|j_2)\hat{P}(j_2)]^2\\ & = (\frac{4}{5}*\frac{5}{8})^4(\frac{1}{5}*\frac{5}{8})(\frac{1}{3}*\frac{3}{8})(\frac{2}{3}*\frac{3}{8})^2\\ &=6.1035*10^{-5}\\ &\Downarrow \\ \textcolor{red}{BIC(G_1:D)} &= ln(P(D|\hat{P}, G_1)) \frac{d}{2}ln(m)\\ &= ln(6.1035*10^{-5})-\frac{3}{2}ln(8)\\ &= -12.823 \end{split} \end{equation}

Graph 2

\begin{equation} \begin{split} \hat{P}(j_1) &= \frac{5}{8}\\ \hat{P}(f_1) &= \frac{5}{8}\\ &\Downarrow \\ \textcolor{red}{P(D|\hat{P}, G_2)} &= [\hat{P}(f_1)\hat{P}(j_1)]^4[\hat{P}(f_2)\hat{P}(j_1)][\hat{P}(f_1)\hat{P}(j_2)][\hat{P}(f_2)\hat{P}(j_2)]^2\\ &= (\frac{5}{8}*\frac{5}{8})^4(\frac{3}{8}*\frac{5}{8})(\frac{5}{8}*\frac{3}{8})(\frac{3}{8}*\frac{3}{8})^2\\ &= 2.5292*10^{-5}\\ &\Downarrow \\ \textcolor{red}{BIC(G_2:D)} &= ln(P(D|\hat{P}, G_2)) \frac{d}{2}ln(m)\\ &= ln(2.5292*10^{-5})-\frac{2}{2}ln(8)\\ &= -12.644 \end{split} \end{equation}

Constraint based approach

Use statistical tests to evaluate the dependency between variables
Exponential to the number of nodes

Step1: Correlation

Correlation Graph:

Identify correlations between every pair of variables in the dataset.
An edge between 2 nodes represents the correlated pair

Step2: Conditional independence tests

Correlation Graph 2
Test whether $B$ and $C$ are correlated just because of common cause $A$ , we use conditional independence test: $I(B, C|A)$ .
e.g. Partial correlation, Chi-square
Correlation Graph 3

PC(Peter & Clark) algorithm

Step1: Learning the skeleton

steps:

Step2: Orientating the Edges

References

Week 4 Slides from Thuc (SP52023)
PC 算法 - 贝叶斯网络与其结构学习算法