Files
2025-02-AI/final/1201.md
2025-12-06 18:32:08 +09:00

5.0 KiB

Study Guide: Bayesian Networks & Probabilistic Inference

Date: 2025.12.01 (Final Lecture) Topic: Bayesian Networks, Probabilistic Inference Examples, Marginalization.


1. Recap: Directed vs. Undirected Models

The lecture begins by briefly contrasting the two types of graphical models discussed:

  • Undirected Graphs (MRF): Use potential functions (\psi) defined on maximal cliques. Requires a normalization constant (partition function Z) to become a probability distribution.
  • Directed Graphs (Bayesian Networks): Use conditional probability distributions (CPDs). The joint distribution is the product of local conditional probabilities. P(X) = \prod_{i} P(x_i | \text{parents}(x_i))

2. Example 1: The "Alarm" Network (Burglary/Earthquake)

This is a classic example used to demonstrate inference in Bayesian Networks.

Scenario & Structure

  • Nodes:
    • B: Burglary (Parent, no prior causes).
    • E: Earthquake (Parent, no prior causes).
    • A: Alarm (Triggered by Burglary or Earthquake).
    • J: JohnCalls (Triggered by Alarm).
    • M: MaryCalls (Triggered by Alarm).
  • Dependencies: B \rightarrow A \leftarrow E, A \rightarrow J, A \rightarrow M.
  • Probabilities (Given):
    • P(B) = 0.05, P(E) = 0.1.
    • P(A|B, E): Table given (e.g., P(A|B, \neg E) = 0.85, P(A|\neg B, \neg E) = 0.05, etc.).
    • P(J|A) = 0.7, P(M|A) = 0.8.

Task 1: Calculate a Specific Joint Probability

Calculate the probability of the event: Burglary, No Earthquake, Alarm rings, John calls, Mary does not call.

P(B, \neg E, A, J, \neg M)
  • Decomposition: Apply the Chain Rule based on the graph structure. = P(B) \cdot P(\neg E) \cdot P(A | B, \neg E) \cdot P(J | A) \cdot P(\neg M | A)
  • Calculation: = 0.05 \times 0.9 \times 0.85 \times 0.7 \times 0.2

Task 2: Inference (Conditional Probability)

Calculate the probability that a Burglary occurred, given that John called and Mary did not call.

P(B | J, \neg M)
  • Formula (Bayes Rule):

    P(B | J, \neg M) = \frac{P(B, J, \neg M)}{P(J, \neg M)}
  • Numerator Calculation (P(B, J, \neg M)): We must marginalize out the unknown variables (A and E) from the joint distribution.

    P(B, J, \neg M) = \sum_{A \in \{T,F\}} \sum_{E \in \{T,F\}} P(B, E, A, J, \neg M)

    This involves summing 4 terms (combinations of A and E).

  • Denominator Calculation (P(J, \neg M)): We further marginalize out B from the numerator result.

    P(J, \neg M) = P(B, J, \neg M) + P(\neg B, J, \neg M)

3. Example 2: 4-Node Tree Structure

A simpler example to demonstrate how sums simplify during marginalization.

Scenario & Structure

  • Nodes: X_1, X_2, X_3, X_4 \in \{0, 1\} (Binary).
  • Dependencies:
    • X_1 \rightarrow X_2
    • X_2 \rightarrow X_3
    • X_2 \rightarrow X_4
  • Decomposition: P(X) = P(X_1)P(X_2|X_1)P(X_3|X_2)P(X_4|X_2).
  • Given Tables: Probabilities for all priors and conditionals are provided.

Task: Calculate Marginal Probability $P(X_3 = 1)$

We need to find the probability of X_3=1 regardless of the other variables.

  • Definition: Sum the joint probability over all other variables (X_1, X_2, X_4).

    P(X_3=1) = \sum_{x_1} \sum_{x_2} \sum_{x_4} P(x_1, x_2, x_3=1, x_4)
  • Step 1: Expand using Graph Structure

    = \sum_{x_1} \sum_{x_2} \sum_{x_4} P(x_1)P(x_2|x_1)P(X_3=1|x_2)P(x_4|x_2)
  • Step 2: Simplify (Key Insight) Move the summation signs to push them as far right as possible. The sum over x_4 only affects the last term P(x_4|x_2).

    = \sum_{x_1} \sum_{x_2} P(x_1)P(x_2|x_1)P(X_3=1|x_2) \left[ \sum_{x_4} P(x_4|x_2) \right]
    • Property: \sum_{x_4} P(x_4|x_2) = 1 (Sum of probabilities for a variable given a condition is always 1).
    • Therefore, the X_4 term vanishes. This makes sense intuitively: X_4 is a "leaf" node distinct from X_3; knowing nothing about it doesn't change $X_3$'s probability if X_2 is handled.
  • Step 3: Final Calculation We are left with summing over X_1 and X_2:

    = \sum_{x_1} \sum_{x_2} P(x_1)P(x_2|x_1)P(X_3=1|x_2)

    This expands to 4 terms (combinations of x_1 \in \{0,1\} and x_2 \in \{0,1\}).


4. Semester Summary & Conclusion

The lecture concludes the semester's material.

  • Key Themes Covered:
    • Discriminative vs. Generative Methods: The fundamental difference in approach (boundary vs. distribution).
    • Objective Functions: Designing Loss functions vs. Likelihood functions.
    • Optimization: Parameter estimation via derivatives (MLE).
    • Graphical Models: Reducing parameter complexity using independence assumptions (Bayes Nets, MRFs).
  • Final Exam: Scheduled for Thursday, December 11th. It will cover the concepts discussed, focusing on understanding the fundamentals (e.g., Likelihood, Generative principles) rather than rote memorization.