add to final
This commit is contained in:
99
final/1117.md
Normal file
99
final/1117.md
Normal file
@@ -0,0 +1,99 @@
|
||||
# Study Guide: Bayes Optimal Classifier & Graphical Models (Bayesian Networks)
|
||||
|
||||
**Date:** 2025.11.17
|
||||
**Topic:** Bayes Error, Graphical Models (Directed), Conditional Independence, and D-separation.
|
||||
|
||||
---
|
||||
|
||||
### **1. Recap: Bayes Optimal Classifier and Bayes Error**
|
||||
|
||||
The lecture begins by revisiting the concept of the **Bayes Optimal Classifier**.
|
||||
* **Decision Rule:** For a new data point $x_{new}$, the classifier compares the posterior probabilities $P(C_1 | x_{new})$ and $P(C_2 | x_{new})$. It assigns the label associated with the higher probability.
|
||||
* **Optimality:** No other classifier can outperform the Bayes Optimal Classifier. It achieves the theoretical minimum error rate.
|
||||
|
||||
#### **Bayes Error (Irreducible Error)**
|
||||
* **Definition:** Even the perfect classifier will make mistakes because classes often overlap in the feature space. This inevitable error is called the **Bayes Error**.
|
||||
* **Cause:** It arises from inherent uncertainty, noise, or ambiguity in the data itself, not from the classifier's limitations.
|
||||
* **Goal of ML:** The objective of any machine learning algorithm is not to achieve 0% error (which is impossible) but to approach the Bayes Error limit as closely as possible.
|
||||
* **Formula:** The risk (expected error) is the integral of the minimum probability over the domain:
|
||||
$$R^* = \int \min[P_1(x), P_2(x)] dx$$
|
||||
If priors are equal, this simplifies to the integral of the overlap region.
|
||||
|
||||
---
|
||||
|
||||
### **2. Introduction to Graphical Models**
|
||||
|
||||
The focus shifts to **Generative Models**, specifically **Graphical Models** (also known as Bayesian Networks).
|
||||
|
||||
* **Motivation:**
|
||||
* A full multivariate Gaussian model requires estimating a **Covariance Matrix** with $D \times D$ elements.
|
||||
* The number of parameters grows quadratically ($O(D^2)$), which corresponds to $\frac{D(D+1)}{2}$ parameters.
|
||||
* For high-dimensional data (like images with millions of pixels), estimating these parameters requires an enormous amount of data, which is often infeasible.
|
||||
* **Solution:** Use **Prior Knowledge** to simplify the model. If we know that certain variables are independent, we can set their covariance terms to zero, significantly reducing the number of parameters to learn.
|
||||
|
||||
---
|
||||
|
||||
### **3. The Chain Rule and Independence**
|
||||
|
||||
Graphical models leverage the **Chain Rule of Probability** to decompose a complex joint distribution into simpler conditional probabilities.
|
||||
|
||||
* **General Chain Rule:**
|
||||
$$P(x_1, ..., x_D) = P(x_1) P(x_2|x_1) P(x_3|x_1, x_2) ... P(x_D|x_1...x_{D-1})$$
|
||||
* **Simplification with Independence:**
|
||||
If variable $x_3$ depends only on $x_1$ and is independent of $x_2$, then $P(x_3|x_1, x_2)$ simplifies to $P(x_3|x_1)$.
|
||||
* **Structure:** This creates a **Directed Acyclic Graph (DAG)** (or Bayes Network) where:
|
||||
* **Nodes** represent random variables.
|
||||
* **Edges (Arrows)** represent conditional dependencies (causality).
|
||||
|
||||
|
||||
|
||||
---
|
||||
|
||||
### **4. Building a Bayesian Network (Causal Graph)**
|
||||
|
||||
The lecture illustrates this with a practical example involving a crying baby.
|
||||
|
||||
* **Scenario:** We want to model the causes of a baby crying.
|
||||
* **Variables:**
|
||||
* **Cry:** The observable effect.
|
||||
* **Hungry, Sick, Diaper:** Direct causes of crying.
|
||||
* **Pororo:** A distractor (e.g., watching a cartoon) that might stop the crying.
|
||||
* **Dependencies:**
|
||||
* "Hungry" and "Sick" might be independent of each other generally.
|
||||
* "Cry" depends on all of them.
|
||||
* "Pororo" depends on "Cry" (parent turns on TV *because* baby is crying) or affects "Cry".
|
||||
|
||||
|
||||
|
||||
---
|
||||
|
||||
### **5. The Three Canonical Patterns of Independence**
|
||||
|
||||
To understand complex graphs, we decompose them into three fundamental 3-node patterns. Understanding these patterns allows us to determine if variables are independent given some evidence.
|
||||
|
||||
#### **1. Tail-to-Tail (Common Cause)**
|
||||
* **Structure:** $X \leftarrow Z \rightarrow Y$ (Z causes both X and Y).
|
||||
* **Property:** $X$ and $Y$ are dependent. However, if $Z$ is observed (given), $X$ and $Y$ become **independent**.
|
||||
* **Example:** If $Z$ (Cause) determines both $X$ and $Y$, knowing $Z$ explains the correlation, decoupling $X$ and $Y$.
|
||||
|
||||
#### **2. Head-to-Tail (Causal Chain)**
|
||||
* **Structure:** $X \rightarrow Z \rightarrow Y$ (X causes Z, which causes Y).
|
||||
* **Property:** $X$ and $Y$ are dependent. If $Z$ is observed, the path is blocked, and $X$ and $Y$ become **independent**.
|
||||
* **Example:** $X$ influences $Y$ only through $Z$. If $Z$ is fixed, $X$ cannot influence $Y$ further.
|
||||
|
||||
#### **3. Head-to-Head (Common Effect / V-Structure)**
|
||||
* **Structure:** $X \rightarrow Z \leftarrow Y$ (X and Y both cause Z).
|
||||
* **Property:** **Crucial Difference.** $X$ and $Y$ are naturally **independent** (marginal independence). However, if $Z$ is observed (or a descendant is observed), they become **dependent** ("explaining away").
|
||||
* **Example:** $X$ (Hungry) $\rightarrow$ $Z$ (Cry) $\leftarrow$ $Y$ (Sick).
|
||||
* Being hungry tells us nothing about being sick (Independent).
|
||||
* But if we *know* the baby is crying ($Z$ observed): finding out the baby is Hungry ($X$) makes it less likely they are Sick ($Y$). The causes compete to explain the effect.
|
||||
|
||||
---
|
||||
|
||||
### **6. D-Separation**
|
||||
|
||||
These rules form the basis of **D-separation** (Directed Separation), a formal method to determine conditional independence in any directed graph.
|
||||
* If all paths between two variables are "blocked" by the evidence set, the variables are D-separated (independent).
|
||||
* A path is blocked if:
|
||||
* It contains a chain or fork where the middle node is **observed**.
|
||||
* It contains a collider where the middle node (and all its descendants) are **NOT observed**.
|
||||
Reference in New Issue
Block a user