add to final

2025-12-06 18:32:08 +09:00
parent ac1d2e744d
commit 0fc412e690
21 changed files with 935 additions and 0 deletions
--- a/final/1117.md
+++ b/final/1117.md
@@ -0,0 +1,99 @@
+# Study Guide: Bayes Optimal Classifier & Graphical Models (Bayesian Networks)
+
+**Date:** 2025.11.17
+**Topic:** Bayes Error, Graphical Models (Directed), Conditional Independence, and D-separation.
+
+---
+
+### **1. Recap: Bayes Optimal Classifier and Bayes Error**
+
+The lecture begins by revisiting the concept of the **Bayes Optimal Classifier**.
+* **Decision Rule:** For a new data point $x_{new}$, the classifier compares the posterior probabilities $P(C_1 | x_{new})$ and $P(C_2 | x_{new})$. It assigns the label associated with the higher probability.
+* **Optimality:** No other classifier can outperform the Bayes Optimal Classifier. It achieves the theoretical minimum error rate.
+
+#### **Bayes Error (Irreducible Error)**
+* **Definition:** Even the perfect classifier will make mistakes because classes often overlap in the feature space. This inevitable error is called the **Bayes Error**.
+* **Cause:** It arises from inherent uncertainty, noise, or ambiguity in the data itself, not from the classifier's limitations.
+* **Goal of ML:** The objective of any machine learning algorithm is not to achieve 0% error (which is impossible) but to approach the Bayes Error limit as closely as possible.
+* **Formula:** The risk (expected error) is the integral of the minimum probability over the domain:
+    $$R^* = \int \min[P_1(x), P_2(x)] dx$$
+    If priors are equal, this simplifies to the integral of the overlap region.
+
+---
+
+### **2. Introduction to Graphical Models**
+
+The focus shifts to **Generative Models**, specifically **Graphical Models** (also known as Bayesian Networks).
+
+* **Motivation:**
+    * A full multivariate Gaussian model requires estimating a **Covariance Matrix** with $D \times D$ elements.
+    * The number of parameters grows quadratically ($O(D^2)$), which corresponds to $\frac{D(D+1)}{2}$ parameters.
+    * For high-dimensional data (like images with millions of pixels), estimating these parameters requires an enormous amount of data, which is often infeasible.
+* **Solution:** Use **Prior Knowledge** to simplify the model. If we know that certain variables are independent, we can set their covariance terms to zero, significantly reducing the number of parameters to learn.
+
+---
+
+### **3. The Chain Rule and Independence**
+
+Graphical models leverage the **Chain Rule of Probability** to decompose a complex joint distribution into simpler conditional probabilities.
+
+* **General Chain Rule:**
+    $$P(x_1, ..., x_D) = P(x_1) P(x_2|x_1) P(x_3|x_1, x_2) ... P(x_D|x_1...x_{D-1})$$
+* **Simplification with Independence:**
+    If variable $x_3$ depends only on $x_1$ and is independent of $x_2$, then $P(x_3|x_1, x_2)$ simplifies to $P(x_3|x_1)$.
+* **Structure:** This creates a **Directed Acyclic Graph (DAG)** (or Bayes Network) where:
+    * **Nodes** represent random variables.
+    * **Edges (Arrows)** represent conditional dependencies (causality).
+
+
+
+---
+
+### **4. Building a Bayesian Network (Causal Graph)**
+
+The lecture illustrates this with a practical example involving a crying baby.
+
+* **Scenario:** We want to model the causes of a baby crying.
+* **Variables:**
+    * **Cry:** The observable effect.
+    * **Hungry, Sick, Diaper:** Direct causes of crying.
+    * **Pororo:** A distractor (e.g., watching a cartoon) that might stop the crying.
+* **Dependencies:**
+    * "Hungry" and "Sick" might be independent of each other generally.
+    * "Cry" depends on all of them.
+    * "Pororo" depends on "Cry" (parent turns on TV *because* baby is crying) or affects "Cry".
+
+
+
+---
+
+### **5. The Three Canonical Patterns of Independence**
+
+To understand complex graphs, we decompose them into three fundamental 3-node patterns. Understanding these patterns allows us to determine if variables are independent given some evidence.
+
+#### **1. Tail-to-Tail (Common Cause)**
+* **Structure:** $X \leftarrow Z \rightarrow Y$ (Z causes both X and Y).
+* **Property:** $X$ and $Y$ are dependent. However, if $Z$ is observed (given), $X$ and $Y$ become **independent**.
+* **Example:** If $Z$ (Cause) determines both $X$ and $Y$, knowing $Z$ explains the correlation, decoupling $X$ and $Y$.
+
+#### **2. Head-to-Tail (Causal Chain)**
+* **Structure:** $X \rightarrow Z \rightarrow Y$ (X causes Z, which causes Y).
+* **Property:** $X$ and $Y$ are dependent. If $Z$ is observed, the path is blocked, and $X$ and $Y$ become **independent**.
+* **Example:** $X$ influences $Y$ only through $Z$. If $Z$ is fixed, $X$ cannot influence $Y$ further.
+
+#### **3. Head-to-Head (Common Effect / V-Structure)**
+* **Structure:** $X \rightarrow Z \leftarrow Y$ (X and Y both cause Z).
+* **Property:** **Crucial Difference.** $X$ and $Y$ are naturally **independent** (marginal independence). However, if $Z$ is observed (or a descendant is observed), they become **dependent** ("explaining away").
+* **Example:** $X$ (Hungry) $\rightarrow$ $Z$ (Cry) $\leftarrow$ $Y$ (Sick).
+    * Being hungry tells us nothing about being sick (Independent).
+    * But if we *know* the baby is crying ($Z$ observed): finding out the baby is Hungry ($X$) makes it less likely they are Sick ($Y$). The causes compete to explain the effect.
+
+---
+
+### **6. D-Separation**
+
+These rules form the basis of **D-separation** (Directed Separation), a formal method to determine conditional independence in any directed graph.
+* If all paths between two variables are "blocked" by the evidence set, the variables are D-separated (independent).
+* A path is blocked if:
+    * It contains a chain or fork where the middle node is **observed**.
+    * It contains a collider where the middle node (and all its descendants) are **NOT observed**.