add to final

This commit is contained in:
2025-12-06 18:32:08 +09:00
parent ac1d2e744d
commit 0fc412e690
21 changed files with 935 additions and 0 deletions

104
final/1110.md Normal file
View File

@@ -0,0 +1,104 @@
# Study Guide: Generative Methods & Multivariate Gaussian Distributions
**Date:** 2025.12.01
**Topic:** Generative vs. Discriminative Models, Multivariate Gaussian Properties, Conditional and Marginal Distributions.
---
### **1. Generative vs. Discriminative Methods**
The lecture begins by contrasting the new topic (Generative Methods) with previous topics (Discriminative Methods like Linear Regression, Logistic Regression, and SVM).
* **Discriminative Methods (Separating):**
* These methods focus on finding a boundary (separating line or hyperplane) between classes.
* **Limitation:** They cannot generate new data samples because they do not model the data distribution; they only know the boundary.
* **Hypothesis:** They assume a linear line or function as the hypothesis to separate data.
* **Generative Methods (Inferring Distribution):**
* **Goal:** To infer the **underlying distribution** (the rule or pattern) from which the data samples were drawn.
* **Assumption:** Data is not random; it follows a specific probabilistic structure (e.g., drawn from a distribution).
* **Capabilities:** Once the Joint Probability Distribution (underlying distribution) is known:
1. **Classification:** Can be performed using Bayes' Rule.
2. **Generation:** New samples can be created that follow the same patterns as the training data (e.g., generating new images or text).
---
### **2. The Gaussian (Normal) Distribution**
The Gaussian distribution is the most popular choice for modeling the "hypothesis" of the underlying distribution in generative models.
#### **Why Gaussian?**
1. **Simplicity:** Defined entirely by two parameters: Mean ($\mu$) and Covariance ($\Sigma$).
2. **Central Limit Theorem:** Sums of independent random events tend to follow a Gaussian distribution.
3. **Mathematical "Closure":** The most critical reason for its use in AI is that **Conditional** and **Marginal** distributions of a Multivariate Gaussian are *also* Gaussian.
#### **Multivariate Gaussian Definition**
For a $D$-dimensional vector $x$:
$$P(x) = \frac{1}{(2\pi)^{D/2} |\Sigma|^{1/2}} \exp\left(-\frac{1}{2} (x-\mu)^T \Sigma^{-1} (x-\mu)\right)$$
* $\mu$: Mean vector ($D$-dimensional).
* $\Sigma$: Covariance Matrix ($D \times D$).
[Image of multivariate gaussian distribution 3d plot]
#### **Properties of the Covariance Matrix ($\Sigma$)**
* **Symmetric:** $\Sigma_{ij} = \Sigma_{ji}$.
* **Positive Definite:** All eigenvalues are positive.
* **Diagonal Terms:** Represent the variance of individual variables.
* **Off-Diagonal Terms:** Represent the correlation (covariance) between variables.
* If $\sigma_{12} = 0$, the variables are **independent** (for Gaussians).
* The matrix shape determines the geometry of the distribution contours (spherical vs. elliptical).
---
### **3. Independence and Factorization**
If the Covariance Matrix is **diagonal** (all off-diagonal elements are 0), the variables are independent.
* Mathematically, the inverse matrix $\Sigma^{-1}$ is also diagonal.
* The joint probability factorizes into the product of marginals:
$$P(x_1, x_2) = P(x_1)P(x_2)$$
* The "quadratic form" inside the exponential splits into a sum of separate squared terms.
---
### **4. Conditional Gaussian Distribution**
The lecture derives what happens when we observe a subset of variables (e.g., $x_2$) and want to determine the distribution of the remaining variables ($x_1$). This is $P(x_1 | x_2)$.
* **Concept:** Visually, this is equivalent to "slicing" the joint distribution at a specific value of $x_2$ (fixed constant).
* **Result:** The resulting cross-section is **also a Gaussian distribution**.
* **Parameters:** If we partition $x$, $\mu$, and $\Sigma$ into subsets, the conditional mean ($\mu_{1|2}$) and covariance ($\Sigma_{1|2}$) are given by:
* **Mean:** $\mu_{1|2} = \mu_1 + \Sigma_{12}\Sigma_{22}^{-1}(x_2 - \mu_2)$.
* **Covariance:** $\Sigma_{1|2} = \Sigma_{11} - \Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21}$.
*(Note: The derivation involves completing the square to identify the Gaussian form).*
---
### **5. Marginal Gaussian Distribution**
The lecture explains how to find the distribution of a subset of variables ($x_1$) by ignoring the others ($x_2$). This is $P(x_1)$.
* **Concept:** This is equivalent to integrating out the unobserved variables:
$$P(x_1) = \int P(x_1, x_2) dx_2$$
* **Result:** The marginal distribution is **also a Gaussian distribution**.
* **Parameters:** Unlike the conditional case, calculating the marginal parameters is trivial. You simply select the corresponding sub-vector and sub-matrix from the joint parameters.
* Mean: $\mu_1$.
* Covariance: $\Sigma_{11}$.
### **Summary Table**
| Distribution | Type | Parameters Derived From Joint $(\mu, \Sigma)$ |
| :--- | :--- | :--- |
| **Joint** $P(x)$ | Gaussian | Given as $\mu, \Sigma$ |
| **Conditional** $P(x_1 \| x_2)$ | Gaussian | Complex formula (involves matrix inversion of $\Sigma_{22}$) |
| **Marginal** $P(x_1)$ | Gaussian | Simple subset (extract $\mu_1$ and $\Sigma_{11}$) |
The lecture concludes by emphasizing that understanding these Gaussian properties is essential for the second half of the semester, as they form the basis for probabilistic generative models.