105 lines
5.2 KiB
Markdown
105 lines
5.2 KiB
Markdown
# Study Guide: Generative Methods & Multivariate Gaussian Distributions
|
|
|
|
**Date:** 2025.12.01
|
|
**Topic:** Generative vs. Discriminative Models, Multivariate Gaussian Properties, Conditional and Marginal Distributions.
|
|
|
|
---
|
|
|
|
### **1. Generative vs. Discriminative Methods**
|
|
|
|
The lecture begins by contrasting the new topic (Generative Methods) with previous topics (Discriminative Methods like Linear Regression, Logistic Regression, and SVM).
|
|
|
|
* **Discriminative Methods (Separating):**
|
|
* These methods focus on finding a boundary (separating line or hyperplane) between classes.
|
|
* **Limitation:** They cannot generate new data samples because they do not model the data distribution; they only know the boundary.
|
|
* **Hypothesis:** They assume a linear line or function as the hypothesis to separate data.
|
|
|
|
* **Generative Methods (Inferring Distribution):**
|
|
* **Goal:** To infer the **underlying distribution** (the rule or pattern) from which the data samples were drawn.
|
|
* **Assumption:** Data is not random; it follows a specific probabilistic structure (e.g., drawn from a distribution).
|
|
* **Capabilities:** Once the Joint Probability Distribution (underlying distribution) is known:
|
|
1. **Classification:** Can be performed using Bayes' Rule.
|
|
2. **Generation:** New samples can be created that follow the same patterns as the training data (e.g., generating new images or text).
|
|
|
|
|
|
|
|
---
|
|
|
|
### **2. The Gaussian (Normal) Distribution**
|
|
|
|
The Gaussian distribution is the most popular choice for modeling the "hypothesis" of the underlying distribution in generative models.
|
|
|
|
#### **Why Gaussian?**
|
|
1. **Simplicity:** Defined entirely by two parameters: Mean ($\mu$) and Covariance ($\Sigma$).
|
|
2. **Central Limit Theorem:** Sums of independent random events tend to follow a Gaussian distribution.
|
|
3. **Mathematical "Closure":** The most critical reason for its use in AI is that **Conditional** and **Marginal** distributions of a Multivariate Gaussian are *also* Gaussian.
|
|
|
|
#### **Multivariate Gaussian Definition**
|
|
For a $D$-dimensional vector $x$:
|
|
$$P(x) = \frac{1}{(2\pi)^{D/2} |\Sigma|^{1/2}} \exp\left(-\frac{1}{2} (x-\mu)^T \Sigma^{-1} (x-\mu)\right)$$
|
|
* $\mu$: Mean vector ($D$-dimensional).
|
|
* $\Sigma$: Covariance Matrix ($D \times D$).
|
|
|
|
|
|
|
|
[Image of multivariate gaussian distribution 3d plot]
|
|
|
|
|
|
#### **Properties of the Covariance Matrix ($\Sigma$)**
|
|
* **Symmetric:** $\Sigma_{ij} = \Sigma_{ji}$.
|
|
* **Positive Definite:** All eigenvalues are positive.
|
|
* **Diagonal Terms:** Represent the variance of individual variables.
|
|
* **Off-Diagonal Terms:** Represent the correlation (covariance) between variables.
|
|
* If $\sigma_{12} = 0$, the variables are **independent** (for Gaussians).
|
|
* The matrix shape determines the geometry of the distribution contours (spherical vs. elliptical).
|
|
|
|
---
|
|
|
|
### **3. Independence and Factorization**
|
|
|
|
If the Covariance Matrix is **diagonal** (all off-diagonal elements are 0), the variables are independent.
|
|
* Mathematically, the inverse matrix $\Sigma^{-1}$ is also diagonal.
|
|
* The joint probability factorizes into the product of marginals:
|
|
$$P(x_1, x_2) = P(x_1)P(x_2)$$
|
|
* The "quadratic form" inside the exponential splits into a sum of separate squared terms.
|
|
|
|
---
|
|
|
|
### **4. Conditional Gaussian Distribution**
|
|
|
|
The lecture derives what happens when we observe a subset of variables (e.g., $x_2$) and want to determine the distribution of the remaining variables ($x_1$). This is $P(x_1 | x_2)$.
|
|
|
|
* **Concept:** Visually, this is equivalent to "slicing" the joint distribution at a specific value of $x_2$ (fixed constant).
|
|
* **Result:** The resulting cross-section is **also a Gaussian distribution**.
|
|
* **Parameters:** If we partition $x$, $\mu$, and $\Sigma$ into subsets, the conditional mean ($\mu_{1|2}$) and covariance ($\Sigma_{1|2}$) are given by:
|
|
* **Mean:** $\mu_{1|2} = \mu_1 + \Sigma_{12}\Sigma_{22}^{-1}(x_2 - \mu_2)$.
|
|
* **Covariance:** $\Sigma_{1|2} = \Sigma_{11} - \Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21}$.
|
|
*(Note: The derivation involves completing the square to identify the Gaussian form).*
|
|
|
|
|
|
|
|
---
|
|
|
|
### **5. Marginal Gaussian Distribution**
|
|
|
|
The lecture explains how to find the distribution of a subset of variables ($x_1$) by ignoring the others ($x_2$). This is $P(x_1)$.
|
|
|
|
* **Concept:** This is equivalent to integrating out the unobserved variables:
|
|
$$P(x_1) = \int P(x_1, x_2) dx_2$$
|
|
* **Result:** The marginal distribution is **also a Gaussian distribution**.
|
|
* **Parameters:** Unlike the conditional case, calculating the marginal parameters is trivial. You simply select the corresponding sub-vector and sub-matrix from the joint parameters.
|
|
* Mean: $\mu_1$.
|
|
* Covariance: $\Sigma_{11}$.
|
|
|
|
|
|
|
|
### **Summary Table**
|
|
|
|
| Distribution | Type | Parameters Derived From Joint $(\mu, \Sigma)$ |
|
|
| :--- | :--- | :--- |
|
|
| **Joint** $P(x)$ | Gaussian | Given as $\mu, \Sigma$ |
|
|
| **Conditional** $P(x_1 \| x_2)$ | Gaussian | Complex formula (involves matrix inversion of $\Sigma_{22}$) |
|
|
| **Marginal** $P(x_1)$ | Gaussian | Simple subset (extract $\mu_1$ and $\Sigma_{11}$) |
|
|
|
|
The lecture concludes by emphasizing that understanding these Gaussian properties is essential for the second half of the semester, as they form the basis for probabilistic generative models.
|