# Study Guide: Generative Methods & Multivariate Gaussian Distributions

**Date:** 2025.12.01
**Topic:** Generative vs. Discriminative Models, Multivariate Gaussian Properties, Conditional and Marginal Distributions.

---

### **1. Generative vs. Discriminative Methods**

The lecture begins by contrasting the new topic (Generative Methods) with previous topics (Discriminative Methods like Linear Regression, Logistic Regression, and SVM).

* **Discriminative Methods (Separating):**
    * These methods focus on finding a boundary (separating line or hyperplane) between classes.
    * **Limitation:** They cannot generate new data samples because they do not model the data distribution; they only know the boundary.
    * **Hypothesis:** They assume a linear line or function as the hypothesis to separate data.

* **Generative Methods (Inferring Distribution):**
    * **Goal:** To infer the **underlying distribution** (the rule or pattern) from which the data samples were drawn.
    * **Assumption:** Data is not random; it follows a specific probabilistic structure (e.g., drawn from a distribution).
    * **Capabilities:** Once the Joint Probability Distribution (underlying distribution) is known:
        1.  **Classification:** Can be performed using Bayes' Rule.
        2.  **Generation:** New samples can be created that follow the same patterns as the training data (e.g., generating new images or text).


---

### **2. The Gaussian (Normal) Distribution**

The Gaussian distribution is the most popular choice for modeling the "hypothesis" of the underlying distribution in generative models.

#### **Why Gaussian?**
1.  **Simplicity:** Defined entirely by two parameters: Mean ($\mu$) and Covariance ($\Sigma$).
2.  **Central Limit Theorem:** Sums of independent random events tend to follow a Gaussian distribution.
3.  **Mathematical "Closure":** The most critical reason for its use in AI is that **Conditional** and **Marginal** distributions of a Multivariate Gaussian are *also* Gaussian.

#### **Multivariate Gaussian Definition**
For a $D$-dimensional vector $x$:
$$P(x) = \frac{1}{(2\pi)^{D/2} |\Sigma|^{1/2}} \exp\left(-\frac{1}{2} (x-\mu)^T \Sigma^{-1} (x-\mu)\right)$$
* $\mu$: Mean vector ($D$-dimensional).
* $\Sigma$: Covariance Matrix ($D \times D$).


[Image of multivariate gaussian distribution 3d plot]


#### **Properties of the Covariance Matrix ($\Sigma$)**
* **Symmetric:** $\Sigma_{ij} = \Sigma_{ji}$.
* **Positive Definite:** All eigenvalues are positive.
* **Diagonal Terms:** Represent the variance of individual variables.
* **Off-Diagonal Terms:** Represent the correlation (covariance) between variables.
    * If $\sigma_{12} = 0$, the variables are **independent** (for Gaussians).
    * The matrix shape determines the geometry of the distribution contours (spherical vs. elliptical).

---

### **3. Independence and Factorization**

If the Covariance Matrix is **diagonal** (all off-diagonal elements are 0), the variables are independent.
* Mathematically, the inverse matrix $\Sigma^{-1}$ is also diagonal.
* The joint probability factorizes into the product of marginals:
    $$P(x_1, x_2) = P(x_1)P(x_2)$$
* The "quadratic form" inside the exponential splits into a sum of separate squared terms.

---

### **4. Conditional Gaussian Distribution**

The lecture derives what happens when we observe a subset of variables (e.g., $x_2$) and want to determine the distribution of the remaining variables ($x_1$). This is $P(x_1 | x_2)$.

* **Concept:** Visually, this is equivalent to "slicing" the joint distribution at a specific value of $x_2$ (fixed constant).
* **Result:** The resulting cross-section is **also a Gaussian distribution**.
* **Parameters:** If we partition $x$, $\mu$, and $\Sigma$ into subsets, the conditional mean ($\mu_{1|2}$) and covariance ($\Sigma_{1|2}$) are given by:
    * **Mean:** $\mu_{1|2} = \mu_1 + \Sigma_{12}\Sigma_{22}^{-1}(x_2 - \mu_2)$.
    * **Covariance:** $\Sigma_{1|2} = \Sigma_{11} - \Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21}$.
    *(Note: The derivation involves completing the square to identify the Gaussian form).*


---

### **5. Marginal Gaussian Distribution**

The lecture explains how to find the distribution of a subset of variables ($x_1$) by ignoring the others ($x_2$). This is $P(x_1)$.

* **Concept:** This is equivalent to integrating out the unobserved variables:
    $$P(x_1) = \int P(x_1, x_2) dx_2$$
* **Result:** The marginal distribution is **also a Gaussian distribution**.
* **Parameters:** Unlike the conditional case, calculating the marginal parameters is trivial. You simply select the corresponding sub-vector and sub-matrix from the joint parameters.
    * Mean: $\mu_1$.
    * Covariance: $\Sigma_{11}$.


### **Summary Table**

| Distribution | Type | Parameters Derived From Joint $(\mu, \Sigma)$ |
| :--- | :--- | :--- |
| **Joint** $P(x)$ | Gaussian | Given as $\mu, \Sigma$ |
| **Conditional** $P(x_1 \| x_2)$ | Gaussian | Complex formula (involves matrix inversion of $\Sigma_{22}$) |
| **Marginal** $P(x_1)$ | Gaussian | Simple subset (extract $\mu_1$ and $\Sigma_{11}$) |

The lecture concludes by emphasizing that understanding these Gaussian properties is essential for the second half of the semester, as they form the basis for probabilistic generative models.