# Study Guide: Generative Methods & Multivariate Gaussian Distributions **Date:** 2025.12.01 **Topic:** Generative vs. Discriminative Models, Multivariate Gaussian Properties, Conditional and Marginal Distributions. --- ### **1. Generative vs. Discriminative Methods** The lecture begins by contrasting the new topic (Generative Methods) with previous topics (Discriminative Methods like Linear Regression, Logistic Regression, and SVM). * **Discriminative Methods (Separating):** * These methods focus on finding a boundary (separating line or hyperplane) between classes. * **Limitation:** They cannot generate new data samples because they do not model the data distribution; they only know the boundary. * **Hypothesis:** They assume a linear line or function as the hypothesis to separate data. * **Generative Methods (Inferring Distribution):** * **Goal:** To infer the **underlying distribution** (the rule or pattern) from which the data samples were drawn. * **Assumption:** Data is not random; it follows a specific probabilistic structure (e.g., drawn from a distribution). * **Capabilities:** Once the Joint Probability Distribution (underlying distribution) is known: 1. **Classification:** Can be performed using Bayes' Rule. 2. **Generation:** New samples can be created that follow the same patterns as the training data (e.g., generating new images or text). --- ### **2. The Gaussian (Normal) Distribution** The Gaussian distribution is the most popular choice for modeling the "hypothesis" of the underlying distribution in generative models. #### **Why Gaussian?** 1. **Simplicity:** Defined entirely by two parameters: Mean ($\mu$) and Covariance ($\Sigma$). 2. **Central Limit Theorem:** Sums of independent random events tend to follow a Gaussian distribution. 3. **Mathematical "Closure":** The most critical reason for its use in AI is that **Conditional** and **Marginal** distributions of a Multivariate Gaussian are *also* Gaussian. #### **Multivariate Gaussian Definition** For a $D$-dimensional vector $x$: $$P(x) = \frac{1}{(2\pi)^{D/2} |\Sigma|^{1/2}} \exp\left(-\frac{1}{2} (x-\mu)^T \Sigma^{-1} (x-\mu)\right)$$ * $\mu$: Mean vector ($D$-dimensional). * $\Sigma$: Covariance Matrix ($D \times D$). [Image of multivariate gaussian distribution 3d plot] #### **Properties of the Covariance Matrix ($\Sigma$)** * **Symmetric:** $\Sigma_{ij} = \Sigma_{ji}$. * **Positive Definite:** All eigenvalues are positive. * **Diagonal Terms:** Represent the variance of individual variables. * **Off-Diagonal Terms:** Represent the correlation (covariance) between variables. * If $\sigma_{12} = 0$, the variables are **independent** (for Gaussians). * The matrix shape determines the geometry of the distribution contours (spherical vs. elliptical). --- ### **3. Independence and Factorization** If the Covariance Matrix is **diagonal** (all off-diagonal elements are 0), the variables are independent. * Mathematically, the inverse matrix $\Sigma^{-1}$ is also diagonal. * The joint probability factorizes into the product of marginals: $$P(x_1, x_2) = P(x_1)P(x_2)$$ * The "quadratic form" inside the exponential splits into a sum of separate squared terms. --- ### **4. Conditional Gaussian Distribution** The lecture derives what happens when we observe a subset of variables (e.g., $x_2$) and want to determine the distribution of the remaining variables ($x_1$). This is $P(x_1 | x_2)$. * **Concept:** Visually, this is equivalent to "slicing" the joint distribution at a specific value of $x_2$ (fixed constant). * **Result:** The resulting cross-section is **also a Gaussian distribution**. * **Parameters:** If we partition $x$, $\mu$, and $\Sigma$ into subsets, the conditional mean ($\mu_{1|2}$) and covariance ($\Sigma_{1|2}$) are given by: * **Mean:** $\mu_{1|2} = \mu_1 + \Sigma_{12}\Sigma_{22}^{-1}(x_2 - \mu_2)$. * **Covariance:** $\Sigma_{1|2} = \Sigma_{11} - \Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21}$. *(Note: The derivation involves completing the square to identify the Gaussian form).* --- ### **5. Marginal Gaussian Distribution** The lecture explains how to find the distribution of a subset of variables ($x_1$) by ignoring the others ($x_2$). This is $P(x_1)$. * **Concept:** This is equivalent to integrating out the unobserved variables: $$P(x_1) = \int P(x_1, x_2) dx_2$$ * **Result:** The marginal distribution is **also a Gaussian distribution**. * **Parameters:** Unlike the conditional case, calculating the marginal parameters is trivial. You simply select the corresponding sub-vector and sub-matrix from the joint parameters. * Mean: $\mu_1$. * Covariance: $\Sigma_{11}$. ### **Summary Table** | Distribution | Type | Parameters Derived From Joint $(\mu, \Sigma)$ | | :--- | :--- | :--- | | **Joint** $P(x)$ | Gaussian | Given as $\mu, \Sigma$ | | **Conditional** $P(x_1 \| x_2)$ | Gaussian | Complex formula (involves matrix inversion of $\Sigma_{22}$) | | **Marginal** $P(x_1)$ | Gaussian | Simple subset (extract $\mu_1$ and $\Sigma_{11}$) | The lecture concludes by emphasizing that understanding these Gaussian properties is essential for the second half of the semester, as they form the basis for probabilistic generative models.