add to final
This commit is contained in:
92
final/1106.md
Normal file
92
final/1106.md
Normal file
@@ -0,0 +1,92 @@
|
||||
# Lecture Summary: Generative Methods & Probability Review
|
||||
|
||||
**Date:** 2025.11.06
|
||||
**Topic:** Discriminative vs. Generative Models, Probability Theory, Probabilistic Inference, and Gaussian Distributions.
|
||||
|
||||
---
|
||||
|
||||
### 1. Classification Approaches: Discriminative vs. Generative
|
||||
|
||||
The lecture begins by distinguishing between two fundamental approaches to machine learning classification, specifically for binary problems (labels 0 or 1).
|
||||
|
||||
#### **Discriminative Methods (e.g., Logistic Regression)**
|
||||
* **Goal:** Directly model the decision boundary or the conditional probability $P(y|x)$.
|
||||
* **Mechanism:** Focuses on distinguishing classes. It learns a function that maps inputs $x$ directly to class labels $y$.
|
||||
* **Limitation:** It does not model the underlying distribution of the data itself.
|
||||
|
||||
#### **Generative Methods**
|
||||
* **Goal:** Model the joint probability or the class-conditional density $P(x|y)$ and the class prior $P(y)$.
|
||||
* **Mechanism:** It learns "how the data is generated" for each class.
|
||||
* **Classification:** To classify a new point, it uses **Bayes' Rule** to invert the probabilities:
|
||||
$$P(y|x) = \frac{P(x|y)P(y)}{P(x)}$$
|
||||
* **Advantage:** If you know the generative model, you can solve the classification problem *and* generate new data samples.
|
||||
|
||||
---
|
||||
|
||||
### 2. Probability Theory Review
|
||||
|
||||
To understand Generative Methods, a strong foundation in probability is required.
|
||||
|
||||
#### **Random Variables**
|
||||
* **Definition:** A random variable is technically a **function** (mapping) that assigns a real number to an outcome (event $\omega$) in the sample space $\Omega$.
|
||||
* **Example:** Tossing a coin 4 times. An event might be "HHTH", and the random variable $X(\omega)$ could be "number of heads" (which equals 3).
|
||||
|
||||
#### **Probability vs. Probability Density Function (PDF)**
|
||||
The lecture emphasizes distinguishing between discrete probability ($P$) and continuous density ($p$).
|
||||
|
||||
* **Discrete Probability ($P$):** Defined as the ratio of cardinalities (counts) or areas in discrete sets (e.g., Venn diagrams).
|
||||
* **Probability Density Function ($p$):** Used for continuous variables.
|
||||
* **Properties:** $p(x) \ge 0$ for all $x$, and $\int p(x)dx = 1$.
|
||||
* **Relationship:** The probability of $x$ falling within a range is the **integral** (area under the curve) of the PDF. The probability of a specific point $P(x=x_0)$ is 0.
|
||||
|
||||
#### **Key Statistics**
|
||||
* **Expectation ($E[x]$):** The mean or weighted average of a random variable.
|
||||
$$E[x] = \int x p(x) dx$$
|
||||
* **Covariance:** Measures the spread or variance of the data. For vectors, this results in a Covariance Matrix.
|
||||
$$Cov[x] = E[(x - \mu)(x - \mu)^T]$$
|
||||
|
||||
---
|
||||
|
||||
### 3. The Trinity of Distributions: Joint, Conditional, and Marginal
|
||||
|
||||
Understanding the relationship between these three is crucial for probabilistic modeling.
|
||||
|
||||
#### **Joint PDF ($P(x_1, x_2)$)**
|
||||
* This represents the probability of $x_1$ and $x_2$ occurring together.
|
||||
* **Importance:** If you know the Joint PDF, you know *everything* about the system. You can derive all other probabilities (marginal, conditional) from it.
|
||||
|
||||
#### **Conditional PDF ($P(x_1 | x_2)$)**
|
||||
* Represents the probability of $x_1$ given that $x_2$ is fixed to a specific value.
|
||||
* Visually, this is like taking a "slice" of the joint distribution 3D surface at $x_2 = a$.
|
||||
|
||||
#### **Marginal PDF ($P(x_1)$)**
|
||||
* Represents the probability of $x_1$ regardless of $x_2$.
|
||||
* **Calculation:** You "marginalize out" (integrate or sum) the other variables.
|
||||
* Continuous: $P(x_1) = \int P(x_1, x_2) dx_2$.
|
||||
* Discrete: Summing rows or columns in a probability table.
|
||||
|
||||
---
|
||||
|
||||
### 4. Probabilistic Inference
|
||||
|
||||
**Inference** is defined as calculating a desired probability (e.g., a prediction) starting from the Joint Probability function using rules like Bayes' theorem and marginalization.
|
||||
|
||||
#### **Handling Missing Data**
|
||||
A major practical benefit of generative models (Joint PDF modeling) over discriminative models (like Logistic Regression) is robust handling of missing data.
|
||||
* **Scenario:** You have a model predicting disease ($y$) based on Age ($x_1$), Blood Pressure ($x_2$), and Oxygen ($x_3$).
|
||||
* **Problem:** A patient arrives, but you cannot measure Age ($x_1$). A discriminative model might fail or require value imputation (guessing averages).
|
||||
* **Probabilistic Solution:** You integrate (marginalize) out the missing variable $x_1$ from the joint distribution to get the probability based only on observed data:
|
||||
$$P(y | x_2, x_3) = \frac{\int p(x_1, x_2, x_3, y) dx_1}{P(x_2, x_3)}$$.
|
||||
|
||||
---
|
||||
|
||||
### 5. The Gaussian Distribution
|
||||
|
||||
The lecture concludes with a review of the Gaussian (Normal) distribution, the most important function in AI/ML.
|
||||
|
||||
* **Univariate Gaussian:** Defined by mean $\mu$ and variance $\sigma^2$.
|
||||
* **Multivariate Gaussian:** Defined for a vector $x \in R^D$.
|
||||
$$P(x) = \frac{1}{(2\pi)^{D/2} |\Sigma|^{1/2}} \exp\left(-\frac{1}{2} (x-\mu)^T \Sigma^{-1} (x-\mu)\right)$$.
|
||||
* **Parameters:**
|
||||
* $\mu$: Mean vector ($D$-dimensional).
|
||||
* $\Sigma$: Covariance Matrix ($D \times D$). It must be **Symmetric** and **Positive Definite**.
|
||||
Reference in New Issue
Block a user