Files
2025-02-AI/final/1106.md
2025-12-06 18:32:08 +09:00

5.0 KiB

Lecture Summary: Generative Methods & Probability Review

Date: 2025.11.06 Topic: Discriminative vs. Generative Models, Probability Theory, Probabilistic Inference, and Gaussian Distributions.


1. Classification Approaches: Discriminative vs. Generative

The lecture begins by distinguishing between two fundamental approaches to machine learning classification, specifically for binary problems (labels 0 or 1).

Discriminative Methods (e.g., Logistic Regression)

  • Goal: Directly model the decision boundary or the conditional probability P(y|x).
  • Mechanism: Focuses on distinguishing classes. It learns a function that maps inputs x directly to class labels y.
  • Limitation: It does not model the underlying distribution of the data itself.

Generative Methods

  • Goal: Model the joint probability or the class-conditional density P(x|y) and the class prior P(y).
  • Mechanism: It learns "how the data is generated" for each class.
  • Classification: To classify a new point, it uses Bayes' Rule to invert the probabilities: P(y|x) = \frac{P(x|y)P(y)}{P(x)}
  • Advantage: If you know the generative model, you can solve the classification problem and generate new data samples.

2. Probability Theory Review

To understand Generative Methods, a strong foundation in probability is required.

Random Variables

  • Definition: A random variable is technically a function (mapping) that assigns a real number to an outcome (event \omega) in the sample space \Omega.
  • Example: Tossing a coin 4 times. An event might be "HHTH", and the random variable X(\omega) could be "number of heads" (which equals 3).

Probability vs. Probability Density Function (PDF)

The lecture emphasizes distinguishing between discrete probability (P) and continuous density (p).

  • Discrete Probability (P): Defined as the ratio of cardinalities (counts) or areas in discrete sets (e.g., Venn diagrams).
    • Probability Density Function (p): Used for continuous variables.
    • Properties: p(x) \ge 0 for all x, and \int p(x)dx = 1.
    • Relationship: The probability of x falling within a range is the integral (area under the curve) of the PDF. The probability of a specific point P(x=x_0) is 0.

Key Statistics

  • Expectation (E[x]): The mean or weighted average of a random variable. E[x] = \int x p(x) dx
  • Covariance: Measures the spread or variance of the data. For vectors, this results in a Covariance Matrix. Cov[x] = E[(x - \mu)(x - \mu)^T]

3. The Trinity of Distributions: Joint, Conditional, and Marginal

Understanding the relationship between these three is crucial for probabilistic modeling.

Joint PDF (P(x_1, x_2))

  • This represents the probability of x_1 and x_2 occurring together.
  • Importance: If you know the Joint PDF, you know everything about the system. You can derive all other probabilities (marginal, conditional) from it.

Conditional PDF (P(x_1 | x_2))

  • Represents the probability of x_1 given that x_2 is fixed to a specific value.
  • Visually, this is like taking a "slice" of the joint distribution 3D surface at x_2 = a.

Marginal PDF (P(x_1))

  • Represents the probability of x_1 regardless of x_2.
  • Calculation: You "marginalize out" (integrate or sum) the other variables.
    • Continuous: P(x_1) = \int P(x_1, x_2) dx_2.
    • Discrete: Summing rows or columns in a probability table.

4. Probabilistic Inference

Inference is defined as calculating a desired probability (e.g., a prediction) starting from the Joint Probability function using rules like Bayes' theorem and marginalization.

Handling Missing Data

A major practical benefit of generative models (Joint PDF modeling) over discriminative models (like Logistic Regression) is robust handling of missing data.

  • Scenario: You have a model predicting disease (y) based on Age (x_1), Blood Pressure (x_2), and Oxygen (x_3).
  • Problem: A patient arrives, but you cannot measure Age (x_1). A discriminative model might fail or require value imputation (guessing averages).
  • Probabilistic Solution: You integrate (marginalize) out the missing variable x_1 from the joint distribution to get the probability based only on observed data: P(y | x_2, x_3) = \frac{\int p(x_1, x_2, x_3, y) dx_1}{P(x_2, x_3)}.

5. The Gaussian Distribution

The lecture concludes with a review of the Gaussian (Normal) distribution, the most important function in AI/ML.

  • Univariate Gaussian: Defined by mean \mu and variance \sigma^2.
  • Multivariate Gaussian: Defined for a vector x \in R^D. P(x) = \frac{1}{(2\pi)^{D/2} |\Sigma|^{1/2}} \exp\left(-\frac{1}{2} (x-\mu)^T \Sigma^{-1} (x-\mu)\right).
  • Parameters:
    • \mu: Mean vector ($D$-dimensional).
    • \Sigma: Covariance Matrix (D \times D). It must be Symmetric and Positive Definite.