add to final

2025-12-06 18:32:08 +09:00
parent ac1d2e744d
commit 0fc412e690
21 changed files with 935 additions and 0 deletions
--- a/final/1027.md
+++ b/final/1027.md
@@ -0,0 +1,68 @@
+# Large Margin Classifiers and Optimization
+
+**Date:** 2025.10.27
+**Topic:** Large Margin Classifiers, Optimization, Margin Definition
+
+---
+
+### 1. Introduction to Robust Classification
+The lecture begins by shifting focus from generative methods to discriminative methods, specifically within a **linearly separable setting**.
+* **Problem Setting:** The goal is to classify data that can be perfectly separated by a linear boundary (hyperplane).
+* **Robustness:** While infinite linear classifiers may separate the data, the objective is to find the "best" one. The best classifier is defined as the one that is most **robust**, meaning it generalizes well to new test data and handles potential outliers effectively.
+* **Intuition:** A robust classifier places the decision boundary in the middle of the gap between classes, maximizing the distance to the nearest data points.
+
+### 2. Defining the Margin
+The concept of the **margin** is introduced to mathematically define robustness.
+* **Definition:** The margin is the distance between the decision hyperplane and the closest data points.
+* **Hyperplane Equation:** The decision boundary is defined as $w^T x - b = 0$.
+* **Support Lines:** To define the margin, we establish two parallel lines passing through the closest data points:
+    * $w^T x - b = 1$ (for class +1).
+    * $w^T x - b = -1$ (for class -1).
+    * The region between these lines contains no data points.
+
+### 3. Calculating the Margin Width
+The lecture derives the mathematical expression for the margin width using vector projection.
+* **Vector Projection:** The margin is calculated by projecting the vector connecting a point on the boundary ($x_0$) to a support vector ($x$) onto the normal vector $w$.
+* **Derivation:**
+    * The distance is the projection of vector $(x - x_0)$ onto the unit normal vector $\frac{w}{||w||}$.
+    * Using the constraint $w^T x - b = 1$ and $w^T x_0 - b = 0$, the derived margin distance is $\frac{1}{||w||}$.
+* **Conclusion:** Maximizing the margin is equivalent to **minimizing the norm of the weight vector $||w||$**.
+
+### 4. The Optimization Problem
+The task of finding the best classifier is formulated as a constrained optimization problem.
+
+* **Objective Function:**
+    $$\min ||w||^2$$
+    (Note: Minimizing $||w||$ is computationally equivalent to minimizing $||w||^2$)
+
+* **Constraints:** All data points must be correctly classified and lie outside the margin. This is formalized as:
+    * $w^T x_i - b \ge 1$ for $y_i = 1$.
+    * $w^T x_i - b \le -1$ for $y_i = -1$.
+    * **Combined Constraint:** $y_i (w^T x_i - b) \ge 1$ for all $i$.
+
+### 5. Optimization with Constraints (Lagrange Multipliers)
+The lecture explains how to solve this optimization problem using **Lagrange Multipliers**, using a general example first.
+
+* **Problem Setup:** Minimize an objective function $L(x)$ subject to a constraint $g(x) \ge 0$.
+* **Lagrangian:** A new objective function is defined by combining the original loss and the constraint with a multiplier $\lambda$:
+    $$L'(x) = L(x) - \lambda g(x)$$
+    (Note: The transcript discusses combining components; the sign depends on the specific maximization/minimization formulation)
+
+* **Solution Cases:**
+    The solution involves taking the derivative $\frac{dL'}{dx} = 0$ and considering two cases:
+    1.  **Feasible Region ($\lambda = 0$):** The unconstrained minimum of $L(x)$ naturally satisfies the constraint ($g(x) > 0$). In this case, the constraint is inactive.
+    2.  **Boundary Case ($\lambda > 0$):** The unconstrained minimum violates the constraint. Therefore, the optimal solution lies *on* the boundary where $g(x) = 0$.
+
+### 6. Example: Constrained Minimization
+A specific mathematical example is worked through to demonstrate the method.
+* **Objective:** Minimize $x_1^2 + x_2^2$ (distance from origin).
+* **Constraint:** $x_2 - x_1^2 - 1 \ge 0$ (must be above a parabola).
+* **Solving:**
+    * The Lagrangian is set up: $L' = x_1^2 + x_2^2 - \lambda(x_2 - x_1^2 - 1)$.
+    * **Case 1 ($\lambda = 0$):** Leads to $x_1=0, x_2=0$, which violates the constraint ($0 - 0 - 1 = -1 \not\ge 0$). This solution is discarded.
+    * **Case 2 (Boundary, $\lambda \ne 0$):** The solution must lie on $x_2 - x_1^2 - 1 = 0$. Solving the system of equations yields the valid minimum at $x_1=0, x_2=1$.
+
+### 7. Next Steps: Support Vector Machines
+The lecture concludes by linking this optimization framework back to the classifier.
+* **Support Vectors:** The data points that lie exactly on the margin boundary ($g(x)=0$) are called "Support Vectors".
+* **Future Topic:** This foundation leads into the **Support Vector Machine (SVM)** algorithm, which will be discussed in the next session to handle non-linearly separable data.