# Large Margin Classifiers and Optimization **Date:** 2025.10.27 **Topic:** Large Margin Classifiers, Optimization, Margin Definition --- ### 1. Introduction to Robust Classification The lecture begins by shifting focus from generative methods to discriminative methods, specifically within a **linearly separable setting**. * **Problem Setting:** The goal is to classify data that can be perfectly separated by a linear boundary (hyperplane). * **Robustness:** While infinite linear classifiers may separate the data, the objective is to find the "best" one. The best classifier is defined as the one that is most **robust**, meaning it generalizes well to new test data and handles potential outliers effectively. * **Intuition:** A robust classifier places the decision boundary in the middle of the gap between classes, maximizing the distance to the nearest data points. ### 2. Defining the Margin The concept of the **margin** is introduced to mathematically define robustness. * **Definition:** The margin is the distance between the decision hyperplane and the closest data points. * **Hyperplane Equation:** The decision boundary is defined as $w^T x - b = 0$. * **Support Lines:** To define the margin, we establish two parallel lines passing through the closest data points: * $w^T x - b = 1$ (for class +1). * $w^T x - b = -1$ (for class -1). * The region between these lines contains no data points. ### 3. Calculating the Margin Width The lecture derives the mathematical expression for the margin width using vector projection. * **Vector Projection:** The margin is calculated by projecting the vector connecting a point on the boundary ($x_0$) to a support vector ($x$) onto the normal vector $w$. * **Derivation:** * The distance is the projection of vector $(x - x_0)$ onto the unit normal vector $\frac{w}{||w||}$. * Using the constraint $w^T x - b = 1$ and $w^T x_0 - b = 0$, the derived margin distance is $\frac{1}{||w||}$. * **Conclusion:** Maximizing the margin is equivalent to **minimizing the norm of the weight vector $||w||$**. ### 4. The Optimization Problem The task of finding the best classifier is formulated as a constrained optimization problem. * **Objective Function:** $$\min ||w||^2$$ (Note: Minimizing $||w||$ is computationally equivalent to minimizing $||w||^2$) * **Constraints:** All data points must be correctly classified and lie outside the margin. This is formalized as: * $w^T x_i - b \ge 1$ for $y_i = 1$. * $w^T x_i - b \le -1$ for $y_i = -1$. * **Combined Constraint:** $y_i (w^T x_i - b) \ge 1$ for all $i$. ### 5. Optimization with Constraints (Lagrange Multipliers) The lecture explains how to solve this optimization problem using **Lagrange Multipliers**, using a general example first. * **Problem Setup:** Minimize an objective function $L(x)$ subject to a constraint $g(x) \ge 0$. * **Lagrangian:** A new objective function is defined by combining the original loss and the constraint with a multiplier $\lambda$: $$L'(x) = L(x) - \lambda g(x)$$ (Note: The transcript discusses combining components; the sign depends on the specific maximization/minimization formulation) * **Solution Cases:** The solution involves taking the derivative $\frac{dL'}{dx} = 0$ and considering two cases: 1. **Feasible Region ($\lambda = 0$):** The unconstrained minimum of $L(x)$ naturally satisfies the constraint ($g(x) > 0$). In this case, the constraint is inactive. 2. **Boundary Case ($\lambda > 0$):** The unconstrained minimum violates the constraint. Therefore, the optimal solution lies *on* the boundary where $g(x) = 0$. ### 6. Example: Constrained Minimization A specific mathematical example is worked through to demonstrate the method. * **Objective:** Minimize $x_1^2 + x_2^2$ (distance from origin). * **Constraint:** $x_2 - x_1^2 - 1 \ge 0$ (must be above a parabola). * **Solving:** * The Lagrangian is set up: $L' = x_1^2 + x_2^2 - \lambda(x_2 - x_1^2 - 1)$. * **Case 1 ($\lambda = 0$):** Leads to $x_1=0, x_2=0$, which violates the constraint ($0 - 0 - 1 = -1 \not\ge 0$). This solution is discarded. * **Case 2 (Boundary, $\lambda \ne 0$):** The solution must lie on $x_2 - x_1^2 - 1 = 0$. Solving the system of equations yields the valid minimum at $x_1=0, x_2=1$. ### 7. Next Steps: Support Vector Machines The lecture concludes by linking this optimization framework back to the classifier. * **Support Vectors:** The data points that lie exactly on the margin boundary ($g(x)=0$) are called "Support Vectors". * **Future Topic:** This foundation leads into the **Support Vector Machine (SVM)** algorithm, which will be discussed in the next session to handle non-linearly separable data.