4.6 KiB
Large Margin Classifiers and Optimization
Date: 2025.10.27 Topic: Large Margin Classifiers, Optimization, Margin Definition
1. Introduction to Robust Classification
The lecture begins by shifting focus from generative methods to discriminative methods, specifically within a linearly separable setting.
- Problem Setting: The goal is to classify data that can be perfectly separated by a linear boundary (hyperplane).
- Robustness: While infinite linear classifiers may separate the data, the objective is to find the "best" one. The best classifier is defined as the one that is most robust, meaning it generalizes well to new test data and handles potential outliers effectively.
- Intuition: A robust classifier places the decision boundary in the middle of the gap between classes, maximizing the distance to the nearest data points.
2. Defining the Margin
The concept of the margin is introduced to mathematically define robustness.
- Definition: The margin is the distance between the decision hyperplane and the closest data points.
- Hyperplane Equation: The decision boundary is defined as
w^T x - b = 0. - Support Lines: To define the margin, we establish two parallel lines passing through the closest data points:
w^T x - b = 1(for class +1).w^T x - b = -1(for class -1).- The region between these lines contains no data points.
3. Calculating the Margin Width
The lecture derives the mathematical expression for the margin width using vector projection.
- Vector Projection: The margin is calculated by projecting the vector connecting a point on the boundary (
x_0) to a support vector (x) onto the normal vectorw. - Derivation:
- The distance is the projection of vector
(x - x_0)onto the unit normal vector\frac{w}{||w||}. - Using the constraint
w^T x - b = 1andw^T x_0 - b = 0, the derived margin distance is\frac{1}{||w||}.
- The distance is the projection of vector
- Conclusion: Maximizing the margin is equivalent to minimizing the norm of the weight vector $||w||$.
4. The Optimization Problem
The task of finding the best classifier is formulated as a constrained optimization problem.
-
Objective Function:
\min ||w||^2(Note: Minimizing
||w||is computationally equivalent to minimizing||w||^2) -
Constraints: All data points must be correctly classified and lie outside the margin. This is formalized as:
w^T x_i - b \ge 1fory_i = 1.w^T x_i - b \le -1fory_i = -1.- Combined Constraint:
y_i (w^T x_i - b) \ge 1for alli.
5. Optimization with Constraints (Lagrange Multipliers)
The lecture explains how to solve this optimization problem using Lagrange Multipliers, using a general example first.
-
Problem Setup: Minimize an objective function
L(x)subject to a constraintg(x) \ge 0. -
Lagrangian: A new objective function is defined by combining the original loss and the constraint with a multiplier
\lambda:L'(x) = L(x) - \lambda g(x)(Note: The transcript discusses combining components; the sign depends on the specific maximization/minimization formulation)
-
Solution Cases: The solution involves taking the derivative
\frac{dL'}{dx} = 0and considering two cases:- Feasible Region (
\lambda = 0): The unconstrained minimum ofL(x)naturally satisfies the constraint (g(x) > 0). In this case, the constraint is inactive. - Boundary Case (
\lambda > 0): The unconstrained minimum violates the constraint. Therefore, the optimal solution lies on the boundary whereg(x) = 0.
- Feasible Region (
6. Example: Constrained Minimization
A specific mathematical example is worked through to demonstrate the method.
- Objective: Minimize
x_1^2 + x_2^2(distance from origin). - Constraint:
x_2 - x_1^2 - 1 \ge 0(must be above a parabola). - Solving:
- The Lagrangian is set up:
L' = x_1^2 + x_2^2 - \lambda(x_2 - x_1^2 - 1). - Case 1 (
\lambda = 0): Leads tox_1=0, x_2=0, which violates the constraint (0 - 0 - 1 = -1 \not\ge 0). This solution is discarded. - Case 2 (Boundary,
\lambda \ne 0): The solution must lie onx_2 - x_1^2 - 1 = 0. Solving the system of equations yields the valid minimum atx_1=0, x_2=1.
- The Lagrangian is set up:
7. Next Steps: Support Vector Machines
The lecture concludes by linking this optimization framework back to the classifier.
- Support Vectors: The data points that lie exactly on the margin boundary (
g(x)=0) are called "Support Vectors". - Future Topic: This foundation leads into the Support Vector Machine (SVM) algorithm, which will be discussed in the next session to handle non-linearly separable data.