- Regularization :
- Minimizes model co-efficients (Shrinkage term in mathematical equation)
- An extra term that penalizes large values of co-efficients
- Adds up with the loss function: Regularized loss = original loss + large co-efficient penalty
- Without regularization = maximize training accuracy
- How it affects unseen data: compromises large contribution
- reduces overfitting
- Lasso :
- Linear regression with l1 regularization (absolute value of co-efficients)
- Makes some co-efficients into 0
- Performs feature selection by filtering out features with less contribution
- Ridge :
- Linear regression with l2 regularization (square value of co-efficients)
- Shrinks co-eficients
- example of regularization on logistic regression:
- Logistic curve = probability curve between 0 to 1 of a given target class (for 2-class problem)
- Ratio of co-efficients : slope of the boundary
- Magnitude of co-efficients : confidence of the line
- Without regularization :
- high co-efficient leads to high confidence (over-confidence),
- Over-confidence leads to high probability
- this leads to overfitting
- With regularization :
- low co-efficient leads to low confidence,
- This leads to low probability
- With low probability, we have the standard