Section 1 Defining The Model

THE SUPPORT VECTOR MACHINE GUIDE
> What is the Algorithm?

Support Vector Machine (SVM) is a supervised machine learning algorithm. SVM’s purpose is to predict the classification of a query sample by relying on labeled input data which are separated into two group classes by using a margin. Specifically, the data is transformed into a higher dimension, and a support vector classifier is used as a threshold (or hyperplane) to separate the two classes with minimum error.

> How Does the Algorithm Work?

Step 1: Transform training data from a low dimension into a higher dimension.

Step 2: Find a Support Vector Classifier [also called Soft Margin Classifier] to separate the two classes [Kernal Trick].

Step 3: Return the class label → prediction of the query sample!

> Example of the Algorithm

Let’s start off with the basics…

Maximal Margin Classifier — is when the threshold is assigned to the midpoint of the observations on edges of each class cluster; the threshold gives the largest distance between two classes to give the maximal margin length.

Maximal Margin Classifier — Correct Classification Example [Image by Author] * Example: Since the query sample falls to the right of the threshold, the query sample is classified as Class B, which is intended! There is a bias/variance tradeoff since there is high bias (selected threshold not sensitive to outliers) and low variance (performed well with new query sample).
* Issue: What happens in an event that there is an outlier present?

Maximal Margin Classifier — Incorrect Classification Example [Image by Author] * Example: Since the query sample falls to the left of the threshold, the query sample is classified as Class A, which is NOT intended! Intuitively, this does not make sense, as the query sample is closer to the Class B cluster when compared to the Class A cluster. There is a bias/variance tradeoff since there is low bias (selected threshold sensitive to outliers) and high variance (performed poorly with new query sample).
* Solution: Since the Maximal Margin Classifier is very sensitive to outliers in training data, it is necessary to select a threshold that is not sensitive to outliers and allows misclassifications → Soft Margin Classifier.

Soft Margin Classifier — is when the threshold is allowed to make an acceptable amount of misclassifications while allowing the new data points to still be classified correctly; cross-validation is used to determine how many misclassifications and observations are allowed inside the soft margin to obtain the best classification. [Support Vector Classifier is another way to reference the Soft Margin Classifier].

Soft Margin Classifier — Correct Classification Example [Image by Author] * Example: Since the query sample falls to the right of the threshold, the query sample is classified as Class B, which is intended! There is 1 misclassification made to find the optimal threshold.
* Issue: What happens in an event that there is significant overlap between the two classes?

Soft Margin Classifier — Incorrect Classification Example [Image by Author] * Example: If only threshold 1 is considered, the query sample falls to the right of the threshold; however since both Class A and a partial cluster of Class B fall to the right, the query sample will be inaccurately classified. If only threshold 2 is considered, the query sample falls to the right of the threshold; although Class B is only represented and the query sample is classified as Class B, which is intended, there is a partial cluster of Class B that falls on the left side of the threshold, signifying potential misclassifications. Hence, there is no optimal threshold that can be applied without resulting in a high magnitude of misclassifications.
* Solution: Since the Soft Margin Classifier is very sensitive to a high volume of overlap in the training data, it is necessary to select a threshold that is neither sensitive to outliers nor sensitive to overlapping classifications → Support Vector Machine.

Let’s add a y-axis (transform the data into a higher dimension)…

Support Vector Machine — is when the data is transformed into a higher dimension, and a support vector classifier (also known as soft margin classifier) is used as a threshold to separate the two classes. When the data is 1D, the support vector classifier is a point; when the data is 2D, the support vector classifier is a line (or hyperplane); when the data is 3D, the support vector classifier is a plane (or hyperplane) and when the data is 4D, the support vector classifier is a hyperplane.

Support Vector Machine Algorithm [Image by Author]Class AClass BQuery Sample * Example: Since the query sample falls to the left of the threshold, the query sample is classified as Class B, which is intended! Here, the data is in 2D and hence the support vector classifier is a line (or hyperplane). The support vectors are observations on the edge and within the soft margin.

Note: In order to make mathematics feasible when transforming the data into higher dimensions, SVMs use kernel functions (linear, radial basis function (RBF), polynomial, or sigmoid), to systematically find the support vector classifiers. When using the kernel functions, the algorithm calculates the relationships between every pair of data points without any transformation in the higher dimension → also known as the Kernal Trick!

To conclude, SVMs are a powerful machine learning algorithm and have been used in Machine Learning and Data Science applications!