When To Use Support Vector Machines

Are you wondering when you should opt for support vector machines over other machine learning algorithms? Well then you are in luck! In this article we tell you everything you need to know to understand when you should use support vector machines (SVMs) for data science projects.

We start out by discussing what types of outcome variables can be predicted using support vector machines. After that, we discuss some of the advantages and disadvantages of support vector machines. Finally, we provide specific examples of situations where you should or should not use support vector machines.

Types of outcome variables SVMs can be used for
What types of outcome variables can you use support vector machines to predict? One thing to keep in mind when pondering this question is that there are a few different models that fall under the umbrella of “support vector machines”. Some of these models support different types of outcome variables.

The main types of outcomes that are supported natively by support vector machines are binary outcomes and numeric outcomes. Some packages adapt support vector machines for multiclass outcomes by training multiple models and then combining the responses, but under the hood it is a series of binary classifiers that is being trained.

Advantages and disadvantages of SVMs
Are you wondering how support vector machines compare to similar machine learning algorithms? Here are some of the main advantages and disadvantages of support vector machines.

Advantages of support vector machines
* Handle high dimensionality well. One of the main advantages of support vector machines is that they handle high dimensional data well. Support vector machines tend to preform well in situations where there are many, many features. This is true even if the data is relatively sparse.
* Less sensitive to some types of outliers. Another benefit of support vector machines is that they are not as sensitive to some types of outliers. Specifically, support vector machines are not as sensitive to one-dimensional outliers that have extreme values for a single variable. This is because support vector machines focus on points that are near the boundary between different classes (for support vector classifiers). One-dimensional outliers tend to be far from this boundary, so they do not have an impact on model training. One caveat here is that if the outliers happen to lie near the boundary, they may have an oversized influence.
* Can handle non-linearity. When used in conjunction with kernels, support vector machines can handle all types of non-linearity. This is an advantage over models like regression that assume a linear relationship between the features and the outcome variable. Note that this is only the case if a kernel is used.
* Can handle interactions natively. When used in conjunction with kernels, support vector machines can also detect interactions without the need to specify them. The same is not true of linear support vector machines that do not use a kernel.

Disadvantages of support vector machines
* Slow training on datasets with many observations. One big disadvantage of support vector machines is that they take a long time to train on datasets that have many observations. Common implementations of support vector machine algorithms are not as easy to parallelize as other machine learning algorithms like random forests
* Sensitive to parameters and modeling choices. Another disadvantage of support vector machines is that they are very sensitive to the choice of parameters that are used to train them. For example, support vector machines are very sensitive to the choice of kernel that is used in the model. This means that they are not a great option if you do not have a lot of time to spend on model tuning.
* Not as straightforward to interpret. Another disadvantage of support vector machines is that they are not as straightforward to interpret as some other machine learning algorithms. They do not provide straightforward estimates of the relationship between model features and the outcome variable. There are ways to get interpretable values out of support vector machine models, but they require additional steps to be taken.
* No built in probability estimates. Along the same lines, support vector machines do not have built-in probability estimates the way some other algorithms do. There are ways to get probability estimates, but they are computationally inefficient and require additional steps.
* No native handling of missing data. Most implementations of support vector machines do not handle missing data well, which means that you need to handle missing data yourself before feeding the data to the model.
* Do not natively support multiclass outcomes. Most implementations of support vector machines do not natively support multiclass outcomes. They may allow you to use multiclass outcomes by training multiple binary classifiers and combining the results, but they do not support training a single model to predict a multiclass outcome.

So when should you use support vector machines over other machine learning algorithms? Here are some examples of situations where you should consider using support vector machines.

* High dimensional, potentially sparse datasets. You should consider using support vector machines when you have a high dimensional dataset with many features. Support vector machines are most commonly used in fields like text analysis, image analysis, and genetic analysis because these fields often have sparse high dimensional datasets.

When not to use support vector machines
When should you avoid using support vector machines? Here are some examples of cases where you should avoid using support vector machines.

* Inference is the primary goal. In general, you are better off avoiding support vector machines if inference is your primary goal. If inference is your primary goal, you are better off using a model like linear or logistic regression that provides interpretable coefficients out of the box.
* Not much time to tune model parameters. Compared to many other machine learning algorithms, support vector machines tend to be particularly sensitive to the choice of model parameters. This is further complicated by the fact that support vector machines can be used with different kernels that themselves have different parameters. If you do not have a lot of time to invest in tuning the parameters of your model, you are better off using a model like a random forest that is not as sensitive to parameter choice.
* Many observations. Support vector machines take a long time to train on datasets with many examples, so you should avoid using them on very large datasets with many observations.

Related articles
Are you trying to figure out which machine learning model is best for your next data science project? Check out our comprehensive guide on how to choose the right machine learning model.