An Introduction to Support Vector Machine (SVM) and the Simplified SMO Algorithm

Table of Contents What is a Support Vector Machine (SVM)?What kind of machine learning problems can be solved using SVM?What is an SVM kernel?How does it work?Identify the right hyper-planeSimplified Sequential Minimal Optimization (SMO) Sequential minimal …

Support Vector Machine (SVM) and the Simplified SMO Algorithm

What is a Support Vector Machine (SVM)?

It is a supervised machine learning algorithm that is used for the classification of data and regression analysis. As it is shown by the category supervised it belongs to, it uses labeled data. It is mostly used for classification purposes. In the classification of data, a hyperplane differentiates labeled data according to its properties. A hyperplane is a decision border line that is created to separate n-dimensional data into classes. Support vector machine algorithms work for both linear and non-linear challenges.

fig 1: General working Model of SVM

What kind of machine learning problems can be solved using SVM?

Support Vector Machine (SVM) is a supervised machine learning algorithm that can be used for classification and regression challenges. It uses a technique kernel to transform your data and then based on these transformations it detects an optimal boundary among the possible outputs.

What is an SVM kernel?

Kernel takes data as input and transforms non-linear training data into linear equation form that can easily be handled into hyperplane surfaces.

How does it work?

As mentioned above , the algorithm segregates n-dimensional data between classes with a hyperplane. Now the question arises “ How can we identify the accurate hyperplane?”

Let us understand:

Identify the right hyper-plane

  1. Case 1: To classify the stars and circles, there are three hyperplanes; A, B, and C. Recall the thumb rule to identify the right decision border. select one which divided classes into better way and off course Hyperplane B doing Job effectively


Figure 2: plot

2. Case 2: There are three hyperplanes in this case which all are dividing classes well. Now the question arises which hyperplane is best to segregate the classes?SVM_3

Figure 3

The maximum distance between the nearest data points is called margin. Margin line always selects with maximum value such as draw hyperplane instead of a perpendicular line to separate classes because maximum value generates high accuracy in results. If select margin with low value, accuracy will fall less than 50% or create misclassification.


Figure 4

Above, as shown by the margin line for decision border (hyperplane) C is accurate as compared to both A and B. Hence C is the right hyperplane.

3. Case 3: Hint: Use the rules as discussed in the previous section to identify the right hyper-plane.


Figure 5

Most of you think B is right because it has a high margin as compared to A but  the main goal of the Support Vector Machine algorithm is the classification of classes with the help of a hyperplane, where the highest margin is the priority. The algorithm condition is fulfilling by hyperplane A which classified data accurately

4. Case 4: Classification of two classes. below is an outlier Example


Figure 6

Star is the outlier in circle class and the SVM algorithm has a feature to ignore the outliers and find the hyperplane that has a high margin.


Figure 7

5. Case 5: hyper-plane to segregate to classes. In this case. a linear hyperplane cannot segregate the two classes then how SVM work on this condition. looked at the linear hyper-plane.


Figure 8

SVM has the solution for this non-linear condition with additional features. By using z=x^2+y^2 plot data points on both axes


Figure 9

Z is the sum of squared x and squared y and All values of z would be always positive.

In the SVM classifier, it is easy to create classification between two classes with the help of a hyperplane but create a problem when non-linear data comes. Here we use the Kernel trick. The kernel is an SVM function that took data as input and transforms non-linear training data into linear equation form that can easily handle into hyperplane surface. it helps to segregate non-separable data into classes.

Simplified Sequential Minimal Optimization (SMO) 

Sequential minimal Optimization

Sequential Minimal optimization is an algorithm that is a very helpful algorithm during the training of Support Vector Machines to solve the issues of Quadratic Programming (QP). It fastens the procedure to train the Support Vector Machines model. It breaks large Quadratic Programming (QP) problems into small chunks and solves them analytically which ignores using an inner loop, a time-killing numerical QP optimization. The SMO required  linear memory to manage very large training data sets. Matrix calculation is neglected due to linear calculation, SMO order somewhere between the linear and quadratic for training set size for various test problems, while standard order to determine SVM algorithm somewhere between linear and cubic for training set size. In real-world applications, SMO is 1000 times faster than the chunking algorithm. 

Steps of solving SMO

There are three steps to optimize SMO Algorithm:

  • Select Parameters
  • Optimize parameters
  • Compute the Threshold

Optimization problem

To speed convergence, SMO uses heuristics  to select the two Lagrange multipliers that will optimize together. There are two choice heuristics:  one for first and two for second Lagrange multiplier. The first heuristic choice from two choices, returns the SMO algorithm outer loop.

It will solve binary/ linear classification where the dataset is from (x1, y1) to (xn, yn). xi is an input vector that have obviously direction and Yi is a binary label for it.

here C is an SVM parameter and K is for kernel function. Both are inputs supplied by the user.

SMO is an algorithm that has the solution to the optimization problem. It breaks large Quadratic Programming (QP) problems into small parts and solves them analytically. Due to linear equality constraint involving the Lagrange multipliers ai, the smallest problem that involves only two multipliers. Then, for any two multipliers a1 and a2, the constraints reduced into :

and above reduced problem solved analytically: there is  need to find a minimum of quadratic one-dimensional function.  k, is the negative value of the sum above all other terms in the equation in equality constraint that is fixed in each iteration.

The proceeding steps of algorithm are following:

  1. Find a Lagrange multiplier ai, that disobey the karush-kuhn-Tucker (KKT)  for the optimization condition problem.
  2. select the second multiplier a2, and optimize the pair of (a1, a2,)
  3. Repeat steps 1 and 2 until convergence is found.

The problem will get the solution when the multipliers satisfy the KKT conditions. To speed up the rate of convergence, a heuristic method is used. 


In this article, two important Machine  learning algorithms are described in detail. SVM stands for Support Vector Machine algorithm is supervised learning algorithm that works on classification of data and regression challenges and SMO is an very effective algorithm during the training of Support Vector Machines to solve the issues of Quadratic Programming (QP). SMO breaks large QP problems into small chunks to solve them analytically. 

Suggested Read: Top 6 PHP Alternatives Advantages and Disadvantages

Categories AI

Leave a Comment