This is very similar to the optimization problem in the linear separable problem, except there is an upper bound C on . To find we can use the quadratic problem solver again. The key idea to generalize linear decision boundary to become a non-linear decision boundary is: transform to a higher dimension space to make things easier. Input space is the space where is located. The feature space is the space of after transformation. Linear operations in the feature space is equivalent to non-linear operations in the input space. Hereby, classification can become easier with a proper transformation. Unfortunately, computations can be very costly in the feature space due to the higher dimension. The solution is the kernel trick. In the dual problem the data points appear as an inner product. As long as we can calculate the inner product in the feature space, we do not need the mapping explicitly. Many common geometric operations can be expressed by inner products. Define the kernel function K by . Kernel functions can be considered as a similarity measure between the input objects. Examples of kernel functions: