One-class Support Vector Machine
Use this unsupervised learning method to perform novelty detection. Available in Excel with the XLSTAT software.
What is One-class Support Vector Machine?
It was in 1999 that Schölkopf et al. proposed an expansion to SVM for the unsupervised learning and more precisely for novelty detection.
The One-class Support Vector Machine (One-class SVM) algorithm seeks to envelop underlying inliers. The aim is to seperate data into two classes (based on a decision function), the positive one considered as the class of inliers and the negative one considered as the class of outliers. Besides, most of the training data must belong to the positive class while the volume of envelope is minimal.
As others SVM methods available in XLSTAT the optimization problem is solved thanks to the Sequential Minimal Optimization (SMO) using second order information as proposed by Fan and Al. (Fan, R., Chen, P. & Lin, C., 2005).
Options of One-class Support Vector Machine function in XLSTAT
SMO parameters: This option allows you to tune the optimization algorithm to your specific needs. There are 2 tunable parameters:
- Nu: This value is the regularization parameter and is between 0 and 1 (see the description for more details).
- Tolerance: This value define the tolerance when comparing 2 values during the optimization. This parameter can be used to speed up computations.
Preprocessing: This option allows you to select the way the explanatory data are rescaled. There are 3 options available:
- Rescaling: Quantitative explanatory variables are rescaled between 0 and 1 using the observed minimum and maximum for each variable.
- Standardisation: Both qualitative and quantitative explanatory variables are standardized using the sample mean and variance for each variable.
- None: No transformation is applied.
Cross-validation: Available only when "Known classes" is activated. This option allows you to run a k-fold cross-validation to quantify the quality of the classifier. Data is partitioned into k equally subsamples of equal size. A single subsample is retained as the validation data to test the model, and the remaining k-1 subsamples are used as training data.
Kernel: This option allows you to select kernel you wish to apply to your dataset to extend the feature space. There are 4 kernels available:
- Linear kernel: This is the basic linear dot product.
- Power kernel: This kernel is detailed in the description. If you select this kernel, you have to enter the coefficient, the degree and gamma parameters.
- RBF kernel: This the Radial Basis Function as detailed in the description. If you select this kernel, you have to enter the gamma parameter.
Sigmoid kernel: This kernel is detailed in the description. If you select this kernel, you have to enter the coefficient and gamma parameters.
Results of One-class Support Vector Machine in XLSTAT
Estimation: A summary description of the optimized classifier is displayed. The outlier class, the training sample size are displayed. Also, both optimized parameters bias corresponding to rho and the number of support vectors are displayed.
List of support vectors: A table containing the optimized value of alpha, and the rescaled explanatory variables as they were used during the optimization is displayed for each identified support vector.
Confusion matrices: The confusion matrix is deduced from prior and posterior classifications together with the overall percentage of well-classified observations.
Performance metrics: There are 10 classification metrics displayed if this option is active:
Accuracy, Precision, Recall, F-score, Specificity, False Positive Rate (FPR), Prevalence, Cohen's kappa, Null Error Rate (NER) and Area Under Curve (AUC).
In addition to these indicators, the ROC curve is displayed for the training sample and the validation sample (if activated).
Predicted classes: The predicted classes obtained using the SVM classifier are displayed for the training, validation and prediction dataset (if activated). Moreover, the decision function is displayed.
Cross-validation: 3 performances metrics are displayed if cross-validation is active. For each k fold, classification error rate, F-score and Balanced ACcuracy are displayed in the case of binary classification.