Sequential floating feature selection tool

Authors

Ing. Zoltán Galáž, Ing. Jiří Mekyska, prof. Ing. Zdeněk Smékal, CSc.

Download

It is possible to download software here.

Publication to be cited

GALÁŽ, Z. Preliminary Acoustic Analysis of Noise Components in Patients In Parkinsons Disease. In Proceedings of the 21st Conference STUDENT EEICT 2015. Brno: 2015. p. 476-480. ISBN: 978-80-214-5148- 3.

Mekyska J., Galáž Z., Mžourek Z., Smékal Z., Rektorová I., et al. (2015) Assessing Progress of Parkinson’s Disease Using Acoustic Analysis of Phonation. International Work Conference on Bioinspired Intelligence (IWOBI 2015): 115-122.

Description

In the field of biomedical signal processing (such as the speech signal processing, hand-written text processing, etc.), it often happens that the result of the parametrization process (parametrization serves us to quantify the useful information stored in the data with so called parameters) is a high-dimensional parametrization space. It describes the computed parameters (features) for all observations in the dataset in the parametrization matrix. Subsequent step in most cases is the analysis of the parametrization matrix including the feature selection step to select the best possible feature subset, which is the most suitable subset for the consequent classification or regression task. The feature selection step is one of the most important step in the data analysis and the issue of dimensionality, also called “the curse of dimensionality” describes the fact that the high-dimensional feature space can lead to overfitting, which often worsens the results of the analysis. The purpose of the feature selection step is therefore to select the most suitable feature subset with the highest statistical relevance for considered application.

Software SFFS has the option of the selection of the best feature subset based on classification, where the actual version of the software provides the 6 possible classification techniques (Support Vector Machines, Naive Bayes Networks, Discriminant Analysis, k-Nearest Neighbour, Classification Trees and Gaussian Mixture Models). It also has the option to select the features based on regression, where the software provides the regression technique referred to as Classification and Regression Trees algorithm. SFFS software provides several metrics to evaluate the feature selection process: 18 metrics for the classification task (classification accuracy, sensitivity, specificity, etc.); 10 metrics for the regression task (gini index, absolute error, root mean squared error, etc.). The package also provides the function for the cross-validation process (k-fold, leave-one-out) used in the feature selection process. The SFFS software is fully written in the MATLAB programming environment. The testing scripts demo_cls.m and demo_reg.m are also provided. The scripts load the data from the test_cls.mat and test_reg.mat files, which include the parametrization matrix “feat_matrix”: rows are determined for the observations; columns are determined for the parameters, and the vector of labels “labels” (e.g. for the classification task: 0/1 – healthy/disordered and for the regression task it is the numeric continuous scale).

Projects

This work was supported by projects NT13499, VG20102014033 and FEKT-S-14-2335. The described research was performed in laboratories supported by the SIX project; the registration number CZ.1.05/2.1.00/03.0072, the operational program Research and Development for Innovation.

License

To negotiate the license terms of use of this software please contact the responsible person Ing. Lukas Novak at Technology Transfer Office, Brno University of Technology, Kounicova 966/67a, Veveří, 60200, Brno, Czech Republic, novak@ro.vutbr.cz.