Feature selection method uses:
Faster training of the algorithmsIncrease the interpret ability It reduces over fitting.Reduce the noise and improves the model accuracy.
Methods in feature selection:
Filter MethodsWrapper MethodsEmbedded Methods
Filter Methods:
Used as data pre-processing techniquesFeature selection is independent of the algorithmsFeatures are selected based on the statistical test scores like
Pearson’s Correlation: It is used as a measure for quantifying linear dependence between two continuous variables X and Y. Its value varies from -1 to +1.
LDA: Linear discriminant analysis is used to find a linear combination of features that characterizes or separates two or more classes (or levels) of a categorical variable.
ANOVA: ANOVA stands for Analysis of variance. It is similar to LDA except for the fact that it is operated using one or more categorical independent features and one continuous dependent feature. It provides a statistical test of whether the means of several groups are equal or not.
Chi-Square: It is a is a statistical test applied to the groups of categorical features to evaluate the likelihood of correlation or association between them using their frequency distribution.
Wrapper Methods:
Computationally very expensive. It uses a subset of features and train a model using them. Based on the inferences that we draw from the previous model, we decide to add or remove features from the subset. Selecting the best subset of features from the feature set that explains the problem.
Forward Selection: It is an iterative method in which we start having no feature in the model and keep adding the features, which best improves the model till the addition of the new variable doesn't improve the performance of the model
Backward Elimination: All features are fitted in the model at the start and removes the least significant feature at each iteration which improves the performance of the model. It is repeated until no improvement is observed by removing additional features.
Recursive Feature Elimination: It repeatedly creates models and gazes the best and worst performing features from various models. It is greedy optimization approach. The process repeats until all the features are exhausted.
Embedded Methods:
It combines both filler and wrapper methods. It is implemented by the algorithms that have their own built in feature selection methods.
Lasso regression uses L1 regularization which adds penalty to absolute value of the magnitude of coefficients.
Ridge Regression uses L2 regularization which adds penalty to square of the magnitude of the coefficients.
Comments