Feature Selection Approach

What is feature selection?

In machine learning, Feature Selection refers to the process of selecting the features which contribute most to the output or the dependent variable.

Feature selection basically helps to reduce the number of features from the dataset and helps in improving the performance of the model by reducing the computational efforts.

Why is feature selection required?

Overfit: Having too many features tend to decrease the accuracy of the prediction and lead to overfitting. Overfitting occurs when the model fits too closely in the training data but fails to perform on the new data.

Redundancy: Having too many features can lead to redundancy as some features might provide information that is already available through other features as well.

Cost: Having too many features also increases the cost of training and deploying the model.

Interpretability: It is easy to interpret/explain a model with fewer features and understand its value compared to one which has more features.

Feature Selection Approach:

Univariate feature selection: This method works by selecting the best features based on several univariate tests like Chi-Square test, f-test etc. Statistical tests can be used to select those features that have the strongest relationship with the output variable.

This method removes all the features with zero variance.

Model-based feature selection:

There are various methods for model-based feature selection:

  • L1 norm/Lasso
  • Sequential feature selection: forward and backward
  • Recursive feature elimination
  • Embedded methods: Feature Importance

 L1 norm/Lasso: A regression model that uses the L1 regularization technique is called Lasso Regression. Lasso shrinks the less important feature’s coefficient to zero which indicates these features are not necessary for explaining the variance in the outcome variable and hence those features can be removed altogether.

This works well for feature selection in case we have a huge number of features.

Sequential Feature Selection(SFS): Sequential Feature Selection adds (forward selection) or removes (backward selection) features to form a feature subset in a greedy way. At each stage, the estimator adds or removes features based on the cross-validation score.

Forward SFS — Forward SFS works by selecting the first best feature that maximizes the cross-validation score. Once that first feature is selected then a new feature is added to the set of selected features and the process is repeated till the desired number of features have been selected.

Backward SFS — Backward SFS starts with all the features and removes features one by one in a greedy fashion.

Recursive feature elimination(RFE): Recursive feature elimination is a method that uses an external estimator that assigns weights to features. It works by searching for a subset of features by starting with all features in the training dataset and successfully removing features until the desired number remains.

The estimator is trained on the initial set of features and the importance of each feature are obtained. Then the least important features are pruned from the current set of features. The process is recursively repeated on the pruned set until the desired number of features to select is eventually reached.

Sequential Feature Selection(SFS) vs Recursive feature elimination(RFE)

Embedded methods: Embedded methods are methods implemented by algorithms that have their own built-in feature selection methods. These use statistical criteria e.g. information gain as a filter to select features. We can use an ML algorithm and then select the subset of features with the highest importance or significance.

Embedded methods are useful in algorithms like Decision Trees, Random Forest or other Boosting Models. Embedded methods do not use iterations like RFE.

Which is the best method for feature selection?

There is no best method for feature selection rather we must try out different models and see what works best for our use case. We can try a range of different models on a different subset of features(which we have chosen through the different statistical methods described above). Then we can find which method is working best for our specific use case.

If you like my content on Medium or Quantifiers and find it resourceful, you can show your support by hitting the clap button.

To connect with me reach out on LinkedinFor PM Interviews, you can refer to amazing articles at Technomanagers

Comments

Popular posts from this blog

Brain Teasers | Tiger and Sheep

Brain Teasers | Screwy Pirates

Determine Weight of Counterfeit coins [Asked in GS Interview]