Dealing with imbalanced datasets
In machine learning often, we often have imbalanced datasets. There are many ways to deal with imbalanced datasets and we below I list the most important things to take into account when dealing with imbalanced datasets.
- Use proper evaluation metrics: Precision-Recall Curve, ROC Curve.
- Predict class probabilities, not the labels.
- Oversampling of the minority class and undersampling of the majority class.
- Use ensembles techniques such as bagging and majority voting.
- Adjust class weights and loss functions.
- Use proper loss functions such as log-loss and focal loss.
A detailed description of these methods can be found here.