Function significance is the most typical software for explaining a machine studying mannequin. It’s so well-liked that many knowledge scientists find yourself believing that characteristic significance equals characteristic goodness.
It isn’t so.
When a characteristic is necessary, it merely signifies that the mannequin discovered it helpful within the coaching set. Nonetheless, this doesn’t say something in regards to the means of the characteristic to generalize on new knowledge!
To account for that, we have to make a distinction between two ideas:
- Prediction Contribution: the burden {that a} variable has within the predictions made by the mannequin. That is decided by the patterns that the mannequin discovered on the coaching set. That is equal to characteristic significance.
- Error Contribution: the burden {that a} variable has within the errors made by the mannequin on a holdout dataset. It is a higher proxy of the characteristic efficiency on new knowledge.
On this article, I’ll clarify the logic behind the calculation of those two portions on a classification mannequin. I will even present an instance wherein utilizing Error Contribution for characteristic choice results in a much better outcome, in comparison with utilizing Prediction Contribution.
If you’re extra fascinated with regression fairly than classification, you may learn my earlier article “Your Options Are Vital? It Doesn’t Imply They Are Good”.
- Ranging from a toy instance
- Which “error” ought to we use for classification fashions?
- How ought to we handle SHAP values in classification fashions?
- Computing “Prediction Contribution”
- Computing “Error Contribution”
- An actual dataset instance
- Proving it really works: Recursive Function Elimination with “Error Contribution”
- Conclusions