Issues with min_split_improvement and Early Stopping in h2o-3 XGBoost Classifier

Niv · February 19

I've been working with the h2o-3 library in Python to train an XGBoost classifier and have come across a couple of behaviors that seem to deviate from the expected functionality. I believe these may be bugs and would appreciate any insights or confirmations on these observations.

Issue with min_split_improvement Parameter:
The min_split_improvement parameter does not seem to function as expected within the recommended value range (1e-10 to 1e-5). However, when using values much higher than this range, adjusted by multiplying with the size of my dataset (in this case, using 1e-6 * 1e9), the parameter appears to work. This leads me to suspect that the loss calculation may not be normalized correctly during the training process.
Early Stopping Not Working as Expected:
When utilizing early stopping with a validation set, the training process does not halt exactly according to the specified early stopping criteria, despite setting parameters for the minimum loss change and the number of rounds. Interestingly, training does stop before reaching the maximum number of trees, but it surpasses the set thresholds and concludes at a point that seems arbitrary to me (or at least, the stopping point is not clear based on the parameters I've set).

Looking forward to your insights or guidance on these matters.
Thanks
Niv

Issues with min_split_improvement and Early Stopping in h2o-3 XGBoost Classifier

Categories