Issue transforming categorical columns in multi-node environment

Options

This is kind of an old issue at this point, but I still can reproduce it with the latest h2o 3.38.03: https://h2oai.atlassian.net/browse/PUBDEV-7381

I'm wondering if anyone else using hadoop or a multi-node environment is encountering this?

The gist of the issue: If you import data with a categorical/factor column (A) and then transform that (for example grouping levels) into a new column (B), that works just fine. However, if you proceed to further transform (B) by grouping levels into a new column (C), then (C) will be invalid. It appears that only the leader node's values will be correct and the remaining nodes will not be. This is pretty serious, as h2o doesn't give any indication that something went wrong. It only shows up if you make comparisons between (B) and (C), for instance. For people that recognize this issue, the only solution I have found is that when trying to create (C), base it off of (A) from scratch, rather than (B).

Would appreciate someone trying to confirm this finding on their system!

Comments