Issue transforming categorical columns in multi-node environment

This is kind of an old issue at this point, but I still can reproduce it with the latest h2o 3.38.03:

I'm wondering if anyone else using hadoop or a multi-node environment is encountering this?

The gist of the issue: If you import data with a categorical/factor column (A) and then transform that (for example grouping levels) into a new column (B), that works just fine. However, if you proceed to further transform (B) by grouping levels into a new column (C), then (C) will be invalid. It appears that only the leader node's values will be correct and the remaining nodes will not be. This is pretty serious, as h2o doesn't give any indication that something went wrong. It only shows up if you make comparisons between (B) and (C), for instance. For people that recognize this issue, the only solution I have found is that when trying to create (C), base it off of (A) from scratch, rather than (B).

Would appreciate someone trying to confirm this finding on their system!


  • marek
    marek Member Posts: 1

    Hi, I was able to reproduce the problem above. I think the JIRA ticket didn't get attention since was already assigned to the reporter of the issue. We will focus on that and try to address the issue in one of the future releases. Thanks a lot for highlighting the problem!