Issue transforming categorical columns in multi-node environment

This is kind of an old issue at this point, but I still can reproduce it with the latest h2o 3.38.03: https://h2oai.atlassian.net/browse/PUBDEV-7381
I'm wondering if anyone else using hadoop or a multi-node environment is encountering this?
The gist of the issue: If you import data with a categorical/factor column (A) and then transform that (for example grouping levels) into a new column (B), that works just fine. However, if you proceed to further transform (B) by grouping levels into a new column (C), then (C) will be invalid. It appears that only the leader node's values will be correct and the remaining nodes will not be. This is pretty serious, as h2o doesn't give any indication that something went wrong. It only shows up if you make comparisons between (B) and (C), for instance. For people that recognize this issue, the only solution I have found is that when trying to create (C), base it off of (A) from scratch, rather than (B).
Would appreciate someone trying to confirm this finding on their system!
Comments
-
Hi, I was able to reproduce the problem above. I think the JIRA ticket didn't get attention since was already assigned to the reporter of the issue. We will focus on that and try to address the issue in one of the future releases. Thanks a lot for highlighting the problem!
-
Hi @marek,
I wanted to follow up on this issue to see if there had been any progress? Linking to the issue on GitHub: https://github.com/h2oai/h2o-3/issues/8256
-
Hi @stuartking, @marek was able to reproduce my reported issue but has not followed up. Would you mind taking a look at the GitHub issue?
Thanks!
-
@JM05 bump. Could you please review this post/the corresponding GitHub issue? Thanks!