-
relevel vs relevel_by_frequency
I am using H2O's Python API to construct a GLM. To set the reference level for a categorical input I use relevel(). Based on the documentation it would seem as though relevel_by_frequency() would accomplish the same thing assuming I want the most frequency level as my base. What I find strange is that the GLM coefficient…
-
Issue transforming categorical columns in multi-node environment
This is kind of an old issue at this point, but I still can reproduce it with the latest h2o 3.38.03: https://h2oai.atlassian.net/browse/PUBDEV-7381 I'm wondering if anyone else using hadoop or a multi-node environment is encountering this? The gist of the issue: If you import data with a categorical/factor column (A) and…
-
SAS PROC GENMOD vs H2O GLM
Hello, I am trying to replicate a model currently being built in SAS using PROC GENMOD. My resulting coefficients between the 2 tools are close but not exact. Might there be a reason why I should not expect them to match?
-
Does the column order in scoring dataframes matter to H2O models using mojo format?
My ML Ops team is suggesting that perhaps the reason why I'm getting different scores than they are (even though the scoring process flow in R and the mojo model is supposedly the same) is that perhaps the difference in the order of the columns in the dataframes we are each scoring having an impact. I can't actually verify…
-
How does one save and re-open a flow.
This is certainly a newbie question, but I cannot figure out how to save and restore a flow. I am using h2o flow on my local computer, both server and browser. v3.38.0.3. Windows 10, Chrome browser. I can save a flow, and certainly something is being saved since if I try to save again with the same name, I get prompted to…
-
Does H2O in clustered formation support a configurable number of concurrent HTTP connections?
I have followed the H2O documentation to create an H2O cluster in Kubernetes using the Docker h2oai/h2o-open-source-k8s:3.38.0.2 image. When I load test the cluster it can support about 80 concurrent HTTP connections that are opened with this Python code: import h2o h2o.connect(url='http://' + "clusterhostname" +':0') My…
-
How to setup H2O Context in yarn-based spark (CDH 6)?
Hi - reposting (i think accidentally deleted the draft, but if its double posted please delete this one) I'm trying to deploy H2O in yarn-based spark in CDH (using pyspark/pysparkling) Below are the script i run in my jupyter notebook: Establishing sparksession sparkSession01 = SparkSession.builder.appName(appName01) \…
-
How long does the H2O Clustering command run?
I ran H2O clustering yesterday to try it out - as it was taking a very long time and I did not receive any feedback in regards to elapsed time, I wanted to ask how long one H2O clustering run can take. Is it a matter of hours or days? my dataset is reasonably-sized (40 patients, 20 variables)
-
Can't connect to http://localhost:54321
Hello, i want to run the lilikoi example code with the mock data provided by the lilikoi R package, however, I am stuck at the lilikoi.machine_learning() command due to a connection error with H2O. I downloaded the H2O file and unzipped it in the R Terminal, but now I cannot connect to http://localhost:54321 as indicated…
-
Feature request properly made?
Hi, Following a discussion (https://groups.google.com/g/h2ostream/c/BEwC2iVZvgY) over on h2ostream, I opened a feature request ticket, namely https://h2oai.atlassian.net/browse/PUBDEV-8891. I suspect this issue has remained unassigned due to there being more pressing priorities, but if I logged it incorrectly, or if it…
-
h2oai.awesome-h2o on GitHub
I want to share a good resource - H2O.ai Awesome on GitHub where we're been adding to a curated list of all the awesome projects, applications, research, tutorials, courses and books that use H2O-3, our open source, distributed machine learning platform. H2O offers parallelized implementations of many supervised and…
-
Help with Lime Values (tabular explainer) for an existing model and dataframe
Has anyone had any luck producing lime values (tabular explainer) for an existing model and dataframe with mixed data types (categorical and continuous)? I've gone through tutorials and haven't had any luck getting values to be produced as the lime explainer expects data to be encoded and the model used data that wasn't…