Hi - reposting (i think accidentally deleted the draft, but if its double posted please delete this one) I'm trying to deploy H2O in yarn-based spark in CDH (using pyspark/pysparkling) Below are the script i run in my jupyter notebook: Establishing sparksession sparkSession01 = SparkSession.builder.appName(appName01) \…
I ran H2O clustering yesterday to try it out - as it was taking a very long time and I did not receive any feedback in regards to elapsed time, I wanted to ask how long one H2O clustering run can take. Is it a matter of hours or days? my dataset is reasonably-sized (40 patients, 20 variables)