java.lang.AssertionError when trying to train a dataset for some CSVs

enpro_h2o · March 29

I am getting an error for my desired dataset when trying to use an isolation forest method to detect anomalies. However I have another completely different dataset that it works fine for, what could cause this issue?

Error:

isolationforest Model Build progress: | (failed) | 0% Traceback (most recent call last): File "h2o_test.py", line 149, in <module> isoforest.train(x=iso_forest.col_names[0:65], training_frame=iso_forest) File "/home/ec2-user/.local/lib/python3.7/site-packages/h2o/estimators/estimator_base.py", line 107, in train self._train(parms, verbose=verbose) File "/home/ec2-user/.local/lib/python3.7/site-packages/h2o/estimators/estimator_base.py", line 199, in _train job.poll(poll_updates=self._print_model_scoring_history if verbose else None) File "/home/ec2-user/.local/lib/python3.7/site-packages/h2o/job.py", line 89, in poll "\n{}".format(self.job_key, self.exception, self.job["stacktrace"])) OSError: Job with key $03017f00000132d4ffffffff$_92ee3e892f7bc86460e80153eaec4b70 failed with an exception: java.lang.AssertionError stacktrace: java.lang.AssertionError at hex.tree.DHistogram.init(DHistogram.java:350) at hex.tree.DHistogram.init(DHistogram.java:343) at hex.tree.ScoreBuildHistogram2$ComputeHistoThread.computeChunk(ScoreBuildHistogram2.java:427) at hex.tree.ScoreBuildHistogram2$ComputeHistoThread.map(ScoreBuildHistogram2.java:408) at water.LocalMR.compute2(LocalMR.java:89) at water.LocalMR.compute2(LocalMR.java:81) at water.H2O$H2OCountedCompleter.compute(H2O.java:1704) at jsr166y.CountedCompleter.exec(CountedCompleter.java:468) at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263) at jsr166y.ForkJoinPool$WorkQueue.popAndExecAll(ForkJoinPool.java:906) at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:979) at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479) at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

Code:

    with open('/home/webapp/flask-api/tmp_rows/temp_file2.csv', 'w+') as tmp_file:
        temp_name = "/tmp_rows/temp_file2.csv"
        tmp_file.write(text_stream.getvalue())
        tmp_file.close()

    h2o.init()
    print("TEMP_nAME", temp_name)
    iso_forest = h2o.import_file('/home/webapp/flask-api/{0}'.format(temp_name))
    seed = 12345
    ntrees = 100
    isoforest = h2o.estimators.H2OIsolationForestEstimator(
    ntrees=ntrees, seed=seed)
    isoforest.train(x=iso_forest.col_names[0:65], training_frame=iso_forest)
    predictions = isoforest.predict(iso_forest)
    print(predictions)
    h2o.cluster().shutdown()

java.lang.AssertionError when trying to train a dataset for some CSVs

Categories