java.lang.AssertionError when trying to train a dataset for some CSVs
I am getting an error for my desired dataset when trying to use an isolation forest method to detect anomalies. However I have another completely different dataset that it works fine for, what could cause this issue?
Error:
isolationforest Model Build progress: | (failed) | 0% Traceback (most recent call last): File "h2o_test.py", line 149, in <module> isoforest.train(x=iso_forest.col_names[0:65], training_frame=iso_forest) File "/home/ec2-user/.local/lib/python3.7/site-packages/h2o/estimators/estimator_base.py", line 107, in train self._train(parms, verbose=verbose) File "/home/ec2-user/.local/lib/python3.7/site-packages/h2o/estimators/estimator_base.py", line 199, in _train job.poll(poll_updates=self._print_model_scoring_history if verbose else None) File "/home/ec2-user/.local/lib/python3.7/site-packages/h2o/job.py", line 89, in poll "\n{}".format(self.job_key, self.exception, self.job["stacktrace"])) OSError: Job with key $03017f00000132d4ffffffff$_92ee3e892f7bc86460e80153eaec4b70 failed with an exception: java.lang.AssertionError stacktrace: java.lang.AssertionError at hex.tree.DHistogram.init(DHistogram.java:350) at hex.tree.DHistogram.init(DHistogram.java:343) at hex.tree.ScoreBuildHistogram2$ComputeHistoThread.computeChunk(ScoreBuildHistogram2.java:427) at hex.tree.ScoreBuildHistogram2$ComputeHistoThread.map(ScoreBuildHistogram2.java:408) at water.LocalMR.compute2(LocalMR.java:89) at water.LocalMR.compute2(LocalMR.java:81) at water.H2O$H2OCountedCompleter.compute(H2O.java:1704) at jsr166y.CountedCompleter.exec(CountedCompleter.java:468) at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263) at jsr166y.ForkJoinPool$WorkQueue.popAndExecAll(ForkJoinPool.java:906) at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:979) at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479) at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
Code:
with open('/home/webapp/flask-api/tmp_rows/temp_file2.csv', 'w+') as tmp_file: temp_name = "/tmp_rows/temp_file2.csv" tmp_file.write(text_stream.getvalue()) tmp_file.close() h2o.init() print("TEMP_nAME", temp_name) iso_forest = h2o.import_file('/home/webapp/flask-api/{0}'.format(temp_name)) seed = 12345 ntrees = 100 isoforest = h2o.estimators.H2OIsolationForestEstimator( ntrees=ntrees, seed=seed) isoforest.train(x=iso_forest.col_names[0:65], training_frame=iso_forest) predictions = isoforest.predict(iso_forest) print(predictions) h2o.cluster().shutdown()