Answer by Avkash@H2O · Nov 30, 2016 at 03:16 PM
Eric. no problem we will try to help you. Please try the set your H2O init as below and then give a test:
If you pass nthreads = -1, means H2O will use all available CPU in the system.
h2o.init(nthreads = -1)
If you want to make sure H2O uses certain size of memory from your system you can assign as below:
h2o.init(max_mem_size = "30g")
Answer by Eric · Nov 30, 2016 at 05:50 PM
Thanks Avkash. Yes, even if I specify to use all my cores (6) and memory (64GB), it still doesn't use all of it. I run R (64bit) on a Windows machine (64bit), using the latest Java.
What is causing the slowdown here? Thanks again for your help!
Answer by Eric · Nov 30, 2016 at 05:57 PM
No, not really. My data is not more than a gigabyte.
But I'm interested why the GBM model fitting takes so long. Won't the GBM model be fit faster if it uses all my CPU/RAM resources? If so, why aren't they used 100%?
Please know that I have very little knowledge/experience on the underlying mechanisms at work here. I am just trying to find ways to speed up the process.
Answer by Avkash@H2O · Dec 06, 2016 at 12:45 AM
It is possible that your have scoring is enabled at each level. If you disable scoring at each level and let the default scoring to take over, you may get some speed. Also tress are build sequentially so depending on your data you will still see some time to get things done.
Answer by Eric · Dec 06, 2016 at 12:55 AM
Thanks Avkash! But I've set up the scoring interval at 100 trees.
It is not the time taken that I'm worried about. I'm trying to understand why the whole GBM fitting process is not consuming all of the computer resources. Is the Java structure the bottleneck here?
Thanks for helping me out!!
Multivariate regression in H2O 1 Answer