I am trying to use the "Partial Dependence Plots" function under the "score" menu in Flow. I select my model and data and let it run, and it goes for a while, and then reaches 100% and "Done" as the status. When I click on the "View" action or on the linked key, I get the error below, as if the PDP doesn't exist. Any tips on what to do? (in the first line, the key is what is after the last /)
Error processing GET /3/PartialDependence/ppd-a271736a-34da-48e0-ac21-ae0f67e20ef3 TypeError: Cannot read property 'description' of null at http://10.24.42.153:54321/flow/js/flow.js:3983:39 at inspect$2 (http://10.24.42.153:54321/flow/js/flow.js:3699:46) at inspect (http://10.24.42.153:54321/flow/js/flow.js:3640:24) at Object.self [as inspect] (http://10.24.42.153:54321/flow/js/flow.js:6934:39) at H2O.PartialDependenceOutput (http://10.24.42.153:54321/flow/js/flow.js:12109:27) at Object.flow_.render (http://10.24.42.153:54321/flow/js/flow.js:3617:31) at http://10.24.42.153:54321/flow/js/flow.js:519:66 at http://10.24.42.153:54321/flow/js/flow.js:6116:37 at http://10.24.42.153:54321/flow/js/flow.js:4737:28 at http://10.24.42.153:54321/flow/js/flow.js:2780:28 at Object.<anonymous> (http://10.24.42.153:54321/flow/js/flow.js:2494:28) at fire (http://10.24.42.153:54321/flow/js/flow-lib.js:3852:30) at Object.fireWith [as resolveWith] (http://10.24.42.153:54321/flow/js/flow-lib.js:3964:7) at done (http://10.24.42.153:54321/flow/js/flow-lib.js:9017:14) at XMLHttpRequest.<anonymous> (http://10.24.42.153:54321/flow/js/flow-lib.js:9358:9)
Answer by ivy · Jan 17 at 05:43 PM
@lilyrobin - PDP is a new feature, and we are actively trying to improve it. Thank you for your patience with it during this time.
The key after the first "/" is the destination key of the computed PDP frame.
Were you predicting on categorical features? If yes - how many levels do you have in the categorial features? Perhaps you can try R/Python API for PDP, which gives more informative error logs, and has errors handled better on the client than Flow (which may resolve the error for you in this case) - R: '?h2o.partialPlot' - for docs in R - Python: 'help(h2o.partialPlot)' - for docs in Python
This is a known issue that if nbins is less than the number of categorical values, PDP cannot be computed. For example, if you have an "animals" feature, that has 100 values (e.g. cat, dog, mouse), and you are trying to compute PDP for nbins of 20. What it is saying is, group the 100 animals into 20 groups, and let me see the trend of how the "animals" feature impacted on a metric. PDP won't know how to "smartly" group the animals - i.e. should "elephant" be in the same bin with "cat" or "mouse"? Hence, set nbins >= largest number of categorical values for PDPs. (Note that nbins for GBM modeling is different, where nbins >= levels may overfit. Also you'd be using nbins_cat for categorical for GBM )
Other possible work arounds aside from using the R/Python API over Flow are:
Bin categorical meaningfully yourself based on your dataset, and use that nbins number
"One-hot" encode important categorical levels that is interesting to you. i.e. if you want to see if "elephant"
Do PDPs only for numeric features. If you are using R & Python API - there is a way to pass an array of column names for which you want PDPs to be computed. This feature is also recently added to Flow, to be released soon hopefully.