Plaid is certainly a great one for cards that only support OpenCL in terms of deep learning and feed forward… Unfortunately, to my knowledge, there isn’t really a great PyCL option for predictive modeling. Typically I use Julia for my big-data oriented projects that would require such processing power, and in Julia there doesn’t usually tend to be a need to speed things up. And I wrote the predictive modeling module for Julia, so I usually grab my functions from the module and rework them to work with CUDA or OpenCL for tests like this. 23 vs 5 milliseconds with 1.5 million observations isn’t a significant enough gain for me to pursue for the module itself, haha.