New paper: Machine Learning for Predictive Auto-Tuning with Boosted Regression Trees

20 May 2012 - Cambridge MA

A new paper by myself, Nicolas Pinto and David D. Cox about generalization in auto-tuning shows that simple machine learning (function approximation by ensembles of regression trees) can make JIT auto-tuning possible. For work like ours that involves sifting through a lot of different models with different architectures the technique promises better performance without having to muck around with case-by-case optimization.

What would be really cool would be to modify Theano or similar to automatically build this kind of “predictive auto-tuning” into the normal function evaluation process, by e.g. running an exploratory option every 1/10 of the time or something so that it just automatically auto-tunes itself as it runs. That’s future work though.

Citation:
J. Bergstra, N. Pinto, D. D. Cox (2012).
Machine Learning for Predictive Auto-Tuning with Boosted Regression Trees.
Proc. Innovative Parallel Computing (INPAR12).

Abstract:
The rapidly evolving landscape of multicore architectures makes the construction of efficient libraries a daunting task. A family of methods known collectively as “auto-tuning” has emerged to address this challenge. Two major approaches to auto-tuning are empirical and model-based: empirical auto-tuning is a generic but slow approach that works by measuring runtimes of candidate implementations, model-based auto-tuning predicts those runtimes using simplified hardware abstractions designed by hand. We show that machine learning methods for non-linear regression can be used to estimate timing models from data, capturing the best of both approaches. A statistically-derived model offers the speed of a model-based approach, with the generality and simplicity of empirical auto-tuning. We validate our approach using the filterbank correlation kernel described in (Pinto and Cox , 2011), where we find that 0.1 seconds of hill climbing on the regression model (“predictive auto-tuning”) can achieve almost the same speed-up as is brought by minutes of empirical auto-tuning. Our approach is not specific to filterbank correlation, nor even to GPU kernel auto-tuning, and can be applied to almost any templated-code optimization problem, spanning a wide variety of problem types, kernel types, and platforms.