This paper is one that I unfortunately missed getting to see presented at NIPS, but which has been getting quite a lot of attention in ML circles in the last few days. The authors, who count among their number Jeff Dean, a very well-respected and early-days Google employee, have one core point, that they reiterate throughout the paper: at their heart, database indexes are models. They may not (typically) be statistically learned, but they are structures that provide a (hopefully quite fast) mapping between an input (the key upon which the index is built) and an output (a position in memory). A Binary Tree, which is a typical such structure used for ordered data, even takes the form of, well, a tree, which a core tool in the machine learning toolbox.
Building on this key intuition, the paper then asks: well, if these structures are just models, could statistical models that learn, and then leverage, the distribution of the data being