One reason deep learning exploded over the last decade was the availability of programming languages that could automate the math — college-level calculus — that is needed to train each new model. Neural networks are trained by tuning their parameters to try to maximize a score that can be rapidly calculated for training data. The equations used to adjust the parameters in each tuning step used to be derived painstakingly by hand. Deep learning platforms use a method called automatic differentiation to calculate the adjustments automatically. This allowed researchers to rapidly explore a huge space of models, and find the ones that really worked, without needing to know the underlying math.
But what about problems like climate modeling, or financial planning, where the underlying scenarios are fundamentally uncertain? For these problems, calculus alone is not enough — you also need probability theory. The "score" is no longer just a deterministic function of the parameters. Instead, it's defined by a stochastic model that makes random choices to model unknowns. If you try to use deep learning platforms on these problems, they can easily give the wrong answer. To fix this problem, MIT researchers developed ADEV, which extends automatic differentiation to handle models that make random choices. This brings the benefits of AI programming to a much broader class of problems, enabling rapid experimentation with models that can reason about uncertain situations.
Lead author and MIT electrical engineering and computer science PhD student Alex Lew says he hopes people will be less wary of using probabilistic models now that there’s a tool to automatically differentiate them. “The need to derive low-variance, unbiased gradient estimators by hand can lead to a perception that probabilistic models are trickier or more finicky to work with than deterministic ones. But probability is an incredibly useful tool for modeling the world. My hope is that by providing a framework for building these estimators automatically, ADEV will make it more attractive to experiment with probabilistic models, possibly enabling new discoveries and advances in AI and beyond.”
Sasa Misailovic, an associate professor at the University of Illinois at Urbana-Champaign who was not involved in this research, adds: "As the probabilistic programming paradigm is emerging to solve various problems in science and engineering, questions arise on how we can make efficient software implementations built on solid mathematical principles. ADEV presents such a foundation for modular and compositional probabilistic inference with derivatives. ADEV brings the benefits of probabilistic programming — automated math and more scalable inference algorithms — to a much broader range of problems where the goal is not just to infer what is probably true but to decide what action to take next."
In addition to climate modeling and financial modeling, ADEV could also be used for operations research — for example, simulating customer queues for call centers to minimize expected wait times, by simulating the wait processes and evaluating the quality of outcomes — or for tuning the algorithm that a robot uses to grasp physical objects. Co-author Mathieu Huot says he’s excited to see ADEV "used as a design space for novel low-variance estimators, a key challenge in probabilistic computations."
The research, awarded the SIGPLAN Distinguished Paper award at POPL 2023, is co-authored by Vikash Mansighka, who leads MIT's Probabilistic Computing Project in the Department of Brain and Cognitive Sciences and the Computer Science and Artificial Intelligence Laboratory, and helps lead the MIT Quest for Intelligence, as well as Mathieu Huot and Sam Staton, both at Oxford University. Huot adds, "ADEV gives a unified framework for reasoning about the ubiquitous problem of estimating gradients unbiasedly, in a clean, elegant and compositional way." The research was supported by the National Science Foundation, the DARPA Machine Common Sense program, and a philanthropic gift from the Siegel Family Foundation.
"Many of our most controversial decisions — from climate policy to the tax code — boil down to decision-making under uncertainty. ADEV makes it easier to experiment with new ways to solve these problems, by automating some of the hardest math," says Mansinghka. "For any problem that we can model using a probabilistic program, we have new, automated ways to tune the parameters to try to create outcomes that we want, and avoid outcomes that we don't."