臉書 (Facebook)開源機器學習超參數調校工具 Ax 與 BoTorch

5月 05, 2019

臉書 (Facebook)開源機器學習超參數調校工具 Ax 與 BoTorch

臉書 (Facebook)開源機器學習超參數調校工具 Ax 與 BoTorch

News from: iThome & Facebook AI Blog.

這兩個工具可以用於A/B測試以及機器學習平臺，只要最少的使用者參與就能獲得最佳的參數配置。

臉書在年度F8大會開源了多項工具，其中 Ax 以及 BoTorch 兩項工具，與人工智慧應用程式開發的實驗階段有關，Ax是一個通用平台，可用於自動管理適應性實驗（Adaptive Experimentation），而BoTorch則是建立在開源深度學習框架 PyTorch 的函式庫，能幫助貝式最佳化（Bayesian Optimization）研究，以API方式提供Ax最佳化演算法，讓開發人員簡單地調整人工智慧參數。

Web Site: https://ai.facebook.com/blog/open-sourcing-ax-and-botorch-new-ai-tools-for-adaptive-experimentation/

臉書提到，在機器學習領域要調校出最佳參數需要花費很多時間，尤其是在複雜的空間中，因此他們開發了Ax和BoTorch兩種工具來解決這個問題，這兩種工具都已經在臉書大量部署使用，是其內部適應性實驗的一部分，在人類的指導下利用機器學習，依序決定配置測試的順序，以達成一系列的試驗目標。

臉書用適應性實驗解決了許多問題，包括新聞饋給（News Feed）和Instagram的排名模型，也最佳化了影片播放演算法、Facebook Live和媒體上傳的演算法，以提供更高品質的影片串流。在後端還會對基礎設施諸如即時編譯器、記憶體配置與資料檢索系統進行最佳化，以提高執行效率，還被用於自動搜尋臉書FBLearner機器學習的超參數，能以更少的計算資源實現更高的模型精準度。

這兩項工具可以降低用戶進入適應性實驗的門檻，Ax讓開發者以低資源以及原則的方式，探索參數配置，BoTorch則是使用了PyTorch的功能，像是自動微分、大規模平行化運算以及深度學習，推進貝式最佳化的研究。

BoTorch是貝式最佳化研究的現代化函式庫，而貝式最佳化則是用來尋找有限資源的最佳系統參數配置。而之所以臉書會開發BoTorch，是因爲過去他們使用現有貝式最佳化工具，進行超參數最佳化任務，逐漸不符合他們日漸增加的工作量，因此臉書自己發展了新方法，讓BoTorch能夠適應臉書多樣化的使用案例。

BoTorch借助了PyTorch的運算能力，臉書重新設計了模型實作以及最佳化程序，使其具有模組化的優點，以及容易使用的介面。BoTorch能夠支援GPU自動化微分、平行運算，並與PyTorch的深度學習模組緊密結合。臉書提到，BoTorch大大提高了貝式最佳化研究人員的工作效率，而且其模組化的設計，可以讓研究人員更換或是重新排列元件，靈活地客製化演算法的每一個部分。

而Ax讓開發者使用API與BoTorch相連接，提供生產服務以及可重現研究結果需要的功能，這讓開發者得以專注於解決實際應用的問題，像是環境配置以及實驗目標的權衡。臉書提到，Ax被沒有豐富機器學習經驗的工程師與人工智慧研究人員廣泛使用。雖然Ax使用大量的BoTorch功能作為最佳化的演算法，但是其提供NumPy和PyTorch通用介面，讓開發人員可以使用在任何框架中。

Ax還提供了自動最佳化程式，會根據實驗的特徵選擇最佳化演算法，開發者可以簡單地設定預設程式，以滿足特定應用程式的需要，更重要的是，Ax還有互動視覺化工具，讓開發者查看代理模型，了解不同結果之間的權衡問題。而開發者也能用基準測試套件，比較不同演算法在測試問題上的最佳化效能，開發者可以儲存結果並進行反覆比較。

--------------------------------------------------------------------

Open-sourcing Ax and BoTorch: New AI tools for adaptive experimentation

How can researchers and engineers explore large configuration spaces that have complex trade-offs when it may take hours or days to evaluate any given configuration? This challenge frequently arises across many domains, including tuning hyperparameters for machine learning (ML) models, finding optimal product settings through A/B testing, and designing next-generation hardware.

Today we are open-sourcing two tools, Ax and BoTorch, that enable anyone to solve challenging exploration problems in both research and production — without the need for large quantities of data.

Ax is an accessible, general-purpose platform for understanding, managing, deploying, and automating adaptive experiments.
BoTorch, built on PyTorch, is a flexible, modern library for Bayesian optimization, a probabilistic method for data-efficient global optimization.

These tools, which have been deployed at scale here at Facebook, are part of our ongoing work in what we have termed “adaptive experimentation,” in which machine learning algorithms, with human guidance, sequentially determine what configurations to test next in order to achieve some set of goals. These methods work by modeling the relationship between limited, potentially noisy observed data from experiments and applying principled exploration strategies, such as bandit optimization and Bayesian optimization, to make decisions.

At Facebook, adaptive experimentation is used to tackle a broad range of problems, including:

Increasing the efficiency of back-end infrastructure, such as just-in-time compilers, memory allocation, and data retrieval systems.
Tuning ranking models, such as those used by News Feed and Instagram, to improve user experience.
Optimizing algorithms for video playback, Facebook Live, and media uploads to deliver higher-quality, smoother video streaming.
Improving response rates on prompts to take surveys or raise awareness of products, such as the blood donations feature on Facebook.
Solving inverse problems in optics for the design of AR and VR hardware with Facebook Reality Labs.
Automating hyperparameter search for Facebook’s FBLearner machine learning platform to achieve high model accuracy with fewer computing resources.
Learning robust robot locomotion policies in simulated and real-world environments.

BoTorch advances the state of the art in Bayesian optimization research by leveraging the features of PyTorch, including auto-differentiation, massive parallelism, and deep learning. BoTorch provides a platform upon which researchers can build and unlocks new areas of research for tackling complex optimization problems. Ax and BoTorch leverage probabilistic models that make efficient use of data and are able to meaningfully quantify the costs and benefits of exploring new regions of problem space. In these cases, probabilistic models can offer significant benefits over standard deep learning methods such as neural networks, which often require large amounts of data to make accurate predictions and don’t provide good estimates of uncertainty.

We hope that by lowering the barrier to entry for adaptive experimentation, Ax will empower developers and researchers to explore more configurations in a principled and resource-efficient way. We also hope BoTorch will be a catalyst for research in this area by providing a powerful, versatile platform for Bayesian optimization research that integrates closely with popular deep learning libraries.

Supporting optimization from research to production

In this blog post, we’ll introduce both projects in detail and then provide a concrete example with code snippets to illustrate how simple it is to use our framework to find optimal configurations.

BoTorch: a modern library for Bayesian optimization research

The goal of Bayesian optimization is to find an optimal configuration of a system with a limited budget of experimental trials. These methods employ a probabilistic surrogate model to make predictions about possible outcomes of unobserved configurations. To search for optimal configurations, we define an acquisition function that uses the surrogate model to assign each configuration a utility. Configurations with the highest utility are tested on the system, and the process repeats. The performance of a Bayesian optimization algorithm is therefore determined by three components: the surrogate model, the acquisition function, and the methods that numerically optimize the acquisition function.

Facebook has previously used Bayesian optimization for simple hyperparameter optimization tasks, but we found existing tools were insufficient to meet our growing needs. So we developed new methods that would support optimization of multiple noisy objectives, scale to highly parallel test environments, leverage low-fidelity approximations, and optimize over high-dimensional parameter spaces. While a number of Bayesian optimization packages existed, they were difficult to extend or customize, and none supported all the features necessary to tackle the diversity of use cases we encounter at Facebook.

To address these challenges, we harnessed the computational capabilities of PyTorch and rethought how we implement models and optimization routines. The result of that work is BoTorch, which provides a modular, easily extensible interface for composing Bayesian optimization primitives, including probabilistic surrogate models, acquisition functions, and optimizers. It also offers support for:

Auto-differentiation, highly parallelized computations on modern hardware (including GPUs), and seamless integration with deep learning modules via PyTorch.
State-of-the art probabilistic modeling in GPyTorch, including support for multitask Gaussian processes (GPs), scalable GPs, deep kernel learning, deep GPs, and approximate inference.
Monte Carlo-based acquisition functions via the reparameterization trick, which makes it straightforward to implement new ideas without having to impose restrictive assumptions about the underlying model.

In our work, we have found that BoTorch substantially improves developer efficiency for Bayesian optimization research. It opens the door for novel methods that do not admit analytic solutions, including batch acquisition functions and proper handling of rich multitask models with multiple correlated outcomes. BoTorch’s modular design makes it possible for researchers to swap out or rearrange individual components in order to customize all aspects of their algorithm, thereby empowering them to do state-of-the art research on modern Bayesian optimization methods.

Ax: an extensible platform for adaptive experimentation

Ax provides easy-to-use APIs to interface with BoTorch, along with the management necessary for production-ready services and reproducible research. This allows developers to focus on the applied problems, such as exploring configurations and understanding trade-offs between objectives. Similarly, it allows researchers to spend more time focusing on the building blocks of Bayesian optimization. At Facebook, Ax has been broadly applied by engineers who do not have extensive experience with machine learning, as well as by AI researchers.

The figure below illustrates how Ax and BoTorch are used within the optimization ecosystem. At Facebook, Ax interfaces with our major A/B testing and machine learning platforms, as well as simulators and other types of backend systems, requiring minimal user involvement for deploying configurations and gathering results.

Ax lowers the barriers to adaptive experimentation for developers and researchers alike through the following core features:

Framework-agnostic interface for implementing new adaptive experimentation algorithms. While Ax makes heavy use of BoTorch for its optimization algorithms, generic NumPy and PyTorch interfaces are provided so that researchers and developers can plug in methods implemented in any framework.
Customizable, automated optimization routines. Ax selects the appropriate optimization strategy — choosing from Bayesian optimization, bandit optimization, and other techniques — according to features of the experiment. These default routines can be easily customized by users to meet the needs of their specific applications.
Tools for system understanding.Interactive visualizations that allow users to view the surrogate model, perform diagnostics, and understand trade-offs between different outcomes.
Human-in-the-loop optimization. In addition to supporting multiple objectives and advancing system understanding, Ax's underlying data model enables experimenters to safely evolve their search space and goals as new data is collected.
Ability to create custom optimization services. Multiple APIs allow using Ax either as a framework that controls deployment and data collection, or as a lightweight library that can be called via a remote service.
A benchmarking suite for evaluating new adaptive experimentation algorithms. Easily compare optimization performance of different algorithms on test problems and save results for reproducible research.

To show what it's like to work with Ax, here is an example of a simple optimization loop using the artificial Booth function as the evaluation function:

from ax import optimize

best_parameters, _, _, _ = optimize(
    parameters=[
        {
          "name": "x1",
          "type": "range",
          "bounds": [-10.0, 10.0],
        },
        {
          "name": "x2",
          "type": "range",
          "bounds": [-10.0, 10.0],
        },
    ],
    evaluation_function=lambda p: (p["x1"] + 2*p["x2"] - 7)**2 + (2*p["x1"] + p["x2"] - 5)**2, 
    minimize=True,
)

best_parameters
# returns {'x1': 1.02, 'x2': 2.97}; true min is (1, 3)


Diving deeper: Using BoTorch with Ax for Bayesian optimization research

Having shown the big picture of what BoTorch and Ax can do, we’ll now dive into the nuts and bolts of how to take a research idea from creation to production.

In many applications, it is desirable to explore the problem space using batches of design points (i.e., configurations). For instance, simulations or ML model training jobs for hyperparameter optimization can be run in parallel on a cluster of compute resources. Doing this kind of batched exploration optimally requires the acquisition function to assess the joint value of a set of design points. One such acquisition function, is the q-Expected Improvement algorithm (qEI), from Parallel Bayesian Global Optimization of Expensive Functions by Wang et al.:

qEI does not admit an analytic expression in terms of the parameters of the posterior distribution. However, it can be estimated using Monte Carlo (MC) sampling via the reparameterization trick, which involves correlating samples drawn from the standard normal distribution using the Cholesky decomposition of the posterior covariance:





Implementing this approximation in BoTorch is straightforward:
import torch
from botorch.acquisition.monte_carlo import MCAcquisitionFunction
from botorch.acquisition.sampler import SobolQMCNormalSampler


class qExpectedImprovement(MCAcquisitionFunction):
    
    def __init__(self, model, best_f, num_samples=500):
        sampler = SobolQMCNormalSampler(num_samples)
        super().__init__(model=model, sampler=sampler)
        self.register_buffer("best_f", torch.as_tensor(best_f))

    def forward(self, X):
        posterior = self.model.posterior(X)  # evaluate posterior at X
        samples = self.sampler(posterior)  # sample from posterior
        delta = (samples - self.best_f).clamp_min(0)  # compute improvement per sample
        delta_max = delta.max(dim=-1)[0]  # compute maximum across the q points 
        qei = delta_max.mean(dim=0)  # average across samples
        return qei



Here, MCAcquisitionFunction is a subclass of torch.nn.Module, and so we only need to implement a forward method. self.sampler() takes 500 quasi-Monte Carlo draws from the (joint) posterior distribution over function values (as modeled by the surrogate model) at the q design points, X. The expected improvement is then the sample average of the largest improvements across q over the best observed value so far (best_f) for each of the 500 samples.

How do we optimize this quantity? PyTorch’s autograd makes it easy to compute gradients:
qEI = qExpectedImprovement(model, best_f=0.0)
X = torch.rand(5, 10, requires_grad=True)
val = qEI(X)
val.backward()
grad = X.grad


This automatic gradient can then be plugged into optimizers that take full advantage of this information to efficiently find the set of design points that maximizes the full joint utility. The figure below shows the trajectory of the design points during optimization for a single batch of size q=4.





The newly minted acquisition function can be plugged into an Ax optimization loop, which internally will use quasi-second order numerical optimization algorithms in conjunction with a random-restart heuristic to optimize the utility of these q design points:




from ax.modelbridge.factory import get_botorch
          
def get_qEI(model, best_f):
    return qExpectedImprovement(model, best_f=best_f)
    
# collect some initial data...

for i in range(num_batches):
    # evaluate all trials that have not yet been evaluated
    data = experiment.eval()
    # set up the model
    model = get_botorch(
        experiment=experiment,
        data=data,  
        search_space=experiment.search_space,
        acqf_constructor=get_qEI,
    )
    # generate candidates and schedule a new trial of 
    # batch size q=4
    trial = experiment.new_trial(model.gen(4))



The figure below shows how parallel evaluations can help speed up the amount of time to optimize a problem.





Given that acquisition functions are a fundamental component of Bayesian optimization, it is important for researchers to easily prototype and test new variants of these functions. The above example shows how easy it is to do this by using a custom BoTorch acquisition function in a standard Ax optimization loop. Models and acquisition function optimizers can be customized in a similar fashion. As a result, researchers can focus on improving the underlying modeling and optimization algorithms in BoTorch and delegate the setup, management, deployment, and analysis to Ax.


The future of adaptive experimentation with Ax and BoTorch



Over time, we will refine the beta releases of our software, expand the set of available algorithms, and provide out-of-box integration with popular scheduling software. We look forward to working with the community to add user-contributed modules to Ax and BoTorch to further improve the platform.


Research areas we are particularly excited about include high-dimensional Bayesian optimization and multi-fidelity optimization. We also believe there are significant opportunities for improving the performance of numerical optimization of acquisition functions using novel parallelism-aware solvers. We plan to further explore these as well as other new features.

Used in tandem, Ax and BoTorch significantly accelerate the process of going from research to production, and we hope it will inspire new use cases for adaptive experimentation within the broader community.

Both Ax and BoTorch are available now, so engineers and researchers can start using them today.

f you are interested in collaborating with the Ax and BoTorch teams, please reach out.

We’d like to acknowledge the contributions to BoTorch and Ax from many researchers, engineers, and data scientists at Facebook. BoTorch is designed to work seamlessly with GPyTorch, and it was developed in collaboration with Jake Gardner from Uber AI Labs and Geoff Pleiss and Andrew Gordon Wilson from Cornell University。