Google 釋出無程式碼機器學習模型分析工具What-If Tool
Google釋出無程式碼機器學習模型分析工具What-If Tool
News from: iThome & Google AI Blog.
機器學習的模型訓練完成後,需要經過反覆的探索調校,What-If Tool不需撰寫任何程式碼,就能探索機器學習模型,讓非開發人員眼能參與模型調校工作。
Web site: https://ai.googleblog.com/2018/09/the-what-if-tool-code-free-probing-of.html
Google推出開源TensorBoard網頁應用程式的新功能What-If Tool,讓使用者在不需要撰寫任何程式碼的情況下,檢測機器學習模型,使用視覺化互動介面,探索模型結果。
建構有效的機器學習系統有很多面向需要注意,除了演算法和效能表現外,資料也是一個良好機器學習應用的根本要素,TensorFlow官方早前釋出了TensorFlow資料驗證(TensorFlow Data Validation,TFDV)工具,來幫助開發者進行大規模資料分析與驗證。而Google也提到,機器學習的模型訓練完成後,仍然可能存在許多問題,需要經過反覆的探索調校。
Google認為,一個優秀的開發人員,應該要可以對自己訓練出來的機器模型,回答幾個問題,像是資料的變化會如何影響模型的預測?機器模型對不同的群體有哪些不同的表現?用來測試模型的資料是否足夠多樣化?要回答這些問題並不容易,通常探索機器學習模型必須要編寫一次性的程式碼,來分析特定的模型,但這個過程通常效率很低,而且不會寫程式的人也很難參與其中。
為此,Google在開源的TensorBoard網頁應用程式中,推出了What-If Tool新功能,可以讓使用者在不撰寫任何程式碼的情況下分析模型,只要給What-If Tool一個TensorFlow模型以及資料集的指標,該工具就能提供互動式視覺化介面,讓使用者探索模型結果。What-If Tool包含了豐富的有用功能,除了能使用Facets自動視覺化資料集外,還可以從資料集手動編輯範例以檢視影響變化等。
What-If Tool在Google內部測試的結果,有一個團隊發現他們的機器學習模型,錯誤忽略資料集的整體特徵,還有另一個團隊則使用What-If Tool的視覺化工具,整理模型最佳與最差的範例類型,發現了導致他們模型表現不佳的原因。
-------------------------------------------------------------------------------
Building effective machine learning (ML) systems means asking a lot of questions. It's not enough to train a model and walk away. Instead, good practitioners act as detectives, probing to understand their model better: How would changes to a datapoint affect my model’s prediction? Does it perform differently for various groups–for example, historically marginalized people? How diverse is the dataset I am testing my model on?
Answering these kinds of questions isn’t easy. Probing “what if” scenarios often means writing custom, one-off code to analyze a specific model. Not only is this process inefficient, it makes it hard for non-programmers to participate in the process of shaping and improving ML models. One focus of the Google AI PAIR initiative is making it easier for a broad set of people to examine, evaluate, and debug ML systems.
Today, we are launching the What-If Tool, a new feature of the open-source TensorBoard web application, which let users analyze an ML model without writing code. Given pointers to a TensorFlow model and a dataset, the What-If Tool offers an interactive visual interface for exploring model results.
The What-If Tool has a large set of features, including visualizing your dataset automatically using Facets, the ability to manually edit examples from your dataset and see the effect of those changes, and automatic generation of partial dependence plots which show how the model’s predictions change as any single feature is changed. Let’s explore two features in more detail.
Counterfactuals
With a click of a button you can compare a datapoint to the most similar point where your model predicts a different result. We call such points "counterfactuals," and they can shed light on the decision boundaries of your model. Or, you can edit a datapoint by hand and explore how the model’s prediction changes. In the screenshot below, the tool is being used on a binary classification model that predicts whether a person earns more than $50k based on public census data from the UCI census dataset. This is a benchmark prediction task used by ML researchers, especially when analyzing algorithmic fairness — a topic we'll get to soon. In this case, for the selected datapoint, the model predicted with 73% confidence that the person earns more than $50k. The tool has automatically located the most-similar person in the dataset for which the model predicted earnings of less than $50k and compares the two side-by-side. In this case, with just a minor difference in age and an occupation change, the model’s prediction has flipped.
Analysis of Performance and Algorithmic Fairness
You can also explore the effects of different classification thresholds, taking into account constraints such as different numerical fairness criteria. The below screenshot shows the results of a smile detector model, trained on the open-source CelebA dataset which consists of annotated face images of celebrities. Below, the faces in the dataset are divided by whether they have brown hair, and for each of the two groups there is an ROC curve and confusion matrix of the predictions, along with sliders for setting how confident the model must be before determining that a face is smiling. In this case, the confidence thresholds for the two groups were set automatically by the tool to optimize for equal opportunity.
Demos
To illustrate the capabilities of the What-If Tool, we’ve released a set of demos using pre-trained models:
- Detecting misclassifications: A multiclass classification model, which predicts plant type from four measurements of a flower from the plant. The tool is helpful in showing the decision boundary of the model and what causes misclassifications. This model is trained with the UCI iris dataset.
- Assessing fairness in binary classification models: The image classification modelfor smile detection mentioned above. The tool is helpful in assessing algorithmic fairness across different subgroups. The model was purposefully trained without providing any examples from a specific subset of the population, in order to show how the tool can help uncover such biases in models. Assessing fairness requires careful consideration of the overall context — but this is a useful quantitative starting point.
- Investigating model performance across different subgroups: A regression modelthat predicts a subject’s age from census information. The tool is helpful in showing relative performance of the model across subgroups and how the different features individually affect the prediction. This model is trained with the UCI census dataset.
We tested the What-If Tool with teams inside Google and saw the immediate value of such a tool. One team quickly found that their model was incorrectly ignoring an entire feature of their dataset, leading them to fix a previously-undiscovered code bug. Another team used it to visually organize their examples from best to worst performance, leading them to discover patterns about the types of examples their model was underperforming on.
We look forward to people inside and outside of Google using this tool to better understand ML models and to begin assessing fairness. And as the code is open-source, we welcome contributions to the tool.
Acknowledgments
The What-If Tool was a collaborative effort, with UX design by Mahima Pushkarna, Facets updates by Jimbo Wilson, and input from many others. We would like to thank the Google teams that piloted the tool and provided valuable feedback and the TensorBoard team for all their help.
留言
張貼留言