IBM 推出資料交換服務 DAX 促進開放資料與 AI 發展

7月 17, 2019

IBM 推出資料交換服務 DAX 促進開放資料與 AI 發展

IBM 推出資料交換服務 DAX 促進開放資料與 AI 發展

News from: iThome & IBM Developer.

DAX上的資料集採用Linux基金會開放資料授權，資料格式與元資料經過標準化，宣稱比一般資料集品質更好。

IBM在OSCON 2019大會上推出了線上資料交換中心DAX（Data Asset eXchange），供開發者和資料科學家可以免費使用，具有明確開放資料授權的資料集，以訓練機器學習模型。IBM在2018年的時候，也推出了線上模型交換中心MAX（Model Asset eXchange），提供開發者免費開箱即用的開源機器學習與深度學習模型。

IBM提到，在DAX上發布的資料集，盡可能都會採用Linux基金會社群資料授權協議（CDLA）的開放資料授權框架，讓資料能夠共享與協作，另外，DAX還提供使用者存取IBM和IBM Research資料集。IBM會定期更新DAX上的資料集，而且IBM Cloud和AI服務也會整合DAX，讓用戶使用DAX上的資料集。

雖然線上已經存在各式各樣的資料集，但IBM強調，DAX提供高品質的資料集，資料經過整理並標準化資料的格式和元資料，而多數其他的資料集往往缺少品質與授權的檢查，因此DAX更適合企業使用，以創建端到端深度學習工作負載。

---------------------------------------------------------------------------------

As more companies adopt artifical intelligence (AI), placing machine learning (ML) models into the hands of developers is imperative. To that end, the Center for Open-Source Data & AI Technologies (CODAIT) launched IBM Model Asset eXchange (MAX) in 2018 to help data scientists and developers easily discover ready-to-use free and open source machine learning and deep learning models.

Today at OSCON 2019, we announced the launch of the IBM Data Asset eXchange (DAX), an online hub for developers and data scientists to find carefully curated free and open datasets under open data licenses. Developers adopting ML models need open data that they can use confidently under clearly defined open data licenses.

Where possible, datasets posted on DAX will use the Linux Foundation’s Community Data License Agreement (CDLA) open data licensing framework to enable data sharing and collaboration. Furthermore, DAX provides unique access to various IBM and IBM Research datasets. IBM plans to publish new datasets on the Data Asset eXchange regularly. The datasets on DAX will integrate with IBM Cloud and AI services as appropriate.

Trusted source of open datasets

For developers, DAX provides a trusted source for carefully curated open datasets for AI. These datasets are ready for use in enterprise AI applications, with related content such as tutorials to make getting started easier.

For staff responsible for dataset usage and vetting, DAX provides curation as well as standardized dataset formats and metadata, in contrast with most other open dataset resources that tend to incorporate fewer quality and licensing terms checks. So DAX datasets are typically more straightforward to adopt within corporations.

Example of datasets in use

An example of the sorts of datasets we’re releasing is the Finance Proposition Bank and Contracts Proposition Bank datasets. These datasets are part of an active research program from IBM Research. This research project aims to improve the natural language understanding technologies behind multiple IBM product offerings, including Watson Natural Language Understanding and Watson Compare & Comply.

Our researchers created these datasets with input from Watson developers, matching the characteristics of the target text to those of the real-world documents that the system analyzes in production. The researchers used these datasets to train domain-specific versions of the parsers that extract semantic meaning from governing business documents such as legal agreements and financial reports.

IBM Research has a long history of doing this kind of work in the open, and we on the CODAIT team are proud to help IBM Research’s mission of openness by releasing this cutting-edge research data on the Data Asset eXchange.

Why DAX?

While there are many resources available online for finding open datasets – ranging from collections of links on GitHub to sites such as Kaggle Datasets – DAX is unique in its high level of quality and curation. DAX helps create end-to-end deep learning workflows (from using the data to train models to deploying models in standard ways) allowing developers to consume open data with confidence under clearly defined open data licenses.

Data you need to develop AI solutions

IBM designed the Data Asset eXchange repository to complement the Model Asset eXchange. The user interface for organizing the assets is consistent across the two platforms, and users can easily train models on MAX using data from the Data Asset eXchange.

The CODAIT team’s goal is to make it straightforward to use DAX and MAX assets in conjunction with IBM AI products as well as other hybrid, multicloud AI tooling, both proprietary and open source. We want to give data scientists and developers well-curated data starting points, so that it’s easier for them to start developing their AI applications and solutions.

搜尋此網誌

Tommas's blog 阿湯哥的部落格