開源深度學習框架 PyTorch 1.2 加入新的模型編譯 API

開源深度學習框架 PyTorch 1.2 加入新的模型編譯 API

News from: iThome & PyTorch

新的TorchScript API可讓開發者更簡單地將PyTorch模型編譯成TorchScript,並支援更多的Python程式語言功能。

開源深度學習框架PyTorch開發團隊釋出了最新的PyTorch 1.2,這個版本的重點在增加了新的模型編譯器TorchScript API,並且擴展了開放模型格式ONNX輸出功能,另外,音訊、文字和電腦視覺應用函式庫,也都發布了新的版本,以配合最新的PyTorch核心。



開發者可以使用TorchScript,將PyTorch模型序列化以及最佳化後,在正式生產環境提供服務,TorchScript是一個編譯器,可以將PyTorch模型轉換成靜態類型圖表達(Graph Representation),為不支援Python的受限環境,提供執行PyTorch模型的方法。開發者也可以將模型轉換成TorchScript,並且將編譯後的代碼與Python一併使用。

而TorchScript在PyTorch 1.2的更新,擴充對TorchScript的支援,除了增加了一個更簡單使用的API,幫助開發者將模型編譯為TorchScript外,還支援更多可用於PyTorch模型的Python子集,包括Python程式語言的結構(Construct)以及標準函式庫,像是for迴圈、zip()與enumerate()等基於迭代的結構,還有支援數學和字串函式庫,與其他Python內建函式。

現在也更完整的支援ONNX Opset匯出,在PyTorch團隊與微軟合作之後,提供了匯出ONNX Opset 7到10版的支援,而且為支援最新的ONNX Opset 10,PyTorch 1.2還加入了常數摺疊(Constant Folding)傳遞。在PyTorch 1.2中,ScriptModule支援多輸出、Tensor Factories,並可在輸出與輸入使用數組(Tuples),開發人員也可以註冊自己的符號,以輸出自定義的操作,或在匯出時指定動態輸入的維度。

PyTorch領域函式庫則提供常用的資料集和模型等,讓開發者快速創建特定領域的基線程式,並且提供常用的功能抽象,減少開發者重複撰寫的樣板程式碼,在釋出PyTorch 1.2的同時,開發團隊也更新音訊應用類Torchaudio、文字應用類Torchtext和電腦視覺應用類Torchvision三個函式庫,以支援最新的PyTorch核心。


Torchaudio是一個機器學習函式庫,內建訊號處理功能,其使用PyTorch的GPU功能,提供波形轉換等工具,使音訊資料更易於使用,新版Torchaudio經重新設計釋出0.3.0,重點是提供了標準化以及複數功能。Torchtext則能加速自然語言處理應用的開發,新的0.4.0版本加入數個監督式學習基線,開發者可簡單地將資料載入使用。Torchvision推出0.4版本,現在新支援影片,以及資料載入、預訓練模型和轉換功能。

-----------------------------------------------------------------------------------------------



Since the release of PyTorch 1.0, we’ve seen the community expand to add new tools, contribute to a growing set of models available in the PyTorch Hub, and continually increase usage in both research and production.

From a core perspective, PyTorch has continued to add features to support both research and production usage, including the ability to bridge these two worlds via TorchScript. Today, we are excited to announce that we have four new releases including PyTorch 1.2, torchvision 0.4, torchaudio 0.3, and torchtext 0.4. You can get started now with any of these releases at pytorch.org.

PyTorch 1.2

With PyTorch 1.2, the open source ML framework takes a major step forward for production usage with the addition of an improved and more polished TorchScript environment. These improvements make it even easier to ship production models, expand support for exporting ONNX formatted models, and enhance module level support for Transformers. In addition to these new features, TensorBoard is now no longer experimental - you can simply typefrom torch.utils.tensorboard import SummaryWriter to get started.

TORCHSCRIPT IMPROVEMENTS
Since its release in PyTorch 1.0, TorchScript has provided a path to production for eager PyTorch models. The TorchScript compiler converts PyTorch models to a statically typed graph representation, opening up opportunities for optimization and execution in constrained environments where Python is not available. You can incrementally convert your model to TorchScript, mixing compiled code seamlessly with Python.

PyTorch 1.2 significantly expands TorchScript’s support for the subset of Python used in PyTorch models and delivers a new, easier-to-use API for compiling your models to TorchScript. See the migration guide for details. Below is an example usage of the new API:



To learn more, see our Introduction to TorchScript and Loading a PyTorch Model in C++ tutorials.

EXPANDED ONNX EXPORT

The ONNX community continues to grow with an open governance structure and additional steering committee members, special interest groups (SIGs), and working groups (WGs). In collaboration with Microsoft, we’ve added full support to export ONNX Opset versions 7(v1.2), 8(v1.3), 9(v1.4) and 10 (v1.5). We’ve have also enhanced the constant folding pass to support Opset 10, the latest available version of ONNX. ScriptModule has also been improved including support for multiple outputs, tensor factories, and tuples as inputs and outputs. Additionally, users are now able to register their own symbolic to export custom ops, and specify the dynamic dimensions of inputs during export. Here is a summary of the all of the major improvements:

  1. Support for multiple Opsets including the ability to export dropout, slice, flip, and interpolate in Opset 10.
  2. Improvements to ScriptModule including support for multiple outputs, tensor factories, and tuples as inputs and outputs.
  3. More than a dozen additional PyTorch operators supported including the ability to export a custom operator.
  4. Many big fixes and test infra improvements.
You can try out the latest tutorial here, contributed by @lara-hdr at Microsoft. A big thank you to the entire Microsoft team for all of their hard work to make this release happen!

NN.TRANSFORMER

In PyTorch 1.2, we now include a standard nn.Transformer module, based on the paper “Attention is All You Need”. The nn.Transformer module relies entirely on an attention mechanism to draw global dependencies between input and output. The individual components of the nn.Transformer module are designed so they can be adopted independently. For example, the nn.TransformerEncoder can be used by itself, without the larger nn.Transformer. The new APIs include:

  • nn.Transformer
  • nn.TransformerEncoder and nn.TransformerEncoderLayer
  • nn.TransformerDecoder and nn.TransformerDecoderLayer


See the Transformer Layers documentation for more information. See here for the full PyTorch 1.2 release notes.


Domain API Library Updates


PyTorch domain libraries like torchvision, torchtext, and torchaudio provide convenient access to common datasets, models, and transforms that can be used to quickly create a state-of-the-art baseline. Moreover, they also provide common abstractions to reduce boilerplate code that users might have to otherwise repeatedly write. Since research domains have distinct requirements, an ecosystem of specialized libraries called domain APIs (DAPI) has emerged around PyTorch to simplify the development of new and existing algorithms in a number of fields. We’re excited to release three updated DAPI libraries for text, audio, and vision that compliment the PyTorch 1.2 core release.

TORCHAUDIO 0.3 WITH KALDI COMPATIBILITY, NEW TRANSFORMS



Torchaudio specializes in machine understanding of audio waveforms. It is an ML library that provides relevant signal processing functionality (but is not a general signal processing library). It leverages PyTorch’s GPU support to provide many tools and transformations for waveforms to make data loading and standardization easier and more readable. For example, it offers data loaders for waveforms using sox, and transformations such as spectrograms, resampling, and mu-law encoding and decoding.

We are happy to announce the availability of torchaudio 0.3.0, with a focus on standardization and complex numbers, a transformation (resample) and two new functionals (phase_vocoder, ISTFT), Kaldi compatibility, and a new tutorial. Torchaudio was redesigned to be an extension of PyTorch and a part of the domain APIs (DAPI) ecosystem.


Standardization

Significant effort in solving machine learning problems goes into data preparation. In this new release, we’ve updated torchaudio’s interfaces for its transformations to standardize around the following vocabulary and conventions.

Tensors are assumed to have channel as the first dimension and time as the last dimension (when applicable). This makes it consistent with PyTorch’s dimensions. For size names, the prefix n_ is used (e.g. “a tensor of size (n_freqn_mel)”) whereas dimension names do not have this prefix (e.g. “a tensor of dimension (channel, time)”). The input of all transforms and functions now assumes channel first. This is done to be consistent with PyTorch, which has channel followed by the number of samples. The channel parameter of all transforms and functions is now deprecated.

The output of STFT is (channel, frequency, time, 2), meaning for each channel, the columns are the Fourier transform of a certain window, so as we travel horizontally we can see each column (the Fourier transformed waveform) change over time. This matches the output of librosa so we no longer need to transpose in our test comparisons with SpectrogramMelScaleMelSpectrogram, and MFCC. Moreover, because of these new conventions, we deprecated LC2CL and BLC2CBL which were used to transfer from one shape of signal to another.

As part of this release, we’re also introducing support for complex numbers via tensors of dimension (…, 2), and providing magphase to convert such a tensor into its magnitude and phase, and similarly complex_norm and angle.

The details of the standardization are provided in the README.

Functionals, Transformations, and Kaldi Compatibility

Prior to the standardization, we separated state and computation intotorchaudio.transforms and torchaudio.functional.

As part of the transforms, we’re adding a new transformation in 0.3.0: ResampleResample can upsample or downsample a waveform to a different frequency.

As part of the functionals, we’re introducing: phase_vocoder, a phase vocoder to change the speed of a waveform without changing its pitch, and ISTFT, the inverse STFT implemented to be compatible with STFT provided by PyTorch. This separation allows us to make functionals weak scriptable and to utilize JIT in 0.3.0. We thus have JIT and CUDA support for the following transformations: SpectrogramAmplitudeToDB (previously named SpectrogramToDB), MelScaleMelSpectrogramMFCCMuLawEncodingMuLawDecoding (previously named MuLawExpanding).

We now also provide a compatibility interface with Kaldi to ease onboarding and reduce a user’s code dependency on Kaldi. We now have an interface for spectrogramfbank, and resample_waveform.

New Tutorial

To showcase the new conventions and transformations, we have a new tutorial demonstrating how to preprocess waveforms using torchaudio. This tutorial walks through an example of loading a waveform and applying some of the available transformations to it.

We are excited to see an active community around torchaudio and eager to further grow and support it. We encourage you to go ahead and experiment for yourself with this tutorial and the two datasets that are available: VCTK and YESNO! They have an interface to download the datasets and preprocess them in a convenient format. You can find the details in the release notes here.

TORCHTEXT 0.4 WITH SUPERVISED LEARNING DATASETS

A key focus area of torchtext is to provide the fundamental elements to help accelerate NLP research. This includes easy access to commonly used datasets and basic preprocessing pipelines for working on raw text based data. The torchtext 0.4.0 release includes several popular supervised learning baselines with “one-command” data loading. 

tutorial is included to show how to use the new datasets for text classification analysis. We also added and improved on a few functions such as get_tokenizer and build_vocab_from_iterator to make it easier to implement future datasets. Additional examples can be found here.

Text classification is an important task in Natural Language Processing with many applications, such as sentiment analysis. The new release includes several popular text classification datasets for supervised learning including:


  • AG_NEWS
  • SogouNews
  • DBpedia
  • YelpReviewPolarity
  • YelpReviewFull
  • YahooAnswers
  • AmazonReviewPolarity
  • AmazonReviewFull


Each dataset comes with two parts (train vs. test), and can be easily loaded with a single command. The datasets also support an ngrams feature to capture the partial information about the local word order. Take a look at the tutorial here to learn more about how to use the new datasets for supervised problems such as text classification analysis.




In addition to the domain library, PyTorch provides many tools to make data loading easy. Users now can load and preprocess the text classification datasets with some well supported tools, like torch.utils.data.DataLoader and torch.utils.data.IterableDataset. Here are a few lines to wrap the data with DataLoader. More examples can be found here.



TORCHVISION 0.4 WITH SUPPORT FOR VIDEO


Video is now a first-class citizen in torchvision, with support for data loading, datasets, pre-trained models, and transforms. The 0.4 release of torchvision includes:
  • Efficient IO primitives for reading/writing video files (including audio), with support for arbitrary encodings and formats.
  • Standard video datasets, compatible with torch.utils.data.Dataset and torch.utils.data.DataLoader.
  • Pre-trained models built on the Kinetics-400 dataset for action classification on videos (including the training scripts).
  • Reference training scripts for training your own video models.
We wanted working with video data in PyTorch to be as straightforward as possible, without compromising too much on performance. As such, we avoid the steps that would require re-encoding the videos beforehand, as it would involve:
  • A preprocessing step which duplicates the dataset in order to re-encode it.
  • An overhead in time and space because this re-encoding is time-consuming.
  • Generally, an external script should be used to perform the re-encoding.
Additionally, we provide APIs such as the utility class, VideoClips, that simplifies the task of enumerating all possible clips of fixed size in a list of video files by creating an index of all clips in a set of videos. It also allows you to specify a fixed frame-rate for the videos. An example of the API is provided below:



Most of the user-facing API is in Python, similar to PyTorch, which makes it easily extensible. Plus, the underlying implementation is fast — torchvision decodes as little as possible from the video on-the-fly in order to return a clip from the video.Check out the torchvision 0.4 release notes here for more details.

We look forward to continuing our collaboration with the community and hearing your feedback as we further improve and expand the PyTorch deep learning platform.







留言

熱門文章