Why PyTorch Then?
In preparing for a GTC 2017 presentation, I was driven to emphasize CUDA-enabled GPUs as the platform upon which I’d run my Machine Learning applications. Although I’d already had some encouraging experience with Apache Spark’s MLlib in a classification problem, ‘porting’ from in-memory computations based upon use of CPUs to GPUs was and remains ‘exploratory’ – with, perhaps, the notable exception of a cloud-based offering from Databricks themselves. Instead, in ramping up for this Silicon Valley event, I approached this ‘opportunity’ with an open mind and began my GPU-centric effort by starting at an NVIDIA page for developers. As I wrote post-event in August 2017:
Frankly, the outcome surprised me: As a consequence of my GTC-motivated reframing, I ‘discovered’ Natural Language Processing (NLP) – broadly speaking, the use of human languages by a computer. Moreover, by reviewing the breadth and depth of possibilities for actually doing some NLP on my Twitter data, I subsequently ‘discovered’ PyTorch – a Python-based framework for Deep Learning that can readily exploit GPUs. It’s important to note that PyTorch is not the only choice available for engaging in NLP on GPUs, and it certainly isn’t the most-obvious choice. As I allude to in my GTC presentation, however, I was rapidly drawn to PyTorch.
Despite that most-obvious choice (I expect) of TensorFlow, I selected PyTorch for reasons that included the following:
- Python support – At the time, the appeal was from more of the perspective of personal programming preference than anything else. In my classification use case, I’d previously made use of PySpark, for example. From numpy, scipy and Cython to mathplotlib, this ‘Pythonic appeal’ was both familiar and capable. Subsequently, as Python arguably became the de facto programming language for Machine Learning in particular and Data Science in general, this preference for Python support continues to serve well even through the rear-view mirror of hindsight.
- Leveraged lineage – Though PyTorch leverages Torch extensively, it does so in appealing ways. For example, it (PyTorch) claims efficient memory usage when it comes to computations involving tensors, as well as a tape-based autograd system for building deep neural networks.
- GPU acceleration – Through integrations with NVIDIA CuDNN and NCCL libraries, PyTorch was able to claim strong GPU acceleration. And, as long as you’ve somehow taken care of deploying the CUDA-based platform for fully exploiting GPUs as computational resources, making use of PyTorch on GPUs versus CPUs is painless! Implicit here, of course, is the ability to program in Python while being able to tap into the ultimate means for accelerating applications executing on GPUs; in other words, developers derive all of the benefits inherent in programming that exploits GPUs, without any need for explicit knowledge of CUDA. From the perspective of this ‘simple-minded’ Python programmer, this benefit is nothing short of intoxicating!
- Resources – Tutorials for Machine Learning via PyTorch are readily available. Even before my GTC presentation, I’d zeroed in on NLP as the approach I wanted pursue in analyzing data extracted from Twitter. Although my initial inclination was to ‘repeat’ classification from a NLP perspective through use of bag-of-words text classification, I fortunately glommed onto the semantically richer use of word embeddings – an approach that continues to demonstrate interesting prospects for a scientific use case that requires processing of tweets.
Not bad for version 0.1 of a framework, I’d say! In fact, by the time I was responding to referee’s feedback in revising a book chapter (please see “Refactoring Earthquake-Tsunami Causality with Big Data Analytics” under NLP in my Data Science Portfolio), PyTorch was revised to version 0.2.0. This was a very welcome revision in the context of this chapter revision, however, as it included a built-in method for performing cosine similarities (“cosine_similarity”) – the key discriminator for quantitatively assessing the semantic similarity between two word vectors.
Perhaps my enthusiasm for PyTorch isn’t all that surprising, as I do fit into one of their identified user profiles:
PyTorch has gotten its biggest adoption from researchers, and it’s gotten about a moderate response from data scientists. As we expected, we did not get any adoption from product builders because PyTorch models are not easy to ship into mobile, for example. We also have people who we did not expect to come on board, like folks from OpenAI and several universities.
Towards PyTorch 1.0
In this same August 2017 O’Reilly podcast (from which I extracted the above quote on user profiles), Facebook’s Soumith Chintala stated:
Internally at Facebook, we have a unified strategy. We say PyTorch is used for all of research and Caffe 2 is used for all of production. This makes it easier for us to separate out which team does what and which tools do what. What we are seeing is, users first create a PyTorch model. When they are ready to deploy their model into production, they just convert it into a Caffe 2 model, then ship into either mobile or another platform.
Perhaps it’s not entirely surprising then that the 1.0 release intends to “… marry PyTorch and Caffe2 which gives the production-level readiness for PyTorch.” My understanding is that researchers (and others) retain the highly favorable benefit of developing in PyTorch but then, via the new JIT compiler, acquire the ability to deploy into production via Caffe2 or “… [export] to C++-only runtimes for use in larger projects”; thus PyTorch 1.0’s production reach extends to runtimes other than just Python-based ones – e.g., those runtimes that drive iOS, Android and other mobile devices. With TensorFlow already having emerged as the ‘gorilla of all frameworks’, the productionizing choice in the implementation of PyTorch will be well received by Facebook and other proponents of Caffe2.
The productionization of PyTorch also includes:
- A C++ frontend – “… a pure C++ interface to the PyTorch backend that follows the API and architecture of the established Python frontend …” that “… is intended to enable research in high performance, low latency and bare metal C++ applications.”
- Distributed PyTorch enhancements – Originally introduced in version 0.2.0 of PyTorch, “… the torch.distributed package … allows you to exchange Tensors among multiple machines.” Otherwise a core-competence of distributed TensorFlow, this ability to introduce parallelism via distributed processing becomes increasingly important as Deep Learning applications and their workflows transition from prototypes into production – e.g., as the demands of training escalate. In PyTorch 1.0, use of a new library (“C10D”) is expected to significantly enhance performance, while asynchronously enabling communications – even when use is made of the familiar-to-HPC-types Message Passing Interface (MPI).
In May 2018, over on Facebook’s developer-centric blog, Bill Jia posted:
Over the coming months, we’re going to refactor and unify the codebases of both the Caffe2 and PyTorch 0.4 frameworks to deduplicate components and share abstractions. The result will be a unified framework that supports efficient graph-mode execution with profiling, mobile deployment, extensive vendor integrations, and more.
As of this writing, a version 1 release candidate for PyTorch 1.0 is available via GitHub.
Stable releases for previous versions are available for ‘local’ or cloud use.
Key Takeaway: Why PyTorch Now!
Whereas it might’ve been a no-brainer to adopt TensorFlow as your go-to framework for all of your Deep Learning needs, I found early releases of PyTorch to be an effective enabler over a year ago – when it was only at the 0.2.0 release stage! Fortunately, the team behind PyTorch has continued to advance the capabilities offered – capabilities that are soon to officially include production-ready distributed processing. If you’re unaware of PyTorch, or bypassed it in the past, it’s likely worth another look right now.