PyTorch Then & Now: A Highly Viable Framework for Deep Learning

Why PyTorch Then?

In preparing for a GTC 2017 presentation, I was driven to emphasize CUDA-enabled GPUs as the platform upon which I’d run my Machine Learning applications. Although I’d already had some encouraging experience with Apache Spark’s MLlib in a classification problem, ‘porting’ from in-memory computations based upon use of CPUs to GPUs was and remains ‘exploratory’ – with, perhaps, the notable exception of a cloud-based offering from Databricks themselves. Instead, in ramping up for this Silicon Valley event, I approached this ‘opportunity’ with an open mind and began my GPU-centric effort by starting at an NVIDIA page for developers. As I wrote post-event in August 2017:

Frankly, the outcome surprised me: As a consequence of my GTC-motivated reframing, I ‘discovered’ Natural Language Processing (NLP) – broadly speaking, the use of human languages by a computer. Moreover, by reviewing the breadth and depth of possibilities for actually doing some NLP on my Twitter data, I subsequently ‘discovered’ PyTorch – a Python-based framework for Deep Learning that can readily exploit GPUs. It’s important to note that PyTorch is not the only choice available for engaging in NLP on GPUs, and it certainly isn’t the most-obvious choice. As I allude to in my GTC presentation, however, I was rapidly drawn to PyTorch.

Despite that most-obvious choice (I expect) of TensorFlow, I selected PyTorch for reasons that included the following:

Not bad for version 0.1 of a framework, I’d say! In fact, by the time I was responding to referee’s feedback in revising a book chapter (please see “Refactoring Earthquake-Tsunami Causality with Big Data Analytics” under NLP in my Data Science Portfolio), PyTorch was revised to version 0.2.0. This was a very welcome revision in the context of this chapter revision, however, as it included a built-in method for performing cosine similarities (“cosine_similarity”) – the key discriminator for quantitatively assessing the semantic similarity between two word vectors.

Perhaps my enthusiasm for PyTorch isn’t all that surprising, as I do fit into one of their identified user profiles:

PyTorch has gotten its biggest adoption from researchers, and it’s gotten about a moderate response from data scientists. As we expected, we did not get any adoption from product builders because PyTorch models are not easy to ship into mobile, for example. We also have people who we did not expect to come on board, like folks from OpenAI and several universities.

Towards PyTorch 1.0

In this same August 2017 O’Reilly podcast (from which I extracted the above quote on user profiles), Facebook’s Soumith Chintala stated:

Internally at Facebook, we have a unified strategy. We say PyTorch is used for all of research and Caffe 2 is used for all of production. This makes it easier for us to separate out which team does what and which tools do what. What we are seeing is, users first create a PyTorch model. When they are ready to deploy their model into production, they just convert it into a Caffe 2 model, then ship into either mobile or another platform.

Perhaps it’s not entirely surprising then that the 1.0 release intends to “… marry PyTorch and Caffe2 which gives the production-level readiness for PyTorch.” My understanding is that researchers (and others) retain the highly favorable benefit of developing in PyTorch but then, via the new JIT compiler, acquire the ability to deploy into production via Caffe2 or “… [export] to C++-only runtimes for use in larger projects”; thus PyTorch 1.0’s production reach extends to runtimes other than just Python-based ones – e.g., those runtimes that drive iOS, Android and other mobile devices. With TensorFlow already having emerged as the ‘gorilla of all frameworks’, the productionizing choice in the implementation of PyTorch will be well received by Facebook and other proponents of Caffe2.

The productionization of PyTorch also includes:

  • A C++ frontend – “… a pure C++ interface to the PyTorch backend that follows the API and architecture of the established Python frontend …” that “… is intended to enable research in high performance, low latency and bare metal C++ applications.”
  • Distributed PyTorch enhancements – Originally introduced in version 0.2.0 of PyTorch, “… the torch.distributed package … allows you to exchange Tensors among multiple machines.” Otherwise a core-competence of distributed TensorFlow, this ability to introduce parallelism via distributed processing becomes increasingly important as Deep Learning applications and their workflows transition from prototypes into production – e.g., as the demands of training escalate. In PyTorch 1.0, use of a new library (“C10D”) is expected to significantly enhance performance, while asynchronously enabling communications – even when use is made of the familiar-to-HPC-types Message Passing Interface (MPI).

In May 2018, over on Facebook’s developer-centric blog, Bill Jia posted:

Over the coming months, we’re going to refactor and unify the codebases of both the Caffe2 and PyTorch 0.4 frameworks to deduplicate components and share abstractions. The result will be a unified framework that supports efficient graph-mode execution with profiling, mobile deployment, extensive vendor integrations, and more.

As of this writing, a version 1 release candidate for PyTorch 1.0 is available via GitHub.

Stable releases for previous versions are available for ‘local’ or cloud use.

Key Takeaway: Why PyTorch Now!

Whereas it might’ve been a no-brainer to adopt TensorFlow as your go-to framework for all of your Deep Learning needs, I found early releases of PyTorch to be an effective enabler over a year ago – when it was only at the 0.2.0 release stage! Fortunately, the team behind PyTorch has continued to advance the capabilities offered – capabilities that are soon to officially include production-ready distributed processing. If you’re unaware of PyTorch, or bypassed it in the past, it’s likely worth another look right now.

Data Scientist: Believe. Behave. Become.

A Litmus Test

When do you legitimately get to call yourself a Data Scientist?

How about a litmus test? You’re at a gathering of some type, and someone asks you:

So, what do you do?

At which point can you (or me, or anyone) respond with confidence:

I’m a Data Scientist.

I think the responding-with-confidence part is key here for any of us with a modicum of humility, education, experience, etc. I don’t know about you, but I’m certainly not interested in this declaration being greeted by judgmental guffaws, coughing spasms, involuntary eye motion, etc. Instead of all this overt ‘body language’, I’m sure we’d all prefer to receive an inquiring response along the lines of:

Oh, just what the [expletive deleted] is that?

Or, at least:

Dude, seriously, did you like, just make that up?

Responses to this very-legitimate, potentially disarming question, will need to be saved for another time – though I’m sure a quick Google search will reveal a just-what-the-[expletive deleted]-is-Data-Scientist elevator pitch.

To return to the question intended for this post however, let’s focus for a moment on how a best-selling author ‘became’ a writer.

“I’m a Writer”

I was recently listening to best-selling author Jeff Goins being interviewed by podcast host Srini Rao on an episode of the Unmistakable Creative. Although the entire episode (and the podcast in general, frankly) is well worth the listen, my purpose here is to extract the discussion relating to Goins’ own process of becoming a writer. In this episode of the podcast, Goins recalls the moment when he believed he was a writer. He then set about behaving as a writer – essentially, the hard work of showing up every single day just to write. Goins continues by explaining how based upon his belief (“I am writer”) and his behavior (i.e., the practice of writing on a daily basis), he ultimately realized his belief through his actions (behavior) and became a writer. With five, best selling books to his credit, plus a high-traffic-blog property, and I’m sure much more, it’s difficult now to dispute Goins’ claim of being a writer.

Believe. Behave. Become. Sounds like a simple enough algorithm, so in the final section of this post, I’ll apply it to the question posed at the outset – namely:

When do you legitimately get to call yourself a Data Scientist?

I’m a Data Scientist?

I suppose, then, that by direct application of Goins’ algorithm, you can start the process merely by believing you’re a Data Scientist. Of course, I think we all know that that’ll only get you so far, and probably not even to a first interview. More likely, I think that most would agree that we need to have some Data Science chops before we would even entertain such an affirmation – especially in public.

And this is where my Data Science Portfolio enters the picture – in part, allowing me to self-validate, to legitimize whether or not I can call myself a Data Scientist in public without the laughing, choking or winking. What’s interesting though is that in order to work through Goins’ algorithm, engaging in active curation of a Data Science portfolio is causing me to work backwards – making use of hindsight to validate that I have ‘arrived’ as a Data Scientist:

  • Become – Whereas I don’t have best sellers or even a high-traffic blog site to draw upon, I have been able to assemble a variety of relevant artifacts into a Portfolio. Included in the Portfolio are peer-reviewed articles that have appeared in published journals with respectable impact factors. This, for a Data Scientist, is arguably a most-stringent validation of an original contribution to the field. However, chapters in books, presentations at academic and industry events, and so on, also serve as valuable demonstrations of having become a Data Scientist. Though it doesn’t apply to me (yet?), the contribution of code would also serve as a resounding example – with frameworks such as Apache Hadoop, Apache Spark, PyTorch, and TensorFlow serving as canonical and compelling examples.
  • Behave – Not since the time I was a graduate student have I been able to show up every day. However, recognizing the importance of deliberate practice, there have been extended periods during which I have shown up every day (even if only for 15 minutes) to advance some Data Science project. In my own case, this was most often the consequence of holding down a full-time job at the same time – though in some cases, as is evident in the Portfolio, I have been able to work on such projects as a part of my job. Such win-win propositions can be especially advantageous for the aspiring Data Scientist and the organization s/he represents.
  • Believe – Perhaps the most important outcome of engaging in the deliberate act of putting together my Data Science Portfolio, is that I’m already in a much more informed position, and able to make a serious ‘gut check’ on whether or not I can legitimately declare myself a Data Scientist right here and right now.

The seemingly self-indulgent pursuit of developing my own Data Science Portfolio, an engagement of active self-curation, has (quite honestly) both surprised and delighted me; I clearly have been directly involved in the production of a number of artifacts that can be used to legitimately represent myself as ‘active’ in the area of Data Science. The part-time nature of this pursuit, especially since the completion of grad school (though with a few notable exceptions), has produced a number of outcomes that can be diplomatically described as works (still) in progress … and in some cases, that is unfortunate.

Net-net, there is some evidence to support a self-declaration as a Data Scientist – based upon artifacts produced, and implied (though inconsistent) behaviors. However, when asked the question “What do you do?”, I am more likely to respond that:

I am a demonstrably engaged and passionate student of Data Science – an aspiring Data Scientist, per se … one who’s actively working on becoming, behaving and ultimately believing he’s a Data Scientist.

Based on my biases, that’s what I currently feel owing to the very nature of Data Science itself.