Developing Your Expertise in Machine Learning: Podcasts for Breadth vs. Depth

From ad hoc to highly professional, there’s no shortage of resources when it comes to learning Machine Learning. Not only should podcasts be blatantly regarded as both viable and valuable resources, the two I cover in this post present opportunities for improving your breadth and/or depth in Machine Learning.

Machine Learning Guide

As a component of his own process for ramping up his knowledge and skills in the area of Machine Learning, OCDevel’s Tyler Renelle has developed an impressive resource of some 30 podcasts. Through this collection of episodes, Tyler’s is primarily a breadth play when it comes to the matter of learning Machine Learning, though he alludes to depth as well in how he positions his podcasts:

Where your other resources provide the machine learning trees, I provide the forest. Consider me your syllabus. At the end of every episode I provide high-quality curated resources for learning each episode’s details.

As I expect you’ll agree, with Tyler’s Guide, the purely audio medium of podcasting permits the breadth of Machine Learning to be communicated extremely effectively; in his own words, Tyler states:

Audio may seem inferior, but it’s a great supplement during exercise/commute/chores.

I couldn’t agree more. Even from the earliest of those episodes in this series, Tyler demonstrates the viability and value of this medium. In my opinion, he is particularly effective for at least three reasons:

  1. Repetition – Extremely important in any learning process, regardless of the medium, repetition is critical when podcasting is employed as a tool for learning.
  2. Analogies – Again, useful in learning regardless of the medium involved, yet extremely so in the case of podcasting. Imagine effective, simple, highly visual and sometimes fun analogies being introduced to explain, for example, a particular algorithm for Machine Learning.
  3. Enthusiasm – Perhaps a no-brainer, but enthusiasm serves to captivate interest and motivate action.

As someone who’s listened to each and every one of those 30 or so episodes, I can state with some assuredness that: We are truly fortunate that Tyler has expended the extra effort to share what he has learned in the hope that it’ll also help others. The quality of the Guide is excellent. If anything, I recall occasionally taking exception to some of the mathematical details related by Tyler. Because Tyler approaches this Guide from the perspective of an experienced developer, lapses mathematical in nature are extremely minor, and certainly do not detract from the overall value of the podcast.

After sharing his Guide, Tyler started up Machine Learning Applied:

an exclusive podcast series on practical/applied tech side of the same. Smaller, more frequent episodes.

Unfortunately, with only six episodes starting from May 2018, and none since mid-July, this more-applied series hasn’t yet achieved the stature of its predecessor. I share this more as a statement of fact than criticism, as sustaining the momentum to deliver such involved content on a regular cadence is not achieved without considerable effort – and, let’s be realistic, more than just a promise of monetization.

This Week in Machine Learning and AI

Whereas OCDevel’s Guide manifests itself as a one-person, breadth play, This Week in Machine Learning and AI (TWiML&AI) exploits the interview format in probing for depth. Built upon the seemingly tireless efforts of knowledgeable and skilled interviewer Sam Charrington, TWiML&AI podcasts allow those at the forefront of Machine Learning to share the details of their work – whether that translates to their R&D projects, business ventures or some combination thereof.

Like Tyler Renelle, Sam has a welcoming and nurturing style that allows him to ensure his guests are audience-centric in their responses – even if that means an episode is tagged with a ‘geek alert’ for those conversations that include mathematical details, for example. As someone who engages in original research in Machine Learning, I have learned a lot from TWiML&AI. Specifically, after listening to a number of episodes, I’ve followed up on show notes by delving a little deeper into something that sounded interesting; and on more than a few occasions, I’ve unearthed something of value for those projects I’m working on. Though Sam has interviewed some of the most well known in this rapidly evolving field, it is truly wonderful that TWiML&AI serves as an equal-opportunity platform – a platform that allows voices that might otherwise be marginalized to also be heard.

At this point, Sam and his team at TWIML&AI have developed a community around the podcast. The opportunity for deeper interaction exists through meetups, for example – meetups that have ranged from focused discussion on a particularly impactful research paper, to a facilitated study group in support of a course. In addition to all of this online activity, Sam and his team participate actively in a plethora of events, and have even been known to host events in person as well.

One last thought regarding TWiML&AI: The team here takes significant effort to ensure that each of the 185 episodes (and counting!) is well documented. While this is extremely useful, I urge you not to merely make your decision on what to listen to based upon teasers and notes alone. Stated differently, I can relate countless examples for which I perceived a very low level of interest prior to actually listening to an episode, only to be both surprised and delighted when I did. As I recall well my from my running days, run for that first kilometre or so (0.6214 of a mile 😉 ) before you make the decision as to how far you’ll run that day.

From the understandably predictable essentials of breadth, to the sometimes surprising and delightful details of depth, these two podcasts well illustrate the complementarity between the schools of breadth and depth. Based upon my experience, you’ll be well served by taking in both of these podcasts – whether you need to jumpstart or engage-in-continuous learning. Have a listen.

Pencasting with a Wacom tablet: Time to revisit this option

Around the start of the Fall term in September 2014, I found myself in a bit of a bind: My level of frustration with Livescribe pencasting had peaked, and was I desperately seeking alternatives. To be clear, it was changes to the Livescribe platform that were the source of this frustration, rather than pencasting as a means for visual communication. In fact, if anything, a positive aspect of the Livescribe experience was that I was indeed SOLD on pencasting as an extremely effective means for communicating visually – an approach that delivered significant value in instructional settings such as the large classes I was teaching at the university level.

In an attempt to make use of an alternative to the Livescribe platform then, I discovered and acquired a small Wacom tablet. Whereas I rapidly became proficient in use of the Livescribe Echo smartpen, because it was truly like making use of a regular pen, my own learning curve with the Wacom solution was considerably steeper.

To be concrete, you can view on Youtube a relatively early attempt. As one viewer commented:

Probably should practice the lecture. Too many pauses um er ah.

Honestly, that was more a reflection of my grasp of the Wacom platform than my expertise with the content I was attempting to convey through this real-time screen capture. In other words, my comfort level with this technology was so low that I was distracted by it. Given that many, many thousands of visual (art) professionals make use of this or similar solutions from Wacom, I’m more that willing to admit that this one was ‘on me’ – I wasn’t ‘a natural’.

With the Wacom solution, you need to train your eyes to be fixed on your screen, while your hand writes/draws/etc. on the tablet. Not exactly known for my hand-eye coordination in general, it’s evident that I struggled with this technology. As I look at the results some four years later, I’m not quite as dismayed as I expected to be. My penmanship isn’t all that bad – even though I still find writing and drawing with this tablet to be a taxing exercise in humility. In hindsight, I’m also fairly pleased with the Wacom tablet’s ability to permit use of colour, as well as lines of different thicknesses. This flexibility, completely out of scope in the solution from Livescribe, introduces a whole next level of prospects for visual communication.

Knowing that others have mastered the Wacom platform, and having some personal indication of its potential to produce useful results, I’m left with the idea of giving this approach another try – soon. I’ll let you know how it goes.

Livescribe Pencasting: Seizing Uncertainty from Success

Echo’es of a Glorified Past

[Optional musical accompaniment: From their album Meddle, Pink Floyd’s Echoes (via their Youtube channel).]

I first learned about pencasting from an elementary-school teacher at a regional-networking summit in 2011.

It took me more than a year to acquire the technology and start experimenting. My initial experiences, in making use of this technology in a large, first-year course at the university level, were extremely encouraging; and after only a week’s worth of experimentation, my ‘findings’ were summarized as follows:

  • Decent-quality pencasts can be produced with minimal effort
  • Pencasts compliment other instructional media
  • Pencasts allow the teacher to address both content and skills-oriented objectives
  • Pencasts are learner-centric
  • I’m striving for authentic, not perfect pencasts

The details that support each of these findings are provided in my April 2012 blog post. With respect to test driving the pencasts in a large-lecture venue, I subsequently shared:

  • The visual aspects of the pencast are quite acceptable
  • The audio quality of the pencasts is very good to excellent
  • One-to-many live streaming of pencasts works well
  • Personal pencasting works well

Again, please refer to my original post in October 2012 for the details.

Over the next year or so, I must’ve developed of order 20 pencasts using the Livescribe Echo smartpen – please see below for a sample. Given that the shortest of these pencasts ran 15-20 minutes, my overall uptake and investment in the technology was significant. I unexpectedly became ‘an advocate for the medium’, as I shared my pencasts with students in the courses I was teaching, colleagues who were also instructing in universities and high schools, plus textbook publishers. At one point, I even had interest from both the publisher and an author of the textbook I was using in my weather and climate class to develop a few pencasts – pencasts that would subsequently be made available as instructional media to any other instructor who was making use of this same textbook.

[Sample pencast: Download a mathematical example – namely, Hydrostatic Equation – Usage Example – 2013-06-15T12-10-06-0. Then, use the Livescribe player, desktop, iOS or Android app to view/listen.]

The Slings and Arrows of Modernization

Unfortunately, all of this changed in the Summer of 2015. Anticipating the impending demise of Adobe Flash technology, in what was marketed as a ‘modernization’ effort, Livescribe rejected this one-time staple in favour of their own proprietary appropriation of the Adobe PDF. Along with the shift to the Livescribe-proprietary format for pencasts then, was an implicit requirement to make use of browser, desktop or mobile apps from this sole-source vendor. As if these changes weren’t enough, Livescribe then proceeded to close its online community – the vehicle through which many of us were sharing our pencasts with our students, colleagues, etc. My frustration was clearly evident in a comment posted to Livescribe’s blog in September 2014:

This may be the tipping point for me and Livescribe products – despite my investment in your products and in pencast development … I’ve been using virtual machines on my Linux systems to run Windows so that I can use your desktop app. The pay off for this inconvenience was being able to share pencasts via the platform-neutral Web. Your direction appears to introduce complexities that translate to diminishing returns from my increasingly marginalized Linux/Android perspective …

From the vantage point of hindsight in 2018, and owing to the ongoing realization of the demise of Flash, I fully appreciate that Livescribe had to do something about the format they employed to encode their pencasts; and, given that there aren’t any open standards available (are there?), they needed to develop their own, proprietary format. What remains unfortunate, however, is the implicit need to make use of their proprietary software to actually view and listen to the pencasts. As far as I can tell, their browser-based viewer still doesn’t work on popular Linux-based platforms (e.g., Ubuntu), while you’ll need to have a Microsoft Windows or Apple Mac OS X based platform to make use of their desktop application. Arguably, the most-positive outcome from ‘all of this’ is that their apps for iOS and Android devices are quite good. (Of course, it took them some time before the Android app followed the release of the iOS app.)

Formats aside, the company’s decision to close its community still, from the vantage point of 2018, strikes me as a strategic blunder of epic proportions. (Who turns their back on their community and expects to survive?) Perhaps they (Livescribe) didn’t want to be in the community-hosting business themselves. And while I can appreciate and respect that position, alternatives were available at the time, and abound today.

Pencasting Complexified

[Full disclosure: I neither own, nor have I used the Livescribe 3 smartpen alluded to in the following paragraph. In other words, this is my hands-off take on the smartpen. I will happily address factual errors.]

At one point, and in my opinion, the simplicity of the Livescribe Echo smartpen was its greatest attribute. As a content producer, all I needed was the pen and one of Livescribe’s proprietary notebooks, plus a quiet place in which to record my pencasts. Subsequent innovation from the company resulted in the Livescribe 3 smartpen. Though it may well be designed “… to work and write like a premium ballpoint pen …”, the complexity introduced now requires the content producer to have the pen, the notebook, a bluetooth headset plus an iOS or Android device to capture pencasts. In this case, there is a serious price to be paid for modernization – both figuratively and literally.

According to Wikipedia, the Livescribe 3 smartpen was introduced in November 2013. And despite the acquisition by Anoto about two-years later, innovation appears to have ceased. So much for first-mover advantage, and Jim Marggraff’s enviable track record of innovation.

My need to pencast remains strong – even in 2018. If you’ve read this far, I’m sure you’ll understand why I might be more than slightly reluctant to fork out the cash for a Livescribe 3 smartpen. There may be alternatives, however; and I do expect that future posts may share my findings, lessons learned, best practices, etc.

Feel free to weigh in on this post via the comments – especially if you have alternatives to suggest. Please note: Support for Linux highly desirable.

Demonstrating Your Machine Learning Expertise: Optimizing Breadth vs. Depth

Developing Expertise

When it comes to developing your expertise in Machine Learning, there seem to be two schools of thought:

  • Exemplified by articles that purport to have listed, for example, the 10-most important methods you need to know to ace a Machine Learning interview, the School of Breadth emphasizes content-oriented objectives. By amping up with courses/workshops to programs (e.g., certificates, degrees) then, the justification for broadening your knowledge of Machine Learning is self-evident.
  • Find data that interests you, and work with it using a single approach for Machine Learning. Thus the School of Depth emphasizes skills-oriented objectives that are progressively mastered as you delve into data, or better yet, a problem of interest.

Depending upon whichever factors you currently have under consideration then (e.g., career stage, employment status, desired employment trajectory, …), breadth versus depth may result in an existential crisis when it comes to developing and ultimately demonstrating your expertise in Machine Learning – with a modicum of apologies if that strikes you as a tad melodramatic.

Demonstrating Expertise

Somewhat conflicted, at least, is in all honesty how I feel at the moment myself.

On Breadth

Even a rapid perusal of the Machine Learning specific artifacts I’ve self-curated into my online, multimedia Data Science Portfolio makes one thing glaringly evident: The breadth of my exposure to Machine Learning has been somewhat limited. Specifically, I have direct experience with classification and Natural Language Processing in Machine Learning contexts from the practitioner’s perspective. The more-astute reviewer, however, might look beyond the ‘pure ML’ sections of my portfolio and afford me additional merit for (say) my mathematical and/or physical sciences background, plus my exposure to concepts directly or indirectly applicable to Machine Learning – e.g., my experience as a scientist with least-squares modeling counting as exposure at a conceptual level to regression (just to keep this focused on breadth, for the moment).

True confession: I’ve started more than one course in Machine Learning in a blunt-instrument attempt to address this known gap in my knowledge of relevant methods. Started is, unfortunately, the operative word, as (thus far) any attempt I’ve made has not been followed through – even when there are options for community, accountability, etc. to better-ensure success. (Though ‘life got in the way’ of me participating fully in the fast.ai study group facilitated by the wonderful team that delivers the This Week in Machine Learning & AI Podcast, such approaches to learning Machine Learning are appealing in principle – even though my own engagement was grossly inconsistent.)

On Depth

What then about depth? Taking the self-serving but increasingly concrete example of my own Portfolio, it’s clear that (at times) I’ve demonstrated depth. Driven by an interesting problem aimed at improving tsunami alerting by processing data extracted from Twitter, for example, the deepening progression with co-author Jim Freemantle has been as follows:

  1. Attempt to apply existing knowledge-representation framework to the problem by extending it (the framework) to include graph analytics
  2. Introduce tweet classification via Machine Learning
  3. Address the absence of semantics in the classification-based approach through the introduction of Natural Language Processing (NLP) in general, and embedded word vectors in particular
  4. Next steps …

(Again, please refer to my Portfolio for content relating to this use case.) Going deeper, in this case, is not a demonstration of a linear progression; rather, it is a sequence of outcomes realized through experimentation, collaboration, consultation, etc. For example, the seed to introduce Machine Learning into this tsunami-alerting initiative was planted on the basis of informal discussions at an oil and gas conference … and later, the introduction of embedded word vectors, was similarly the outcome of informal discussions at a GPU technology conference.

Whereas these latter examples are intended primarily to demonstrate the School of Depth, it is clear that the two schools of thought aren’t mutually exclusive. For example, in delving into a problem of interest Jim and I may have deepened our mastery of specific skills within NLP, however we have also broadened our knowledge within this important subdomain of Machine Learning.

One last thought here on depth. At the outset, neither Jim nor I had as an objective any innate desire to explore NLP. Rather, the problem, and more importantly the demands of the problem, caused us to ‘gravitate’ towards NLP. In other words, we are wedded more to making scientific progress (on tsunami alerting) than a specific method for Machine Learning (e.g., NLP).

Next Steps

Net-net then, it appears to be that which motivates us that dominates in practice – in spite, perhaps, of our best intentions. In my own case, my existential crisis derives from being driven by problems into depth, while at the same time seeking to demonstrate a broader portfolio of expertise with Machine Learning. To be more specific, there’s a part of me that wants to apply LSTMs (foe example) to the tsunami-alerting use case, whereas another part knows I must broaden (at least a little!) my portfolio when it comes to methods applicable to Machine Learning.

Finally then, how do I plan to address this crisis? For me, it’ll likely manifest itself as a two-pronged approach:

  1. Enrol and follow through on a course (at least!) that exposes me to one or more methods of Machine Learning that compliments my existing exposure to classification and NLP.
  2. Identify a problem, or problems of interest, that allow me to deepen my mastery of one or more of these ‘newly introduced’ methods of Machine Learning.

In a perfect situation, perhaps we’d emphasize breadth and depth. However, when you’re attempting to introduce, pivot, re-position, etc. yourself, a trade off between breadth versus depth appears to be inevitable. An introspective reflection, based upon the substance of a self-curated portfolio, appears to be an effective and efficient means for roadmapping how gaps can be identified and ultimately addressed.

Postscript

In many settings/environments, Machine Learning and Data Science in general, are team sports. Clearly then, a viable way to address the challenges and opportunities presented by depth versus breadth is to hire accordingly – i.e., hire for depth and breadth in your organization.

Revisiting the Estimation of Fractal Dimension for Image Classification

Classification is a well-established use case for Machine Learning. Though textbook examples abound, standard examples include the classification of email into ham versus spam, or images of cats versus dogs.

Circa 1994, I was unaware of Machine Learning, but I did have a use case for quantitative image classification. I expect you’re familiar with those brave souls known as The Hurricane Hunters – brave because they explicitly seek to locate the eyes of hurricanes using an appropriately tricked out, military-grade aircraft. Well, these hunters aren’t the only brave souls when it comes to chasing down storms in the pursuit of atmospheric science. In an effort to better understand Atlantic storms (i.e., East Coast, North America), a few observational campaigns featured aircraft flying through blizzards at various times during Canadian winters.

In addition to standard instrumentation for atmospheric and navigational observables, these planes were tricked out in an exceptional way:

For about two-and-a-half decades, Knollenberg-type [ref 4] optical array probes have been used to render in-situ digital images of hydrometeors. Such hydrometeors are represented as a two-dimensional matrix, whose individual elements depend on the intensity of transmitted light, as these hydrometeors pass across a linear optical array of photodiodes. [ref 5]

In other words, the planes were equipped with underwing optical sensors that had the capacity to obtain in-flight images of

hydrometeor type, e.g. plates, stellar crystals, columns, spatial dendrites, capped columns, graupel, and raindrops. [refs 1,7]

(Please see the original paper for the references alluded to here.)

Even though this is hardly a problem in Big Data, a single flight might produce tens to hundreds to thousands of hydrometeor images that needed to be manually classified by atmospheric scientists. Working for a boutique consultancy focused on atmospheric science, and having excellent relationships with Environment Canada scientists who make Cloud Physics their express passion, an opportunity to automate the classification of hydrometeors presented itself.

Around this same time, I became aware of fractal geometrya visually arresting and quantitative description of nature popularized by proponents such as Benoit Mandlebrot. Whereas simple objects (e.g., lines, planes, cubes) can be associated with an integer dimension (e.g., 1, 2 and 3, respectively), objects in nature (e.g., a coastline, a cloud outline) can be better characterized by a fractional dimension – a real-valued fractal dimension that lies between the integer value for a line (i.e., 1) and the two-dimensional (i.e., 2) value for a plane.

Armed with an approach for estimating fractal dimension then, my colleagues and I sought to classify hydrometeors based on their subtle to significant geometrical expressions. Although the idea was appealing in principle, the outcome on a per-hydrometeor basis was a single, scalar result that attempted to capture geometrical uniqueness. In isolation, this approach was simply not enough to deliver an automated scheme for quantitatively classifying hydrometeors.

I well recall some of the friendly conversations I had with my scientific and engineering peers who attended the conference at Montreal’s Ecole Polytechnique. Essentially, the advice I was given, was to regard the work I’d done as a single dimension of the hydrometeor classification problem. What I really needed to do was develop additional dimensions for classifying hydrometeors. With enough dimensions then, the resulting multidimensional classification scheme would be likely to have a much-better chance of delivering the automated solution sought by the atmospheric scientists.

In my research, fractal dimensions were estimated using various algorithms; they were not learned. However, they could be – as is clear from the efforts of others (e.g., the prediction of fractal dimension via Machine Learning). And though my pursuit of such a suggestion will have to wait for a subsequent research effort, a learned approach might allow for the introduction of much more of a multidimensional scheme for quantitative classification of hydrometeors via Machine Learning. Of course, from the hindsight of 2018, there are a number possibilities for quantitative classification via Machine Learning – possibilities that I fully expect would result in more useful outcomes.

Whereas fractals don’t receive as much attention these days as they once did, and certainly not anything close to the deserved hype that seems to pervade most discussions of Machine Learning, there may still be some value in incorporating their ability to quantify geometry into algorithms for Machine Learning. From a very different perspective, it might be interesting to see if the architecture of deep neural networks can be characterized through an estimation of their fractal dimension – if only to tease out geometrical similarities that might be otherwise completely obscured.

While I, or (hopefully) others, ponder such thoughts, there is no denying the stunning expression of the fractal geometry of nature that fractals have rendered visual.

Prob & Stats Gaps: Sprinting for Closure

Prob & Stats Gap

When it comes to the mathematical underpinnings for Deep Learning, I’m extremely passionate. In fact, my perspective can be summarized succinctly:

Deep Learning – Deep Math = Deep Gap.

In reflecting upon my own mathematical credentials for Deep Learning, when it came to probability and statistics, I previously stated:

Through a number of courses in Time Series Analysis (TSA), my background affords me an appreciation for prob & stats. In other words, I have enough context to appreciate this need, and through use of quality, targeted resources (e.g., Goodfellow et al.’s textbook), I can close out the gaps sufficiently – in my case, for example, Bayes’ Rule and information theory.

Teaching to Learn

DSC02681Although I can certainly leverage quality, targeted resources, I wanted to share here a complementary approach. One reason for doing this is that resources such as Goodfellow et al.’s textbook may not be readily accessible to everyone – in other words, some homework is required before some of us are ready to crack open this excellent resource, and make sense of the prob & stats summary provided there.

So, in the spirit of progressing towards being able to leverage appropriate references such as Goodfellow et al.’s textbook, please allow me to share here a much-more pragmatic suggestion:

Tutor a few high school students in prob & stats to learn prob & stats.

Just in case the basic premise of this suggestion isn’t evident, it is: By committing to teaching prob & stats, you must be able to understand prob & stats. And as an added bonus, this commitment of tutoring each of a few students (say) once a week, establishes and reinforces a habit – a habit that is quite likely, in this case, to ensure you stick with your objective to broaden and deepen your knowledge/skills when it comes to probability and statistics.

As an added bonus, this is a service for which you could charge a fee – full rate for tutoring math at the high-school level to gratis, depending upon the value you’ll be able to offer your students … of course, a rate you could adjust over time, as your expertise with prob & stats develops.

Agile Sprints

Over recent years, I’ve found it particularly useful to frame initiatives such as this one in the form of Agile Sprints – an approach I’ve adopted and adapted from the pioneering efforts of J D Meier. To try this for yourself, I suggest the following two-step procedure:

  1. Review JD’s blog post on sprints – there’s also an earlier post of his that is both useful and relevant.
  2. Apply the annotated template I’ve prepared here to a sprint of your choosing. Because the sample template I’ve shared is specific to the prob & stats example I’ve been focused on in this post, I’ve also included a blank version of the sprint template here.

4DXit

Before you go, there’s one final point I’d like to draw your attention to – and that’s lead and lag measures. Whereas lag measures focus on your (wildly) important goal (WIG), lead measures emphasize those behaviors that’ll get you there. To draw from the example I shared for addressing a math gap in prob & stats, the lag measure is:

MUST have enhanced my knowledge/skills in the area of prob & stats such that I am better prepared to review Deep Learning staples such as Goodfellow et al.’s textbook

In contrast, examples of lead measures are each of the following:

SHOULD have sought tutoring positions with local and/or online services

COULD have acquired the textbook relevant for high-school level prob & stats

With appropriately crafted lead measures then, the likelihood that your WIG will be achieved is significantly enhanced. Kudos to Cal Newport for emphasizing the importance of acting on lead measures in his Deep Work book. For all four disciplines of execution, you can have a closer look at Newport’s book, or go to the 4DX source – the book or by simply Googling for resources on “the 4 disciplines of execution”.

Of course, the approach described here can be applied to much more than a gap in your knowledge/skills of prob & stats. And as I continue the process of self-curating my Data Science Portfolio, I expect to unearth additional challenges and opportunities – challenges and opportunities that can be well approached through 4DX’d Agile Sprints.

Data Scientist: Believe. Behave. Become.

A Litmus Test

When do you legitimately get to call yourself a Data Scientist?

How about a litmus test? You’re at a gathering of some type, and someone asks you:

So, what do you do?

At which point can you (or me, or anyone) respond with confidence:

I’m a Data Scientist.

I think the responding-with-confidence part is key here for any of us with a modicum of humility, education, experience, etc. I don’t know about you, but I’m certainly not interested in this declaration being greeted by judgmental guffaws, coughing spasms, involuntary eye motion, etc. Instead of all this overt ‘body language’, I’m sure we’d all prefer to receive an inquiring response along the lines of:

Oh, just what the [expletive deleted] is that?

Or, at least:

Dude, seriously, did you like, just make that up?

Responses to this very-legitimate, potentially disarming question, will need to be saved for another time – though I’m sure a quick Google search will reveal a just-what-the-[expletive deleted]-is-Data-Scientist elevator pitch.

To return to the question intended for this post however, let’s focus for a moment on how a best-selling author ‘became’ a writer.

“I’m a Writer”

I was recently listening to best-selling author Jeff Goins being interviewed by podcast host Srini Rao on an episode of the Unmistakable Creative. Although the entire episode (and the podcast in general, frankly) is well worth the listen, my purpose here is to extract the discussion relating to Goins’ own process of becoming a writer. In this episode of the podcast, Goins recalls the moment when he believed he was a writer. He then set about behaving as a writer – essentially, the hard work of showing up every single day just to write. Goins continues by explaining how based upon his belief (“I am writer”) and his behavior (i.e., the practice of writing on a daily basis), he ultimately realized his belief through his actions (behavior) and became a writer. With five, best selling books to his credit, plus a high-traffic-blog property, and I’m sure much more, it’s difficult now to dispute Goins’ claim of being a writer.

Believe. Behave. Become. Sounds like a simple enough algorithm, so in the final section of this post, I’ll apply it to the question posed at the outset – namely:

When do you legitimately get to call yourself a Data Scientist?

I’m a Data Scientist?

I suppose, then, that by direct application of Goins’ algorithm, you can start the process merely by believing you’re a Data Scientist. Of course, I think we all know that that’ll only get you so far, and probably not even to a first interview. More likely, I think that most would agree that we need to have some Data Science chops before we would even entertain such an affirmation – especially in public.

And this is where my Data Science Portfolio enters the picture – in part, allowing me to self-validate, to legitimize whether or not I can call myself a Data Scientist in public without the laughing, choking or winking. What’s interesting though is that in order to work through Goins’ algorithm, engaging in active curation of a Data Science portfolio is causing me to work backwards – making use of hindsight to validate that I have ‘arrived’ as a Data Scientist:

  • Become – Whereas I don’t have best sellers or even a high-traffic blog site to draw upon, I have been able to assemble a variety of relevant artifacts into a Portfolio. Included in the Portfolio are peer-reviewed articles that have appeared in published journals with respectable impact factors. This, for a Data Scientist, is arguably a most-stringent validation of an original contribution to the field. However, chapters in books, presentations at academic and industry events, and so on, also serve as valuable demonstrations of having become a Data Scientist. Though it doesn’t apply to me (yet?), the contribution of code would also serve as a resounding example – with frameworks such as Apache Hadoop, Apache Spark, PyTorch, and TensorFlow serving as canonical and compelling examples.
  • Behave – Not since the time I was a graduate student have I been able to show up every day. However, recognizing the importance of deliberate practice, there have been extended periods during which I have shown up every day (even if only for 15 minutes) to advance some Data Science project. In my own case, this was most often the consequence of holding down a full-time job at the same time – though in some cases, as is evident in the Portfolio, I have been able to work on such projects as a part of my job. Such win-win propositions can be especially advantageous for the aspiring Data Scientist and the organization s/he represents.
  • Believe – Perhaps the most important outcome of engaging in the deliberate act of putting together my Data Science Portfolio, is that I’m already in a much more informed position, and able to make a serious ‘gut check’ on whether or not I can legitimately declare myself a Data Scientist right here and right now.

The seemingly self-indulgent pursuit of developing my own Data Science Portfolio, an engagement of active self-curation, has (quite honestly) both surprised and delighted me; I clearly have been directly involved in the production of a number of artifacts that can be used to legitimately represent myself as ‘active’ in the area of Data Science. The part-time nature of this pursuit, especially since the completion of grad school (though with a few notable exceptions), has produced a number of outcomes that can be diplomatically described as works (still) in progress … and in some cases, that is unfortunate.

Net-net, there is some evidence to support a self-declaration as a Data Scientist – based upon artifacts produced, and implied (though inconsistent) behaviors. However, when asked the question “What do you do?”, I am more likely to respond that:

I am a demonstrably engaged and passionate student of Data Science – an aspiring Data Scientist, per se … one who’s actively working on becoming, behaving and ultimately believing he’s a Data Scientist.

Based on my biases, that’s what I currently feel owing to the very nature of Data Science itself.

Remembering a Supportive Sibling

Remembering a Supportive Sibling

Less than a week before I was scheduled to deliver my first presentation on a novel way for approaching an outstanding challenge in seismic processing, my younger sister Deborah passed away. She was only 50. Thanks to medical care that included extensive chemotherapy, Debbie recovered from lymphoma once, and was declared cancer free. However, a second wave of lymphoma accompanied by leukemia, proved to be more than she could handle – and we lost her during a procedure that (ironically) was attempting to provide more information about the cancers that had literally taken over her body.

Between Debbie’s passing and her funeral, was not only a about a week’s lapse of time, but the need for me to make a decision – a decision to present as scheduled at the 2015 Rice University Oil & Gas Conference in Houston or miss the event entirely. A complicating factor in my ability to make this decision was that I truly was the only person who could deliver it. That’s more a pragmatic statement than a boastful one, as I had combined my background in geophysics with an increasing knowledge of Big Data Analytics; in so doing, I’d arrived at a submission for the RiceU Conference that was as uniquely of my creation as it was a disruptive suggestion – in other words, something I felt strongly to be well suited to the Conference’s Disruptive Technology Track. With the Conference being less than a week away, most of the real work had already been completed; in other words, all I needed to do was show up, make a two-minute presentation, and discuss the poster I’d prepared with those who expressed an interest.

Debbie was always highly supportive and encouraging when it came to ‘things’ like this – the expression of something worth sharing. This, despite the fact that she and I were on completely different trajectories when it came to our intellectual interests and pursuits – me in the physical sciences and technology, while Debbie favoured English literature. Despite these differences, Debbie often made a point of trying to understand and appreciate what I was working on – no matter how geekily obscure.

In recalling these traits of hers, her sincere interest in what I was doing (I suppose) just because we were siblings, my decision to follow through with the presentation at the RiceU Conference was a relatively easy one. Executing it, however, was at times challenging … and I could not have followed through without the support of my colleagues from Bright Computing.

You can still review my two-minute presentation here thanks to the wonderful people who run this industry-leading event on an annual basis at Rice. The poster I alluded to is available here. The ideas hatched through these 2015 communications proved instrumental in spinning off additional contributions. Equally important, were those interactions initiated at this 2015 RiceU Conference. Some of these interactions resulted in relationships that persist through today – relationships that have, for example, resulted me applying Machine Learning to problems of scientific interest.

And so it is, on the occasion of what would’ve been Debbie’s 54th birthday, that I wistfully remember my sister. Without knowing that I’d have had her support and encouragement, I likely wouldn’t have followed through with that March 2015 presentation at the RiceU Conference – a decision that had immediate plus long-lasting implications to my progression as a Data Scientist.

How I Ended Up in Geophysical Fluid Dynamics

How I Ended Up in Geophysical Fluid Dynamics

Lately, I’ve been disclosing the various biases I bring to practicing and enabling Data Science. Motivated by my decision to (finally) self-curate an online, multimedia portfolio, I felt such biases to be material in providing the context that frames this effort. Elsewhere, I’ve shared my inherently scientific bias. In this post, I want to provide additional details. These details I’ve been able to extract verbatim from a blog post I wrote for Bright Computing in January 2015; once I’d settled on geophysics (see below), I aspired to be a seismologist … but, as you’ll soon find out, things didn’t pan out quite the way I’d expected:

I always wanted to be a seismologist.

Scratch that: I always wanted to be an astronaut. How could I help it? I grew up in suburban London (UK, not Ontario) watching James Burke cover the Apollo missions. (Guess I’m also revealing my age here!)

Although I never gave my childhood dream of becoming an astronaut more than a fleeting consideration, I did pursue a career in science.

As my high-school education drew to a close, I had my choices narrowed down to being an astronomer, geophysicist or a nuclear physicist. In grade 12 at Laurier Collegiate in Scarboro (Ontario, not UK … or elsewhere), I took an optional physics course that introduced me to astronomy and nuclear physics. And although I was taken by both subjects, and influenced by wonderful teachers, I dismissed both of these as areas of focus in university. As I recall, I had concerns that I wouldn’t be employable if I had a degree in astronomy, and I wasn’t ready to confront the ethical/moral/etc. dilemmas I expected would accompany a choice of nuclear physics. Go figure!

And so it was to geophysics I was drawn, again influenced significantly by courses in physical geography taught by a wonderful teacher at this same high school. My desire to be a seismologist persisted throughout my undergraduate degree at Montreal’s McGill Universitywhere I ultimately graduated with a B.Sc. in solid Earth geophysics. Armed with my McGill degree, I was in a position to make seismology a point of focus.

But I didn’t. Instead, at Toronto’s York University, I applied Geophysical Fluid Dynamics (GFD) to Earth’s deep interior – mostly Earth’s fluid outer core. Nothing superficial here (literally), as the core only begins some 3,000 km below where we stand on the surface!

Full disclosure: In graduate school, the emphasis was GFD. However, seismology crept in from time to time. For example, I made use of results from deep-Earth seismology in estimating the viscosity of Earth’s fluid outer core. Since this is such a deeply remote region of our planet, geophysicists need to content themselves with observations accessible via seismic and other methods.

From making use of Apache Spark to improve the performance of seismic processing (search for “Reverse-Time Seismic Migration” or “RTM” in my Portfolio), to the analysis of ‘seismic data’ extracted from Twitter (search for “Twitter”in my Portfolio), seismology has taken center stage in a number of my projects as a practitioner of Data Science. However, so has the geophysical fluid dynamics of Earth’s mantle and outer core. Clearly, you can have your geeky cake and eat it too!

Data Science: Identifying My Professional Bias

Data Science: Identifying My Professional Bias

In the Summer of 1984, I arrived at Toronto’s York University as a graduate student in Physics & Astronomy. (Although my grad programme was Physics & Astronomy, my research emphasized the application of fluid dynamics to Earth’s deep interior.) Some time after that, I ran my first non-interactive computation on a cluster of VAX computers. I’m not sure if this was my first exposure to Distributed Computing or not not; I am, however, fairly certain that this was the first time it (Distributed Computing) registered with me as something exceedingly cool, and exceedingly powerful.

Even back in those days, armed with nothing more than a VT100 terminal ultimately connected to a serial interface on one of the VAXes, I could be logged in and able to submit a computational job that might run on some other VAX participating in the cluster. The implied connectedness, the innate ability to make use of compute cycles on some ‘remote’ system was intellectually intoxicating – and I wasn’t even doing any parallel computing (yet)!

More than a decade later, while serving in a staff role as a computer coordinator, I became involved in procuring a modest supercomputer for those members of York’s Faculty of Pure & Applied Science who made High Performance Computing (HPC) a critical component of their research. If memory serves me correctly, this exercise resulted in the purchase of a NUMA-architecture system from SGI powered by MIPS CPUs. Though isolated initially, and as a component of the overall solution, Platform LSF was included to manage the computational workloads that would soon consume the resources of this SGI system.

The more I learned about Platform LSF, the more I was smitten by the promise and reality of Distributed Computing – a capability to be leveraged from a resource-centric perspective with this Load Sharing Facility (LSF). [Expletive deleted], Platform founder Songnian Zhou expressed the ramifications of his technical vision for this software as Utopia in a 1993 publication. Although buying the company wasn’t an option, I did manage to be hired by Platform, and work there in various roles for about seven-and-a-half years.

Between my time at Platform (now an IBM company) and much-more recently Univa, over a decade of my professional experience has been spent focused on managing workloads in Distributed Computing environments. From a small handful of VAXes, to core counts that have reached 7 figures, these environments have included clusters, grids and clouds.

My professional bias towards Distributed Computing was further enhanced through the experience of being employed by two software vendors who emphasized the management of clusters – namely Scali (Scali Manage) and subsequently Bright Computing (Bright Cluster Manager). Along with Univa (Project Tortuga and Navops Launch), Bright extended their reach to the management of HPC resources in various cloud configurations.

If it wasn’t for a technical role at Allinea (subsequently acquired by ARM), I might have ended up ‘stuck in the middle’ of the computational stack – as workload and cluster management is regarded by the HPC community (at least) as middleware … software that exists between the operating environment (i.e., the compute node and its operating system) and the toolchain (e.g., binaries, libraries) that ultimately support applications and end users (e.g., Figure 5 here).

Allinea’s focuses on tools to enable HPC developers. Although they were in the process of broadening their product portfolio to include a profiling capability around the time of my departure, in my tenure there the emphasis was on a debugger – a debugger capable of handling code targeted for (you guessed it) Distributed Computing environments.

Things always seemed so much bigger when we were children. Whereas Kid Ian was impressed by a three-node VAX cluster, and later ‘blown away’ by a modest NUMA-architecture ‘supercomputer’, Adult Ian had the express privilege of running Allinea DDT on some of the largest supercomputers on the planet (at the time) – tracking down a bug that only showed up when more than 20K cores were used in parallel on one of Argonne’s Blue Genes, and demonstrating scalable, parallel debugging during a tutorial on some 700K cores of NCSA’s Blue Waters supercomputer. In hindsight, I can’t help but feel humbled by this impressive capability of Allinea DDT to scale to these extremes. Because HPC’s appetite for scale has extended beyond tera and petascale capabilities, and is eyeing seriously the demand to perform at the exascale, software like Allinea DDT needs also to match this penchant for extremely extreme scale.

At this point, suffice it to say that scalable Distributed Computing has been firmly encoded into my professional DNA. As with my scientifically based academic bias, it’s difficult not to frame my predisposition towards Distributed Computing in a positive light within the current context of Data Science. Briefly, it’s a common experience for the transition from prototype-to-production to include the introduction of Distributed Computing – if not only to merely execute applications and/or their workflows on more powerful computers, but perhaps to simultaneously scale these in parallel.

I anticipate the need to return to this disclosure regarding the professional bias I bring to Data Science. For now though, calling out the highly influential impact Distributed Computing has had on my personal trajectory, appears warranted within the context of my Data Science Portfolio.