Disclosures Regarding My Portfolios: Attributing the Contributions of Others

‘Personal’ Achievement?

October 8, 2018 was an extremely memorable night for Drew Brees at the Mercedes-Benz Superdome in New Orleans. Under the intense scrutiny of Monday Night Football, the quarterback of the New Orleans Saints became the leading passer in the history of the National Football League. (For those not familiar with this sport, you can think of his 72,103-yard milestone as a lifetime-achievement accomplishment of ultramarathon’ic proportions.) The narrative on Brees’ contributions to ‘the game’ are anything but complete. In fact, the longer he plays, the more impressive this milestone becomes, as he continues to place distance between himself and every other NFL QB.

Of course the record books, and Brees’ inevitable induction into the Pro Football Hall of Fame, will all position this as an individual-achievement award. Whenever given the opportunity to reflect upon seemingly personal achievements such as the all-time passing leader, Brees is quick to acknowledge those who have enabled him to be so stunningly successful in such a high-profile, high-pressure role – from family and friends, to teammates, coaches, and more.

As I wrote about another NFL quarterback in a recent post, like Tom Brady, Brees remains a student-of-the-game. He is also known for his off-the-field work ethic that he practices with the utmost intensity in preparing for those moments when he takes the main stage along with his team. Therefore, when someone like Brees shares achievements with those around him, it’s clearly an act that is sincerely authentic.

Full Disclosure

At the very least, self-curating and sharing in public some collection of your work has more than the potential to come across as an act of blatant self-indulgence – and, of course, to some degree it is! At the very worst, however, is the potential for such an effort to come across as a purely individual contribution. Because contribution matters so much to me personally, I wanted to ensure that any portfolio I self-curate includes appropriate disclosures; disclosures that acknowledge the importance of collaboration, opportunity, support, and so on, from my family, friends and acquaintances, peers and co-workers, employers, customers and partners, sponsors, and more. In other words, and though in a very different context, like Brees I want to ensure that what comes across as ‘My Portfolio’ rightly acknowledges that this too is a team sport.

In the interests of generic disclosures then, the following is an attempt to ensure the efforts of others are known explicitly:

  • Articles, book chapters and posters – Based on authorships, affiliations and acknowledgements, portfolio artifacts such as articles, book chapters and posters make explicit collaborators, enablers and supporters/influencers, respectively. In this case, there’s almost no need for further disclosure.
  • Blog posts – Less formal than the written and oral forms of communication already alluded to above and below, it’s through the words themselves and/or hyperlinks introduced that the contributions of others are gratefully and willingly acknowledged. Fortunately, it is common practice for page-ranking algorithms to take into account the words and metadata that collectively comprise blog posts, and appropriately afford Web pages stronger rankings based upon these and other metrics.
  • Presentations – My intention here is to employ Presentations as a disclosure category for talks, webinars, workshops, courses, etc. – i.e., all kinds of oral communications that may or may not be recorded. With respect to this category, my experience is ‘varied’ – e.g., in not always allowing for full disclosure regarding collaborators, though less so regarding affiliations. Therefore, to make collaborators as well as supporters/influencers explicit, contribution attributions are typically included in the materials I’ve shared (e.g., the slides corresponding to my GTC17 presentation) and/or through the words I’ve spoken. Kudos are also warranted for the organizations I’ve represented in some of these cases as well, as it has been a byproduct of this representation that numerous opportunities have fallen into my lap – though often owing to a sponsorship fee, to be completely frank. Finally, sponsoring organizations are also deserving of recognition, as it is often their mandate (e.g., a lead-generation marketing program that requires a webinar, a call for papers/proposals) that inspires what ultimately manifests itself as some artifact in one of my portfolios; having been on the event-sponsor’s side more than a few times, I am only too well aware of the effort involved in creating the space for presentations … a contribution that cannot be ignored.

From explicit to vague, disclosures regarding contribution are clearly to barely evident. Regardless, for those portfolios shared via my personal blog (Data Science Portfolio and Cloud Computing Portfolio), suffice it to say that there were always others involved. I’ve done my best to make those contributions clear, however I’m sure that unintentional omissions, errors and/or (mis)representations exist. Given that these portfolios are intentionally positioned and executed as works-in-progress, I look forward to addressing matters as they arise.

Ian Lumb’s Cloud Computing Portfolio

When I first introduced it, it made sense to me (at the time, at least!) to divide my Data Science Portfolio into two parts; the latter part was “… intended to showcase those efforts that have enabled other Data Scientists” – in other words, my contributions as a Data Science Enabler.

As of today, most of what was originally placed in that latter part of my Data Science Portfolio has been transferred to a new portfolio – namely one that emphasizes Cloud computing. Thus my Cloud Computing Portfolio is a self-curated, online, multimedia effort intended to draw together into a cohesive whole my efforts in Cloud computing; specifically this new Portfolio is organized as follows:

  • Strictly Cloud – A compilation of contributions in which Cloud computing takes centerstage
  • Cloud-Related – A compilation of contributions drawn from clusters and grids to miscellany. Also drawn out in this section, however, are contributions relating to containerization.

As with my Data Science Portfolio, you’ll find in my Cloud Computing Portfolio everything from academic articles and book chapters, to blog posts, to webinars and conference presentations – in other words, this Portfolio also lives up to its multimedia billing!

Since this is intentionally a work-in-progress, like my Data Science Portfolio, feedback is always welcome as there will definitely be revisions applied !

Demonstrating Your Machine Learning Expertise: Optimizing Breadth vs. Depth

Developing Expertise

When it comes to developing your expertise in Machine Learning, there seem to be two schools of thought:

  • Exemplified by articles that purport to have listed, for example, the 10-most important methods you need to know to ace a Machine Learning interview, the School of Breadth emphasizes content-oriented objectives. By amping up with courses/workshops to programs (e.g., certificates, degrees) then, the justification for broadening your knowledge of Machine Learning is self-evident.
  • Find data that interests you, and work with it using a single approach for Machine Learning. Thus the School of Depth emphasizes skills-oriented objectives that are progressively mastered as you delve into data, or better yet, a problem of interest.

Depending upon whichever factors you currently have under consideration then (e.g., career stage, employment status, desired employment trajectory, …), breadth versus depth may result in an existential crisis when it comes to developing and ultimately demonstrating your expertise in Machine Learning – with a modicum of apologies if that strikes you as a tad melodramatic.

Demonstrating Expertise

Somewhat conflicted, at least, is in all honesty how I feel at the moment myself.

On Breadth

Even a rapid perusal of the Machine Learning specific artifacts I’ve self-curated into my online, multimedia Data Science Portfolio makes one thing glaringly evident: The breadth of my exposure to Machine Learning has been somewhat limited. Specifically, I have direct experience with classification and Natural Language Processing in Machine Learning contexts from the practitioner’s perspective. The more-astute reviewer, however, might look beyond the ‘pure ML’ sections of my portfolio and afford me additional merit for (say) my mathematical and/or physical sciences background, plus my exposure to concepts directly or indirectly applicable to Machine Learning – e.g., my experience as a scientist with least-squares modeling counting as exposure at a conceptual level to regression (just to keep this focused on breadth, for the moment).

True confession: I’ve started more than one course in Machine Learning in a blunt-instrument attempt to address this known gap in my knowledge of relevant methods. Started is, unfortunately, the operative word, as (thus far) any attempt I’ve made has not been followed through – even when there are options for community, accountability, etc. to better-ensure success. (Though ‘life got in the way’ of me participating fully in the fast.ai study group facilitated by the wonderful team that delivers the This Week in Machine Learning & AI Podcast, such approaches to learning Machine Learning are appealing in principle – even though my own engagement was grossly inconsistent.)

On Depth

What then about depth? Taking the self-serving but increasingly concrete example of my own Portfolio, it’s clear that (at times) I’ve demonstrated depth. Driven by an interesting problem aimed at improving tsunami alerting by processing data extracted from Twitter, for example, the deepening progression with co-author Jim Freemantle has been as follows:

  1. Attempt to apply existing knowledge-representation framework to the problem by extending it (the framework) to include graph analytics
  2. Introduce tweet classification via Machine Learning
  3. Address the absence of semantics in the classification-based approach through the introduction of Natural Language Processing (NLP) in general, and embedded word vectors in particular
  4. Next steps …

(Again, please refer to my Portfolio for content relating to this use case.) Going deeper, in this case, is not a demonstration of a linear progression; rather, it is a sequence of outcomes realized through experimentation, collaboration, consultation, etc. For example, the seed to introduce Machine Learning into this tsunami-alerting initiative was planted on the basis of informal discussions at an oil and gas conference … and later, the introduction of embedded word vectors, was similarly the outcome of informal discussions at a GPU technology conference.

Whereas these latter examples are intended primarily to demonstrate the School of Depth, it is clear that the two schools of thought aren’t mutually exclusive. For example, in delving into a problem of interest Jim and I may have deepened our mastery of specific skills within NLP, however we have also broadened our knowledge within this important subdomain of Machine Learning.

One last thought here on depth. At the outset, neither Jim nor I had as an objective any innate desire to explore NLP. Rather, the problem, and more importantly the demands of the problem, caused us to ‘gravitate’ towards NLP. In other words, we are wedded more to making scientific progress (on tsunami alerting) than a specific method for Machine Learning (e.g., NLP).

Next Steps

Net-net then, it appears to be that which motivates us that dominates in practice – in spite, perhaps, of our best intentions. In my own case, my existential crisis derives from being driven by problems into depth, while at the same time seeking to demonstrate a broader portfolio of expertise with Machine Learning. To be more specific, there’s a part of me that wants to apply LSTMs (foe example) to the tsunami-alerting use case, whereas another part knows I must broaden (at least a little!) my portfolio when it comes to methods applicable to Machine Learning.

Finally then, how do I plan to address this crisis? For me, it’ll likely manifest itself as a two-pronged approach:

  1. Enrol and follow through on a course (at least!) that exposes me to one or more methods of Machine Learning that compliments my existing exposure to classification and NLP.
  2. Identify a problem, or problems of interest, that allow me to deepen my mastery of one or more of these ‘newly introduced’ methods of Machine Learning.

In a perfect situation, perhaps we’d emphasize breadth and depth. However, when you’re attempting to introduce, pivot, re-position, etc. yourself, a trade off between breadth versus depth appears to be inevitable. An introspective reflection, based upon the substance of a self-curated portfolio, appears to be an effective and efficient means for roadmapping how gaps can be identified and ultimately addressed.

Postscript

In many settings/environments, Machine Learning and Data Science in general, are team sports. Clearly then, a viable way to address the challenges and opportunities presented by depth versus breadth is to hire accordingly – i.e., hire for depth and breadth in your organization.

Data Scientist: Believe. Behave. Become.

A Litmus Test

When do you legitimately get to call yourself a Data Scientist?

How about a litmus test? You’re at a gathering of some type, and someone asks you:

So, what do you do?

At which point can you (or me, or anyone) respond with confidence:

I’m a Data Scientist.

I think the responding-with-confidence part is key here for any of us with a modicum of humility, education, experience, etc. I don’t know about you, but I’m certainly not interested in this declaration being greeted by judgmental guffaws, coughing spasms, involuntary eye motion, etc. Instead of all this overt ‘body language’, I’m sure we’d all prefer to receive an inquiring response along the lines of:

Oh, just what the [expletive deleted] is that?

Or, at least:

Dude, seriously, did you like, just make that up?

Responses to this very-legitimate, potentially disarming question, will need to be saved for another time – though I’m sure a quick Google search will reveal a just-what-the-[expletive deleted]-is-Data-Scientist elevator pitch.

To return to the question intended for this post however, let’s focus for a moment on how a best-selling author ‘became’ a writer.

“I’m a Writer”

I was recently listening to best-selling author Jeff Goins being interviewed by podcast host Srini Rao on an episode of the Unmistakable Creative. Although the entire episode (and the podcast in general, frankly) is well worth the listen, my purpose here is to extract the discussion relating to Goins’ own process of becoming a writer. In this episode of the podcast, Goins recalls the moment when he believed he was a writer. He then set about behaving as a writer – essentially, the hard work of showing up every single day just to write. Goins continues by explaining how based upon his belief (“I am writer”) and his behavior (i.e., the practice of writing on a daily basis), he ultimately realized his belief through his actions (behavior) and became a writer. With five, best selling books to his credit, plus a high-traffic-blog property, and I’m sure much more, it’s difficult now to dispute Goins’ claim of being a writer.

Believe. Behave. Become. Sounds like a simple enough algorithm, so in the final section of this post, I’ll apply it to the question posed at the outset – namely:

When do you legitimately get to call yourself a Data Scientist?

I’m a Data Scientist?

I suppose, then, that by direct application of Goins’ algorithm, you can start the process merely by believing you’re a Data Scientist. Of course, I think we all know that that’ll only get you so far, and probably not even to a first interview. More likely, I think that most would agree that we need to have some Data Science chops before we would even entertain such an affirmation – especially in public.

And this is where my Data Science Portfolio enters the picture – in part, allowing me to self-validate, to legitimize whether or not I can call myself a Data Scientist in public without the laughing, choking or winking. What’s interesting though is that in order to work through Goins’ algorithm, engaging in active curation of a Data Science portfolio is causing me to work backwards – making use of hindsight to validate that I have ‘arrived’ as a Data Scientist:

  • Become – Whereas I don’t have best sellers or even a high-traffic blog site to draw upon, I have been able to assemble a variety of relevant artifacts into a Portfolio. Included in the Portfolio are peer-reviewed articles that have appeared in published journals with respectable impact factors. This, for a Data Scientist, is arguably a most-stringent validation of an original contribution to the field. However, chapters in books, presentations at academic and industry events, and so on, also serve as valuable demonstrations of having become a Data Scientist. Though it doesn’t apply to me (yet?), the contribution of code would also serve as a resounding example – with frameworks such as Apache Hadoop, Apache Spark, PyTorch, and TensorFlow serving as canonical and compelling examples.
  • Behave – Not since the time I was a graduate student have I been able to show up every day. However, recognizing the importance of deliberate practice, there have been extended periods during which I have shown up every day (even if only for 15 minutes) to advance some Data Science project. In my own case, this was most often the consequence of holding down a full-time job at the same time – though in some cases, as is evident in the Portfolio, I have been able to work on such projects as a part of my job. Such win-win propositions can be especially advantageous for the aspiring Data Scientist and the organization s/he represents.
  • Believe – Perhaps the most important outcome of engaging in the deliberate act of putting together my Data Science Portfolio, is that I’m already in a much more informed position, and able to make a serious ‘gut check’ on whether or not I can legitimately declare myself a Data Scientist right here and right now.

The seemingly self-indulgent pursuit of developing my own Data Science Portfolio, an engagement of active self-curation, has (quite honestly) both surprised and delighted me; I clearly have been directly involved in the production of a number of artifacts that can be used to legitimately represent myself as ‘active’ in the area of Data Science. The part-time nature of this pursuit, especially since the completion of grad school (though with a few notable exceptions), has produced a number of outcomes that can be diplomatically described as works (still) in progress … and in some cases, that is unfortunate.

Net-net, there is some evidence to support a self-declaration as a Data Scientist – based upon artifacts produced, and implied (though inconsistent) behaviors. However, when asked the question “What do you do?”, I am more likely to respond that:

I am a demonstrably engaged and passionate student of Data Science – an aspiring Data Scientist, per se … one who’s actively working on becoming, behaving and ultimately believing he’s a Data Scientist.

Based on my biases, that’s what I currently feel owing to the very nature of Data Science itself.

Data Science: Celebrating Academic Personal Bias

Data Science: Celebrating My Academic Bias

In a recent post, I introduced my Data Science Portfolio. After describing the high-level organization of the Portfolio, I noted:

At the end, and for now, there is a section on my academic background – a background that has shaped so much of those intersections between science and technology that have been captured in the preceding sections of the portfolio.

Even in this earliest of drafts, I knew that I was somewhat uncomfortable with a section dedicated to academics in my Portfolio. After all shouldn’t a portfolio place more emphasis on how my knowledge and skills, academic or otherwise, have been applied to produce some tangible artifact?

Upon further reflection, I currently believe what’s material in the context of a portfolio is some indication of the bias inherent in the resulting curated showcase of one’s work. Of course to some degree the works presented, and the curation process itself, will make self-evident such personal bias.

Whereas it may make sense for an artist not to overtly disclose any bias with respect to their craft, or a curated collection their work, I currently perceive absolutely no downside in sharing my personal bias – a bias that in my own case, I believe reflects only in positive ways on the Portfolio as well as the individual items included in it.

To this end, and in the spirit of such a positive self-disclosure, my personal bias reflects my formative years in science – a background to which I well recall significant contributions from high school, that were subsequently broadened and deepened as an undergraduate and then graduate student. Even more specifically in terms of personal bias was my emphasis on the physical sciences; a bias that remains active today.

As I’ve started to share, through such posts as the one on the mathematical credentials I bring to Data Science, my choice to pursue the physical sciences was an excellent one – even through the self-critical lens of personal hindsight. An excellent choice, but albeit a biased one.

The very nature of Data Science is such that each of us carries with us our own, wonderfully unique personal bias. As we necessarily collaborate in team, project and organizational settings, I believe it’s important to not only ensure each of us preserves their personal bias, but that we leverage this perspective as fully and appropriately as possible. As a consequence it is much more likely that everyone we work with, and everything we work on, will derive maximal value.

Ian Lumb’s Data Science Portfolio

Ian Lumb’s Data Science Portfolio

I had the very fortunate opportunity to present some of my research at GTC 2017 in Silicon Valley. Even after 3 months, I found GTC to be of lasting impact. However, my immediate response to the event was to reflect upon my mathematical credentials – credentials that would allow me to pursue Deep Learning with the increased breadth and depth demanded by my research project. I crystallized this quantitative reflection into a very simple question: Do I need to go back to school? (That is, back to school to enhance my mathematical credentials.)

There were a number of outcomes from this reflection upon my math creds for Deep Learning. Although the primary outcome was a mathematical ‘gap analysis’, a related outcome is this Data Science Portfolio that I’ve just started to develop. You see, after I reflected upon my mathematical credentials, it was difficult not to broaden and deepen that reflection; so, in a sense, this Data Science Portfolio is an outcome of that more-focused reflection.

As with the purely mathematical reflection, the effort I’m putting into self-curating my Data Science Portfolio allows me to showcase existing contributions (the easy part), but simultaneously raises interesting challenges and opportunities for future efforts (the difficult part). More on the future as it develops …

For now, the portfolio is organization into two broad categories:

  • Data Science Practitioner – intended to showcase my own contributions towards the practice of Data Science
  • Data Science Enabler – intended to showcase those efforts that have enabled other Data Scientists

At the end, and for now, there is a section on my academic background – a background that has shaped so much of those intersections between science and technology that have been captured in the preceding sections of the portfolio.

Although I expect there’ll be more to share as this portfolio develops, I did want to share one observation immediately: When placed in the context of a portfolio, immune to the chronological tyranny of time, it is fascinating to me to see themes that form an arc through seemingly unrelated efforts. One fine example is the matter of semantics. In representing knowledge, for example, semantics were critical to the models I built using self-expressive data (i.e., data successively encapsulated via XML, RDF and ultimately OWL). And then again, in processing data extracted from Twitter via Natural Language Processing (NLP), I’m continually faced with the challenge of ‘retaining’ a modicum of semantics in approaches based upon Machine Learning. I did not plan this thematic arc of semantics; it is therefore fascinating to see such themes exposed – exposed particularly well by the undertaking of portfolio curation.

There’s no shortage of Data Science portfolios to view. However one thing that’s certain, is that these portfolios are likely to be every bit as diverse and varied as Data Science itself, compounded by the uniqueness of the individuals involved. And that, of course, is a wonderful thing.

Thank you for taking the time to be a traveller at the outset of this journey with me. If you have any feedback whatsoever, please don’t hesitate to reach out via a comment and/or email to ian [DOT] lumb [AT] gmail [DOT] com. Bon voyage!