Data Scientist: Believe. Behave. Become.

A Litmus Test

When do you legitimately get to call yourself a Data Scientist?

How about a litmus test? You’re at a gathering of some type, and someone asks you:

So, what do you do?

At which point can you (or me, or anyone) respond with confidence:

I’m a Data Scientist.

I think the responding-with-confidence part is key here for any of us with a modicum of humility, education, experience, etc. I don’t know about you, but I’m certainly not interested in this declaration being greeted by judgmental guffaws, coughing spasms, involuntary eye motion, etc. Instead of all this overt ‘body language’, I’m sure we’d all prefer to receive an inquiring response along the lines of:

Oh, just what the [expletive deleted] is that?

Or, at least:

Dude, seriously, did you like, just make that up?

Responses to this very-legitimate, potentially disarming question, will need to be saved for another time – though I’m sure a quick Google search will reveal a just-what-the-[expletive deleted]-is-Data-Scientist elevator pitch.

To return to the question intended for this post however, let’s focus for a moment on how a best-selling author ‘became’ a writer.

“I’m a Writer”

I was recently listening to best-selling author Jeff Goins being interviewed by podcast host Srini Rao on an episode of the Unmistakable Creative. Although the entire episode (and the podcast in general, frankly) is well worth the listen, my purpose here is to extract the discussion relating to Goins’ own process of becoming a writer. In this episode of the podcast, Goins recalls the moment when he believed he was a writer. He then set about behaving as a writer – essentially, the hard work of showing up every single day just to write. Goins continues by explaining how based upon his belief (“I am writer”) and his behavior (i.e., the practice of writing on a daily basis), he ultimately realized his belief through his actions (behavior) and became a writer. With five, best selling books to his credit, plus a high-traffic-blog property, and I’m sure much more, it’s difficult now to dispute Goins’ claim of being a writer.

Believe. Behave. Become. Sounds like a simple enough algorithm, so in the final section of this post, I’ll apply it to the question posed at the outset – namely:

When do you legitimately get to call yourself a Data Scientist?

I’m a Data Scientist?

I suppose, then, that by direct application of Goins’ algorithm, you can start the process merely by believing you’re a Data Scientist. Of course, I think we all know that that’ll only get you so far, and probably not even to a first interview. More likely, I think that most would agree that we need to have some Data Science chops before we would even entertain such an affirmation – especially in public.

And this is where my Data Science Portfolio enters the picture – in part, allowing me to self-validate, to legitimize whether or not I can call myself a Data Scientist in public without the laughing, choking or winking. What’s interesting though is that in order to work through Goins’ algorithm, engaging in active curation of a Data Science portfolio is causing me to work backwards – making use of hindsight to validate that I have ‘arrived’ as a Data Scientist:

  • Become – Whereas I don’t have best sellers or even a high-traffic blog site to draw upon, I have been able to assemble a variety of relevant artifacts into a Portfolio. Included in the Portfolio are peer-reviewed articles that have appeared in published journals with respectable impact factors. This, for a Data Scientist, is arguably a most-stringent validation of an original contribution to the field. However, chapters in books, presentations at academic and industry events, and so on, also serve as valuable demonstrations of having become a Data Scientist. Though it doesn’t apply to me (yet?), the contribution of code would also serve as a resounding example – with frameworks such as Apache Hadoop, Apache Spark, PyTorch, and TensorFlow serving as canonical and compelling examples.
  • Behave – Not since the time I was a graduate student have I been able to show up every day. However, recognizing the importance of deliberate practice, there have been extended periods during which I have shown up every day (even if only for 15 minutes) to advance some Data Science project. In my own case, this was most often the consequence of holding down a full-time job at the same time – though in some cases, as is evident in the Portfolio, I have been able to work on such projects as a part of my job. Such win-win propositions can be especially advantageous for the aspiring Data Scientist and the organization s/he represents.
  • Believe – Perhaps the most important outcome of engaging in the deliberate act of putting together my Data Science Portfolio, is that I’m already in a much more informed position, and able to make a serious ‘gut check’ on whether or not I can legitimately declare myself a Data Scientist right here and right now.

The seemingly self-indulgent pursuit of developing my own Data Science Portfolio, an engagement of active self-curation, has (quite honestly) both surprised and delighted me; I clearly have been directly involved in the production of a number of artifacts that can be used to legitimately represent myself as ‘active’ in the area of Data Science. The part-time nature of this pursuit, especially since the completion of grad school (though with a few notable exceptions), has produced a number of outcomes that can be diplomatically described as works (still) in progress … and in some cases, that is unfortunate.

Net-net, there is some evidence to support a self-declaration as a Data Scientist – based upon artifacts produced, and implied (though inconsistent) behaviors. However, when asked the question “What do you do?”, I am more likely to respond that:

I am a demonstrably engaged and passionate student of Data Science – an aspiring Data Scientist, per se … one who’s actively working on becoming, behaving and ultimately believing he’s a Data Scientist.

Based on my biases, that’s what I currently feel owing to the very nature of Data Science itself.

Data Science: Celebrating Academic Personal Bias

Data Science: Celebrating My Academic Bias

In a recent post, I introduced my Data Science Portfolio. After describing the high-level organization of the Portfolio, I noted:

At the end, and for now, there is a section on my academic background – a background that has shaped so much of those intersections between science and technology that have been captured in the preceding sections of the portfolio.

Even in this earliest of drafts, I knew that I was somewhat uncomfortable with a section dedicated to academics in my Portfolio. After all shouldn’t a portfolio place more emphasis on how my knowledge and skills, academic or otherwise, have been applied to produce some tangible artifact?

Upon further reflection, I currently believe what’s material in the context of a portfolio is some indication of the bias inherent in the resulting curated showcase of one’s work. Of course to some degree the works presented, and the curation process itself, will make self-evident such personal bias.

Whereas it may make sense for an artist not to overtly disclose any bias with respect to their craft, or a curated collection their work, I currently perceive absolutely no downside in sharing my personal bias – a bias that in my own case, I believe reflects only in positive ways on the Portfolio as well as the individual items included in it.

To this end, and in the spirit of such a positive self-disclosure, my personal bias reflects my formative years in science – a background to which I well recall significant contributions from high school, that were subsequently broadened and deepened as an undergraduate and then graduate student. Even more specifically in terms of personal bias was my emphasis on the physical sciences; a bias that remains active today.

As I’ve started to share, through such posts as the one on the mathematical credentials I bring to Data Science, my choice to pursue the physical sciences was an excellent one – even through the self-critical lens of personal hindsight. An excellent choice, but albeit a biased one.

The very nature of Data Science is such that each of us carries with us our own, wonderfully unique personal bias. As we necessarily collaborate in team, project and organizational settings, I believe it’s important to not only ensure each of us preserves their personal bias, but that we leverage this perspective as fully and appropriately as possible. As a consequence it is much more likely that everyone we work with, and everything we work on, will derive maximal value.

Teaching/Learning Weather and Climate via Pencasting

I first heard about it a few years ago, and thought it sounded interesting … and then, this past Summer, I did a little more research and decided to purchase a Livescribe 8 GB Echo(TM) Pro Pack. Over the Summer, I took notes with the pen from time-to-time and found it to be somewhat useful/interesting.

Just this week, however, I decided it was time to use the pen for the originally intended purpose: Making pencasts for the course I’m currently teaching in weather and climate at Toronto’s York University. Before I share some sample pencasts, please allow me to share my findings based on less than a week’s worth of `experience’:

  • Decent-quality pencasts can be produced with minimal effort – I figured out the basics (e.g., how to record my voice) in a few minutes, and started on my first pencast. Transferring the pencast from the pen to the desktop software to the Web (where it can be shared with my students) also requires minimal effort. “Decent quality” here refers to both the visual and audio elements. The fact that this is both a very natural (writing with a pen while speaking!) and speedy (efficient/effective) undertaking means that I am predisposed towards actually using the technology whenever it makes sense – more on that below. Net-net: This solution is teacher-friendly.
  • Pencasts compliment other instructional media – This is my current perspective … Pencasts compliment the textbook readings I assign, the lecture slides plus video/audio captures I provide, the Web sites we all share, the Moodle discussion forums we engage in, the Tweets I issue, etc. In the spirit of blended learning it is my hope that pencasts, in concert with these other instructional media, will allow my TAs and I to `reach’ most of the students in the course.
  • Pencasts allow the teacher to address both content and skills-oriented objectives – Up to this point, my pencasts have started from a blank page. This forces me to be focused, and systematically develop towards some desired content (e.g., conceptually introducing the phase diagram for H2O) and/or skills (e.g., how to calculate the slope of a line on a graph) oriented outcome. Because students can follow along, they have the opportunity to be fully engaged as the pencast progresses. Of course, what this also means is that this technology can be as effective in the first-year university level course I’m currently teaching, but also at the academic levels that precede (e.g., grade school, high school, etc.) and follow (senior undergraduate and graduate) this level.
  • Pencasts are learner-centric – In addition to be teacher-friendly, pencasts are learner-centric. Although a student could passively watch and listen to a pencast as it plays out in a linear, sequential fashion, the technology almost begs you to interact with it. As noted previously, this means a student can easily replay some aspect of the pencast that they missed. Even more interestingly, however, students can interact with pencasts in a random-access mode – a mode that would almost certainly be useful when they are attempting to apply the content/skills conveyed through the pencast to a tutorial or assignment they are working on, or a quiz or exam they are actively studying for. It is important to note that both the visual and audio elements of the pencast can be manipulated with impressive responsiveness to random-access input from the student.
  • I’m striving for authentic, not perfect pencasts – With a little more practice and some planning/scripting, I’d be willing to bet that I could produce an extremely polished pencast. Based on past experience teaching today’s first-year university students, I’m fairly convinced that this is something they couldn’t care less about. Let’s face it, my in-person lectures aren’t perfectly polished, and neither are my pencasts. Because I can easily go back to existing pencasts and add to them, I don’t need to fret too much about being perfect the first time. Too much time spent fussing here would diminish the natural and speedy aspects of the technology.

Findings aside, on to samples:

  • Calculating the lapse rate for Earth’s troposphere – This is a largely a skills-oriented example. It was my first pencast. I returned twice to the original pencast to make changes – once to correct a spelling mistake, and the second time to add in a bracket (“Run”) that I forgot. I communicated these changes to the students in the course via an updated link shared through a Moodle forum dedicated to pencasts. If you were to experience the updates, you’d almost be unaware of the lapse of time between the original pencast and the updates, as all of this is presented seamlessly as a single pencast to the students.
  • Introducing the pressure-temperature phase diagram for H2O – This is largely a content-oriented example. I got a little carried away in this one, and ended up packing in a little too much – the pencast is fairly long, and by the time I’m finished, the visual element is … a tad on the busy side. Experience gained.

Anecdotally, initial reaction from the students has been positive. Time will tell.

Next steps:

  • Monday (October 1, 2012), I intend to use a pencast during my lecture – to introduce aspects of the stability of Earth’s atmosphere. I’ll try to share here how it went. For this intended use of the pencast, I will use a landscape mode for presentation – as I expect that’ll work well in the large lecture hall I teach in. I am, however, a little concerned that the lines I’ll be drawing will be a little too thin/faint for the students at the back of the lecture theatre to see …
  • I have two sections of the NATS 1780 Weather and Climate course to teach this year. One section is taught the traditional way – almost 350 students in a large lecture theatre, 25-student tutorial groups, supported by Moodle, etc. In striking contrast to the approach taken in the meatspace section, is the second section where almost everything takes place online via Moodle. Although I have yet to support this hypothesis with any data, it is my belief that these pencasts are an excellent way to reach out to the students in the Internet-only section of the course. More on this over the fullness of time (i.e., the current academic session.)

Feel free to comment on this post or share your own experiences with pencasts.

Remembering Steve Jobs

I was doing some errands earlier this evening (Toronto time) … While I was in the car, the all-news station (680news) I had on played some of Steve Jobs’ 2005 commencement address to Stanford grads. As I listened, and later re-read my own blog post on discovering the same address, I’m struck on the event of his passing by the importance of valuing every experience in life. In Jobs’ case, he eventually leveraged his experience with calligraphy to design the typography for the Apple Mac – after a ten-year incubation period!

I think it’s time to read that Stanford commencement address again …

RIP Steve – and thanks much.

Targeting Public Speaking Skills via Virtual Environments

Recently I shared an a-ha! moment on the use of virtual environments for confronting the fear of public speaking.

The more I think about it, the more I’m inclined to claim that the real value of such technology is in targeted skills development.

Once again, I’ll use myself as an example here to make my point.

If I think back to my earliest attempts at public speaking as a graduate student, I’d claim that I did a reasonable job of delivering my presentation. And given that the content of my presentation was likely vetted with my research peers (fellow graduate students) and supervisor ahead of time, this left me with a targeted opportunity for improvement: The Q&A session.

Countless times I can recall having a brilliant answer to a question long after my presentation was finished – e.g., on my way home from the event. Not very useful … and exceedingly frustrating.

I would also assert that this lag, between question and appropriate answer, had a whole lot less to do with my expertise in a particular discipline, and a whole lot more to do with my degree nervousness – how else can I explain the ability to fashion perfect answers on the way home!

image006Over time, I like to think that I’ve approved my ability to deliver better-quality answers in real time. How have I improved? Experience. I would credit my experience teaching science to non-scientists at York, as well as my public-sector experience as a vendor representative at industry events, as particularly edifying in this regard.

Rather than submit to such baptisms of fire, and because hindsight is 20/20, I would’ve definitely appreciated the opportunity to develop my Q&A skills in virtual environments such as Nortel web.alive. Why? Such environments can easily facilitate the focused effort I required to target the development of my Q&A skills. And, of course, as my skills improve, so can the challenges brought to bear via the virtual environment.

All speculation at this point … Reasonable speculation that needs to be validated …

If you were to embrace such a virtual environment for the development of your public-speaking skills, which skills would you target? And how might you make use of the virtual environment to do so?

Confronting the Fear of Public Speaking via Virtual Environments

Confession: In the past, I’ve been extremely quick to dismiss the value of Second Life in the context of teaching and learning.

Even worse, my dismissal was not fact-based … and, if truth be told, I’ve gone out of my way to avoid opportunities to ‘gather the facts’ by attending presentations at conferences, conducting my own research online, speaking with my colleagues, etc.

So I, dear reader, am as surprised as any of you to have had an egg-on-my-face epiphany this morning …

Please allow me to elaborate:

It was at some point during this morning’s brainstorming session that the egg hit me squarely in the face:

Why not use Nortel web.alive to prepare graduate students for presenting their research?

Often feared more than death and taxes, public speaking is an essential aspect of academic research – regardless of the discipline.

image004Enter Nortel web.alive with its virtual environment of a large lecture hall – complete with a podium, projection screen for sharing slides, and most importantly an audience!

As a former graduate student, I could easily ‘see’ myself in this environment with increasingly realistic audiences comprised of friends, family and/or pets, fellow graduate students, my research supervisor, my supervisory committee, etc. Because Nortel web.alive only requires a Web browser, my audience isn’t geographically constrained. This geographical freedom is important as it allows for participation – e.g., between graduate students at York in Toronto and their supervisor who just happens to be on sabbatical in the UK. (Trust me, this happens!)

As the manager of Network Operations at York, I’m always keen to encourage novel use of our campus network. The public-speaking use case I’ve described here has the potential to make innovative use of our campus network, regional network (GTAnet), provincial network (ORION), and even national network (CANARIE) that would ultimately allow for global connectivity.

While I busy myself scraping the egg off my face, please chime in with your feedback. Does this sound useful? Are you aware of other efforts to use virtual environments to confront the fear of public speaking? Are there related applications that come to mind for you? (As someone who’s taught classes of about 300 students in large lecture halls, a little bit of a priori experimentation in a virtual environment would’ve been greatly appreciated!)

Update (November 13, 2009): I just Google’d the title of this article and came up with a few, relevant hits; further research is required.

Google Chrome for Linux on Bright Hub: Series Expanded

I recently posted on a new article series on Google Chrome for Linux that I’ve been developing over on Bright Hub. My exploration has turned out to be more engaging than I anticipated! At the moment, there are six articles in the series:

I anticipate a few more …

It’s also important to share that Google Chrome for Linux does not yet exist as an end-user application. Under the auspices of the Chromium Project, however, there is a significant amount of work underway. And because this work is taking place out in the open (Chromiun is an Open Source Project), now is an excellent time to engage – especially for serious enthusiasts.

Survey on How Scientists Use Their Computers

How do scientists actually use computers in their day-to-day work?

A Canadian team is conducting a survey to find out:

Computers are as important to modern scientists as test tubes, but we know surprisingly little about how scientists develop and use software in their research. To find out, the University of Toronto, Simula Research Laboratory, and the National Research Council of Canada have launched an online survey in conjunction with “American Scientist” magazine. If you have 20 minutes to take part, please go to:

http://softwareresearch.ca/seg/SCS/scientific-computing-survey.html

Thanks in advance for your help!

Jo Hannay (Simula Research Laboratory)
Hans Petter Langtangen (Simula Research Laboratory)
Dietmar Pfahl (Simula Research Laboratory)
Janice Singer (National Research Council of Canada)
Greg Wilson (University of Toronto)

The results of the survey will be shared via American Scientist.

Is Desktop Software Dead?

When was the last time you were impressed by desktop software?

Really impressed?

After seeing (in chronological order) Steve Jobs, Al Gore and Tim Bray make use of Apple Keynote, I absolutely had to give it a try. And impressed I was – and to some extent, still am. For me, this revelation happened about a year ago. I cannot recall the previous instance – i.e., the time I was truly impressed by desktop software.

Although I may be premature, I can’t help but ask: Is desktop software dead?
A few data points:
  • Wikipedia states: “There is no page titled “desktop software”.” What?! I suppose you could argue I’m hedging my bets by choosing an obscure phrase (not!), but seriously, it is remarkable that there is no Wikipedia entry for “desktop software”!
  • Microsoft, easily the leading purveyor of desktop software, is apparently in trouble. Although Gartner’s recent observations target Microsoft Windows Vista, this indirectly spells trouble for all Windows applications as they rely heavily on the platform provided by Vista.
  • There’s an innovation’s hiatus. And that’s diplomatically generous! Who really cares about the feature/functionality improvements in, e.g., Microsoft Office? When was the last time a whole new desktop software category appeared? Even in the Apple Keynote example I shared above, I was impressed by Apple’s spin on presentation software. Although Keynote required me to unlearn habits developed through years of use Microsoft PowerPoint, I was under no delusions of having entered some new genre of desktop software.
  • Thin is in! The bloatware that is modern desktop software is crumbling under its own weight. It must be nothing short of embarrassing to see this proven on a daily basis by the likes of Google Docs. Hardware vendors must be crying in their beers as well, as for years consumers have been forced to upgrade their desktops to accommodate the latest revs of their favorite desktop OS and apps. And of course, this became a negatively reinforcing cycle, as the hardware upgrades masked the inefficiencies inherent in the bloated desktop software. Thin is in! And thin, these days, doesn’t necessarily translate to a penalty in performance.
  • Desktop software is reaching out to the network. Despite efforts like Microsoft Office Online, the lacklustre results speak for themselves. It’s 2008, and Microsoft is still playing catch up with upstarts like Google. Even desktop software behemoth Adobe has shown better signs of getting it (network-wise) with recent entres such as Adobe Air. (And of course, with the arrival of Google Gears, providers of networked software are reaching out to the desktop.)

The figure below attempts to graphically represent some of the data points I’ve ranted about above.

In addition to providing a summary, the figure suggests:

  • An opportunity for networked, Open Source software. AFAIK, that upper-right quadrant is completely open. I haven’t done an exhaustive search, so any input would be appreciated.
  • A new battle ground. Going forward, the battle will be less about commercial versus Open Source software. The battle will be more about desktop versus networked software.

So: Is desktop software dead?

Feel free to chime in!

To Do for Microsoft: Create a Wikipedia entry for “desktop software”.

The MFA is the New MBA: Illustrations by Steve Jobs and Apple

In March 2005, Dan Pink asserted “… the MFA is the new MBA”.
Why?

… businesses are realizing that the only way to differentiate their goods and services in today’s overstocked marketplace is to make their offerings physically beautiful and emotionally compelling. Thus the high-concept abilities of an artist are often more valuable than the easily replicated L-Directed skills of an entry-level business graduate.

I can’t think of a better illustration than Steve Jobs’ story of how the Mac became the first computer with beautiful typography.
And of course, true to form, Jobs illustrated Pink’s assertion more than two decades ago.
And since 1984, Jobs and Apple have made the illustration even more compelling with the current generation of Macs, the iPod, and most recently the iPhone.
Note-to-self: Look into MFA programs!

Aside: I’ve blogged previously about Pink’s book and its implications for displacing knowledge workers.