Data Scientist: Believe. Behave. Become.

A Litmus Test

When do you legitimately get to call yourself a Data Scientist?

How about a litmus test? You’re at a gathering of some type, and someone asks you:

So, what do you do?

At which point can you (or me, or anyone) respond with confidence:

I’m a Data Scientist.

I think the responding-with-confidence part is key here for any of us with a modicum of humility, education, experience, etc. I don’t know about you, but I’m certainly not interested in this declaration being greeted by judgmental guffaws, coughing spasms, involuntary eye motion, etc. Instead of all this overt ‘body language’, I’m sure we’d all prefer to receive an inquiring response along the lines of:

Oh, just what the [expletive deleted] is that?

Or, at least:

Dude, seriously, did you like, just make that up?

Responses to this very-legitimate, potentially disarming question, will need to be saved for another time – though I’m sure a quick Google search will reveal a just-what-the-[expletive deleted]-is-Data-Scientist elevator pitch.

To return to the question intended for this post however, let’s focus for a moment on how a best-selling author ‘became’ a writer.

“I’m a Writer”

I was recently listening to best-selling author Jeff Goins being interviewed by podcast host Srini Rao on an episode of the Unmistakable Creative. Although the entire episode (and the podcast in general, frankly) is well worth the listen, my purpose here is to extract the discussion relating to Goins’ own process of becoming a writer. In this episode of the podcast, Goins recalls the moment when he believed he was a writer. He then set about behaving as a writer – essentially, the hard work of showing up every single day just to write. Goins continues by explaining how based upon his belief (“I am writer”) and his behavior (i.e., the practice of writing on a daily basis), he ultimately realized his belief through his actions (behavior) and became a writer. With five, best selling books to his credit, plus a high-traffic-blog property, and I’m sure much more, it’s difficult now to dispute Goins’ claim of being a writer.

Believe. Behave. Become. Sounds like a simple enough algorithm, so in the final section of this post, I’ll apply it to the question posed at the outset – namely:

When do you legitimately get to call yourself a Data Scientist?

I’m a Data Scientist?

I suppose, then, that by direct application of Goins’ algorithm, you can start the process merely by believing you’re a Data Scientist. Of course, I think we all know that that’ll only get you so far, and probably not even to a first interview. More likely, I think that most would agree that we need to have some Data Science chops before we would even entertain such an affirmation – especially in public.

And this is where my Data Science Portfolio enters the picture – in part, allowing me to self-validate, to legitimize whether or not I can call myself a Data Scientist in public without the laughing, choking or winking. What’s interesting though is that in order to work through Goins’ algorithm, engaging in active curation of a Data Science portfolio is causing me to work backwards – making use of hindsight to validate that I have ‘arrived’ as a Data Scientist:

  • Become – Whereas I don’t have best sellers or even a high-traffic blog site to draw upon, I have been able to assemble a variety of relevant artifacts into a Portfolio. Included in the Portfolio are peer-reviewed articles that have appeared in published journals with respectable impact factors. This, for a Data Scientist, is arguably a most-stringent validation of an original contribution to the field. However, chapters in books, presentations at academic and industry events, and so on, also serve as valuable demonstrations of having become a Data Scientist. Though it doesn’t apply to me (yet?), the contribution of code would also serve as a resounding example – with frameworks such as Apache Hadoop, Apache Spark, PyTorch, and TensorFlow serving as canonical and compelling examples.
  • Behave – Not since the time I was a graduate student have I been able to show up every day. However, recognizing the importance of deliberate practice, there have been extended periods during which I have shown up every day (even if only for 15 minutes) to advance some Data Science project. In my own case, this was most often the consequence of holding down a full-time job at the same time – though in some cases, as is evident in the Portfolio, I have been able to work on such projects as a part of my job. Such win-win propositions can be especially advantageous for the aspiring Data Scientist and the organization s/he represents.
  • Believe – Perhaps the most important outcome of engaging in the deliberate act of putting together my Data Science Portfolio, is that I’m already in a much more informed position, and able to make a serious ‘gut check’ on whether or not I can legitimately declare myself a Data Scientist right here and right now.

The seemingly self-indulgent pursuit of developing my own Data Science Portfolio, an engagement of active self-curation, has (quite honestly) both surprised and delighted me; I clearly have been directly involved in the production of a number of artifacts that can be used to legitimately represent myself as ‘active’ in the area of Data Science. The part-time nature of this pursuit, especially since the completion of grad school (though with a few notable exceptions), has produced a number of outcomes that can be diplomatically described as works (still) in progress … and in some cases, that is unfortunate.

Net-net, there is some evidence to support a self-declaration as a Data Scientist – based upon artifacts produced, and implied (though inconsistent) behaviors. However, when asked the question “What do you do?”, I am more likely to respond that:

I am a demonstrably engaged and passionate student of Data Science – an aspiring Data Scientist, per se … one who’s actively working on becoming, behaving and ultimately believing he’s a Data Scientist.

Based on my biases, that’s what I currently feel owing to the very nature of Data Science itself.

Remembering a Supportive Sibling

Remembering a Supportive Sibling

Less than a week before I was scheduled to deliver my first presentation on a novel way for approaching an outstanding challenge in seismic processing, my younger sister Deborah passed away. She was only 50. Thanks to medical care that included extensive chemotherapy, Debbie recovered from lymphoma once, and was declared cancer free. However, a second wave of lymphoma accompanied by leukemia, proved to be more than she could handle – and we lost her during a procedure that (ironically) was attempting to provide more information about the cancers that had literally taken over her body.

Between Debbie’s passing and her funeral, was not only a about a week’s lapse of time, but the need for me to make a decision – a decision to present as scheduled at the 2015 Rice University Oil & Gas Conference in Houston or miss the event entirely. A complicating factor in my ability to make this decision was that I truly was the only person who could deliver it. That’s more a pragmatic statement than a boastful one, as I had combined my background in geophysics with an increasing knowledge of Big Data Analytics; in so doing, I’d arrived at a submission for the RiceU Conference that was as uniquely of my creation as it was a disruptive suggestion – in other words, something I felt strongly to be well suited to the Conference’s Disruptive Technology Track. With the Conference being less than a week away, most of the real work had already been completed; in other words, all I needed to do was show up, make a two-minute presentation, and discuss the poster I’d prepared with those who expressed an interest.

Debbie was always highly supportive and encouraging when it came to ‘things’ like this – the expression of something worth sharing. This, despite the fact that she and I were on completely different trajectories when it came to our intellectual interests and pursuits – me in the physical sciences and technology, while Debbie favoured English literature. Despite these differences, Debbie often made a point of trying to understand and appreciate what I was working on – no matter how geekily obscure.

In recalling these traits of hers, her sincere interest in what I was doing (I suppose) just because we were siblings, my decision to follow through with the presentation at the RiceU Conference was a relatively easy one. Executing it, however, was at times challenging … and I could not have followed through without the support of my colleagues from Bright Computing.

You can still review my two-minute presentation here thanks to the wonderful people who run this industry-leading event on an annual basis at Rice. The poster I alluded to is available here. The ideas hatched through these 2015 communications proved instrumental in spinning off additional contributions. Equally important, were those interactions initiated at this 2015 RiceU Conference. Some of these interactions resulted in relationships that persist through today – relationships that have, for example, resulted me applying Machine Learning to problems of scientific interest.

And so it is, on the occasion of what would’ve been Debbie’s 54th birthday, that I wistfully remember my sister. Without knowing that I’d have had her support and encouragement, I likely wouldn’t have followed through with that March 2015 presentation at the RiceU Conference – a decision that had immediate plus long-lasting implications to my progression as a Data Scientist.

Targeting Public Speaking Skills via Virtual Environments

Recently I shared an a-ha! moment on the use of virtual environments for confronting the fear of public speaking.

The more I think about it, the more I’m inclined to claim that the real value of such technology is in targeted skills development.

Once again, I’ll use myself as an example here to make my point.

If I think back to my earliest attempts at public speaking as a graduate student, I’d claim that I did a reasonable job of delivering my presentation. And given that the content of my presentation was likely vetted with my research peers (fellow graduate students) and supervisor ahead of time, this left me with a targeted opportunity for improvement: The Q&A session.

Countless times I can recall having a brilliant answer to a question long after my presentation was finished – e.g., on my way home from the event. Not very useful … and exceedingly frustrating.

I would also assert that this lag, between question and appropriate answer, had a whole lot less to do with my expertise in a particular discipline, and a whole lot more to do with my degree nervousness – how else can I explain the ability to fashion perfect answers on the way home!

image006Over time, I like to think that I’ve approved my ability to deliver better-quality answers in real time. How have I improved? Experience. I would credit my experience teaching science to non-scientists at York, as well as my public-sector experience as a vendor representative at industry events, as particularly edifying in this regard.

Rather than submit to such baptisms of fire, and because hindsight is 20/20, I would’ve definitely appreciated the opportunity to develop my Q&A skills in virtual environments such as Nortel web.alive. Why? Such environments can easily facilitate the focused effort I required to target the development of my Q&A skills. And, of course, as my skills improve, so can the challenges brought to bear via the virtual environment.

All speculation at this point … Reasonable speculation that needs to be validated …

If you were to embrace such a virtual environment for the development of your public-speaking skills, which skills would you target? And how might you make use of the virtual environment to do so?

On Knowledge-Based Representations for Actionable Data …

I bumped into a professional acquaintance last week. After describing briefly a presentation I was about to give, he offered to broker introductions to others who might have an interest in the work I’ve been doing. To initiate the introductions, I crafted a brief description of what I’ve been up to for the past 5 years in this area. I’ve also decided to share it here as follows: 

As always, [name deleted], I enjoyed our conversation at the recent AGU meeting in Toronto. Below, I’ve tried to provide some context for the work I’ve been doing in the area of knowledge representations over the past few years. I’m deeply interested in any introductions you might be able to broker with others at York who might have an interest in applications of the same.

Since 2004, I’ve been interested in expressive representations of data. My investigations started with a representation of geophysical data in the eXtensible Markup Language (XML). Although this was successful, use of the approach underlined the importance of metadata (data about data) as an oversight. To address this oversight, a subsequent effort introduced a relationship-centric representation via the Resource Description Format (RDF). RDF, by the way, forms the underpinnings of the next-generation Web – variously known as the Semantic Web, Web 3.0, etc. In addition to taking care of issues around metadata, use of RDF paved the way for increasingly expressive representations of the same geophysical data. For example, to represent features in and of the geophysical data, an RDF-based scheme for annotation was introduced using XML Pointer Language (XPointer). Somewhere around this point in my research, I placed all of this into a framework.

A data-centric framework for knowledge representation.

A data-centric framework for knowledge representation.

 In addition to applying my Semantic Framework to use cases in Internet Protocol (IP) networking, I’ve continued to tease out increasingly expressive representations of data. Most recently, these representations have been articulated in RDFS – i.e., RDF Schema. And although I have not reached the final objective of an ontological representation in the Web Ontology Language (OWL), I am indeed progressing in this direction. (Whereas schemas capture the vocabulary of an application domain in geophysics or IT, for example, ontologies allow for knowledge-centric conceptualizations of the same.)  

From niche areas of geophysics to IP networking, the Semantic Framework is broadly applicable. As a workflow for systematically enhancing the expressivity of data, the Framework is based on open standards emerging largely from the World Wide Web Consortium (W3C). Because there is significant interest in this next-generation Web from numerous parties and angles, implementation platforms allow for increasingly expressive representations of data today. In making data actionable, the ultimate value of the Semantic Framework is in providing a means for integrating data from seemingly incongruous disciplines. For example, such representations are actually responsible for providing new results – derived by querying the representation through a ‘semantified’ version of the Structured Query Language (SQL) known as SPARQL. 

I’ve spoken formally and informally about this research to audiences in the sciences, IT, and elsewhere. With York co-authors spanning academic and non-academic staff, I’ve also published four refereed journal papers on aspects of the Framework, and have an invited book chapter currently under review – interestingly, this chapter has been contributed to a book focusing on data management in the Semantic Web. Of course, I’d be pleased to share any of my publications and discuss aspects of this work with those finding it of interest.

With thanks in advance for any connections you’re able to facilitate, Ian. 

If anything comes of this, I’m sure I’ll write about it here – eventually!

In the meantime, feedback is welcome.

Blended Learning Panel

York University’s Institute for Research on Learning Technologies is sponsoring a panel discussion on blended learning:

“A recent workplace survey reported by Brandon Hall Publishing (2008) indicates that employing a mix of web-technologies with face-to-face learning is more effective than either e-learning or face-to-face instructional approaches alone. To explore the use and potential of “blended learning” further, please join us for a panel discussion featuring experts from various fields …”

This event has been re-scheduled for April 2, 2009 at 12:15 pm in TEL 1009 at York’s Keele Campus. I anticipate a lively and interesting discussion!

(Please check the IRLT Web site for the latest updates on the event.)

An Eight Pack of Leadership Traits

I recently came across an article by Hank Marquis on effective leadership traits for those in IT

Marquis distills the following eight pack of traits:
  1. Leadership means focusing on the needs of others, not yourself
  2. Leadership comes from your actions, not your title
  3. Leadership makes you accountable, even if it’s not your fault
  4. Leadership is not a 9-to-5 activity
  5. Leadership takes trust from your followers
  6. Leaders get their best ideas from their team
  7. Leadership thrives on diversity
  8. Leadership comes from continuous communication
Marquis elaborates on each of these traits in the article.
And as two final nuggets to further whet your appetite, consider the following two quotes:

Effective leaders build a trusted team and then follow the team’s advice.

… always give the credit to the team. The leader’s credit comes only by crediting the team he or she leads.

Injury Time: Remembrance of Things Just Passed …

What happened?

I strained my lower back. Badly. It was the result of two careless acts: 
  1. Lifting a heavy prop awkwardly at our annual Mardi Gras event. I felt a twinge of pain, and suspect that this predisposed my back towards injury.
  2. Attempting to leave a leg-press machine before completely releasing the 220 lbs of weight that I, back included, was still supporting. 
The pivotal incident (involving the leg-press machine) happened last Wednesday at the University’s athletic complex. Of course, I finished my hamstring curls and rowed for 20 minutes before calling it a day. Doh! 
And yes, I knew then that I was in deep trouble. 
Contrary to my spouse’s advice, I hauled my sorry self off to the University the next day, because I had things that must be done. Doh! With a notable, curvaceous list (upwards to the left), and walking speed 10-20% my normal, I can honestly state that I got a deeper appreciation of what it means to be differently abled. People rushing past me, icy walkways, plus doors stiff to open, were all-of-a-sudden on my radar. 
I barely made it through that Thursday.
I started my formal convalescence (aka. sensible acknowledgement of my predicament) on Friday morning. 

What did I do?
I convalesced. At this point, I had no choice! I took muscle relaxant and installed myself upon a heating pad. Save for attending to primal bodily functions, and attempting to do a few exercises I learned in physio that last time I strained my lower back, I remained in a sub-horizontal state through the entire weekend. I had to pass on a friend’s birthday party and a ski day 😦  
But, I:

I fretted. About work – not being there, work piling up, etc. And about my exercise routine – that picked me up, and then knocked me down! I communed with my family – when they weren’t making up for my shortfalls – and with our pets (three cats and an obnoxiously vocal husky).


What did I learn?
How good people are to me. From walking the dog to driving Miss Daisy (our teenage princess to/from dance/work/friends/etc.) to countless other things I normally do, my family filled the gaps and still had some energy left over for me in my supine state. When I did hobble into the office, I received all kinds of moral and physical support from my co-workers.
In addition to valuing my health, which I’ve been consistently better at for about the past seven months, I need to be careful – especially during acts of weekend heroism (aka. attempts at being handy) and/or exercise (technique and form do matter – ouch!).
I need to allocate more time for reading. All kinds of reading. Because I really don’t watch TV, except for NFL football, there’s nothing I can do there. Reducing the amount of time I spend handling email is about the only place I believe I can claw back from. In the 4-Hour Work Week, Timothy Ferriss presents some provocative suggestions on this front; I’d better re-read that! 
The BlackBerry is a wonderfully powerful platform that suits me when I am highly mobile, but also when I’m highly immobile – like flat on my back, literally! It’s also the only device my back can actually handle me moving around with at the moment – my laptop in an over-the-shoulder case is a non-starter for me in my current condition.

Why did I share this?
So that I have something to refer back to (sorry), when I’m getting careless!  

iPhone Envy? Live Vicariously!

Canadians are faced with the ongoing reality of iPhone envy.

And although we’re not alone, the iPhone feels so close …

Therefore, in the interim, I’m living vicariously by channeling experiences from those in the US.

I recently asked a tech-savvy, former-coworker, who actually has an iPhone: “How do you like your iPhone?”

Here’s what he had to say:

My wife and I both bought an iPhone. She is madly in love with hers. I really like mine. Occasionally, email doesn’t work the way one would expect which can be frustrating. The problems are related to 1) yahoo.com problems, 2) timing issues of using pop or 3) cannot connect to edge/wifi. But it hasn’t dampened our overall satisfaction with the phone.

Also, I can’t get work email as our exchange server doesn’t have IMAP enabled, but that’s cool since I don’t want work email on my personal phone. 🙂 Everyone who see’s the phone oohs and aahs about it even if they don’t realize that its an iPhone. It just has a really slick appearance to it. In particular, when you bring up photos or a web site and you turn the phone and the picture automatically reorients itself and then you use gestures to move around or to move to the next picture and resize the picture, they REALLY get excited.

Mostly, I just appreciate it because the interface works the way you would expect/want it to.

It is definitely 1.0. Can’t wait for 1.1 both to fix small issues and to see what features it brings along!

I heard rogers is supposed to carry it, but there have been issues in the negotiations.

Earth and Space Science Informatics at the 2007 Fall Meeting of the American Geophysical Union

In a previous post, I referred to Earth Science Informatics as a discipline-in-the-making.

To support this claim, I cited a number of data points. And of these data points, the 2006 Fall Meeting of the American Geophysical Union (AGU) stands out as a key enabler.

With 22 sessions posted, the 2007 Fall Meeting of the AGU is well primed to further enable the development of this discipline.

Because I’m a passionate advocate of this intersection between the Earth Sciences and Informatics, I’m involved in convening three of the 22 Earth and Space Science Informatics sessions:

I encourage you to take a moment to review the calls for participation for these three, as well as the other 19, sessions in Earth and Space Science Informatics at the 2007 Fall Meeting of the AGU.