Data Scientist: Believe. Behave. Become.

A Litmus Test

When do you legitimately get to call yourself a Data Scientist?

How about a litmus test? You’re at a gathering of some type, and someone asks you:

So, what do you do?

At which point can you (or me, or anyone) respond with confidence:

I’m a Data Scientist.

I think the responding-with-confidence part is key here for any of us with a modicum of humility, education, experience, etc. I don’t know about you, but I’m certainly not interested in this declaration being greeted by judgmental guffaws, coughing spasms, involuntary eye motion, etc. Instead of all this overt ‘body language’, I’m sure we’d all prefer to receive an inquiring response along the lines of:

Oh, just what the [expletive deleted] is that?

Or, at least:

Dude, seriously, did you like, just make that up?

Responses to this very-legitimate, potentially disarming question, will need to be saved for another time – though I’m sure a quick Google search will reveal a just-what-the-[expletive deleted]-is-Data-Scientist elevator pitch.

To return to the question intended for this post however, let’s focus for a moment on how a best-selling author ‘became’ a writer.

“I’m a Writer”

I was recently listening to best-selling author Jeff Goins being interviewed by podcast host Srini Rao on an episode of the Unmistakable Creative. Although the entire episode (and the podcast in general, frankly) is well worth the listen, my purpose here is to extract the discussion relating to Goins’ own process of becoming a writer. In this episode of the podcast, Goins recalls the moment when he believed he was a writer. He then set about behaving as a writer – essentially, the hard work of showing up every single day just to write. Goins continues by explaining how based upon his belief (“I am writer”) and his behavior (i.e., the practice of writing on a daily basis), he ultimately realized his belief through his actions (behavior) and became a writer. With five, best selling books to his credit, plus a high-traffic-blog property, and I’m sure much more, it’s difficult now to dispute Goins’ claim of being a writer.

Believe. Behave. Become. Sounds like a simple enough algorithm, so in the final section of this post, I’ll apply it to the question posed at the outset – namely:

When do you legitimately get to call yourself a Data Scientist?

I’m a Data Scientist?

I suppose, then, that by direct application of Goins’ algorithm, you can start the process merely by believing you’re a Data Scientist. Of course, I think we all know that that’ll only get you so far, and probably not even to a first interview. More likely, I think that most would agree that we need to have some Data Science chops before we would even entertain such an affirmation – especially in public.

And this is where my Data Science Portfolio enters the picture – in part, allowing me to self-validate, to legitimize whether or not I can call myself a Data Scientist in public without the laughing, choking or winking. What’s interesting though is that in order to work through Goins’ algorithm, engaging in active curation of a Data Science portfolio is causing me to work backwards – making use of hindsight to validate that I have ‘arrived’ as a Data Scientist:

  • Become – Whereas I don’t have best sellers or even a high-traffic blog site to draw upon, I have been able to assemble a variety of relevant artifacts into a Portfolio. Included in the Portfolio are peer-reviewed articles that have appeared in published journals with respectable impact factors. This, for a Data Scientist, is arguably a most-stringent validation of an original contribution to the field. However, chapters in books, presentations at academic and industry events, and so on, also serve as valuable demonstrations of having become a Data Scientist. Though it doesn’t apply to me (yet?), the contribution of code would also serve as a resounding example – with frameworks such as Apache Hadoop, Apache Spark, PyTorch, and TensorFlow serving as canonical and compelling examples.
  • Behave – Not since the time I was a graduate student have I been able to show up every day. However, recognizing the importance of deliberate practice, there have been extended periods during which I have shown up every day (even if only for 15 minutes) to advance some Data Science project. In my own case, this was most often the consequence of holding down a full-time job at the same time – though in some cases, as is evident in the Portfolio, I have been able to work on such projects as a part of my job. Such win-win propositions can be especially advantageous for the aspiring Data Scientist and the organization s/he represents.
  • Believe – Perhaps the most important outcome of engaging in the deliberate act of putting together my Data Science Portfolio, is that I’m already in a much more informed position, and able to make a serious ‘gut check’ on whether or not I can legitimately declare myself a Data Scientist right here and right now.

The seemingly self-indulgent pursuit of developing my own Data Science Portfolio, an engagement of active self-curation, has (quite honestly) both surprised and delighted me; I clearly have been directly involved in the production of a number of artifacts that can be used to legitimately represent myself as ‘active’ in the area of Data Science. The part-time nature of this pursuit, especially since the completion of grad school (though with a few notable exceptions), has produced a number of outcomes that can be diplomatically described as works (still) in progress … and in some cases, that is unfortunate.

Net-net, there is some evidence to support a self-declaration as a Data Scientist – based upon artifacts produced, and implied (though inconsistent) behaviors. However, when asked the question “What do you do?”, I am more likely to respond that:

I am a demonstrably engaged and passionate student of Data Science – an aspiring Data Scientist, per se … one who’s actively working on becoming, behaving and ultimately believing he’s a Data Scientist.

Based on my biases, that’s what I currently feel owing to the very nature of Data Science itself.

Recent Articles on Bright Hub

I’ve added a few more articles over on Bright Hub:

Google Chrome for Linux on Bright Hub: Series Expanded

I recently posted on a new article series on Google Chrome for Linux that I’ve been developing over on Bright Hub. My exploration has turned out to be more engaging than I anticipated! At the moment, there are six articles in the series:

I anticipate a few more …

It’s also important to share that Google Chrome for Linux does not yet exist as an end-user application. Under the auspices of the Chromium Project, however, there is a significant amount of work underway. And because this work is taking place out in the open (Chromiun is an Open Source Project), now is an excellent time to engage – especially for serious enthusiasts.

Introducing Jott for BlackBerry

I just learned about Jott for BlackBerry:

We have a lot of happy Blackberry customers at Jott, and Jott for BlackBerry is the ultimate BlackBerry download. It is a simple, but very powerful tool that will let you reply to emails on your BlackBerry just using your voice – either speaking directly into your BlackBerry, or while wearing a Bluetooth headset. It is seamlessly integrated into the email application you already use, and is a huge leap forward for BlackBerry lovers in three ways: first, it is 3-5 times faster than ‘thumbing’ text; two, you won’t be known for sending just terse replies because you don’t want to thumb type out a normal email message; and three, you will be safer because you won’t have to take your eyes off the road.*
(*Jott does NOT encourage messaging while driving).

Jott for Blackberry makes an already awesome device even better.

The following is the body of a reply I just created:

Thanks for sharing this interesting service with me. It's definitely something that I'm interested in investigating and it's my intention to follow up very very soon. Thanks very much. Bye for now.

Sent with my voice via Jott for Blackberry ~ http://jott.com/bb

To listen: http://www.jott.com/show.aspx?id=e4eb3151-9007-448c-bd73-7de70ecc4766

In this example, the transcription quality was excellent. Note that the recipient is advised that the response was Jott’ed, and has the option of listening to the original audio recording. Nice!

Although I’m only at the testing stage, I expect to make extensive use of Jott for BlackBerry!

Note to Jott and Google: Please enable Jott for BlackBerry in the GMail for BlackBerry application.

Annotation Modeling: To Appear in Comp & Geosci

What a difference a day makes!
Yesterday I learned that my paper on semantic platforms was rejected.
Today, however, the news was better as a manuscript on annotation modeling was
accepted for publication.
It’s been a long road for this paper:

The abstract of the paper is as follows:

Annotation Modeling with Formal Ontologies:
Implications for Informal Ontologies

L. I. Lumb[1], J. R. Freemantle[2], J. I. Lederman[2] & K. D.
Aldridge[2]
[1] Computing and Network Services, York University, 4700 Keele Street,
Toronto, Ontario, M3J 1P3, Canada
[2] Earth & Space Science and Engineering, York University, 4700 Keele
Street, Toronto, Ontario, M3J 1P3, Canada
Knowledge representation is increasingly recognized as an important component of any cyberinfrastructure (CI). In order to expediently address scientific needs, geoscientists continue to leverage the standards and implementations emerging from the World Wide Web Consortium’s (W3C) Semantic Web effort. In an ongoing investigation, previous efforts have been aimed towards the development of a semantic framework for the Global Geodynamics Project (GGP). In contrast to other efforts, the approach taken has emphasized the development of informal ontologies, i.e., ontologies that are derived from the successive extraction of Resource Description Format (RDF) representations from eXtensible Markup Language (XML), and then Web Ontology Language (OWL) from RDF. To better understand the challenges and opportunities for incorporating annotations into the emerging semantic framework, the present effort focuses on knowledge-representation modeling involving formal ontologies. Although OWL’s internal mechanism for annotation is constrained to ensure computational completeness and decidability, externally originating annotations based on the XML Pointer Language (XPointer) can easily violate these constraints. Thus the effort of modeling with formal ontologies allows for recommendations applicable to the case of incorporating annotations into informal ontologies.

I expect the whole paper will be made available in the not-too-distant future …

Evolving Semantic Frameworks into Platforms: Unpublished ms.

I learned yesterday that the manuscript I submitted to HPCS 2008 was not accepted 😦
It may take my co-authors and I some time before this manuscript is revised and re-submitted.
This anticipated re-submission latency, along with the fact that we believe the content needs to be shared in a timely fashion, provides the motivation for sharing the manuscript online.
To whet your appetite, the abstract is as follows:

Evolving a Semantic Framework into a Network-Enabled Semantic Platform
A data-oriented semantic framework has been developed previously for a project involving a network of globally distributed scientific instruments. Through the use of this framework, the semantic expressivity and richness of the project’s ASCII data is systematically enhanced as it is successively represented in XML (eXtensible Markup Language), RDF (Resource Description Formal) and finally as an informal ontology in OWL (Web Ontology Language). In addition to this representational transformation, there is a corresponding transformation from data into information into knowledge. Because this framework is broadly applicable to ASCII and binary data of any origin, it is appropriate to develop a network-enabled semantic platform that identifies the enabling semantic components and interfaces that already exist, as well as the key gaps that need to be addressed to completely implement the platform. After briefly reviewing the semantic framework, a J2EE (Java 2 Enterprise Edition) based implementation for a network-enabled semantic platform is provided. And although the platform is in principle usable, ongoing adoption suggests that strategies aimed at processing XML via parallel I/O techniques are likely an increasingly pressing requirement.

Injury Time: Remembrance of Things Just Passed …

What happened?

I strained my lower back. Badly. It was the result of two careless acts: 
  1. Lifting a heavy prop awkwardly at our annual Mardi Gras event. I felt a twinge of pain, and suspect that this predisposed my back towards injury.
  2. Attempting to leave a leg-press machine before completely releasing the 220 lbs of weight that I, back included, was still supporting. 
The pivotal incident (involving the leg-press machine) happened last Wednesday at the University’s athletic complex. Of course, I finished my hamstring curls and rowed for 20 minutes before calling it a day. Doh! 
And yes, I knew then that I was in deep trouble. 
Contrary to my spouse’s advice, I hauled my sorry self off to the University the next day, because I had things that must be done. Doh! With a notable, curvaceous list (upwards to the left), and walking speed 10-20% my normal, I can honestly state that I got a deeper appreciation of what it means to be differently abled. People rushing past me, icy walkways, plus doors stiff to open, were all-of-a-sudden on my radar. 
I barely made it through that Thursday.
I started my formal convalescence (aka. sensible acknowledgement of my predicament) on Friday morning. 

What did I do?
I convalesced. At this point, I had no choice! I took muscle relaxant and installed myself upon a heating pad. Save for attending to primal bodily functions, and attempting to do a few exercises I learned in physio that last time I strained my lower back, I remained in a sub-horizontal state through the entire weekend. I had to pass on a friend’s birthday party and a ski day 😦  
But, I:

I fretted. About work – not being there, work piling up, etc. And about my exercise routine – that picked me up, and then knocked me down! I communed with my family – when they weren’t making up for my shortfalls – and with our pets (three cats and an obnoxiously vocal husky).


What did I learn?
How good people are to me. From walking the dog to driving Miss Daisy (our teenage princess to/from dance/work/friends/etc.) to countless other things I normally do, my family filled the gaps and still had some energy left over for me in my supine state. When I did hobble into the office, I received all kinds of moral and physical support from my co-workers.
In addition to valuing my health, which I’ve been consistently better at for about the past seven months, I need to be careful – especially during acts of weekend heroism (aka. attempts at being handy) and/or exercise (technique and form do matter – ouch!).
I need to allocate more time for reading. All kinds of reading. Because I really don’t watch TV, except for NFL football, there’s nothing I can do there. Reducing the amount of time I spend handling email is about the only place I believe I can claw back from. In the 4-Hour Work Week, Timothy Ferriss presents some provocative suggestions on this front; I’d better re-read that! 
The BlackBerry is a wonderfully powerful platform that suits me when I am highly mobile, but also when I’m highly immobile – like flat on my back, literally! It’s also the only device my back can actually handle me moving around with at the moment – my laptop in an over-the-shoulder case is a non-starter for me in my current condition.

Why did I share this?
So that I have something to refer back to (sorry), when I’m getting careless!  

Book Review: BlackBerry Enterprise Server for Microsoft Exchange

Packt Publishing claims its

… unique business model allows [them] to bring [us] more focused information, giving [us] more of what [we] need to know, and less of what [we] don’t.  

If Desai & Renfroe’s BlackBerry Enterprise Server for Microsoft Exchange: Installation and Administration is any indication, Packt actually lives up to its claim. In just 172 pages, Desai & Renfroe achieve an enviable balance between being concise and being comprehensive. This statement applies as much to what the authors have written, as to their choice of what to illustrate. Specifically:

  • Chapter 1 places the BlackBerry Enterprise Server (BES) in the broader context of Research In Motion’s (RIM) BlackBerry universe. In addition to itemizing relevant components, an introduction to the BlackBerry’s push model, security and Internet connectivity is provided. 
  • Though brief, Chapter 2 runs deep in addressing BES architecture and implementation planning. For example, we learn that the BES employs a modular architecture comprising over a dozen components. After succinctly enumerating the components and their function, BES requirements and prerequisites are identified. In addition to hardware and software requirements, recommendations are made with respect to networking your BES (e.g., firewall and/or proxy considerations) and providing it with a database. Easy to gloss over on first read are thoughtful recommendations on sizing the BES (including pointers to resources from RIM) and the database for the anticipated user load. 
  • Before BES components can be installed and enabled, the messaging environment and database server need to be configured. This is the subject of Chapter 3. Both local and remote database instances receive attention. Because each step is well illustrated, the book delivers on its intended purpose of serving as a solution guide.
  • The installation of the BES is a multistep process enabled via a wizard. As in the previous chapter, in Chapter 4 the authors guide the reader through this process making appropriate use of illustrations. They interject appropriate commentary, and are clear on out-of-scope topics. The early emphasis on delineating BES architecture (Chapter 1) is realized as the authors transition the reader through the BES installation. 
  • Of course, installing the BES is just the beginning, and therefore the next few chapters focus on the additional tasks required to operationally deliver this service to its users. After introducing the six permissible levels of administrative role on the BES, attention shifts in Chapter 5 to the matter of provisioning users, groups and devices. And with respect to devices, wireline and wireless options for provisioning are given consideration. 
  • The BES ships with over 200 policies that can be applied variously to users, groups and devices. Also covered in Chapter 6 is the topic of provisioning software from RIM and third parties. Of particular value is the authors’ example of a software bundle targeted to a particular BlackBerry model. The ability to administer users, groups and devices with respect to policies (including software), from a single point of control (i.e., the BES server), speaks volumes to the appeal and value that this offering can deliver to corporate enterprise environments. This Chapter’s treatment of policies and software provisioning serves as an excellent introduction to topics BES administrators will return to repeatedly, and likely with increasing degrees of sophistication. 
  • Unlike many of the other chapters, Chapter 7 provides only an overview of multitiered administration – i.e., properties and tasks relating to users, groups, (BlackBerry) domains and servers. This enumeration of possibilities, presented in context, works effectively. 
  • A deeper discussion on security is the focus of the first part of the final chapter (Chapter 8). Encryption and authorization, both of which receive detailed consideration, amplify the value of the BES and its context in the overall BlackBerry universe for corporate enterprises. An unanticipated treatment of disaster recovery closes Chapter 8. In sufficient detail to enable a solution, the authors discuss in turn the measures needed to ensure that both the server (the BES) and its data (housed by the BES’s local or remote database) are readied for a disaster situation. 

 

Although Desai and Renfroe’s BES book unapologetically targets the Microsoft Exchange environment, its value is not limited here. Those working in other environments, and those interested in learning more about BES’s place in the BlackBerry universe, will almost certainly derive value from this book. Because the book is clear and concise, yet surprisingly complete and well-organized, it is likely to be well-thumbed by BES administrators of varying expertise.  

With the possible exception of a concluding chapter, page, paragraph or even sentence(!), to provide some sense of closure to the book, I am at a loss to report any omissions, oversights or errors. And although they might be better suited for a follow-on contribution of some kind, additional discussion might be given to topics such as performance and scalability (e.g., of local versus remote databases), the mapping of BlackBerry domains to organizational units, and/or improved degrees of DR.

Even though the book I reviewed was a complimentary copy provided by the publisher, I would happily pay for my own copy, and heartily recommend this book to others having interest in BES installation and administration. 

Jott Announces Local Numbers for Canada!

The following message just arrived in my inbox:

From: “Jott Networks”
Date: February 13, 2008 1:39:32 PM GMT-05:00
To: ian DOT lumb AT gmail DOT com
Subject: Canadian Local Numbers Announced
Reply-To: feedback AT jott DOT comHi everyone,

We are happy to announce that Canadian local numbers are finally here!

As most of you know, we have had a Toronto Jott number (647-724-5814) for some time and have been working on acquiring more local numbers across Canada.

Still confused as to why we are not releasing a toll free number? Jott requires caller ID to know who is sending a Jott to what contact information. In an effort to protect your privacy, most Canadian mobile providers have blocked caller ID information from being passed to toll free numbers. This leaves the alternative of using local access numbers across the country, so that everyone can send Jott messages without having to pay long distance fees.

Below is the list of available Jott numbers in Canada. Find the number in your area code and program it to your speed dial today!

AURORA : +12898020110
CALGARY : +14037751288
EDMONTON : +17806287799
HALIFAX : +19024828120
HAMILTON : +19054819060
KITCHENER : +15199572711
LONDON : +15194898968
MARKHAM : +12898000110
MONTREAL : +15146670329
OTTAWA : +16136861502
QUEBEC CITY : +14189072209
SAINT JOHNS : +17097570047
SHERBROOKE : +18193401636
TORONTO : +16477245365
TORONTO : +14168001067
VANCOUVER : +17787868229
VANCOUVER : +16044841347
VICTORIA : +12509847093
WINDSOR : +15198000031
WINNIPEG : +12042728154

Brought to you by Jott.com – 1-866-JOTT-123
300 East Pike Street, Suite 200
Seattle, WA 98122
Click here to stop receiving emails like this.

This is excellent news!

And just in case this is your first visit to my blog, you can read other Jott-related posts here.

Google Blogging 2007: From Legitimizing Blogs to Wikipedia-Competitor Google Knol

There’s a recent, year-in-review entry by the Google blogging team.Not only does this entry highlight another wonderful year for Google, it also quantitatively places blogging in perspective. If you ever had any doubts as to the legitimacy of blogging, just read this post.Amongst the highlights I found the announcement of the Knol test project to be of interest. Although I’m a huge fan of knowledge representation and management, especially in the context of the Semantic Web, I must confess to being confused by Knol. At the most-basic level, Knol seems to be about knowledge sharing. And more-specifically, providing jumping off points (from search-engine hits) for those seeking to understand some topic.Therefore, I can’t help but ask, is there more to Knol than it’s Google’s competitive answer to Wikipedia?If you happen to drop by my blog, and this post, please feel free to share your take on Knol.What am I missing?