Been writing … just not here!

Things have been far too quiet over here …

However, the good news is that I’ve been making contributions on a regular basis over at Sylabs – in fact, quite a lot of contributions!

As Sylabs’ focus is container science, most of the posts have something to do with Singularity containers. I encourage you to have a gander here.

I’ve also updated my cloud computing portfolio to reflect some of these contributions over here as well.

Enjoy!

Disclosures Regarding My Portfolios: Attributing the Contributions of Others

‘Personal’ Achievement?

October 8, 2018 was an extremely memorable night for Drew Brees at the Mercedes-Benz Superdome in New Orleans. Under the intense scrutiny of Monday Night Football, the quarterback of the New Orleans Saints became the leading passer in the history of the National Football League. (For those not familiar with this sport, you can think of his 72,103-yard milestone as a lifetime-achievement accomplishment of ultramarathon’ic proportions.) The narrative on Brees’ contributions to ‘the game’ are anything but complete. In fact, the longer he plays, the more impressive this milestone becomes, as he continues to place distance between himself and every other NFL QB.

Of course the record books, and Brees’ inevitable induction into the Pro Football Hall of Fame, will all position this as an individual-achievement award. Whenever given the opportunity to reflect upon seemingly personal achievements such as the all-time passing leader, Brees is quick to acknowledge those who have enabled him to be so stunningly successful in such a high-profile, high-pressure role – from family and friends, to teammates, coaches, and more.

As I wrote about another NFL quarterback in a recent post, like Tom Brady, Brees remains a student-of-the-game. He is also known for his off-the-field work ethic that he practices with the utmost intensity in preparing for those moments when he takes the main stage along with his team. Therefore, when someone like Brees shares achievements with those around him, it’s clearly an act that is sincerely authentic.

Full Disclosure

At the very least, self-curating and sharing in public some collection of your work has more than the potential to come across as an act of blatant self-indulgence – and, of course, to some degree it is! At the very worst, however, is the potential for such an effort to come across as a purely individual contribution. Because contribution matters so much to me personally, I wanted to ensure that any portfolio I self-curate includes appropriate disclosures; disclosures that acknowledge the importance of collaboration, opportunity, support, and so on, from my family, friends and acquaintances, peers and co-workers, employers, customers and partners, sponsors, and more. In other words, and though in a very different context, like Brees I want to ensure that what comes across as ‘My Portfolio’ rightly acknowledges that this too is a team sport.

In the interests of generic disclosures then, the following is an attempt to ensure the efforts of others are known explicitly:

  • Articles, book chapters and posters – Based on authorships, affiliations and acknowledgements, portfolio artifacts such as articles, book chapters and posters make explicit collaborators, enablers and supporters/influencers, respectively. In this case, there’s almost no need for further disclosure.
  • Blog posts – Less formal than the written and oral forms of communication already alluded to above and below, it’s through the words themselves and/or hyperlinks introduced that the contributions of others are gratefully and willingly acknowledged. Fortunately, it is common practice for page-ranking algorithms to take into account the words and metadata that collectively comprise blog posts, and appropriately afford Web pages stronger rankings based upon these and other metrics.
  • Presentations – My intention here is to employ Presentations as a disclosure category for talks, webinars, workshops, courses, etc. – i.e., all kinds of oral communications that may or may not be recorded. With respect to this category, my experience is ‘varied’ – e.g., in not always allowing for full disclosure regarding collaborators, though less so regarding affiliations. Therefore, to make collaborators as well as supporters/influencers explicit, contribution attributions are typically included in the materials I’ve shared (e.g., the slides corresponding to my GTC17 presentation) and/or through the words I’ve spoken. Kudos are also warranted for the organizations I’ve represented in some of these cases as well, as it has been a byproduct of this representation that numerous opportunities have fallen into my lap – though often owing to a sponsorship fee, to be completely frank. Finally, sponsoring organizations are also deserving of recognition, as it is often their mandate (e.g., a lead-generation marketing program that requires a webinar, a call for papers/proposals) that inspires what ultimately manifests itself as some artifact in one of my portfolios; having been on the event-sponsor’s side more than a few times, I am only too well aware of the effort involved in creating the space for presentations … a contribution that cannot be ignored.

From explicit to vague, disclosures regarding contribution are clearly to barely evident. Regardless, for those portfolios shared via my personal blog (Data Science Portfolio and Cloud Computing Portfolio), suffice it to say that there were always others involved. I’ve done my best to make those contributions clear, however I’m sure that unintentional omissions, errors and/or (mis)representations exist. Given that these portfolios are intentionally positioned and executed as works-in-progress, I look forward to addressing matters as they arise.

Ian Lumb’s Cloud Computing Portfolio

When I first introduced it, it made sense to me (at the time, at least!) to divide my Data Science Portfolio into two parts; the latter part was “… intended to showcase those efforts that have enabled other Data Scientists” – in other words, my contributions as a Data Science Enabler.

As of today, most of what was originally placed in that latter part of my Data Science Portfolio has been transferred to a new portfolio – namely one that emphasizes Cloud computing. Thus my Cloud Computing Portfolio is a self-curated, online, multimedia effort intended to draw together into a cohesive whole my efforts in Cloud computing; specifically this new Portfolio is organized as follows:

  • Strictly Cloud – A compilation of contributions in which Cloud computing takes centerstage
  • Cloud-Related – A compilation of contributions drawn from clusters and grids to miscellany. Also drawn out in this section, however, are contributions relating to containerization.

As with my Data Science Portfolio, you’ll find in my Cloud Computing Portfolio everything from academic articles and book chapters, to blog posts, to webinars and conference presentations – in other words, this Portfolio also lives up to its multimedia billing!

Since this is intentionally a work-in-progress, like my Data Science Portfolio, feedback is always welcome as there will definitely be revisions applied !

Revisiting the Estimation of Fractal Dimension for Image Classification

Classification is a well-established use case for Machine Learning. Though textbook examples abound, standard examples include the classification of email into ham versus spam, or images of cats versus dogs.

Circa 1994, I was unaware of Machine Learning, but I did have a use case for quantitative image classification. I expect you’re familiar with those brave souls known as The Hurricane Hunters – brave because they explicitly seek to locate the eyes of hurricanes using an appropriately tricked out, military-grade aircraft. Well, these hunters aren’t the only brave souls when it comes to chasing down storms in the pursuit of atmospheric science. In an effort to better understand Atlantic storms (i.e., East Coast, North America), a few observational campaigns featured aircraft flying through blizzards at various times during Canadian winters.

In addition to standard instrumentation for atmospheric and navigational observables, these planes were tricked out in an exceptional way:

For about two-and-a-half decades, Knollenberg-type [ref 4] optical array probes have been used to render in-situ digital images of hydrometeors. Such hydrometeors are represented as a two-dimensional matrix, whose individual elements depend on the intensity of transmitted light, as these hydrometeors pass across a linear optical array of photodiodes. [ref 5]

In other words, the planes were equipped with underwing optical sensors that had the capacity to obtain in-flight images of

hydrometeor type, e.g. plates, stellar crystals, columns, spatial dendrites, capped columns, graupel, and raindrops. [refs 1,7]

(Please see the original paper for the references alluded to here.)

Even though this is hardly a problem in Big Data, a single flight might produce tens to hundreds to thousands of hydrometeor images that needed to be manually classified by atmospheric scientists. Working for a boutique consultancy focused on atmospheric science, and having excellent relationships with Environment Canada scientists who make Cloud Physics their express passion, an opportunity to automate the classification of hydrometeors presented itself.

Around this same time, I became aware of fractal geometrya visually arresting and quantitative description of nature popularized by proponents such as Benoit Mandlebrot. Whereas simple objects (e.g., lines, planes, cubes) can be associated with an integer dimension (e.g., 1, 2 and 3, respectively), objects in nature (e.g., a coastline, a cloud outline) can be better characterized by a fractional dimension – a real-valued fractal dimension that lies between the integer value for a line (i.e., 1) and the two-dimensional (i.e., 2) value for a plane.

Armed with an approach for estimating fractal dimension then, my colleagues and I sought to classify hydrometeors based on their subtle to significant geometrical expressions. Although the idea was appealing in principle, the outcome on a per-hydrometeor basis was a single, scalar result that attempted to capture geometrical uniqueness. In isolation, this approach was simply not enough to deliver an automated scheme for quantitatively classifying hydrometeors.

I well recall some of the friendly conversations I had with my scientific and engineering peers who attended the conference at Montreal’s Ecole Polytechnique. Essentially, the advice I was given, was to regard the work I’d done as a single dimension of the hydrometeor classification problem. What I really needed to do was develop additional dimensions for classifying hydrometeors. With enough dimensions then, the resulting multidimensional classification scheme would be likely to have a much-better chance of delivering the automated solution sought by the atmospheric scientists.

In my research, fractal dimensions were estimated using various algorithms; they were not learned. However, they could be – as is clear from the efforts of others (e.g., the prediction of fractal dimension via Machine Learning). And though my pursuit of such a suggestion will have to wait for a subsequent research effort, a learned approach might allow for the introduction of much more of a multidimensional scheme for quantitative classification of hydrometeors via Machine Learning. Of course, from the hindsight of 2018, there are a number possibilities for quantitative classification via Machine Learning – possibilities that I fully expect would result in more useful outcomes.

Whereas fractals don’t receive as much attention these days as they once did, and certainly not anything close to the deserved hype that seems to pervade most discussions of Machine Learning, there may still be some value in incorporating their ability to quantify geometry into algorithms for Machine Learning. From a very different perspective, it might be interesting to see if the architecture of deep neural networks can be characterized through an estimation of their fractal dimension – if only to tease out geometrical similarities that might be otherwise completely obscured.

While I, or (hopefully) others, ponder such thoughts, there is no denying the stunning expression of the fractal geometry of nature that fractals have rendered visual.

Data Science: Celebrating Academic Personal Bias

Data Science: Celebrating My Academic Bias

In a recent post, I introduced my Data Science Portfolio. After describing the high-level organization of the Portfolio, I noted:

At the end, and for now, there is a section on my academic background – a background that has shaped so much of those intersections between science and technology that have been captured in the preceding sections of the portfolio.

Even in this earliest of drafts, I knew that I was somewhat uncomfortable with a section dedicated to academics in my Portfolio. After all shouldn’t a portfolio place more emphasis on how my knowledge and skills, academic or otherwise, have been applied to produce some tangible artifact?

Upon further reflection, I currently believe what’s material in the context of a portfolio is some indication of the bias inherent in the resulting curated showcase of one’s work. Of course to some degree the works presented, and the curation process itself, will make self-evident such personal bias.

Whereas it may make sense for an artist not to overtly disclose any bias with respect to their craft, or a curated collection their work, I currently perceive absolutely no downside in sharing my personal bias – a bias that in my own case, I believe reflects only in positive ways on the Portfolio as well as the individual items included in it.

To this end, and in the spirit of such a positive self-disclosure, my personal bias reflects my formative years in science – a background to which I well recall significant contributions from high school, that were subsequently broadened and deepened as an undergraduate and then graduate student. Even more specifically in terms of personal bias was my emphasis on the physical sciences; a bias that remains active today.

As I’ve started to share, through such posts as the one on the mathematical credentials I bring to Data Science, my choice to pursue the physical sciences was an excellent one – even through the self-critical lens of personal hindsight. An excellent choice, but albeit a biased one.

The very nature of Data Science is such that each of us carries with us our own, wonderfully unique personal bias. As we necessarily collaborate in team, project and organizational settings, I believe it’s important to not only ensure each of us preserves their personal bias, but that we leverage this perspective as fully and appropriately as possible. As a consequence it is much more likely that everyone we work with, and everything we work on, will derive maximal value.

ORION Summit 2011: Selected Takeaways

In no specific order, or attempt to be comprehensive, the following are my selected takeaways from the ORION Summit 2011:

  • 100G network boasts a series of firsts for ORION – ORION is the first R&E network anywhere in the world, and the first network of any kind in Canada, to provide a bandwidth of 100 Gb/s. At the moment, a 100G segment stretches from St. Catharines to London. However, by the end of June 2011, it is projected that the bulk of ORION will be enabled for the 100G capacity. Membership has its privileges.
  • IP is not a businessCFI CEO Gilles Patry stated (and I’m paraphrasing here) that an idea is not an invention, an invention is not a product, and a product is not a business. (I didn’t capture his source, sorry.) Patry’s remarks resonated well with author Peter Nowak‘s suggestion of tracking innovation by the licensing of patents – as opposed to tracking the patents themselves. In other words, both presenters effectively made the point that Intellectual Property (IP) does not a product or business make. For a strikingly compelling illustration, see “Biomedical research has a Long Tail … and it ain’t pretty!” below.
  • Purdue is filling in the gaps – Purdue is enabling teaching and learning via mobile technologies. Their sensible approach is to leverage apps students are already using on their mobile devices, and then focus their attention on better enabling teachers and learners for creating, locating, communicating and connecting via handhelds. Their impressive and growing Studio is worth exploring.
  • The five differentiators of GenY – 10-30 year olds comprise GenY. According to author Daneal Charney, this generation is characterized by the following five differentiators: GenYs are digitally savvy; GenYs are non-hierarchical; GenYs are super-collaborative; GenYs are 24/7; and finally, GenYs are highly educated. Based on teaching over 200 GenYs for the past academic year at York, living with a few GenYs, and working with a few GenYs, I believe I can state numerous examples to substantiate Charney’s characterization.
  • Pencasts deserve attention – Teacher Zoe Branigan-Pipe illustrated just how powerful a pencast could be. Her example was based on use of a smartpen from LiveScribe. I can easily see ways I can leverage this simple technology in powerful ways in many contexts.
  • CANARIE is DAIRing to drive innovation – In their words: “DAIR is an advanced R & D environment — a `digital sandbox’ — where high-tech innovators can design, validate, prototype and demonstrate new technologies.” DAIR targets tech-oriented SMEs.
  • Is a Top 40 billing of any value? – CFI CEO Gilles Patry stated that UofT, McGill, UBC and UWaterloo made a global Top 40 ranking of the best undergraduate programs in computer science. Later in the day, Queens Ph.D. student Rob Horgan reminded us that such rankings may comprise hollow praise. For example, if your programme produces excellent graduates, does it really matter what its ranking against some contrived set of metrics actually is? I tend to agree.
  • P2PU – The acronym is self-expressive. In about a week’s time, OISE Ph.D. student Stian Haklev will start an introductory course on Computer Supported Collaborative Learning. Both the course, and the teaching/learning model, are worth investigating.
  • Biomedical research has a Long Tail … and it ain’t pretty! – Dr. Aled Edwards described the systemic realities of biomedical research as it stands today: Biomedical researchers are driven towards extremely conservative investigations that focus on known targets that are most likely to produce results – extremely incremental results. (Reminder from tompeters!: Incrementalism spells death for innovation.) Considering that the Human Genome Project mapped some 20,000 genes, this emphasis on a relatively small subset of the same (i.e., the meaty part of the Long Tail) is especially concerning – especially concerning to anyone who cares about research that should ultimately lead to drug discovery. So, what’s Edwards’ solution: Share results, as soon as they become available, in public – without restriction! That means that academics, behemoth pharmas and other partners, start together on a level playing field. In other words, use coopetition to drive innovation. (IP is not a business!) And that’s precisely what his Structural Genomics Consortium does. By placing attention on the 17K genes that would be otherwise neglected, perhaps SGC will help to make that tail of biomedical research a little less long … a little less meaty at its base … and therefore a little less ugly.

If you attended the Summit and have your own takeaways to share, or have comments on this post, please feel free to add your $0.02 via a comment.

2010 in review

The stats helper monkeys at WordPress.com mulled over how this blog did in 2010, and here’s a high level summary of its overall blog health:

Healthy blog!

The Blog-Health-o-Meter™ reads This blog is on fire!.

Crunchy numbers

Featured image

About 3 million people visit the Taj Mahal every year. This blog was viewed about 38,000 times in 2010. If it were the Taj Mahal, it would take about 5 days for that many people to see it.

 

In 2010, there were 2 new posts, growing the total archive of this blog to 206 posts.

The busiest day of the year was January 6th with 195 views. The most popular post that day was sync blackberry contacts with gmail: Problem Solved!.

Where did they come from?

The top referring sites in 2010 were google.com, ifreestores.com, search.conduit.com, google.co.uk, and bigextracash.com.

Some visitors came searching, mostly for blackberry contacts to gmail, import blackberry contacts to gmail, import contacts from blackberry to gmail, sync blackberry contacts with gmail, and how to import blackberry contacts to gmail.

Attractions in 2010

These are the posts and pages that got the most views in 2010.

1

sync blackberry contacts with gmail: Problem Solved! March 2008
15 comments

2

sync blackberry contacts with gmail January 2007
21 comments

3

Microsoft Word: A Tool for Annotation August 2006
7 comments

4

GMail on Your BlackBerry: Latest Client Offers Significant Featur March 2008
7 comments

5

Teens: A Surprising Market Segment for the iPhone April 2007
2 comments