Data Science: Identifying My Professional Bias

Data Science: Identifying My Professional Bias

In the Summer of 1984, I arrived at Toronto’s York University as a graduate student in Physics & Astronomy. (Although my grad programme was Physics & Astronomy, my research emphasized the application of fluid dynamics to Earth’s deep interior.) Some time after that, I ran my first non-interactive computation on a cluster of VAX computers. I’m not sure if this was my first exposure to Distributed Computing or not not; I am, however, fairly certain that this was the first time it (Distributed Computing) registered with me as something exceedingly cool, and exceedingly powerful.

Even back in those days, armed with nothing more than a VT100 terminal ultimately connected to a serial interface on one of the VAXes, I could be logged in and able to submit a computational job that might run on some other VAX participating in the cluster. The implied connectedness, the innate ability to make use of compute cycles on some ‘remote’ system was intellectually intoxicating – and I wasn’t even doing any parallel computing (yet)!

More than a decade later, while serving in a staff role as a computer coordinator, I became involved in procuring a modest supercomputer for those members of York’s Faculty of Pure & Applied Science who made High Performance Computing (HPC) a critical component of their research. If memory serves me correctly, this exercise resulted in the purchase of a NUMA-architecture system from SGI powered by MIPS CPUs. Though isolated initially, and as a component of the overall solution, Platform LSF was included to manage the computational workloads that would soon consume the resources of this SGI system.

The more I learned about Platform LSF, the more I was smitten by the promise and reality of Distributed Computing – a capability to be leveraged from a resource-centric perspective with this Load Sharing Facility (LSF). [Expletive deleted], Platform founder Songnian Zhou expressed the ramifications of his technical vision for this software as Utopia in a 1993 publication. Although buying the company wasn’t an option, I did manage to be hired by Platform, and work there in various roles for about seven-and-a-half years.

Between my time at Platform (now an IBM company) and much-more recently Univa, over a decade of my professional experience has been spent focused on managing workloads in Distributed Computing environments. From a small handful of VAXes, to core counts that have reached 7 figures, these environments have included clusters, grids and clouds.

My professional bias towards Distributed Computing was further enhanced through the experience of being employed by two software vendors who emphasized the management of clusters – namely Scali (Scali Manage) and subsequently Bright Computing (Bright Cluster Manager). Along with Univa (Project Tortuga and Navops Launch), Bright extended their reach to the management of HPC resources in various cloud configurations.

If it wasn’t for a technical role at Allinea (subsequently acquired by ARM), I might have ended up ‘stuck in the middle’ of the computational stack – as workload and cluster management is regarded by the HPC community (at least) as middleware … software that exists between the operating environment (i.e., the compute node and its operating system) and the toolchain (e.g., binaries, libraries) that ultimately support applications and end users (e.g., Figure 5 here).

Allinea’s focuses on tools to enable HPC developers. Although they were in the process of broadening their product portfolio to include a profiling capability around the time of my departure, in my tenure there the emphasis was on a debugger – a debugger capable of handling code targeted for (you guessed it) Distributed Computing environments.

Things always seemed so much bigger when we were children. Whereas Kid Ian was impressed by a three-node VAX cluster, and later ‘blown away’ by a modest NUMA-architecture ‘supercomputer’, Adult Ian had the express privilege of running Allinea DDT on some of the largest supercomputers on the planet (at the time) – tracking down a bug that only showed up when more than 20K cores were used in parallel on one of Argonne’s Blue Genes, and demonstrating scalable, parallel debugging during a tutorial on some 700K cores of NCSA’s Blue Waters supercomputer. In hindsight, I can’t help but feel humbled by this impressive capability of Allinea DDT to scale to these extremes. Because HPC’s appetite for scale has extended beyond tera and petascale capabilities, and is eyeing seriously the demand to perform at the exascale, software like Allinea DDT needs also to match this penchant for extremely extreme scale.

At this point, suffice it to say that scalable Distributed Computing has been firmly encoded into my professional DNA. As with my scientifically based academic bias, it’s difficult not to frame my predisposition towards Distributed Computing in a positive light within the current context of Data Science. Briefly, it’s a common experience for the transition from prototype-to-production to include the introduction of Distributed Computing – if not only to merely execute applications and/or their workflows on more powerful computers, but perhaps to simultaneously scale these in parallel.

I anticipate the need to return to this disclosure regarding the professional bias I bring to Data Science. For now though, calling out the highly influential impact Distributed Computing has had on my personal trajectory, appears warranted within the context of my Data Science Portfolio.

Incorporate the Cloud into Existing IT Infrastructure => Progress ( Life Sciences )

I still have lots to share after recently attending Bio-IT World in  Boston … The latest comes as a Bright Computing contributed article to the April 2013 issue of the IEEE Life Sciences Newsletter.

The upshot of this article is:

Progress in the Life Sciences demands extension of IT infrastructure from the ground into the cloud.

Feel free to respond here with your comments.

Over “On the Bright side …” Cloud Use Cases from Bio-IT World

As some of you know, I’ve recently joined Bright Computing.

Last week, I attended Bio-IT World 2013 in Boston. Bright had an excellent show – lots of great conversations, and even an award!

During numerous conversations, the notion of extending on-site IT infrastructure into the cloud was raised. Bright has an excellent solution for this.

What also emerged during the conversations were two uses for this extension of local IT resources via the cloud. I thought this was worth capturing and sharing. You can read about the use cases I identified over “On the Bright side …

Pencasting During Lectures in Large Venues

In a recent post on pencasting as a way of teaching/learning weather and climate, I stated:

Monday (October 1, 2012), I intend to use a pencast during my lecture – to introduce aspects of the stability of Earth’s atmosphere. I’ll try to share here how it went. For this intended use of the pencast, I will use a landscape mode for presentation – as I expect that’ll work well in the large lecture hall I teach in. I am, however, a little concerned that the lines I’ll be drawing will be a little too thin/faint for the students at the back of the lecture theatre to see …

I followed through as advertized (above) earlier today.

Image

My preliminary findings are as follows:

  • The visual aspects of the pencast are quite acceptable – This is true even in large lecture halls such as the 500-seat Price Family Cinema at York University (pictured above) in Toronto, Canada where I am currently teaching. I used landscape mode for today’s pencast, and zoomed it in a little. A slightly thicker pen option would be wonderful for such situations … as would different pen colours (the default is green).
  • The audio quality of the pencasts is very good to excellent – Although my Livescribe pen came with a headset/microphone, I don’t use it. I simply use the built-in microphone on the pen, and speak normally when I am developing pencasts. Of course, the audio capabilities of the lecture hall I teach in are most excellent for playback!
  • One-to-many live streaming of pencasts works well – I streamed live directly from myLivescibe today. I believe the application infrastructure is based largely on Adobe Flash and various Web services delivered by Web Objects. Regardless of the technical underpinnings, live streaming worked well. Of course, I could’ve developed a completely self-contained PDF file, downloaded this, and run the pencast locally using Adobe Reader.
  • Personal pencasting works well – I noticed that a number of students were streaming the pencast live for themselves during the lecture. In so doing, they could control interaction with the pencast.

Anecdotally, a few students mentioned that they appreciated the pencast during the break period – my class meets once per for a three-hour session.

Although I’ve yet to hear this feedback directly from the students, I believe I need to:

  • Decrease the duration of pencasts – Today’s lasts about 10 minutes
  • Employ a less-is-more approach/strategy – My pencasts are fairly involved when done …
  • Experiment with the right balance of speaking to penning (is that even a word!?) – Probably a less-is-more approach/strategy would work well here for both the penned and spoken word …

Finally, today’s pencast on the basics of atmospheric stability:

  • Previous approach – Project an illustration taken directly from the course’s text. This is a professionally produced, visually appealing, detailed, end-result, static diagram that I embedded in my presentation software (I use Google Docs for a number of reasons.) Using a laser pointer, my pedagogy called for a systematic deconstruction this diagram – hoping that the students would be engaged enough to actually follow me. Of course, in the captured versions of my lectures, the students don’t actually see where I’m directing the laser pointer. The students have access to the course text and my lecture slides. I have no idea if/how they attempt to ingest and learn from this approach.
  • Pencasting – As discussed elsewhere, the starting point is a blank slate. Using the pencasting technology, I sketch my own rendition of the illustration from the text. As I build up the details, I explain the concept of stability analyses. Because the sketch appears as I speak, the students have the potential to follow me quite closely – and if they miss anything, they can review the pencast after class at their own pace. The end result of a pencast is a sketch that doesn’t hold a candle to the professionally produced illustration provided in the text and my lecture notes. However, to evaluate the pencast as merely a final product, I believe, misses the point completely. Why? I believe the pencast is a far superior way to teach and to learn in situations such as this one. Why? I believe the pencast allows the teacher to focus on communication – communication that the learner can also choose to be highly receptive to, and engaged by.

I still regard myself as very much a neophyte in this arena. However, as the above final paragraphs indicate, pencasting is a disruptive innovation whose value in teaching/learning merits further investigation.

Teaching/Learning Weather and Climate via Pencasting

I first heard about it a few years ago, and thought it sounded interesting … and then, this past Summer, I did a little more research and decided to purchase a Livescribe 8 GB Echo(TM) Pro Pack. Over the Summer, I took notes with the pen from time-to-time and found it to be somewhat useful/interesting.

Just this week, however, I decided it was time to use the pen for the originally intended purpose: Making pencasts for the course I’m currently teaching in weather and climate at Toronto’s York University. Before I share some sample pencasts, please allow me to share my findings based on less than a week’s worth of `experience’:

  • Decent-quality pencasts can be produced with minimal effort – I figured out the basics (e.g., how to record my voice) in a few minutes, and started on my first pencast. Transferring the pencast from the pen to the desktop software to the Web (where it can be shared with my students) also requires minimal effort. “Decent quality” here refers to both the visual and audio elements. The fact that this is both a very natural (writing with a pen while speaking!) and speedy (efficient/effective) undertaking means that I am predisposed towards actually using the technology whenever it makes sense – more on that below. Net-net: This solution is teacher-friendly.
  • Pencasts compliment other instructional media – This is my current perspective … Pencasts compliment the textbook readings I assign, the lecture slides plus video/audio captures I provide, the Web sites we all share, the Moodle discussion forums we engage in, the Tweets I issue, etc. In the spirit of blended learning it is my hope that pencasts, in concert with these other instructional media, will allow my TAs and I to `reach’ most of the students in the course.
  • Pencasts allow the teacher to address both content and skills-oriented objectives – Up to this point, my pencasts have started from a blank page. This forces me to be focused, and systematically develop towards some desired content (e.g., conceptually introducing the phase diagram for H2O) and/or skills (e.g., how to calculate the slope of a line on a graph) oriented outcome. Because students can follow along, they have the opportunity to be fully engaged as the pencast progresses. Of course, what this also means is that this technology can be as effective in the first-year university level course I’m currently teaching, but also at the academic levels that precede (e.g., grade school, high school, etc.) and follow (senior undergraduate and graduate) this level.
  • Pencasts are learner-centric – In addition to be teacher-friendly, pencasts are learner-centric. Although a student could passively watch and listen to a pencast as it plays out in a linear, sequential fashion, the technology almost begs you to interact with it. As noted previously, this means a student can easily replay some aspect of the pencast that they missed. Even more interestingly, however, students can interact with pencasts in a random-access mode – a mode that would almost certainly be useful when they are attempting to apply the content/skills conveyed through the pencast to a tutorial or assignment they are working on, or a quiz or exam they are actively studying for. It is important to note that both the visual and audio elements of the pencast can be manipulated with impressive responsiveness to random-access input from the student.
  • I’m striving for authentic, not perfect pencasts – With a little more practice and some planning/scripting, I’d be willing to bet that I could produce an extremely polished pencast. Based on past experience teaching today’s first-year university students, I’m fairly convinced that this is something they couldn’t care less about. Let’s face it, my in-person lectures aren’t perfectly polished, and neither are my pencasts. Because I can easily go back to existing pencasts and add to them, I don’t need to fret too much about being perfect the first time. Too much time spent fussing here would diminish the natural and speedy aspects of the technology.

Findings aside, on to samples:

  • Calculating the lapse rate for Earth’s troposphere – This is a largely a skills-oriented example. It was my first pencast. I returned twice to the original pencast to make changes – once to correct a spelling mistake, and the second time to add in a bracket (“Run”) that I forgot. I communicated these changes to the students in the course via an updated link shared through a Moodle forum dedicated to pencasts. If you were to experience the updates, you’d almost be unaware of the lapse of time between the original pencast and the updates, as all of this is presented seamlessly as a single pencast to the students.
  • Introducing the pressure-temperature phase diagram for H2O – This is largely a content-oriented example. I got a little carried away in this one, and ended up packing in a little too much – the pencast is fairly long, and by the time I’m finished, the visual element is … a tad on the busy side. Experience gained.

Anecdotally, initial reaction from the students has been positive. Time will tell.

Next steps:

  • Monday (October 1, 2012), I intend to use a pencast during my lecture – to introduce aspects of the stability of Earth’s atmosphere. I’ll try to share here how it went. For this intended use of the pencast, I will use a landscape mode for presentation – as I expect that’ll work well in the large lecture hall I teach in. I am, however, a little concerned that the lines I’ll be drawing will be a little too thin/faint for the students at the back of the lecture theatre to see …
  • I have two sections of the NATS 1780 Weather and Climate course to teach this year. One section is taught the traditional way – almost 350 students in a large lecture theatre, 25-student tutorial groups, supported by Moodle, etc. In striking contrast to the approach taken in the meatspace section, is the second section where almost everything takes place online via Moodle. Although I have yet to support this hypothesis with any data, it is my belief that these pencasts are an excellent way to reach out to the students in the Internet-only section of the course. More on this over the fullness of time (i.e., the current academic session.)

Feel free to comment on this post or share your own experiences with pencasts.

Foraging for Resources in the Multicore Present and Future

HPC consultant Wolfgang Gentzsch has thoughtfully updated the case of multicore architectures in the HPC context. Over on LinkedIn, via one of the HPC discussion groups, I responded with:

I also enjoyed your article, Wolfgang – thank you. Notwithstanding the drive towards cluster-on-a-chip architectures, HPC customers will require workload managers (WLMs) that interface effectively and efficiently with O/S-level features/functionalities (e.g., MCOPt Multicore Manager from eXludus for Linux, to re-state your example). To me, this is a need well evidenced in the past: For example, various WLMs were tightly integrated with IRIX’s cpuset functionality (http://www.sgi.com/products/software/irix/releases/irix658.html) to allow for topology-aware scheduling in this NUMA-based offering from SGI. In present and future multicore contexts, the appetite for petascale and exascale computing will drive the need for such WLM-O/S integrations. In addition to the multicore paradigm, what makes ‘this’ future particularly interesting, is that some of these multicore architectures will exist in a hybrid (CPU/GPU) cloud – a cloud that may compliment in-house resources via some bursting capability (e.g., Bright’s cloud bursting, http://www.brightcomputing.com/Linux-Cluster-Cloud-Bursting.php). As you also well indicated in your article, it is incumbent upon all stakeholders to ensure that this future is a friendly as possible (e.g., for developers and users). To update a phrase originally spun by Herb Sutter (http://www.gotw.ca/publications/concurrency-ddj.htm) in the multicore context, not only is the free lunch over, its getting tougher to find and ingest lunches you’re willing to pay for!

We certainly live in interesting times!

Early Win Required for Partner-Friendly, Post-Acquisition Platform Computing

Further to the LinkedIn discussion on the relatively recent acquisition of Platfom by IBM, I just posted:

Platform CEO and Founder Songnian Zhou has this to say regarding the kernel of this discussion:

“IBM expects Platform to operate as a coherent business unit within its Systems and Technology Group. We got some promises from folks at IBM. We will accelerate our investments and growth. We will deliver on our product roadmaps. We will continue to provide our industry-best support and services. We will work even harder to add value to our partners, including IBM’s competitors. We want to make new friends while keeping the old, for one is silver while the other is gold. We might even get to keep our brand name. After all, distributed computing needs a platform, and there is only one Platform Computing. We are an optimistic bunch. We want to deliver to you the best of both worlds – you know what I mean. Give us a chance to show you what we can do for you tomorrow. Our customers and partners have journeyed with Platform all these years and have not regretted it. We are grateful to them eternally.”

Unsurprisingly upbeat, Zhou, Platform and IBM, really do require that customers and partners give them a chance to prove themselves under the new business arrangement. As noted in my previous comment in this discussion, this’ll require some seriously skillful stickhandling to skirt around challenging issues such as IP (Intellectual Property) – a challenge that is particularly exacerbated by the demands of the tightly coupled integrations required to deliver tangible value in the HPC context.

How might IBM-acquired Platform best demonstrate that it’s true to its collective word:

“Give us a chance to show you what we can do for you tomorrow.”

Certainly one way, is to strike an early win with a partner that demonstrates that they (Zhou, Platform and IBM) are true to their collective word. Aspects of this demonstration should likely include:

  • IP handling disclosures. Post-acquisition Platform and the partner should be as forthcoming as possible with respect to IP (Intellectual Property) handling – i.e., they should collectively communicate how business and technical IP challenges were handled in practice.
  • Customer validation. Self-explicit, such a demonstration has negligible value without validation by a customer willing to publicly state why they are willing to adopt the corresponding solution.
  • HPC depth. This demonstration has to be comprised of a whole lot more than merely porting a Platform product to a partner’s platform that would be traditionally viewed as a competitive to IBM. As stated previously, herein lies the conundrum: “To deliver a value-rich solution in the HPC context, Platform has to work (extremely) closely with the ‘system vendor’.  In many cases, this closeness requires that Intellectual Property (IP) of a technical and/or business nature be communicated …”

Thus, as the fullness of time shifts to post-acquisition Platform, trust becomes the watchword for continued success – particularly in HPC.

For without trust, there will be no opportunity for demonstrations such as the early win outlined here.

How else might Platform-acquired IBM demonstrate that it’s business-better-than-usual?

Feel free to add your $0.02.

RTM for Android: Significant Update Indeed!

I just upgraded to the latest, native-client release of RTM (Remember the Milk) for my Motorola Xoom tablet.

As expressed by the RTM team, this is a  very significant update. Although the details of the update are well covered over on the RTM blog, in 5 minutes of usage I’m left with the impression of:

  • A much improved user interface
  • A much more deeply integrated application

Although I’ve flirted with other tools/utilities for task management (including Evernote!), I’ve always returned to, or remained with, RTM. This latest update for the Android platform gives me another reason not to bother looking elsewhere.

Kudos to Bob T. Monkey and the rest of the banana-loving team down under at RTM!

IBM-Acquired Platform: Plan for Sustained, Partner-Friendly HPC Innovation Required

Over on LinkedIn, there’s an interesting discussion taking place in the “High Performance & Super Computing” group on the recently announced acquisition of Markham-based Platform Computing by IBM. My comment (below) was stimulated by concerns regarding the implications of this acquisition for IBM’s traditional competitors (i.e., other system vendors such as Cray, Dell, HP, etc.):

It could be argued:

“IBM groks vendor-neutral software and services (e.g., IBM Global Services), and therefore coopetition.”

At face value then, it’ll be business-as-usual for IBM-acquired Platform – and therefore its pre-acquisition partners and customers.

While business-as-usual plausibly applies to porting Platform products to offerings from IBM’s traditional competitors, I believe the sensitivity to the new business relationship (Platform as an IBM business unit) escalates rapidly for any solution that has value in HPC.

Why?

To deliver a value-rich solution in the HPC context, Platform has to work (extremely) closely with the ‘system vendor’. In many cases, this closeness requires that Intellectual Property (IP) of a technical and/or business nature be communicated – often well before solutions are introduced to the marketplace and made available for purchase. Thus Platform’s new status as an IBM entity, has the potential to seriously complicate matters regarding risk, trust, etc., relating to the exchange of IP.

Although it’s been stated elsewhere that IBM will allow Platform measures of post-acquisition independence, I doubt that this’ll provide sufficient comfort for matters relating to IP. While NDAs specific to the new (and independent) Platform business unit within IBM may offer some measure of additional comfort, I believe that technically oriented approaches offer the greatest promise for mitigating concerns relating to risk, trust, etc., in the exchange of IP.

In principle, one possibility is the adoption of open standards by all stakeholders. Such standards hold the promise of allowing for the integration between products via documented interfaces and protocols, while allowing (proprietary) implementation specifics to remain opaque. Although this may sound appealing, the availability of such standards remains elusive – despite various, well-intended efforts (by HPC, Grid, Cloud, etc., communities).

While Platform’s traditional competitors predictably and understandably gorge themselves sharing FUD, it obviously behooves both Platform and IBM to expend some effort allaying the concerns of their customers and partner ecosystem.

I’d be interested to hear of others’ suggestions as to how this new business relationship might allow for sustained innovation in the HPC context from IBM-acquired Platform.

Disclaimer: Although I do not have a vested financial interest in this acquisition, I did work for Platform from 1998-2005.

To reiterate here then:

How can this new business relationship allow for sustained, partner-friendly innovation in the HPC context from IBM-acquired Platform?

Please feel free to share your thoughts on this via comments to this post.

Remembering Steve Jobs

I was doing some errands earlier this evening (Toronto time) … While I was in the car, the all-news station (680news) I had on played some of Steve Jobs’ 2005 commencement address to Stanford grads. As I listened, and later re-read my own blog post on discovering the same address, I’m struck on the event of his passing by the importance of valuing every experience in life. In Jobs’ case, he eventually leveraged his experience with calligraphy to design the typography for the Apple Mac – after a ten-year incubation period!

I think it’s time to read that Stanford commencement address again …

RIP Steve – and thanks much.