My Next Chapter in Distributed Computing: Joining Sylabs to Containerize HPC and Deep Learning

HPC in the Cloud?

Back in 2015, my long-time friend and scientific collaborator James (Jim) Freemantle suggested I give a talk to his local association of remote-sensing professionals. In hindsight, and much more importantly it would turn out for me, was his suggestion to juxtapose Cloud computing and High Performance Computing (HPC) in this talk. Although it’s still available via the Ontario Association of Remote Sensing (OARS) website here, the abstract for my talk High Performance Computing in the Cloud? read:

High Performance Computing (HPC) in the Cloud is viable in numerous applications. Common to all successful applications for cloud-based HPC is the ability to embrace latency. Early successes were achieved with embarrassingly parallel HPC applications involving minimal amounts of data – in other words, there was little or no latency to be hidden. More recently the HPC-cloud community has become increasingly adept in its ability to ‘hide’ latency and, in the process, support increasingly more sophisticated HPC applications in public and private clouds. In this presentation, real-world applications, deemed relevant to remote sensing, will illustrate aspects of these sophistications for hiding latency in accounting for large volumes of data, the need to pass messages between simultaneously executing components of distributed-memory parallel applications, as well as (processing) workflows/pipelines. Finally, the impact of containerizing HPC for the cloud will be considered through the relatively recent creation of the Cloud Native Computing Foundation.

I delivered the talk in November 2015 at Toronto’s Ryerson University to a small, but engaged group, and made the slides available via the OARS website and Slideshare.

As you can read for yourself from the abstract and slides, or hear in the Univa webinar that followed in February 2016, I placed a lot of emphasis on latency in juxtaposing the cloud and HPC; from the perspective of HPC workloads, an emphasis that remains justifiable today. Much later however, in working hands-on on various cloud projects for Univa, I’d appreciate the challenges and opportunities that data introduces; but more on that (data) another time …

Cloud-Native HPC?

Also in hindsight, I am pleased to see that I made this relatively early connection (for me, anyway) with the ‘modern’ notion of what it means to be cloud native. Unfortunately, this is a phrase bandied about with reckless abandon at times – usage that causes the phrase to become devoid of meaning. So, in part, in my OARS talk and the Univa webinar, I related cloud native as understood by the Cloud Native Computing Foundation:

Cloud native computing uses an open source software stack to deploy applications as microservices, packaging each part into its own container, and dynamically orchestrating those containers to optimize resource utilization.

If you even possess just a working knowledge of HPC at a high level, you’ll immediately appreciate that there’s the possibility for more than a little tension that’s likely to surface in realizing any vision of ‘cloud-native HPC’. Why? HPC applications have not traditionally been architected with microservices in mind; in fact, they employ the polar opposite of microservices in implementation. Therefore, the notion of taking existing HPC applications and simply executing them within a Docker container can present some challenges and opportunities – even though numerous examples of successfully containerized HPC applications do exist (see, for example, the impressive array of case studies over at UberCloud).

In some respects, when it comes to containerizing HPC applications, this is just the tip of the iceberg. In following up on the Univa webinar with a Q&A blog on HPC in the Cloud in February 2016, I quoted Univa CTO Fritz Ferstl in regard to a question on checkpointing Docker containers:

The mainstream of the container ecosystem views them as ephemeral – i.e., you can just kill them, restart them (whether on the same node or elsewhere), and then they somehow re-establish ‘service’ (i.e., what they are supposed to do … even though this isn’t an intrinsic capability of a Docker container).

Whereas ephemeral resonates soundly with microservices-based applications, this is hardly a ‘good fit’ for HPC applications. And because they share numerous characteristics with traditional HPC applications, emerging applications and workloads in AI, Deep Learning and Machine Learning suffer a similar fate: They aren’t always a good fit for traditional containers along the implementation lines of Docker. From nvidia-docker to the relatively recent and impressive integration between Univa Grid Engine and Docker, it’s blatantly evident that there are significant technical gymnastics required to leverage GPUs for applications that one seeks to execute within a Docker container. For years now for traditional HPC applications and workflows, and more recently for Deep Learning use cases, there is an implied requirement to tap into GPUs as computational resources.

A Singular Fit

For these and other reasons, Singularity has been developed, as a ‘vehicle’ for containerization that is simply a better fit for HPC and Deep Learning applications and their corresponding workflows. Because I have very recently joined the team at Sylabs, Inc. as a technical writer, you can expect to hear a whole lot more from me on containerization via Singularity – here, or even more frequently, over at the Sylabs blog and in Lab Notes.

Given that my acknowledged bias towards Distributed computing includes a significant dose of Cloud computing, I’m sure you can appreciate that it’s with so much more than a casual degree of enthusiasm that I regard my new opportunity with Sylabs – a startup that is literally defining how GOV/EDU as well as enterprise customers can sensibly and easily acquire the benefits of containerizing their applications and workflows on everything from isolated PCs (laptops and desktops) to servers, VMs and/or instances that exist in their datacenters and/or clouds.

From my ‘old’ friend/collaborator Jim to the team at Sylabs that I’ve just joined, and to everyone in between, it’s hard not to feel a sense of gratitude at this juncture. With HPC’s premiere event less than a month away in Dallas, I look forward to reconnecting with ‘my peeps’ at SC18, and ensuring they are aware of the exciting prospects Singularity brings to their organizations.

Preserving Content for Your Portfolio: Kudos to The Internet Archive

Preserving Science

I’ve been publishing articles since the last century.

In fact, my first, legitimate publication was a letter to science journal Nature with my then thesis supervisor (Keith Aldridge) in 1987 … that’s 31 years ago . Armed with nothing more than Google Scholar, searching for “aldridge lumb nature 1987” yields access to the article via Nature’s website in fractions of a second. Moreover, since the introduction of Digital Object Identifiers (DOIs) around the turn of the last century (circa 2000), articles such as this one are uniquely identifiable and findable via a URL – e.g., the URL for our Nature letter is http://dx.doi.org/10.1038/325421a0.

In this letter to Nature, Keith and I cite an 1880 publication authored by Lord Kelvin – whom, it appears, is known for fluid dynamics in addition to the temperature scale that bears his name … and, of course, much more! Through this and other citations, Keith and I explicitly acknowledged how the contributions of others enabled us to produce our letter – in other words, we made it clear how we have been able to stand on the shoulders of giants.

In addition to assigning intellectual credit where it is due, this personal reflection underscores the importance of preserving contributions over the long haul – make that 138 years in the case of Kelvin’s 1880 paper. Preservation is a well-established practice in the case of scientific journals, for example, even though it may be necessary to draw upon analog renditions captured via print or microfiche rather than some digital representation.

In self-curating portfolios recently, it’s been made increasingly clear to me that content preservation has not been a focal point in the digital realm.

Digital Properties

Let’s make use of Grid Computing for the purpose of providing an illustrative example. In its heyday, a popular and reputable online magazine was GRIDtoday: “DAILY NEWS AND INFORMATION FOR THE GLOBAL GRID COMMUNITY”. Other than a passing reference in pioneering publisher Tom Tabor’s BIO (you can search for his BIO here), I expect you’ll be hard pressed to locate very much at all regarding this once-thriving online property. Like Grid Computing itself: GRIDtoday, gone tomorrow; RIP GRIDtoday. Of course, Grid Computing Planet (GCP) suffered a similar fate.

My purpose here is not to question those extremely reasonable business decisions that resulted in closing down operations on GCP or GRIDtoday – Tabor Communications, for example, boasts three, prized ‘properties’ as of this writing … one of which (HPCwire) predates the inception of GRIDtoday, and remains a go-to source for all things HPC.

Grid Computing remains an important chapter in my professional life – especially given my claims for genetic imprinting via Distributed Computing. However, based upon my desire to assemble a portfolio of my work that includes Grid Computing, the /dev/null redirection of those bits that collectively represented GRIDtoday and GCP is problematical. In particular, and even though I collaborated upon articles and book chapters that have been preserved in analog and/or digital representations, articles contributed to GRIDtoday and GCP still retain value to me personally – value that I’d like to incorporate into my Portfolio.

Enter The Internet Archive

Fortunately, also since close to the end of the last century, The Internet Archive has been:

… building a digital library of Internet sites and other cultural artifacts in digital form. Like a paper library, [they] provide free access to researchers, historians, scholars, the print disabled, and the general public. Our mission is to provide Universal Access to All Knowledge.

I’m not intending to imply that those items I was able to have published via GRIDtoday and GCP carry ‘a Kelvin of clout’ however, for more than purely sentimental reasons it’s truly wonderful that The Internet Archive has attempted to preserve those artifacts that collectively comprised these publications in their heyday. Although I haven’t yet attempted to locate an article I wrote for GCP, I was able to retrieve two articles from the archive for GRIDtoday:

  • Towards The Telecosmic Grid – Published originally in December 2002, in this article I ‘channeled’ George Gilder is asserting that: “Isolating and manipulating discrete wavelengths of visible light across intelligent optical transport media results in the grid – a specific instance of The Telecosmic Grid. Several examples serve as beacons of possibility.” More on this soon (I hope) in a separate post that revisits this possibility.
  • Open Grid Forum: Necessary … but Sufficient? – Published originally in June 2006, this may be the most-opinionated article I’ve ever had appear in any media format! It generated a decent amount of traffic for GRIDtoday, as well as an interesting accusation – an accusation ‘leaked’, incidentally, through a mailing list archive.

Given that these two GRIDtoday articles are currently accessible via The Internet Archive means that I can include each of them directly in my Portfolio, and update my blog posts that make reference to them. Having laid intellectual claim (in 2002 I’ll have you know!!! 😉 to various possibilities telecosmic in nature, I’ll be able to soon revisit the same through the guise of hindsight. Whereas I fully appreciate that business decisions need to be made, and as consequence once-popular landing pages necessarily disappear, it’s truly fortunate that The Internet Archive has our collective backs on this. So, if this post has any key takeaways, it’s simply this:

Please donate to The Internet Archive.

Thanks Brewster!

Disclosures Regarding My Portfolios: Attributing the Contributions of Others

‘Personal’ Achievement?

October 8, 2018 was an extremely memorable night for Drew Brees at the Mercedes-Benz Superdome in New Orleans. Under the intense scrutiny of Monday Night Football, the quarterback of the New Orleans Saints became the leading passer in the history of the National Football League. (For those not familiar with this sport, you can think of his 72,103-yard milestone as a lifetime-achievement accomplishment of ultramarathon’ic proportions.) The narrative on Brees’ contributions to ‘the game’ are anything but complete. In fact, the longer he plays, the more impressive this milestone becomes, as he continues to place distance between himself and every other NFL QB.

Of course the record books, and Brees’ inevitable induction into the Pro Football Hall of Fame, will all position this as an individual-achievement award. Whenever given the opportunity to reflect upon seemingly personal achievements such as the all-time passing leader, Brees is quick to acknowledge those who have enabled him to be so stunningly successful in such a high-profile, high-pressure role – from family and friends, to teammates, coaches, and more.

As I wrote about another NFL quarterback in a recent post, like Tom Brady, Brees remains a student-of-the-game. He is also known for his off-the-field work ethic that he practices with the utmost intensity in preparing for those moments when he takes the main stage along with his team. Therefore, when someone like Brees shares achievements with those around him, it’s clearly an act that is sincerely authentic.

Full Disclosure

At the very least, self-curating and sharing in public some collection of your work has more than the potential to come across as an act of blatant self-indulgence – and, of course, to some degree it is! At the very worst, however, is the potential for such an effort to come across as a purely individual contribution. Because contribution matters so much to me personally, I wanted to ensure that any portfolio I self-curate includes appropriate disclosures; disclosures that acknowledge the importance of collaboration, opportunity, support, and so on, from my family, friends and acquaintances, peers and co-workers, employers, customers and partners, sponsors, and more. In other words, and though in a very different context, like Brees I want to ensure that what comes across as ‘My Portfolio’ rightly acknowledges that this too is a team sport.

In the interests of generic disclosures then, the following is an attempt to ensure the efforts of others are known explicitly:

  • Articles, book chapters and posters – Based on authorships, affiliations and acknowledgements, portfolio artifacts such as articles, book chapters and posters make explicit collaborators, enablers and supporters/influencers, respectively. In this case, there’s almost no need for further disclosure.
  • Blog posts – Less formal than the written and oral forms of communication already alluded to above and below, it’s through the words themselves and/or hyperlinks introduced that the contributions of others are gratefully and willingly acknowledged. Fortunately, it is common practice for page-ranking algorithms to take into account the words and metadata that collectively comprise blog posts, and appropriately afford Web pages stronger rankings based upon these and other metrics.
  • Presentations – My intention here is to employ Presentations as a disclosure category for talks, webinars, workshops, courses, etc. – i.e., all kinds of oral communications that may or may not be recorded. With respect to this category, my experience is ‘varied’ – e.g., in not always allowing for full disclosure regarding collaborators, though less so regarding affiliations. Therefore, to make collaborators as well as supporters/influencers explicit, contribution attributions are typically included in the materials I’ve shared (e.g., the slides corresponding to my GTC17 presentation) and/or through the words I’ve spoken. Kudos are also warranted for the organizations I’ve represented in some of these cases as well, as it has been a byproduct of this representation that numerous opportunities have fallen into my lap – though often owing to a sponsorship fee, to be completely frank. Finally, sponsoring organizations are also deserving of recognition, as it is often their mandate (e.g., a lead-generation marketing program that requires a webinar, a call for papers/proposals) that inspires what ultimately manifests itself as some artifact in one of my portfolios; having been on the event-sponsor’s side more than a few times, I am only too well aware of the effort involved in creating the space for presentations … a contribution that cannot be ignored.

From explicit to vague, disclosures regarding contribution are clearly to barely evident. Regardless, for those portfolios shared via my personal blog (Data Science Portfolio and Cloud Computing Portfolio), suffice it to say that there were always others involved. I’ve done my best to make those contributions clear, however I’m sure that unintentional omissions, errors and/or (mis)representations exist. Given that these portfolios are intentionally positioned and executed as works-in-progress, I look forward to addressing matters as they arise.

Ian Lumb’s Cloud Computing Portfolio

When I first introduced it, it made sense to me (at the time, at least!) to divide my Data Science Portfolio into two parts; the latter part was “… intended to showcase those efforts that have enabled other Data Scientists” – in other words, my contributions as a Data Science Enabler.

As of today, most of what was originally placed in that latter part of my Data Science Portfolio has been transferred to a new portfolio – namely one that emphasizes Cloud computing. Thus my Cloud Computing Portfolio is a self-curated, online, multimedia effort intended to draw together into a cohesive whole my efforts in Cloud computing; specifically this new Portfolio is organized as follows:

  • Strictly Cloud – A compilation of contributions in which Cloud computing takes centerstage
  • Cloud-Related – A compilation of contributions drawn from clusters and grids to miscellany. Also drawn out in this section, however, are contributions relating to containerization.

As with my Data Science Portfolio, you’ll find in my Cloud Computing Portfolio everything from academic articles and book chapters, to blog posts, to webinars and conference presentations – in other words, this Portfolio also lives up to its multimedia billing!

Since this is intentionally a work-in-progress, like my Data Science Portfolio, feedback is always welcome as there will definitely be revisions applied !

PyTorch Then & Now: A Highly Viable Framework for Deep Learning

Why PyTorch Then?

In preparing for a GTC 2017 presentation, I was driven to emphasize CUDA-enabled GPUs as the platform upon which I’d run my Machine Learning applications. Although I’d already had some encouraging experience with Apache Spark’s MLlib in a classification problem, ‘porting’ from in-memory computations based upon use of CPUs to GPUs was and remains ‘exploratory’ – with, perhaps, the notable exception of a cloud-based offering from Databricks themselves. Instead, in ramping up for this Silicon Valley event, I approached this ‘opportunity’ with an open mind and began my GPU-centric effort by starting at an NVIDIA page for developers. As I wrote post-event in August 2017:

Frankly, the outcome surprised me: As a consequence of my GTC-motivated reframing, I ‘discovered’ Natural Language Processing (NLP) – broadly speaking, the use of human languages by a computer. Moreover, by reviewing the breadth and depth of possibilities for actually doing some NLP on my Twitter data, I subsequently ‘discovered’ PyTorch – a Python-based framework for Deep Learning that can readily exploit GPUs. It’s important to note that PyTorch is not the only choice available for engaging in NLP on GPUs, and it certainly isn’t the most-obvious choice. As I allude to in my GTC presentation, however, I was rapidly drawn to PyTorch.

Despite that most-obvious choice (I expect) of TensorFlow, I selected PyTorch for reasons that included the following:

Not bad for version 0.1 of a framework, I’d say! In fact, by the time I was responding to referee’s feedback in revising a book chapter (please see “Refactoring Earthquake-Tsunami Causality with Big Data Analytics” under NLP in my Data Science Portfolio), PyTorch was revised to version 0.2.0. This was a very welcome revision in the context of this chapter revision, however, as it included a built-in method for performing cosine similarities (“cosine_similarity”) – the key discriminator for quantitatively assessing the semantic similarity between two word vectors.

Perhaps my enthusiasm for PyTorch isn’t all that surprising, as I do fit into one of their identified user profiles:

PyTorch has gotten its biggest adoption from researchers, and it’s gotten about a moderate response from data scientists. As we expected, we did not get any adoption from product builders because PyTorch models are not easy to ship into mobile, for example. We also have people who we did not expect to come on board, like folks from OpenAI and several universities.

Towards PyTorch 1.0

In this same August 2017 O’Reilly podcast (from which I extracted the above quote on user profiles), Facebook’s Soumith Chintala stated:

Internally at Facebook, we have a unified strategy. We say PyTorch is used for all of research and Caffe 2 is used for all of production. This makes it easier for us to separate out which team does what and which tools do what. What we are seeing is, users first create a PyTorch model. When they are ready to deploy their model into production, they just convert it into a Caffe 2 model, then ship into either mobile or another platform.

Perhaps it’s not entirely surprising then that the 1.0 release intends to “… marry PyTorch and Caffe2 which gives the production-level readiness for PyTorch.” My understanding is that researchers (and others) retain the highly favorable benefit of developing in PyTorch but then, via the new JIT compiler, acquire the ability to deploy into production via Caffe2 or “… [export] to C++-only runtimes for use in larger projects”; thus PyTorch 1.0’s production reach extends to runtimes other than just Python-based ones – e.g., those runtimes that drive iOS, Android and other mobile devices. With TensorFlow already having emerged as the ‘gorilla of all frameworks’, the productionizing choice in the implementation of PyTorch will be well received by Facebook and other proponents of Caffe2.

The productionization of PyTorch also includes:

  • A C++ frontend – “… a pure C++ interface to the PyTorch backend that follows the API and architecture of the established Python frontend …” that “… is intended to enable research in high performance, low latency and bare metal C++ applications.”
  • Distributed PyTorch enhancements – Originally introduced in version 0.2.0 of PyTorch, “… the torch.distributed package … allows you to exchange Tensors among multiple machines.” Otherwise a core-competence of distributed TensorFlow, this ability to introduce parallelism via distributed processing becomes increasingly important as Deep Learning applications and their workflows transition from prototypes into production – e.g., as the demands of training escalate. In PyTorch 1.0, use of a new library (“C10D”) is expected to significantly enhance performance, while asynchronously enabling communications – even when use is made of the familiar-to-HPC-types Message Passing Interface (MPI).

In May 2018, over on Facebook’s developer-centric blog, Bill Jia posted:

Over the coming months, we’re going to refactor and unify the codebases of both the Caffe2 and PyTorch 0.4 frameworks to deduplicate components and share abstractions. The result will be a unified framework that supports efficient graph-mode execution with profiling, mobile deployment, extensive vendor integrations, and more.

As of this writing, a version 1 release candidate for PyTorch 1.0 is available via GitHub.

Stable releases for previous versions are available for ‘local’ or cloud use.

Key Takeaway: Why PyTorch Now!

Whereas it might’ve been a no-brainer to adopt TensorFlow as your go-to framework for all of your Deep Learning needs, I found early releases of PyTorch to be an effective enabler over a year ago – when it was only at the 0.2.0 release stage! Fortunately, the team behind PyTorch has continued to advance the capabilities offered – capabilities that are soon to officially include production-ready distributed processing. If you’re unaware of PyTorch, or bypassed it in the past, it’s likely worth another look right now.

Data Science: Identifying My Professional Bias

Data Science: Identifying My Professional Bias

In the Summer of 1984, I arrived at Toronto’s York University as a graduate student in Physics & Astronomy. (Although my grad programme was Physics & Astronomy, my research emphasized the application of fluid dynamics to Earth’s deep interior.) Some time after that, I ran my first non-interactive computation on a cluster of VAX computers. I’m not sure if this was my first exposure to Distributed Computing or not not; I am, however, fairly certain that this was the first time it (Distributed Computing) registered with me as something exceedingly cool, and exceedingly powerful.

Even back in those days, armed with nothing more than a VT100 terminal ultimately connected to a serial interface on one of the VAXes, I could be logged in and able to submit a computational job that might run on some other VAX participating in the cluster. The implied connectedness, the innate ability to make use of compute cycles on some ‘remote’ system was intellectually intoxicating – and I wasn’t even doing any parallel computing (yet)!

More than a decade later, while serving in a staff role as a computer coordinator, I became involved in procuring a modest supercomputer for those members of York’s Faculty of Pure & Applied Science who made High Performance Computing (HPC) a critical component of their research. If memory serves me correctly, this exercise resulted in the purchase of a NUMA-architecture system from SGI powered by MIPS CPUs. Though isolated initially, and as a component of the overall solution, Platform LSF was included to manage the computational workloads that would soon consume the resources of this SGI system.

The more I learned about Platform LSF, the more I was smitten by the promise and reality of Distributed Computing – a capability to be leveraged from a resource-centric perspective with this Load Sharing Facility (LSF). [Expletive deleted], Platform founder Songnian Zhou expressed the ramifications of his technical vision for this software as Utopia in a 1993 publication. Although buying the company wasn’t an option, I did manage to be hired by Platform, and work there in various roles for about seven-and-a-half years.

Between my time at Platform (now an IBM company) and much-more recently Univa, over a decade of my professional experience has been spent focused on managing workloads in Distributed Computing environments. From a small handful of VAXes, to core counts that have reached 7 figures, these environments have included clusters, grids and clouds.

My professional bias towards Distributed Computing was further enhanced through the experience of being employed by two software vendors who emphasized the management of clusters – namely Scali (Scali Manage) and subsequently Bright Computing (Bright Cluster Manager). Along with Univa (Project Tortuga and Navops Launch), Bright extended their reach to the management of HPC resources in various cloud configurations.

If it wasn’t for a technical role at Allinea (subsequently acquired by ARM), I might have ended up ‘stuck in the middle’ of the computational stack – as workload and cluster management is regarded by the HPC community (at least) as middleware … software that exists between the operating environment (i.e., the compute node and its operating system) and the toolchain (e.g., binaries, libraries) that ultimately support applications and end users (e.g., Figure 5 here).

Allinea’s focuses on tools to enable HPC developers. Although they were in the process of broadening their product portfolio to include a profiling capability around the time of my departure, in my tenure there the emphasis was on a debugger – a debugger capable of handling code targeted for (you guessed it) Distributed Computing environments.

Things always seemed so much bigger when we were children. Whereas Kid Ian was impressed by a three-node VAX cluster, and later ‘blown away’ by a modest NUMA-architecture ‘supercomputer’, Adult Ian had the express privilege of running Allinea DDT on some of the largest supercomputers on the planet (at the time) – tracking down a bug that only showed up when more than 20K cores were used in parallel on one of Argonne’s Blue Genes, and demonstrating scalable, parallel debugging during a tutorial on some 700K cores of NCSA’s Blue Waters supercomputer. In hindsight, I can’t help but feel humbled by this impressive capability of Allinea DDT to scale to these extremes. Because HPC’s appetite for scale has extended beyond tera and petascale capabilities, and is eyeing seriously the demand to perform at the exascale, software like Allinea DDT needs also to match this penchant for extremely extreme scale.

At this point, suffice it to say that scalable Distributed Computing has been firmly encoded into my professional DNA. As with my scientifically based academic bias, it’s difficult not to frame my predisposition towards Distributed Computing in a positive light within the current context of Data Science. Briefly, it’s a common experience for the transition from prototype-to-production to include the introduction of Distributed Computing – if not only to merely execute applications and/or their workflows on more powerful computers, but perhaps to simultaneously scale these in parallel.

I anticipate the need to return to this disclosure regarding the professional bias I bring to Data Science. For now though, calling out the highly influential impact Distributed Computing has had on my personal trajectory, appears warranted within the context of my Data Science Portfolio.

Incorporate the Cloud into Existing IT Infrastructure => Progress ( Life Sciences )

I still have lots to share after recently attending Bio-IT World in  Boston … The latest comes as a Bright Computing contributed article to the April 2013 issue of the IEEE Life Sciences Newsletter.

The upshot of this article is:

Progress in the Life Sciences demands extension of IT infrastructure from the ground into the cloud.

Feel free to respond here with your comments.

Over “On the Bright side …” Cloud Use Cases from Bio-IT World

As some of you know, I’ve recently joined Bright Computing.

Last week, I attended Bio-IT World 2013 in Boston. Bright had an excellent show – lots of great conversations, and even an award!

During numerous conversations, the notion of extending on-site IT infrastructure into the cloud was raised. Bright has an excellent solution for this.

What also emerged during the conversations were two uses for this extension of local IT resources via the cloud. I thought this was worth capturing and sharing. You can read about the use cases I identified over “On the Bright side …

Foraging for Resources in the Multicore Present and Future

HPC consultant Wolfgang Gentzsch has thoughtfully updated the case of multicore architectures in the HPC context. Over on LinkedIn, via one of the HPC discussion groups, I responded with:

I also enjoyed your article, Wolfgang – thank you. Notwithstanding the drive towards cluster-on-a-chip architectures, HPC customers will require workload managers (WLMs) that interface effectively and efficiently with O/S-level features/functionalities (e.g., MCOPt Multicore Manager from eXludus for Linux, to re-state your example). To me, this is a need well evidenced in the past: For example, various WLMs were tightly integrated with IRIX’s cpuset functionality (http://www.sgi.com/products/software/irix/releases/irix658.html) to allow for topology-aware scheduling in this NUMA-based offering from SGI. In present and future multicore contexts, the appetite for petascale and exascale computing will drive the need for such WLM-O/S integrations. In addition to the multicore paradigm, what makes ‘this’ future particularly interesting, is that some of these multicore architectures will exist in a hybrid (CPU/GPU) cloud – a cloud that may compliment in-house resources via some bursting capability (e.g., Bright’s cloud bursting, http://www.brightcomputing.com/Linux-Cluster-Cloud-Bursting.php). As you also well indicated in your article, it is incumbent upon all stakeholders to ensure that this future is a friendly as possible (e.g., for developers and users). To update a phrase originally spun by Herb Sutter (http://www.gotw.ca/publications/concurrency-ddj.htm) in the multicore context, not only is the free lunch over, its getting tougher to find and ingest lunches you’re willing to pay for!

We certainly live in interesting times!

Early Win Required for Partner-Friendly, Post-Acquisition Platform Computing

Further to the LinkedIn discussion on the relatively recent acquisition of Platfom by IBM, I just posted:

Platform CEO and Founder Songnian Zhou has this to say regarding the kernel of this discussion:

“IBM expects Platform to operate as a coherent business unit within its Systems and Technology Group. We got some promises from folks at IBM. We will accelerate our investments and growth. We will deliver on our product roadmaps. We will continue to provide our industry-best support and services. We will work even harder to add value to our partners, including IBM’s competitors. We want to make new friends while keeping the old, for one is silver while the other is gold. We might even get to keep our brand name. After all, distributed computing needs a platform, and there is only one Platform Computing. We are an optimistic bunch. We want to deliver to you the best of both worlds – you know what I mean. Give us a chance to show you what we can do for you tomorrow. Our customers and partners have journeyed with Platform all these years and have not regretted it. We are grateful to them eternally.”

Unsurprisingly upbeat, Zhou, Platform and IBM, really do require that customers and partners give them a chance to prove themselves under the new business arrangement. As noted in my previous comment in this discussion, this’ll require some seriously skillful stickhandling to skirt around challenging issues such as IP (Intellectual Property) – a challenge that is particularly exacerbated by the demands of the tightly coupled integrations required to deliver tangible value in the HPC context.

How might IBM-acquired Platform best demonstrate that it’s true to its collective word:

“Give us a chance to show you what we can do for you tomorrow.”

Certainly one way, is to strike an early win with a partner that demonstrates that they (Zhou, Platform and IBM) are true to their collective word. Aspects of this demonstration should likely include:

  • IP handling disclosures. Post-acquisition Platform and the partner should be as forthcoming as possible with respect to IP (Intellectual Property) handling – i.e., they should collectively communicate how business and technical IP challenges were handled in practice.
  • Customer validation. Self-explicit, such a demonstration has negligible value without validation by a customer willing to publicly state why they are willing to adopt the corresponding solution.
  • HPC depth. This demonstration has to be comprised of a whole lot more than merely porting a Platform product to a partner’s platform that would be traditionally viewed as a competitive to IBM. As stated previously, herein lies the conundrum: “To deliver a value-rich solution in the HPC context, Platform has to work (extremely) closely with the ‘system vendor’.  In many cases, this closeness requires that Intellectual Property (IP) of a technical and/or business nature be communicated …”

Thus, as the fullness of time shifts to post-acquisition Platform, trust becomes the watchword for continued success – particularly in HPC.

For without trust, there will be no opportunity for demonstrations such as the early win outlined here.

How else might Platform-acquired IBM demonstrate that it’s business-better-than-usual?

Feel free to add your $0.02.