Preserving Content for Your Portfolio: Kudos to The Internet Archive

Preserving Science

I’ve been publishing articles since the last century.

In fact, my first, legitimate publication was a letter to science journal Nature with my then thesis supervisor (Keith Aldridge) in 1987 … that’s 31 years ago . Armed with nothing more than Google Scholar, searching for “aldridge lumb nature 1987” yields access to the article via Nature’s website in fractions of a second. Moreover, since the introduction of Digital Object Identifiers (DOIs) around the turn of the last century (circa 2000), articles such as this one are uniquely identifiable and findable via a URL – e.g., the URL for our Nature letter is http://dx.doi.org/10.1038/325421a0.

In this letter to Nature, Keith and I cite an 1880 publication authored by Lord Kelvin – whom, it appears, is known for fluid dynamics in addition to the temperature scale that bears his name … and, of course, much more! Through this and other citations, Keith and I explicitly acknowledged how the contributions of others enabled us to produce our letter – in other words, we made it clear how we have been able to stand on the shoulders of giants.

In addition to assigning intellectual credit where it is due, this personal reflection underscores the importance of preserving contributions over the long haul – make that 138 years in the case of Kelvin’s 1880 paper. Preservation is a well-established practice in the case of scientific journals, for example, even though it may be necessary to draw upon analog renditions captured via print or microfiche rather than some digital representation.

In self-curating portfolios recently, it’s been made increasingly clear to me that content preservation has not been a focal point in the digital realm.

Digital Properties

Let’s make use of Grid Computing for the purpose of providing an illustrative example. In its heyday, a popular and reputable online magazine was GRIDtoday: “DAILY NEWS AND INFORMATION FOR THE GLOBAL GRID COMMUNITY”. Other than a passing reference in pioneering publisher Tom Tabor’s BIO (you can search for his BIO here), I expect you’ll be hard pressed to locate very much at all regarding this once-thriving online property. Like Grid Computing itself: GRIDtoday, gone tomorrow; RIP GRIDtoday. Of course, Grid Computing Planet (GCP) suffered a similar fate.

My purpose here is not to question those extremely reasonable business decisions that resulted in closing down operations on GCP or GRIDtoday – Tabor Communications, for example, boasts three, prized ‘properties’ as of this writing … one of which (HPCwire) predates the inception of GRIDtoday, and remains a go-to source for all things HPC.

Grid Computing remains an important chapter in my professional life – especially given my claims for genetic imprinting via Distributed Computing. However, based upon my desire to assemble a portfolio of my work that includes Grid Computing, the /dev/null redirection of those bits that collectively represented GRIDtoday and GCP is problematical. In particular, and even though I collaborated upon articles and book chapters that have been preserved in analog and/or digital representations, articles contributed to GRIDtoday and GCP still retain value to me personally – value that I’d like to incorporate into my Portfolio.

Enter The Internet Archive

Fortunately, also since close to the end of the last century, The Internet Archive has been:

… building a digital library of Internet sites and other cultural artifacts in digital form. Like a paper library, [they] provide free access to researchers, historians, scholars, the print disabled, and the general public. Our mission is to provide Universal Access to All Knowledge.

I’m not intending to imply that those items I was able to have published via GRIDtoday and GCP carry ‘a Kelvin of clout’ however, for more than purely sentimental reasons it’s truly wonderful that The Internet Archive has attempted to preserve those artifacts that collectively comprised these publications in their heyday. Although I haven’t yet attempted to locate an article I wrote for GCP, I was able to retrieve two articles from the archive for GRIDtoday:

  • Towards The Telecosmic Grid – Published originally in December 2002, in this article I ‘channeled’ George Gilder is asserting that: “Isolating and manipulating discrete wavelengths of visible light across intelligent optical transport media results in the grid – a specific instance of The Telecosmic Grid. Several examples serve as beacons of possibility.” More on this soon (I hope) in a separate post that revisits this possibility.
  • Open Grid Forum: Necessary … but Sufficient? – Published originally in June 2006, this may be the most-opinionated article I’ve ever had appear in any media format! It generated a decent amount of traffic for GRIDtoday, as well as an interesting accusation – an accusation ‘leaked’, incidentally, through a mailing list archive.

Given that these two GRIDtoday articles are currently accessible via The Internet Archive means that I can include each of them directly in my Portfolio, and update my blog posts that make reference to them. Having laid intellectual claim (in 2002 I’ll have you know!!! 😉 to various possibilities telecosmic in nature, I’ll be able to soon revisit the same through the guise of hindsight. Whereas I fully appreciate that business decisions need to be made, and as consequence once-popular landing pages necessarily disappear, it’s truly fortunate that The Internet Archive has our collective backs on this. So, if this post has any key takeaways, it’s simply this:

Please donate to The Internet Archive.

Thanks Brewster!

Towards Tsunami Informatics: Applying Machine Learning to Data Extracted from Twitter

2018 Sulawesi Earthquake & Tsunami

Even in 2018, our ability to provide accurate tsunami advisories and warnings is exceedingly challenged.

In best-case scenarios, advisories and warnings afford inhabitants of low-lying coastal areas minutes or (hopefully) longer to react.

In best-case scenarios, advisories and warnings are based upon in situ measurements via tsunameters – as ocean-bottom changes in seawater pressure serve as reliable precursors for impending tsunami arrival. (By way of analogy, tsunameters ‘see’ tsunamis as do radars ‘see’ precipitation. Based on ‘sight’ then, both offer a reasonable ability to ‘nowcast’.)

In typical scenarios, however, advisories and warnings can communicate mixed messages. In the case of the recent Sulawesi earthquake and tsunami for example, a nearby alert (for the Makassar Strait) was retracted after some 30 minutes, even though Palu, Indonesia experienced a ‘localized’ tsunami that resulted in significant losses – with current estimates placing the number of fatalities at more than 1200 people.

With ultimate regret stemming from significant loss of human life, the recent case for the residents of Palu is particularly painful, as alerting was not informed by tsunameter measurements owing to an ongoing dispute – an unresolved dispute that rendered the deployment of an array of tsunameters incomplete and inoperable. A dispute that, if resolved, could’ve provided this low-lying coastal area with accurate and potentially life-saving alerts.

Lessons from Past Events

It’s been only 5,025 days since the last tsunami affected Indonesia – the also devastating Boxing Day 2004 event in the Indian Ocean. All things considered, it’s truly wonderful that a strategic effort to deploy a network of tsunameters in this part the planet was in place; of course, it’s well beyond tragic that execution of the project was significantly hampered, and that almost 14 years later, inhabitants of this otherwise idyllic setting are left to suffer loss of such epic proportions.

I’m a huge proponent of tsunameters as last-resort, yet-accurate indicators for tsunami alerting. In their absence, the norm is for advisories and warnings that may deliver accurate alerts – “may” being the operative word here, as it often the case that alerts are issued only to be retracted at some future time … as was the case again for the recent Sulawesi event. Obviously, tsunami centers that ‘cry wolf’, run the risk of not being taken seriously – seriously, perhaps, in the case when they have correctly predicted an event of some significance.

It’s not that those scientific teams of geographers, geologists, geophysicists, oceanographers and more are in any way lax in attempting to do their jobs; it’s truly that the matter of tsunami prediction is exceedingly difficult. For example, unless you caught the January 2006 issue of Scientific American as I happened to, you’d likely be unaware that 4,933 days ago an earthquake affected (essentially) the same region as the Boxing Day 2004 event; regarded as a three-month-later aftershock, this event of similar earthquake magnitude and tectonic setting did not result in a tsunami.

Writing in this January 2006 issue of Scientific American, Geist et al. compared the two Indian Ocean events side-by-side – using one of those diagrams that this magazine is lauded for. The similarities between the two events are compelling. The seemingly subtle differences, however, are much more than compelling – as the tsunami-producing earlier of the two events bears testimony.

As a student of theoretical, global geophysics, but not specifically oceanography, seismology, tectonophysics or the like, I was unaware of the ‘shocking differences’ between these two events. However, my interest was captivated instantaneously!

Towards Tsunami Informatics

Graph Analytics?

It would take, however, some 3,000 days for my captivated interest to be transformed into a scientific communication. On the heels of successfully developing a framework and platform for knowledge representation with long-time friend and collaborator Jim Freemantle and others, our initial idea was to apply graph analytics to data extracted from Twitter – thus acknowledging that Twitter has the potential to serve as a source of data that might be of value in the context of tsunami alerting.

In hindsight, it’s fortunate that Jim and I did not spend a lot of time on the graph-analytics approach. In fact, arguably the most-valuable outcome from the poster we presented at a computer-science conference in June 2014 (HPCS, Halifax, Nova Scotia), was Jim’s Perl script (see, e.g., Listing 1 of our subsequent unpublished paper, or Listing 1.1 of our soon-to-be published book chapter) that extracted keyword-specified data (e.g., “#earthquake”) from Twitter streams.

Machine Learning: Classification

About two years later, stemming from conversations at the March 2016 Rice University Oil & Gas Conference in Houston, our efforts began to emphasize Machine Learning over graph analytics. Driving for results to present at a May 2016 Big Data event at Prairie View A&M University (PVAMU, also in the Houston area), a textbook example (literally!) taken from the pages of an O’Reilly book on Learning Spark showed some promise in allowing Jim and I to classify tweets – with hammy tweets encapsulating something deemed geophysically interesting, whereas spammy ones not so much. ‘Not so much’ was determined through supervised learning – in other words, results reported were achieved after a manual classification of tweets for the purpose of training the Machine Learning models. The need for manual training, and absence of semantics struck the two of us as ‘lacking’ from the outset; more specifically, each tokenized word of each tweet was represented as a feature vector – stated differently, data and metadata (e.g., Twitter handles, URLs) were all represented with the same (lacking) degree of semantic expression. Based upon our experience with knowledge-representation frameworks, we immediately sought a semantically richer solution.

Machine Learning: Natural Language Processing

It wasn’t until after I’d made a presentation at GTC 2017 in Silicon Valley the following year that the idea of representing words as embedded vectors would register with me. Working with Jim, two unconventional choices were made – namely, GloVe over word2vec and PyTorch over TensorFlow. Whereas academic articles justified our choice of Stanford’s GloVe, the case for PyTorch was made on less-rigorous grounds – grounds expounded in my GTC presentation and our soon-to-be published book chapter.

Our uptake of GloVe and PyTorch addressed our scientific imperative, as results were obtained for the 2017 instantiation of the same HPCS conference where this idea of tsunami alerting (based upon data extracted from Twitter) was originally hatched. In employing Natural Language Processing (NLP), via embedded word vectors, Jim and I were able to quantitatively explore tweets as word-based time series based upon their co-occurrences – stated differently, this word-vector quantification is based upon ‘the company’ (usage associations) that words ‘keep’. By referencing the predigested corpora available from the GloVe project, we were able to explore “earthquake” and “tsunami” in terms of distances, analogies and various kinds of similarities (e.g., cosine similarity).

Event-Reanalysis Examples

Our NLP approach appeared promising enough that we closed out 2017 with a presentation of our findings to date during an interdisciplinary session on tsunami science at the Fall Meeting of the American Geophysical Union held in New Orleans. To emphasize the scientific applicability of our approach, Jim and I focused on reanalyzing two-pairs of events (see Slide 10 here). Like the pair identified years previously in the 2006 Scientific American article, the more-recent event pairs we chose included earthquake-only plus tsunamigenic events originating in close geographic proximity, with similar oceanic and tectonic settings.

The most-promising results we reported (see slides 11 and 12 here and below) involved those cosine similarities obtained for earthquake-only versus tsunamigenic events; evident via clustering, the approach appears able to discriminate between the two classes of events based upon data extracted from Twitter. Even in our own estimation however, the clustering is weakly discriminating at best, and we expect to apply more-advanced approaches for NLP to further separate classes of events.

Agile Sprints - Events - 2017 AGU Fall Meeting - Twitter Tsunami - December 8, 2017

Discussion

Ultimately, the ability to further validate and operationally deploy this alerting mechanism would require the data from Twitter be streamed and processed in real time – a challenge that some containerized implementation of Apache Spark would seem ideally suited to, for example. (Aspects of this Future Work are outlined in the final section of our HPCS 2017 book chapter.)

When it comes to tsunamis, alerting remains a challenge – especially in those parts of the planet under-serviced by networks of tsunameters … and even seismometers, tide gauges, etc. Thus prospects for enhancing the alerting capabilities remain valuable and warranted. Even though inherently fraught with subjectivity, data extracted from streamed Twitter data in real time appears to hold some promise for providing a data source that compliments the objective output from scientific instrumentation. Our approach, based upon Machine Learning via NLP, has demonstrated promising-enough early signs of success that ‘further research is required’. Given that this initiative has already benefited from useful discussions at conferences, suggestions are welcome, as it’s clear that even NLP has a lot more to offer beyond embedded word vectors.

Pencasting with a Wacom tablet: Time to revisit this option

Around the start of the Fall term in September 2014, I found myself in a bit of a bind: My level of frustration with Livescribe pencasting had peaked, and was I desperately seeking alternatives. To be clear, it was changes to the Livescribe platform that were the source of this frustration, rather than pencasting as a means for visual communication. In fact, if anything, a positive aspect of the Livescribe experience was that I was indeed SOLD on pencasting as an extremely effective means for communicating visually – an approach that delivered significant value in instructional settings such as the large classes I was teaching at the university level.

In an attempt to make use of an alternative to the Livescribe platform then, I discovered and acquired a small Wacom tablet. Whereas I rapidly became proficient in use of the Livescribe Echo smartpen, because it was truly like making use of a regular pen, my own learning curve with the Wacom solution was considerably steeper.

To be concrete, you can view on Youtube a relatively early attempt. As one viewer commented:

Probably should practice the lecture. Too many pauses um er ah.

Honestly, that was more a reflection of my grasp of the Wacom platform than my expertise with the content I was attempting to convey through this real-time screen capture. In other words, my comfort level with this technology was so low that I was distracted by it. Given that many, many thousands of visual (art) professionals make use of this or similar solutions from Wacom, I’m more that willing to admit that this one was ‘on me’ – I wasn’t ‘a natural’.

With the Wacom solution, you need to train your eyes to be fixed on your screen, while your hand writes/draws/etc. on the tablet. Not exactly known for my hand-eye coordination in general, it’s evident that I struggled with this technology. As I look at the results some four years later, I’m not quite as dismayed as I expected to be. My penmanship isn’t all that bad – even though I still find writing and drawing with this tablet to be a taxing exercise in humility. In hindsight, I’m also fairly pleased with the Wacom tablet’s ability to permit use of colour, as well as lines of different thicknesses. This flexibility, completely out of scope in the solution from Livescribe, introduces a whole next level of prospects for visual communication.

Knowing that others have mastered the Wacom platform, and having some personal indication of its potential to produce useful results, I’m left with the idea of giving this approach another try – soon. I’ll let you know how it goes.

How I Ended Up in Geophysical Fluid Dynamics

How I Ended Up in Geophysical Fluid Dynamics

Lately, I’ve been disclosing the various biases I bring to practicing and enabling Data Science. Motivated by my decision to (finally) self-curate an online, multimedia portfolio, I felt such biases to be material in providing the context that frames this effort. Elsewhere, I’ve shared my inherently scientific bias. In this post, I want to provide additional details. These details I’ve been able to extract verbatim from a blog post I wrote for Bright Computing in January 2015; once I’d settled on geophysics (see below), I aspired to be a seismologist … but, as you’ll soon find out, things didn’t pan out quite the way I’d expected:

I always wanted to be a seismologist.

Scratch that: I always wanted to be an astronaut. How could I help it? I grew up in suburban London (UK, not Ontario) watching James Burke cover the Apollo missions. (Guess I’m also revealing my age here!)

Although I never gave my childhood dream of becoming an astronaut more than a fleeting consideration, I did pursue a career in science.

As my high-school education drew to a close, I had my choices narrowed down to being an astronomer, geophysicist or a nuclear physicist. In grade 12 at Laurier Collegiate in Scarboro (Ontario, not UK … or elsewhere), I took an optional physics course that introduced me to astronomy and nuclear physics. And although I was taken by both subjects, and influenced by wonderful teachers, I dismissed both of these as areas of focus in university. As I recall, I had concerns that I wouldn’t be employable if I had a degree in astronomy, and I wasn’t ready to confront the ethical/moral/etc. dilemmas I expected would accompany a choice of nuclear physics. Go figure!

And so it was to geophysics I was drawn, again influenced significantly by courses in physical geography taught by a wonderful teacher at this same high school. My desire to be a seismologist persisted throughout my undergraduate degree at Montreal’s McGill Universitywhere I ultimately graduated with a B.Sc. in solid Earth geophysics. Armed with my McGill degree, I was in a position to make seismology a point of focus.

But I didn’t. Instead, at Toronto’s York University, I applied Geophysical Fluid Dynamics (GFD) to Earth’s deep interior – mostly Earth’s fluid outer core. Nothing superficial here (literally), as the core only begins some 3,000 km below where we stand on the surface!

Full disclosure: In graduate school, the emphasis was GFD. However, seismology crept in from time to time. For example, I made use of results from deep-Earth seismology in estimating the viscosity of Earth’s fluid outer core. Since this is such a deeply remote region of our planet, geophysicists need to content themselves with observations accessible via seismic and other methods.

From making use of Apache Spark to improve the performance of seismic processing (search for “Reverse-Time Seismic Migration” or “RTM” in my Portfolio), to the analysis of ‘seismic data’ extracted from Twitter (search for “Twitter”in my Portfolio), seismology has taken center stage in a number of my projects as a practitioner of Data Science. However, so has the geophysical fluid dynamics of Earth’s mantle and outer core. Clearly, you can have your geeky cake and eat it too!

Data Science: Identifying My Professional Bias

Data Science: Identifying My Professional Bias

In the Summer of 1984, I arrived at Toronto’s York University as a graduate student in Physics & Astronomy. (Although my grad programme was Physics & Astronomy, my research emphasized the application of fluid dynamics to Earth’s deep interior.) Some time after that, I ran my first non-interactive computation on a cluster of VAX computers. I’m not sure if this was my first exposure to Distributed Computing or not not; I am, however, fairly certain that this was the first time it (Distributed Computing) registered with me as something exceedingly cool, and exceedingly powerful.

Even back in those days, armed with nothing more than a VT100 terminal ultimately connected to a serial interface on one of the VAXes, I could be logged in and able to submit a computational job that might run on some other VAX participating in the cluster. The implied connectedness, the innate ability to make use of compute cycles on some ‘remote’ system was intellectually intoxicating – and I wasn’t even doing any parallel computing (yet)!

More than a decade later, while serving in a staff role as a computer coordinator, I became involved in procuring a modest supercomputer for those members of York’s Faculty of Pure & Applied Science who made High Performance Computing (HPC) a critical component of their research. If memory serves me correctly, this exercise resulted in the purchase of a NUMA-architecture system from SGI powered by MIPS CPUs. Though isolated initially, and as a component of the overall solution, Platform LSF was included to manage the computational workloads that would soon consume the resources of this SGI system.

The more I learned about Platform LSF, the more I was smitten by the promise and reality of Distributed Computing – a capability to be leveraged from a resource-centric perspective with this Load Sharing Facility (LSF). [Expletive deleted], Platform founder Songnian Zhou expressed the ramifications of his technical vision for this software as Utopia in a 1993 publication. Although buying the company wasn’t an option, I did manage to be hired by Platform, and work there in various roles for about seven-and-a-half years.

Between my time at Platform (now an IBM company) and much-more recently Univa, over a decade of my professional experience has been spent focused on managing workloads in Distributed Computing environments. From a small handful of VAXes, to core counts that have reached 7 figures, these environments have included clusters, grids and clouds.

My professional bias towards Distributed Computing was further enhanced through the experience of being employed by two software vendors who emphasized the management of clusters – namely Scali (Scali Manage) and subsequently Bright Computing (Bright Cluster Manager). Along with Univa (Project Tortuga and Navops Launch), Bright extended their reach to the management of HPC resources in various cloud configurations.

If it wasn’t for a technical role at Allinea (subsequently acquired by ARM), I might have ended up ‘stuck in the middle’ of the computational stack – as workload and cluster management is regarded by the HPC community (at least) as middleware … software that exists between the operating environment (i.e., the compute node and its operating system) and the toolchain (e.g., binaries, libraries) that ultimately support applications and end users (e.g., Figure 5 here).

Allinea’s focuses on tools to enable HPC developers. Although they were in the process of broadening their product portfolio to include a profiling capability around the time of my departure, in my tenure there the emphasis was on a debugger – a debugger capable of handling code targeted for (you guessed it) Distributed Computing environments.

Things always seemed so much bigger when we were children. Whereas Kid Ian was impressed by a three-node VAX cluster, and later ‘blown away’ by a modest NUMA-architecture ‘supercomputer’, Adult Ian had the express privilege of running Allinea DDT on some of the largest supercomputers on the planet (at the time) – tracking down a bug that only showed up when more than 20K cores were used in parallel on one of Argonne’s Blue Genes, and demonstrating scalable, parallel debugging during a tutorial on some 700K cores of NCSA’s Blue Waters supercomputer. In hindsight, I can’t help but feel humbled by this impressive capability of Allinea DDT to scale to these extremes. Because HPC’s appetite for scale has extended beyond tera and petascale capabilities, and is eyeing seriously the demand to perform at the exascale, software like Allinea DDT needs also to match this penchant for extremely extreme scale.

At this point, suffice it to say that scalable Distributed Computing has been firmly encoded into my professional DNA. As with my scientifically based academic bias, it’s difficult not to frame my predisposition towards Distributed Computing in a positive light within the current context of Data Science. Briefly, it’s a common experience for the transition from prototype-to-production to include the introduction of Distributed Computing – if not only to merely execute applications and/or their workflows on more powerful computers, but perhaps to simultaneously scale these in parallel.

I anticipate the need to return to this disclosure regarding the professional bias I bring to Data Science. For now though, calling out the highly influential impact Distributed Computing has had on my personal trajectory, appears warranted within the context of my Data Science Portfolio.

Data Science: Celebrating Academic Personal Bias

Data Science: Celebrating My Academic Bias

In a recent post, I introduced my Data Science Portfolio. After describing the high-level organization of the Portfolio, I noted:

At the end, and for now, there is a section on my academic background – a background that has shaped so much of those intersections between science and technology that have been captured in the preceding sections of the portfolio.

Even in this earliest of drafts, I knew that I was somewhat uncomfortable with a section dedicated to academics in my Portfolio. After all shouldn’t a portfolio place more emphasis on how my knowledge and skills, academic or otherwise, have been applied to produce some tangible artifact?

Upon further reflection, I currently believe what’s material in the context of a portfolio is some indication of the bias inherent in the resulting curated showcase of one’s work. Of course to some degree the works presented, and the curation process itself, will make self-evident such personal bias.

Whereas it may make sense for an artist not to overtly disclose any bias with respect to their craft, or a curated collection their work, I currently perceive absolutely no downside in sharing my personal bias – a bias that in my own case, I believe reflects only in positive ways on the Portfolio as well as the individual items included in it.

To this end, and in the spirit of such a positive self-disclosure, my personal bias reflects my formative years in science – a background to which I well recall significant contributions from high school, that were subsequently broadened and deepened as an undergraduate and then graduate student. Even more specifically in terms of personal bias was my emphasis on the physical sciences; a bias that remains active today.

As I’ve started to share, through such posts as the one on the mathematical credentials I bring to Data Science, my choice to pursue the physical sciences was an excellent one – even through the self-critical lens of personal hindsight. An excellent choice, but albeit a biased one.

The very nature of Data Science is such that each of us carries with us our own, wonderfully unique personal bias. As we necessarily collaborate in team, project and organizational settings, I believe it’s important to not only ensure each of us preserves their personal bias, but that we leverage this perspective as fully and appropriately as possible. As a consequence it is much more likely that everyone we work with, and everything we work on, will derive maximal value.

Recent Workshop: Nurturing Quantitative Skills for the Physical Sciences through use of Scientific Models

unst_water_oilA few weeks back, I delivered a workshop at a conference focused on online learning. Unfortunately, abstracts were not made available via the event’s web site. In lieu of directing you elsewhere then, below is the abstract I submitted:

Nurturing Quantitative Skills for the Physical Sciences through use of Scientific Models

L. I. Lumb
Division of Natural Science, Faculty of Science, York University
Toronto, Ontario, Canada

With numerous scientists asserting that we have entered into The Anthropocene, a ‘brand new’ Geologic Epoch that underscores human impact on planet Earth, there has arguably never been a more relevant time for literacy in the physical sciences. Complicating this, however, is the implied need for quantitative skills demanded of those who seek to have more than a superficial degree of literacy in matters relating to climate or global change. Grounded by direct, personal experience in teaching science to non-scientists at the undergraduate university level, and independently validated by academic research into Science Technology Engineering Math (STEM) related programs and subjects, mastery of even the most-basic quantitative skills presents a well-established challenge in engaging learners at levels beyond the quantitatively superficial – a challenge that appears to be increasingly the case with the arriving cohort of undergraduates each Fall. In an effort to systematically develop and encourage proficiency in quantitative skills in data-rich courses in the physical sciences, a number of scientific models have been introduced by the author. Ranging from embarrassingly simple physical models using rice to investigate relative humidity in Earth’s atmosphere, to software-based models that employ spreadsheets to elucidate aspects of climate and global change, the use of scientific models presents intriguing challenges and opportunities for both instructors and students; needless to state, these challenges and opportunities can be significantly exacerbated in courses that are delivered online to numbers in excess of 100 students. After an introduction of scientific models as a pedagogical vehicle for nurturing quantitative skills, emphasis shifts to the sharing of real-world experiences with this approach in relatively large, online courses in physical sciences taught at the undergraduate level to non-majors (and therefore non-scientists). In ultimately working towards the primary example of a relatively simple, yet scientifically appropriate spreadsheet model for the Paris Climate Agreement, participants’ involvement will be scaffolded through use of other examples of models that have also been used in practice. Participants will also be encouraged to engage in a dialogue that compares and contrasts these models with more traditional approaches (e.g., formal essays). Finally, armed with some context for models as a pedagogical vehicle for quantitatively enhancing student engagement, participants will be guided through exercises that will allow them to develop their own models for their own teaching and learning requirements – whether their interests fall within or beyond scientifically oriented disciplines.

As you can see, I have a vested interest in nurturing quantitative skills, and models is one of the vehicles I make use of. If you share similar interests or better yet, if you have ideas as to what’s worked for you, please feel free to comment.

Current Events in the Classroom: Experiments on Mars-Like Clouds Stimulate the Learning Process

Everyone has an appreciation for humidity and clouds … However, when you seek to understand humidity and clouds from the scientific perspective, `things get technical‘ in a hurry! As someone who attempts to share science with non-scientists, it’s wonderful to be able to work current events into the (physical/virtual) classroom. Some recent experimental results, aimed at simulating Martian-style clouds, allow for a highly topical teachable moment.

For the details, please see below my recent post (via Moodle) to my Weather and Climate class at Toronto’s York University:

Image

Now, if only I could have such a cloud chamber in the (virtual) classroom …

Pencasting During Lectures in Large Venues

In a recent post on pencasting as a way of teaching/learning weather and climate, I stated:

Monday (October 1, 2012), I intend to use a pencast during my lecture – to introduce aspects of the stability of Earth’s atmosphere. I’ll try to share here how it went. For this intended use of the pencast, I will use a landscape mode for presentation – as I expect that’ll work well in the large lecture hall I teach in. I am, however, a little concerned that the lines I’ll be drawing will be a little too thin/faint for the students at the back of the lecture theatre to see …

I followed through as advertized (above) earlier today.

Image

My preliminary findings are as follows:

  • The visual aspects of the pencast are quite acceptable – This is true even in large lecture halls such as the 500-seat Price Family Cinema at York University (pictured above) in Toronto, Canada where I am currently teaching. I used landscape mode for today’s pencast, and zoomed it in a little. A slightly thicker pen option would be wonderful for such situations … as would different pen colours (the default is green).
  • The audio quality of the pencasts is very good to excellent – Although my Livescribe pen came with a headset/microphone, I don’t use it. I simply use the built-in microphone on the pen, and speak normally when I am developing pencasts. Of course, the audio capabilities of the lecture hall I teach in are most excellent for playback!
  • One-to-many live streaming of pencasts works well – I streamed live directly from myLivescibe today. I believe the application infrastructure is based largely on Adobe Flash and various Web services delivered by Web Objects. Regardless of the technical underpinnings, live streaming worked well. Of course, I could’ve developed a completely self-contained PDF file, downloaded this, and run the pencast locally using Adobe Reader.
  • Personal pencasting works well – I noticed that a number of students were streaming the pencast live for themselves during the lecture. In so doing, they could control interaction with the pencast.

Anecdotally, a few students mentioned that they appreciated the pencast during the break period – my class meets once per for a three-hour session.

Although I’ve yet to hear this feedback directly from the students, I believe I need to:

  • Decrease the duration of pencasts – Today’s lasts about 10 minutes
  • Employ a less-is-more approach/strategy – My pencasts are fairly involved when done …
  • Experiment with the right balance of speaking to penning (is that even a word!?) – Probably a less-is-more approach/strategy would work well here for both the penned and spoken word …

Finally, today’s pencast on the basics of atmospheric stability:

  • Previous approach – Project an illustration taken directly from the course’s text. This is a professionally produced, visually appealing, detailed, end-result, static diagram that I embedded in my presentation software (I use Google Docs for a number of reasons.) Using a laser pointer, my pedagogy called for a systematic deconstruction this diagram – hoping that the students would be engaged enough to actually follow me. Of course, in the captured versions of my lectures, the students don’t actually see where I’m directing the laser pointer. The students have access to the course text and my lecture slides. I have no idea if/how they attempt to ingest and learn from this approach.
  • Pencasting – As discussed elsewhere, the starting point is a blank slate. Using the pencasting technology, I sketch my own rendition of the illustration from the text. As I build up the details, I explain the concept of stability analyses. Because the sketch appears as I speak, the students have the potential to follow me quite closely – and if they miss anything, they can review the pencast after class at their own pace. The end result of a pencast is a sketch that doesn’t hold a candle to the professionally produced illustration provided in the text and my lecture notes. However, to evaluate the pencast as merely a final product, I believe, misses the point completely. Why? I believe the pencast is a far superior way to teach and to learn in situations such as this one. Why? I believe the pencast allows the teacher to focus on communication – communication that the learner can also choose to be highly receptive to, and engaged by.

I still regard myself as very much a neophyte in this arena. However, as the above final paragraphs indicate, pencasting is a disruptive innovation whose value in teaching/learning merits further investigation.

Teaching/Learning Weather and Climate via Pencasting

I first heard about it a few years ago, and thought it sounded interesting … and then, this past Summer, I did a little more research and decided to purchase a Livescribe 8 GB Echo(TM) Pro Pack. Over the Summer, I took notes with the pen from time-to-time and found it to be somewhat useful/interesting.

Just this week, however, I decided it was time to use the pen for the originally intended purpose: Making pencasts for the course I’m currently teaching in weather and climate at Toronto’s York University. Before I share some sample pencasts, please allow me to share my findings based on less than a week’s worth of `experience’:

  • Decent-quality pencasts can be produced with minimal effort – I figured out the basics (e.g., how to record my voice) in a few minutes, and started on my first pencast. Transferring the pencast from the pen to the desktop software to the Web (where it can be shared with my students) also requires minimal effort. “Decent quality” here refers to both the visual and audio elements. The fact that this is both a very natural (writing with a pen while speaking!) and speedy (efficient/effective) undertaking means that I am predisposed towards actually using the technology whenever it makes sense – more on that below. Net-net: This solution is teacher-friendly.
  • Pencasts compliment other instructional media – This is my current perspective … Pencasts compliment the textbook readings I assign, the lecture slides plus video/audio captures I provide, the Web sites we all share, the Moodle discussion forums we engage in, the Tweets I issue, etc. In the spirit of blended learning it is my hope that pencasts, in concert with these other instructional media, will allow my TAs and I to `reach’ most of the students in the course.
  • Pencasts allow the teacher to address both content and skills-oriented objectives – Up to this point, my pencasts have started from a blank page. This forces me to be focused, and systematically develop towards some desired content (e.g., conceptually introducing the phase diagram for H2O) and/or skills (e.g., how to calculate the slope of a line on a graph) oriented outcome. Because students can follow along, they have the opportunity to be fully engaged as the pencast progresses. Of course, what this also means is that this technology can be as effective in the first-year university level course I’m currently teaching, but also at the academic levels that precede (e.g., grade school, high school, etc.) and follow (senior undergraduate and graduate) this level.
  • Pencasts are learner-centric – In addition to be teacher-friendly, pencasts are learner-centric. Although a student could passively watch and listen to a pencast as it plays out in a linear, sequential fashion, the technology almost begs you to interact with it. As noted previously, this means a student can easily replay some aspect of the pencast that they missed. Even more interestingly, however, students can interact with pencasts in a random-access mode – a mode that would almost certainly be useful when they are attempting to apply the content/skills conveyed through the pencast to a tutorial or assignment they are working on, or a quiz or exam they are actively studying for. It is important to note that both the visual and audio elements of the pencast can be manipulated with impressive responsiveness to random-access input from the student.
  • I’m striving for authentic, not perfect pencasts – With a little more practice and some planning/scripting, I’d be willing to bet that I could produce an extremely polished pencast. Based on past experience teaching today’s first-year university students, I’m fairly convinced that this is something they couldn’t care less about. Let’s face it, my in-person lectures aren’t perfectly polished, and neither are my pencasts. Because I can easily go back to existing pencasts and add to them, I don’t need to fret too much about being perfect the first time. Too much time spent fussing here would diminish the natural and speedy aspects of the technology.

Findings aside, on to samples:

  • Calculating the lapse rate for Earth’s troposphere – This is a largely a skills-oriented example. It was my first pencast. I returned twice to the original pencast to make changes – once to correct a spelling mistake, and the second time to add in a bracket (“Run”) that I forgot. I communicated these changes to the students in the course via an updated link shared through a Moodle forum dedicated to pencasts. If you were to experience the updates, you’d almost be unaware of the lapse of time between the original pencast and the updates, as all of this is presented seamlessly as a single pencast to the students.
  • Introducing the pressure-temperature phase diagram for H2O – This is largely a content-oriented example. I got a little carried away in this one, and ended up packing in a little too much – the pencast is fairly long, and by the time I’m finished, the visual element is … a tad on the busy side. Experience gained.

Anecdotally, initial reaction from the students has been positive. Time will tell.

Next steps:

  • Monday (October 1, 2012), I intend to use a pencast during my lecture – to introduce aspects of the stability of Earth’s atmosphere. I’ll try to share here how it went. For this intended use of the pencast, I will use a landscape mode for presentation – as I expect that’ll work well in the large lecture hall I teach in. I am, however, a little concerned that the lines I’ll be drawing will be a little too thin/faint for the students at the back of the lecture theatre to see …
  • I have two sections of the NATS 1780 Weather and Climate course to teach this year. One section is taught the traditional way – almost 350 students in a large lecture theatre, 25-student tutorial groups, supported by Moodle, etc. In striking contrast to the approach taken in the meatspace section, is the second section where almost everything takes place online via Moodle. Although I have yet to support this hypothesis with any data, it is my belief that these pencasts are an excellent way to reach out to the students in the Internet-only section of the course. More on this over the fullness of time (i.e., the current academic session.)

Feel free to comment on this post or share your own experiences with pencasts.