Ian Lumb’s Data Science Portfolio

Ian Lumb’s Data Science Portfolio

I had the very fortunate opportunity to present some of my research at GTC 2017 in Silicon Valley. Even after 3 months, I found GTC to be of lasting impact. However, my immediate response to the event was to reflect upon my mathematical credentials – credentials that would allow me to pursue Deep Learning with the increased breadth and depth demanded by my research project. I crystallized this quantitative reflection into a very simple question: Do I need to go back to school? (That is, back to school to enhance my mathematical credentials.)

There were a number of outcomes from this reflection upon my math creds for Deep Learning. Although the primary outcome was a mathematical ‘gap analysis’, a related outcome is this Data Science Portfolio that I’ve just started to develop. You see, after I reflected upon my mathematical credentials, it was difficult not to broaden and deepen that reflection; so, in a sense, this Data Science Portfolio is an outcome of that more-focused reflection.

As with the purely mathematical reflection, the effort I’m putting into self-curating my Data Science Portfolio allows me to showcase existing contributions (the easy part), but simultaneously raises interesting challenges and opportunities for future efforts (the difficult part). More on the future as it develops …

For now, the portfolio is organization into two broad categories:

  • Data Science Practitioner – intended to showcase my own contributions towards the practice of Data Science
  • Data Science Enabler – intended to showcase those efforts that have enabled other Data Scientists

At the end, and for now, there is a section on my academic background – a background that has shaped so much of those intersections between science and technology that have been captured in the preceding sections of the portfolio.

Although I expect there’ll be more to share as this portfolio develops, I did want to share one observation immediately: When placed in the context of a portfolio, immune to the chronological tyranny of time, it is fascinating to me to see themes that form an arc through seemingly unrelated efforts. One fine example is the matter of semantics. In representing knowledge, for example, semantics were critical to the models I built using self-expressive data (i.e., data successively encapsulated via XML, RDF and ultimately OWL). And then again, in processing data extracted from Twitter via Natural Language Processing (NLP), I’m continually faced with the challenge of ‘retaining’ a modicum of semantics in approaches based upon Machine Learning. I did not plan this thematic arc of semantics; it is therefore fascinating to see such themes exposed – exposed particularly well by the undertaking of portfolio curation.

There’s no shortage of Data Science portfolios to view. However one thing that’s certain, is that these portfolios are likely to be every bit as diverse and varied as Data Science itself, compounded by the uniqueness of the individuals involved. And that, of course, is a wonderful thing.

Thank you for taking the time to be a traveller at the outset of this journey with me. If you have any feedback whatsoever, please don’t hesitate to reach out via a comment and/or email to ian [DOT] lumb [AT] gmail [DOT] com. Bon voyage!

On Knowledge-Based Representations for Actionable Data …

I bumped into a professional acquaintance last week. After describing briefly a presentation I was about to give, he offered to broker introductions to others who might have an interest in the work I’ve been doing. To initiate the introductions, I crafted a brief description of what I’ve been up to for the past 5 years in this area. I’ve also decided to share it here as follows: 

As always, [name deleted], I enjoyed our conversation at the recent AGU meeting in Toronto. Below, I’ve tried to provide some context for the work I’ve been doing in the area of knowledge representations over the past few years. I’m deeply interested in any introductions you might be able to broker with others at York who might have an interest in applications of the same.

Since 2004, I’ve been interested in expressive representations of data. My investigations started with a representation of geophysical data in the eXtensible Markup Language (XML). Although this was successful, use of the approach underlined the importance of metadata (data about data) as an oversight. To address this oversight, a subsequent effort introduced a relationship-centric representation via the Resource Description Format (RDF). RDF, by the way, forms the underpinnings of the next-generation Web – variously known as the Semantic Web, Web 3.0, etc. In addition to taking care of issues around metadata, use of RDF paved the way for increasingly expressive representations of the same geophysical data. For example, to represent features in and of the geophysical data, an RDF-based scheme for annotation was introduced using XML Pointer Language (XPointer). Somewhere around this point in my research, I placed all of this into a framework.

A data-centric framework for knowledge representation.

A data-centric framework for knowledge representation.

 In addition to applying my Semantic Framework to use cases in Internet Protocol (IP) networking, I’ve continued to tease out increasingly expressive representations of data. Most recently, these representations have been articulated in RDFS – i.e., RDF Schema. And although I have not reached the final objective of an ontological representation in the Web Ontology Language (OWL), I am indeed progressing in this direction. (Whereas schemas capture the vocabulary of an application domain in geophysics or IT, for example, ontologies allow for knowledge-centric conceptualizations of the same.)  

From niche areas of geophysics to IP networking, the Semantic Framework is broadly applicable. As a workflow for systematically enhancing the expressivity of data, the Framework is based on open standards emerging largely from the World Wide Web Consortium (W3C). Because there is significant interest in this next-generation Web from numerous parties and angles, implementation platforms allow for increasingly expressive representations of data today. In making data actionable, the ultimate value of the Semantic Framework is in providing a means for integrating data from seemingly incongruous disciplines. For example, such representations are actually responsible for providing new results – derived by querying the representation through a ‘semantified’ version of the Structured Query Language (SQL) known as SPARQL. 

I’ve spoken formally and informally about this research to audiences in the sciences, IT, and elsewhere. With York co-authors spanning academic and non-academic staff, I’ve also published four refereed journal papers on aspects of the Framework, and have an invited book chapter currently under review – interestingly, this chapter has been contributed to a book focusing on data management in the Semantic Web. Of course, I’d be pleased to share any of my publications and discuss aspects of this work with those finding it of interest.

With thanks in advance for any connections you’re able to facilitate, Ian. 

If anything comes of this, I’m sure I’ll write about it here – eventually!

In the meantime, feedback is welcome.

Recent Articles on Bright Hub

I’ve added a few more articles over on Bright Hub:

Google Chrome for Linux on Bright Hub: Series Expanded

I recently posted on a new article series on Google Chrome for Linux that I’ve been developing over on Bright Hub. My exploration has turned out to be more engaging than I anticipated! At the moment, there are six articles in the series:

I anticipate a few more …

It’s also important to share that Google Chrome for Linux does not yet exist as an end-user application. Under the auspices of the Chromium Project, however, there is a significant amount of work underway. And because this work is taking place out in the open (Chromiun is an Open Source Project), now is an excellent time to engage – especially for serious enthusiasts.

Juniper Seminar: Key Takeaways

Yesterday, I attended the Toronto session of a Juniper seminar focused on security and datacenter solutions.

The following are the key takeaways I extracted:

  • Juniper is standards-oriented. In the area of NAC, e.g., they are co-chairing with Symantec the Trusted Computing Group‘s Trusted Network Connect (TNC) effort. It’s not (yet) clear to me how the TCG interplays with the IETF … And speaking of IETF, Juniper’s Network and Security Manager (NSM) makes use of IETF’s NetConf standard in, e.g., simplifying the provisioning of new devices on the network.
  • Juniper has a comprehensive portfolio of offerings at the intersection of security and networking. Interestingly, Juniper’s Security Threat Response Manager (STRM) OEMs technology from Q1Labs.
  • 802.1x is a solid bet. Based on a number of trends, and a variety of requirements, Juniper promotes use of 802.1x. Even though this is a path we’ve already identified, it’s good to have it independently validated …
  • Security, and other services, can be offloaded to purpose-built devices in the core. Instead of inserting, e.g., a FWSM into a device (e.g., a Cisco 65xx) that is primarily providing routing and switching services, Juniper has recently introduced a new paradigm with its SRX series. Touted as a services gateway for the core, the purpose of the SRX is to offload from the routing/switching devices various services – e.g., firewall, VPN, etc. As I understand it, the SRX runs JUNOS with various enhancements from ScreenOS (their O/S from their firewall devices). Even if you don’t make use of Juniper solutions, it may make sense to understand and potentially apply the offloading-of-services concept/paradigm in your core.
  • Juniper allows for the virtualization of switches. Juniper Virtual Chassis (VC) is currently only available for their EX 4200 platform. With VC, it’s possible to virtualize up to 10 physically distinct EX 4200s into one. Within the next year, Juniper plans to provide VC on, e.g., their EX 8200 platform. Because vmWare’s vMotion requires layer-2 adjacency, server virtualization may prove to be a significant driver for switch virtualization. I expect that this will prove, e.g., to be particularly relevant in providing failover services (at the networking layer) between multiple, physically distinct, and geographically separated locations.

Even though the event appeared to be more of the sales-y/marketing-y variety, there was substantial technical content in evidence.

RDF-ization: Is That What I’ve Been Up To?

Recently, on his blogKingsley Idehen wrote:

RDF-ization is a term used by the Semantic Web community to describe the process of generating RDF from non RDF Data Sources such as (X)HTML, Weblogs, Shared Bookmark Collections, Photo Galleries, Calendars, Contact Managers, Feed Subscriptions, Wikis, and other information resource collections.

Although Idehen identifies a number of data sources, he does not explicitly identify two data sources I’ve been spending a fair amount of time with over the past few years: 

Of course, whether the motivation is personal/social-networking or scientific/IT related, the attention to RDF-ization is win-win for all stakeholders. Why? Anything that accelerates the RDF-ization of non-RDF data sources brings us that much closer to realizing the true value of the Semantic Web.

Chrome Today the Stainless Way

Even though I recently whined about the wait for Google Chrome on Mac and Linux platforms, I haven’t spent any effort empowering some other browser with Chrome-like features/functionalities – even though this is possible. And although CrossOver Chromium caught my attention, I wasn’t sufficiently motivated to evaluate it either.

Somewhat surprising then is the fact that I have gravitated rapidly towards a quick-and-dirty evaluation of “Stainless – a multiprocess browser for OS X inspired by Google Chrome.” Inspired is definitely the operative word here as:

… the Mac version of Chrome will use a WebCore-rendered bitmap to pass between the browser and rendering processes. The strategy we use in Hypercube (and now Stainless) is far less ambitious, but a whole lot easier to do and, thus, available today for your downloading pleasure (for Leopard only, sorry).

And what a pleasure it is!
Honestly, based on the recent LifeHacker post, I expected a whole lot less than what Stainless actually delivers today – in version 0.1!

Based on about 30-minutes experience, Stainless:
  • Performs well – It loads Web pages quickly. And as “ps -alx | grep -i stainless” indicates, Stainless really is a multiprocess browser for OS X. For me, this alone makes Stainless worth the effort.
  • Supports AJAX – I’m writing this blog post using Google Docs via Stainless. Stainless worked fine on my initial tests with other Google productivity apps – I tested Google Spreadsheets and GMail. I therefore have some level of comfort in proclaiming it as supporting AJAX. Nice!
I’m sure I’ll have more to say soon … In the meantime, though, even at this early stage Stainless is definitely worth a serious look.

CANHEIT 2008: Update on Semantic Topologies Presentation

As I blog, CANHEIT 2008 is winding down …

And although my entire presentation will soon appear online at the conference’s Web site, I thought I’d share here an updated version of the approach image shared previously.

As you’ll see from the presentation, this work is now progressing well. There should be more to share soon.

CANHEIT 2008: York Involvement

York University will be well represented at CANHEIT 2008
Although you’ll find the details in CANHEIT’s online programme, allow me to whet your appetite regarding our contributions:

Annotation Modeling: To Appear in Comp & Geosci

What a difference a day makes!
Yesterday I learned that my paper on semantic platforms was rejected.
Today, however, the news was better as a manuscript on annotation modeling was
accepted for publication.
It’s been a long road for this paper:

The abstract of the paper is as follows:

Annotation Modeling with Formal Ontologies:
Implications for Informal Ontologies

L. I. Lumb[1], J. R. Freemantle[2], J. I. Lederman[2] & K. D.
Aldridge[2]
[1] Computing and Network Services, York University, 4700 Keele Street,
Toronto, Ontario, M3J 1P3, Canada
[2] Earth & Space Science and Engineering, York University, 4700 Keele
Street, Toronto, Ontario, M3J 1P3, Canada
Knowledge representation is increasingly recognized as an important component of any cyberinfrastructure (CI). In order to expediently address scientific needs, geoscientists continue to leverage the standards and implementations emerging from the World Wide Web Consortium’s (W3C) Semantic Web effort. In an ongoing investigation, previous efforts have been aimed towards the development of a semantic framework for the Global Geodynamics Project (GGP). In contrast to other efforts, the approach taken has emphasized the development of informal ontologies, i.e., ontologies that are derived from the successive extraction of Resource Description Format (RDF) representations from eXtensible Markup Language (XML), and then Web Ontology Language (OWL) from RDF. To better understand the challenges and opportunities for incorporating annotations into the emerging semantic framework, the present effort focuses on knowledge-representation modeling involving formal ontologies. Although OWL’s internal mechanism for annotation is constrained to ensure computational completeness and decidability, externally originating annotations based on the XML Pointer Language (XPointer) can easily violate these constraints. Thus the effort of modeling with formal ontologies allows for recommendations applicable to the case of incorporating annotations into informal ontologies.

I expect the whole paper will be made available in the not-too-distant future …