Revisiting the Estimation of Fractal Dimension for Image Classification

Classification is a well-established use case for Machine Learning. Though textbook examples abound, standard examples include the classification of email into ham versus spam, or images of cats versus dogs.

Circa 1994, I was unaware of Machine Learning, but I did have a use case for quantitative image classification. I expect you’re familiar with those brave souls known as The Hurricane Hunters – brave because they explicitly seek to locate the eyes of hurricanes using an appropriately tricked out, military-grade aircraft. Well, these hunters aren’t the only brave souls when it comes to chasing down storms in the pursuit of atmospheric science. In an effort to better understand Atlantic storms (i.e., East Coast, North America), a few observational campaigns featured aircraft flying through blizzards at various times during Canadian winters.

In addition to standard instrumentation for atmospheric and navigational observables, these planes were tricked out in an exceptional way:

For about two-and-a-half decades, Knollenberg-type [ref 4] optical array probes have been used to render in-situ digital images of hydrometeors. Such hydrometeors are represented as a two-dimensional matrix, whose individual elements depend on the intensity of transmitted light, as these hydrometeors pass across a linear optical array of photodiodes. [ref 5]

In other words, the planes were equipped with underwing optical sensors that had the capacity to obtain in-flight images of

hydrometeor type, e.g. plates, stellar crystals, columns, spatial dendrites, capped columns, graupel, and raindrops. [refs 1,7]

(Please see the original paper for the references alluded to here.)

Even though this is hardly a problem in Big Data, a single flight might produce tens to hundreds to thousands of hydrometeor images that needed to be manually classified by atmospheric scientists. Working for a boutique consultancy focused on atmospheric science, and having excellent relationships with Environment Canada scientists who make Cloud Physics their express passion, an opportunity to automate the classification of hydrometeors presented itself.

Around this same time, I became aware of fractal geometrya visually arresting and quantitative description of nature popularized by proponents such as Benoit Mandlebrot. Whereas simple objects (e.g., lines, planes, cubes) can be associated with an integer dimension (e.g., 1, 2 and 3, respectively), objects in nature (e.g., a coastline, a cloud outline) can be better characterized by a fractional dimension – a real-valued fractal dimension that lies between the integer value for a line (i.e., 1) and the two-dimensional (i.e., 2) value for a plane.

Armed with an approach for estimating fractal dimension then, my colleagues and I sought to classify hydrometeors based on their subtle to significant geometrical expressions. Although the idea was appealing in principle, the outcome on a per-hydrometeor basis was a single, scalar result that attempted to capture geometrical uniqueness. In isolation, this approach was simply not enough to deliver an automated scheme for quantitatively classifying hydrometeors.

I well recall some of the friendly conversations I had with my scientific and engineering peers who attended the conference at Montreal’s Ecole Polytechnique. Essentially, the advice I was given, was to regard the work I’d done as a single dimension of the hydrometeor classification problem. What I really needed to do was develop additional dimensions for classifying hydrometeors. With enough dimensions then, the resulting multidimensional classification scheme would be likely to have a much-better chance of delivering the automated solution sought by the atmospheric scientists.

In my research, fractal dimensions were estimated using various algorithms; they were not learned. However, they could be – as is clear from the efforts of others (e.g., the prediction of fractal dimension via Machine Learning). And though my pursuit of such a suggestion will have to wait for a subsequent research effort, a learned approach might allow for the introduction of much more of a multidimensional scheme for quantitative classification of hydrometeors via Machine Learning. Of course, from the hindsight of 2018, there are a number possibilities for quantitative classification via Machine Learning – possibilities that I fully expect would result in more useful outcomes.

Whereas fractals don’t receive as much attention these days as they once did, and certainly not anything close to the deserved hype that seems to pervade most discussions of Machine Learning, there may still be some value in incorporating their ability to quantify geometry into algorithms for Machine Learning. From a very different perspective, it might be interesting to see if the architecture of deep neural networks can be characterized through an estimation of their fractal dimension – if only to tease out geometrical similarities that might be otherwise completely obscured.

While I, or (hopefully) others, ponder such thoughts, there is no denying the stunning expression of the fractal geometry of nature that fractals have rendered visual.

Data Science: Celebrating Academic Personal Bias

Data Science: Celebrating My Academic Bias

In a recent post, I introduced my Data Science Portfolio. After describing the high-level organization of the Portfolio, I noted:

At the end, and for now, there is a section on my academic background – a background that has shaped so much of those intersections between science and technology that have been captured in the preceding sections of the portfolio.

Even in this earliest of drafts, I knew that I was somewhat uncomfortable with a section dedicated to academics in my Portfolio. After all shouldn’t a portfolio place more emphasis on how my knowledge and skills, academic or otherwise, have been applied to produce some tangible artifact?

Upon further reflection, I currently believe what’s material in the context of a portfolio is some indication of the bias inherent in the resulting curated showcase of one’s work. Of course to some degree the works presented, and the curation process itself, will make self-evident such personal bias.

Whereas it may make sense for an artist not to overtly disclose any bias with respect to their craft, or a curated collection their work, I currently perceive absolutely no downside in sharing my personal bias – a bias that in my own case, I believe reflects only in positive ways on the Portfolio as well as the individual items included in it.

To this end, and in the spirit of such a positive self-disclosure, my personal bias reflects my formative years in science – a background to which I well recall significant contributions from high school, that were subsequently broadened and deepened as an undergraduate and then graduate student. Even more specifically in terms of personal bias was my emphasis on the physical sciences; a bias that remains active today.

As I’ve started to share, through such posts as the one on the mathematical credentials I bring to Data Science, my choice to pursue the physical sciences was an excellent one – even through the self-critical lens of personal hindsight. An excellent choice, but albeit a biased one.

The very nature of Data Science is such that each of us carries with us our own, wonderfully unique personal bias. As we necessarily collaborate in team, project and organizational settings, I believe it’s important to not only ensure each of us preserves their personal bias, but that we leverage this perspective as fully and appropriately as possible. As a consequence it is much more likely that everyone we work with, and everything we work on, will derive maximal value.

Ian Lumb’s Data Science Portfolio

Ian Lumb’s Data Science Portfolio

I had the very fortunate opportunity to present some of my research at GTC 2017 in Silicon Valley. Even after 3 months, I found GTC to be of lasting impact. However, my immediate response to the event was to reflect upon my mathematical credentials – credentials that would allow me to pursue Deep Learning with the increased breadth and depth demanded by my research project. I crystallized this quantitative reflection into a very simple question: Do I need to go back to school? (That is, back to school to enhance my mathematical credentials.)

There were a number of outcomes from this reflection upon my math creds for Deep Learning. Although the primary outcome was a mathematical ‘gap analysis’, a related outcome is this Data Science Portfolio that I’ve just started to develop. You see, after I reflected upon my mathematical credentials, it was difficult not to broaden and deepen that reflection; so, in a sense, this Data Science Portfolio is an outcome of that more-focused reflection.

As with the purely mathematical reflection, the effort I’m putting into self-curating my Data Science Portfolio allows me to showcase existing contributions (the easy part), but simultaneously raises interesting challenges and opportunities for future efforts (the difficult part). More on the future as it develops …

For now, the portfolio is organization into two broad categories:

  • Data Science Practitioner – intended to showcase my own contributions towards the practice of Data Science
  • Data Science Enabler – intended to showcase those efforts that have enabled other Data Scientists

At the end, and for now, there is a section on my academic background – a background that has shaped so much of those intersections between science and technology that have been captured in the preceding sections of the portfolio.

Although I expect there’ll be more to share as this portfolio develops, I did want to share one observation immediately: When placed in the context of a portfolio, immune to the chronological tyranny of time, it is fascinating to me to see themes that form an arc through seemingly unrelated efforts. One fine example is the matter of semantics. In representing knowledge, for example, semantics were critical to the models I built using self-expressive data (i.e., data successively encapsulated via XML, RDF and ultimately OWL). And then again, in processing data extracted from Twitter via Natural Language Processing (NLP), I’m continually faced with the challenge of ‘retaining’ a modicum of semantics in approaches based upon Machine Learning. I did not plan this thematic arc of semantics; it is therefore fascinating to see such themes exposed – exposed particularly well by the undertaking of portfolio curation.

There’s no shortage of Data Science portfolios to view. However one thing that’s certain, is that these portfolios are likely to be every bit as diverse and varied as Data Science itself, compounded by the uniqueness of the individuals involved. And that, of course, is a wonderful thing.

Thank you for taking the time to be a traveller at the outset of this journey with me. If you have any feedback whatsoever, please don’t hesitate to reach out via a comment and/or email to ian [DOT] lumb [AT] gmail [DOT] com. Bon voyage!