RDF-ization: Is That What I’ve Been Up To?

Recently, on his blogKingsley Idehen wrote:

RDF-ization is a term used by the Semantic Web community to describe the process of generating RDF from non RDF Data Sources such as (X)HTML, Weblogs, Shared Bookmark Collections, Photo Galleries, Calendars, Contact Managers, Feed Subscriptions, Wikis, and other information resource collections.

Although Idehen identifies a number of data sources, he does not explicitly identify two data sources I’ve been spending a fair amount of time with over the past few years: 

Of course, whether the motivation is personal/social-networking or scientific/IT related, the attention to RDF-ization is win-win for all stakeholders. Why? Anything that accelerates the RDF-ization of non-RDF data sources brings us that much closer to realizing the true value of the Semantic Web.

Google Blogging 2007: From Legitimizing Blogs to Wikipedia-Competitor Google Knol

There’s a recent, year-in-review entry by the Google blogging team.Not only does this entry highlight another wonderful year for Google, it also quantitatively places blogging in perspective. If you ever had any doubts as to the legitimacy of blogging, just read this post.Amongst the highlights I found the announcement of the Knol test project to be of interest. Although I’m a huge fan of knowledge representation and management, especially in the context of the Semantic Web, I must confess to being confused by Knol. At the most-basic level, Knol seems to be about knowledge sharing. And more-specifically, providing jumping off points (from search-engine hits) for those seeking to understand some topic.Therefore, I can’t help but ask, is there more to Knol than it’s Google’s competitive answer to Wikipedia?If you happen to drop by my blog, and this post, please feel free to share your take on Knol.What am I missing?

Jott Meets the Semantic Web

While walking my husky after work yesterday, I Jott’ed myself:

Another great work out today on the electrical, you had over 3 kilometers and over 550 calories burned in 32 minutes. Nice work and then some good wait listing …

Most human readers would automatically parse this Jott as:

Another great workout today on the elliptical, you had over 3 kilometers and over 550 calories burned in 32 minutes. Nice work and then some good weight lifting …

Even though I don’t know a lot about Jott’s transcription engine, I’ll share my perspective on the identified differences:

  • “work out” vs. “workout” and “wait” vs. “weight” – These are subtle differences. Differences that can only be resolved with an understanding of context. In other words, a human reader knows that I was attempting to capture some data on my lunch-time exercise routine, and re-parses the Jott with contextually correct words. In order to correct such subtle ‘errors of transcription’, Jott will need to develop semantic filters. Filters that can take context into account.
  • “electrical” vs. “elliptical” and “listing” vs. “lifting” – These are glaring differences. I know, from past experience, that Jott has words like “elliptical” and “lifting” in its ‘dictionary’. Therefore, I regard these as errors originating from Jott’s inability to ‘hear’ what I’m saying. And although a context-based filter may also help here, I feel I must share some of the responsibility for not clearly articulating my Jott.

What does all of this mean?

Meaning, indeed, is the root of it all!

What this means is that some future version of Jott will need to do a better job of capturing meaning. What I had intended. The context in which I framed my Jott.

What this means is that in the longer term, a few major releases of Jott down the road, Jott will need to become as interested in the Semantic Web as companies like Google are today.

And as we’re experiencing with search engines like Google, this’ll take some effort and some time!

On the Use of Informal Ontologies in the Delivery of Service Oriented Architectures (SOAs)

In Service-Oriented Architecture: Concepts, Technology and Design, author Thomas Erl frames ontologies (section 10.2) in a top-down strategy for the delivery of a Service Oriented Architecture (SOA) .

As the first step, in a multistep process, what starts with ontologies ultimately results in a Contemporary SOA (Erl, section 3.2.20):

Contemporary SOA represents an open, extensible, federated, composable architecture that promotes service-orientation and is comprised of autonomous, QoS-capable, vendor diverse, interoperable, discoverable, and potentially reuable services, implemented as Web services.

SOA can establish an abstraction of business logic and technology, resulting in a loose coupling between these domains.

SOA is an evolution of past platforms, preserving successful characteristics of traditional architectures, and bringing with it distinct principles that foster service-orientation in support of a service-oriented enterprise.

SOA is ideally standardized throughout an enterprise, but achieving this state requires a planned transition and the support of a still evolving technology set.

In the same chapter, Erl also provides an abridged Contemporary SOA definition:

SOA is a form of technology architecture that adheres to the principles of service-orientation. When realized through the Web services technology platform, SOA establishes the potential to support and promote these principles throughout the business process and automation domains of an enterprise.

In other words, buying into the top-down strategy can ultimately result in a Contemporary SOA and this is a big deal.

Erl also discusses the bottom-up strategy for delivering a SOA (section 10.2).

In striking contrast to the top-down strategy, and as Erl describes it, the bottom-up strategy does not incorporate ontologies. Despite the fact that “… the majority of organizations that are currently building Web services apply the bottom-up approach …” (Erl, pg. 368):

The bottom-up strategy is really not a strategy at all. Nor is it a valid approach to achieving a contemporary SOA. This is a realization that will hit many organizations as they begin to take service-orientation, as an architectural model, more seriously. Although the bottom-up design allows for the creation of Web services as required by applications, implementing an SOA at a later point can result in a great deal of retro-fitting and even the introduction of new standardized service layers positioned over the top of the non-standardized services produced by this approach.

After reading this chapter, one is left with the impression that Erl favors the agile strategy (Erl, section 10.4) as it attempts “… to find an acceptable balance between incorporating service-oriented design principles into business analysis environments without having to wait before introducing Web services technologies into technical environments.”

I would be willing to accept all of this on spec if it weren’t for the fact that it’s possible to create informal ontologies, in non-SOA contexts, during bottom-up processes.

And if this is possible in non-SOA contexts, then it’s reasonable that informal ontologies could be incorporated into the bottom-up strategy for SOA delivery.

I believe this is worth exploring because use of informal ontologies in a bottom-up strategy for SOA delivery may improve the potential for ultimately achieving a Contemporary SOA. (An outcome, you’ll recall from above, Erl stated wasn’t otherwise acheiveable.)

I also believe this is worth exploring as, as Erl states, most organizations are attempting to gravitate towards SOAs from the bottom up.

Because the agile strategy (ideally) combines the best of both the top-down and bottom-up approaches, I also believe it’s worth exploring the potential for informal ontologies in this case as well.

Although further research is required, the figure below extends Erl’s Figure 10.3 (pg. 367) with a first-blush suggestion of how informal ontologies might be incorporated into the bottom-up strategy for SOA delivery.

informal_ontology_soa_delivery

It’s important to note that Erl’s original figure illustrates a five-step process that culminates with “Deploy services”.

Based on work I’ve done elsewhere, in this first-blush depiction, I believe the steps required to make use of informal ontologies would need to include:

  • “Extract service relationships” – In the work I’ve done elsewhere, this extraction has been achieved by Gleaning Resource Descriptions from Dialects of Languages (GRDDL). GRDDL extracts relationships and represents them in RDF from XML via XSLT.
  • “Generate informal ontology” – These days, ontologies are often expressed in the Web Ontology Language (OWL). OWL is a semantically richer and more-expressive variation of XML than is XML. Much like the previous step, the generated informal ontology is expressed in OWL via processing that would likely make use of XSLT. This step might also involve the need to incorporate annotations.
  • “Integrate informal ontologies” – Because each act of modeling through deploying application services will result in an informal ontology, there will eventually be a pressing need a integrate these informal ontologies. This ontology integration, which may also involve top-down or formal ontologies, will provide the best possibilities for ultimately realizing a Contemporary SOA.

Even at this early stage, the use of informal ontologies in the delivery of a SOA appears promising and worth investigating.

Knowledge-Based Heuristics: Further Research is Required

Recently, I’ve blogged about:

In both cases, there’s a case to be made for combining heuristic with knowledge-based approaches.

Although I did find “heuristics” and “knowledge” juxtaposed in Googling for “knowledge-based heuristics”, I believe the tightly coupled examples I’ve described above have some degree of novelty.

Further research is required 🙂

A Bayesian-Ontological Approach for Fighting Spam

When it comes to fighting spam, Bayesian and ontological approaches are not mutually exclusive.

They could be used together in a highly complimentary fashion.

For example you could use Bayesian approaches, as they are implemented today, to build a spam ontology. In other words, the Bayesian approach would be extended through the addition of knowledge-representation methods for fighting spam.

This is almost the opposite of the Wikipedia-based approach I blogged about recently.

In the Wikipedia-based approach, the ontology consists of ham-based knowledge.

In the alternative I’ve presented here, the ontology consists of spam-based knowledge.

Both approaches are technically viable. However, it’d be interesting to see which one actually works better in practice.

Either way, automated approaches for constructing ontologies, as I’ve outlined elsewhere, are likely to be of significant applicability here.

Another point is also clear: Either way, this will be a computationally intensive activity!

Quantitative classification of cloud microphysical imagery via fractal dimension calculations

I recently referred to a paper I wrote for a Fractals in Engineering conference in the mid-90s:

I did lead a project at KelResearch where our objective was to classify hydrometeors (i.e., raindrops, snowflakes, etc.). The hydrometeors were observed in situ by a sensor deployed on the wing of an airplane. Data was collected as the plane flew through winter storms. (Many of these campaigns were spearheaded by Prof. R. E. Stewart.) What we attempted to do was automate the classification of the hydrometeors on the basis of their shape. More specifically, we attempted to estimate the fractal dimension of each observed hydrometeor in the hopes of providing at automated classification scheme. Although this was feasible in principle, the resolution offered by the sensor made this impractical.

I’ve now added the citation and paper to my publications list.

I expect to revisit this paper soon … stay tuned.