On Knowledge-Based Representations for Actionable Data …

I bumped into a professional acquaintance last week. After describing briefly a presentation I was about to give, he offered to broker introductions to others who might have an interest in the work I’ve been doing. To initiate the introductions, I crafted a brief description of what I’ve been up to for the past 5 years in this area. I’ve also decided to share it here as follows: 

As always, [name deleted], I enjoyed our conversation at the recent AGU meeting in Toronto. Below, I’ve tried to provide some context for the work I’ve been doing in the area of knowledge representations over the past few years. I’m deeply interested in any introductions you might be able to broker with others at York who might have an interest in applications of the same.

Since 2004, I’ve been interested in expressive representations of data. My investigations started with a representation of geophysical data in the eXtensible Markup Language (XML). Although this was successful, use of the approach underlined the importance of metadata (data about data) as an oversight. To address this oversight, a subsequent effort introduced a relationship-centric representation via the Resource Description Format (RDF). RDF, by the way, forms the underpinnings of the next-generation Web – variously known as the Semantic Web, Web 3.0, etc. In addition to taking care of issues around metadata, use of RDF paved the way for increasingly expressive representations of the same geophysical data. For example, to represent features in and of the geophysical data, an RDF-based scheme for annotation was introduced using XML Pointer Language (XPointer). Somewhere around this point in my research, I placed all of this into a framework.

A data-centric framework for knowledge representation.

A data-centric framework for knowledge representation.

 In addition to applying my Semantic Framework to use cases in Internet Protocol (IP) networking, I’ve continued to tease out increasingly expressive representations of data. Most recently, these representations have been articulated in RDFS – i.e., RDF Schema. And although I have not reached the final objective of an ontological representation in the Web Ontology Language (OWL), I am indeed progressing in this direction. (Whereas schemas capture the vocabulary of an application domain in geophysics or IT, for example, ontologies allow for knowledge-centric conceptualizations of the same.)  

From niche areas of geophysics to IP networking, the Semantic Framework is broadly applicable. As a workflow for systematically enhancing the expressivity of data, the Framework is based on open standards emerging largely from the World Wide Web Consortium (W3C). Because there is significant interest in this next-generation Web from numerous parties and angles, implementation platforms allow for increasingly expressive representations of data today. In making data actionable, the ultimate value of the Semantic Framework is in providing a means for integrating data from seemingly incongruous disciplines. For example, such representations are actually responsible for providing new results – derived by querying the representation through a ‘semantified’ version of the Structured Query Language (SQL) known as SPARQL. 

I’ve spoken formally and informally about this research to audiences in the sciences, IT, and elsewhere. With York co-authors spanning academic and non-academic staff, I’ve also published four refereed journal papers on aspects of the Framework, and have an invited book chapter currently under review – interestingly, this chapter has been contributed to a book focusing on data management in the Semantic Web. Of course, I’d be pleased to share any of my publications and discuss aspects of this work with those finding it of interest.

With thanks in advance for any connections you’re able to facilitate, Ian. 

If anything comes of this, I’m sure I’ll write about it here – eventually!

In the meantime, feedback is welcome.

Annotation Modeling: To Appear in Comp & Geosci

What a difference a day makes!
Yesterday I learned that my paper on semantic platforms was rejected.
Today, however, the news was better as a manuscript on annotation modeling was
accepted for publication.
It’s been a long road for this paper:

The abstract of the paper is as follows:

Annotation Modeling with Formal Ontologies:
Implications for Informal Ontologies

L. I. Lumb[1], J. R. Freemantle[2], J. I. Lederman[2] & K. D.
Aldridge[2]
[1] Computing and Network Services, York University, 4700 Keele Street,
Toronto, Ontario, M3J 1P3, Canada
[2] Earth & Space Science and Engineering, York University, 4700 Keele
Street, Toronto, Ontario, M3J 1P3, Canada
Knowledge representation is increasingly recognized as an important component of any cyberinfrastructure (CI). In order to expediently address scientific needs, geoscientists continue to leverage the standards and implementations emerging from the World Wide Web Consortium’s (W3C) Semantic Web effort. In an ongoing investigation, previous efforts have been aimed towards the development of a semantic framework for the Global Geodynamics Project (GGP). In contrast to other efforts, the approach taken has emphasized the development of informal ontologies, i.e., ontologies that are derived from the successive extraction of Resource Description Format (RDF) representations from eXtensible Markup Language (XML), and then Web Ontology Language (OWL) from RDF. To better understand the challenges and opportunities for incorporating annotations into the emerging semantic framework, the present effort focuses on knowledge-representation modeling involving formal ontologies. Although OWL’s internal mechanism for annotation is constrained to ensure computational completeness and decidability, externally originating annotations based on the XML Pointer Language (XPointer) can easily violate these constraints. Thus the effort of modeling with formal ontologies allows for recommendations applicable to the case of incorporating annotations into informal ontologies.

I expect the whole paper will be made available in the not-too-distant future …

Evolving Semantic Frameworks into Platforms: Unpublished ms.

I learned yesterday that the manuscript I submitted to HPCS 2008 was not accepted 😦
It may take my co-authors and I some time before this manuscript is revised and re-submitted.
This anticipated re-submission latency, along with the fact that we believe the content needs to be shared in a timely fashion, provides the motivation for sharing the manuscript online.
To whet your appetite, the abstract is as follows:

Evolving a Semantic Framework into a Network-Enabled Semantic Platform
A data-oriented semantic framework has been developed previously for a project involving a network of globally distributed scientific instruments. Through the use of this framework, the semantic expressivity and richness of the project’s ASCII data is systematically enhanced as it is successively represented in XML (eXtensible Markup Language), RDF (Resource Description Formal) and finally as an informal ontology in OWL (Web Ontology Language). In addition to this representational transformation, there is a corresponding transformation from data into information into knowledge. Because this framework is broadly applicable to ASCII and binary data of any origin, it is appropriate to develop a network-enabled semantic platform that identifies the enabling semantic components and interfaces that already exist, as well as the key gaps that need to be addressed to completely implement the platform. After briefly reviewing the semantic framework, a J2EE (Java 2 Enterprise Edition) based implementation for a network-enabled semantic platform is provided. And although the platform is in principle usable, ongoing adoption suggests that strategies aimed at processing XML via parallel I/O techniques are likely an increasingly pressing requirement.

AGU Poster: Relationship-Centric Ontology Integration

Later today in San Francisco, at the 2007 Fall Meeting of the American Geophysical Union (AGU), one of my co-authors will be presenting our poster entitled “Relationship-Centric Ontology Integration” (abstract).

This poster will be in a session for which I was a co-convenor and described elsewhere.

A PDF-version of the poster is available elsewhere (agu07_the_poster_v2.pdf).

Earth and Space Science Informatics at the 2007 Fall Meeting of the American Geophysical Union

In a previous post, I referred to Earth Science Informatics as a discipline-in-the-making.

To support this claim, I cited a number of data points. And of these data points, the 2006 Fall Meeting of the American Geophysical Union (AGU) stands out as a key enabler.

With 22 sessions posted, the 2007 Fall Meeting of the AGU is well primed to further enable the development of this discipline.

Because I’m a passionate advocate of this intersection between the Earth Sciences and Informatics, I’m involved in convening three of the 22 Earth and Space Science Informatics sessions:

I encourage you to take a moment to review the calls for participation for these three, as well as the other 19, sessions in Earth and Space Science Informatics at the 2007 Fall Meeting of the AGU.

Googling and Swoogling for Ontologies

Using http://www.google.com/search?q=filetype:owl+owl, you can Google for ontologies.

Currently, Google returns over 53,000 results.

If you prefer, you can Swoogle:

Swoogle (http://swoogle.umbc.edu/) is an specialized web search engine that discovers, analyzes and indexes knowledge encoded in semantic web documents published on the Web. Swoogle reasons about these documents and their constituent parts (e.g., terms, individuals, triples) and records meaningful metadata about them. Swoogle provides webscale semantic web data access service, which helps human users and software systems to find relevant documents, terms and triples, via its search and navigation services. Swoogle also provides a customizable algorithm inspired by Google’s PageRank algorithm but adapted to the semantics and use patterns found in semantic web documents. Swoogle currently has indexed nearly 1.3M Semantic Web documents which contain almost 240M triples. In addition to providing general Semantic Web search services, Swoogle has been used by several projects to maintain and manage specialized collections of RDF data.

Currently, Swoogle searches over 10,000 ontologies – even more if you register.

Swoogle’s stats provide even more meaningful and impressive numbers:

  • Swoogle’s aware of about 1.2 million error-free Semantic Web documents
  • Swoogle was able to parse over 412 million triples from Semantic Web documents

Based on these data points, ontology developers would be wise to consult Google or Swoogle to avoid reinventing that which already exists … !

When using pre-existing ontologies, it’s also important to remember that as developers you have various choices:

Existing ontologies are inherently reusable, extensible and refactorable.

2007 Semantic Technology Conference

This is an event I wish I’d had on my radar … alas, May is already rife with conferences!

Some data (not information!) points:

  • The tutorials/presentations content looks quite good
    • I even recognize or know some of the presenters 🙂
    • The content is bleeding-edge to current to mainstream
    • Much of the content is published
  • Judging by the demographics, e.g., there are a number of vendors involved, the Semantic Web continues to get real – and some of the vendors have cool names (see the sponsors page for more 😉
  • A primer is available for those who are new to the subject
  • San Jose will be a nice place to be in May – it’s close to San Francisco … still one of my favorite places to go!

On the Use of Informal Ontologies in the Delivery of Service Oriented Architectures (SOAs)

In Service-Oriented Architecture: Concepts, Technology and Design, author Thomas Erl frames ontologies (section 10.2) in a top-down strategy for the delivery of a Service Oriented Architecture (SOA) .

As the first step, in a multistep process, what starts with ontologies ultimately results in a Contemporary SOA (Erl, section 3.2.20):

Contemporary SOA represents an open, extensible, federated, composable architecture that promotes service-orientation and is comprised of autonomous, QoS-capable, vendor diverse, interoperable, discoverable, and potentially reuable services, implemented as Web services.

SOA can establish an abstraction of business logic and technology, resulting in a loose coupling between these domains.

SOA is an evolution of past platforms, preserving successful characteristics of traditional architectures, and bringing with it distinct principles that foster service-orientation in support of a service-oriented enterprise.

SOA is ideally standardized throughout an enterprise, but achieving this state requires a planned transition and the support of a still evolving technology set.

In the same chapter, Erl also provides an abridged Contemporary SOA definition:

SOA is a form of technology architecture that adheres to the principles of service-orientation. When realized through the Web services technology platform, SOA establishes the potential to support and promote these principles throughout the business process and automation domains of an enterprise.

In other words, buying into the top-down strategy can ultimately result in a Contemporary SOA and this is a big deal.

Erl also discusses the bottom-up strategy for delivering a SOA (section 10.2).

In striking contrast to the top-down strategy, and as Erl describes it, the bottom-up strategy does not incorporate ontologies. Despite the fact that “… the majority of organizations that are currently building Web services apply the bottom-up approach …” (Erl, pg. 368):

The bottom-up strategy is really not a strategy at all. Nor is it a valid approach to achieving a contemporary SOA. This is a realization that will hit many organizations as they begin to take service-orientation, as an architectural model, more seriously. Although the bottom-up design allows for the creation of Web services as required by applications, implementing an SOA at a later point can result in a great deal of retro-fitting and even the introduction of new standardized service layers positioned over the top of the non-standardized services produced by this approach.

After reading this chapter, one is left with the impression that Erl favors the agile strategy (Erl, section 10.4) as it attempts “… to find an acceptable balance between incorporating service-oriented design principles into business analysis environments without having to wait before introducing Web services technologies into technical environments.”

I would be willing to accept all of this on spec if it weren’t for the fact that it’s possible to create informal ontologies, in non-SOA contexts, during bottom-up processes.

And if this is possible in non-SOA contexts, then it’s reasonable that informal ontologies could be incorporated into the bottom-up strategy for SOA delivery.

I believe this is worth exploring because use of informal ontologies in a bottom-up strategy for SOA delivery may improve the potential for ultimately achieving a Contemporary SOA. (An outcome, you’ll recall from above, Erl stated wasn’t otherwise acheiveable.)

I also believe this is worth exploring as, as Erl states, most organizations are attempting to gravitate towards SOAs from the bottom up.

Because the agile strategy (ideally) combines the best of both the top-down and bottom-up approaches, I also believe it’s worth exploring the potential for informal ontologies in this case as well.

Although further research is required, the figure below extends Erl’s Figure 10.3 (pg. 367) with a first-blush suggestion of how informal ontologies might be incorporated into the bottom-up strategy for SOA delivery.

informal_ontology_soa_delivery

It’s important to note that Erl’s original figure illustrates a five-step process that culminates with “Deploy services”.

Based on work I’ve done elsewhere, in this first-blush depiction, I believe the steps required to make use of informal ontologies would need to include:

  • “Extract service relationships” – In the work I’ve done elsewhere, this extraction has been achieved by Gleaning Resource Descriptions from Dialects of Languages (GRDDL). GRDDL extracts relationships and represents them in RDF from XML via XSLT.
  • “Generate informal ontology” – These days, ontologies are often expressed in the Web Ontology Language (OWL). OWL is a semantically richer and more-expressive variation of XML than is XML. Much like the previous step, the generated informal ontology is expressed in OWL via processing that would likely make use of XSLT. This step might also involve the need to incorporate annotations.
  • “Integrate informal ontologies” – Because each act of modeling through deploying application services will result in an informal ontology, there will eventually be a pressing need a integrate these informal ontologies. This ontology integration, which may also involve top-down or formal ontologies, will provide the best possibilities for ultimately realizing a Contemporary SOA.

Even at this early stage, the use of informal ontologies in the delivery of a SOA appears promising and worth investigating.

Annotation Paper Submitted to HPCS 2007 Event

I’ve blogged and presented recently (locally and at an international scientific event) on the topic of annotation and knowledge representation.

Working with co-authors Jerusha Lederman, Jim Freemantle and Keith Aldridge, a written version of the recent AGU presentation has been prepared and submitted to the HPCS 2007 event. The abstract is as follows:

Semantically Enabling the Global Geodynamics Project:
Incorporating Feature-Based Annotations via XML Pointer Language (XPointer)

Earth Science Markup Language (ESML) is efficient and effective in representing scientific data in an XML-based formalism. However, features of the data being represented are not accounted for in ESML. Such features might derive from events, identifications, or some other source. In order to account for features in an ESML context, they are considered from the perspective of annotation. Although it is possible to extend ESML to incorporate feature-based annotations internally, there are complicating factors identified that apply to ESML and most XML dialects. Rather than pursue the ESML-extension approach, an external representation for feature-based annotations via XML Pointer Language (XPointer) is developed. In previous work, it has been shown that it is possible to extract relationships from ESML-based representations, and capture the results in the Resource Description Format (RDF). Application of this same requirement to XPointer-based annotations of ESML representations results in a revised semantic framework for the Global Geodynamics Project (GGP).

Once the paper is accepted, I’ll make a pre-submission version available online.

Because the AGU session I participated in has also issued a call for papers, I’ll be extending the HPCS 2007 submission in various interesting ways.

And finally, thoughts are starting to gel on how annotations may be worked into the emerging notions I’ve been having on knowledge-based heuristics.

Stay tuned.

Knowledge-Based Heuristics: Further Research is Required

Recently, I’ve blogged about:

In both cases, there’s a case to be made for combining heuristic with knowledge-based approaches.

Although I did find “heuristics” and “knowledge” juxtaposed in Googling for “knowledge-based heuristics”, I believe the tightly coupled examples I’ve described above have some degree of novelty.

Further research is required 🙂