On Knowledge-Based Representations for Actionable Data …

I bumped into a professional acquaintance last week. After describing briefly a presentation I was about to give, he offered to broker introductions to others who might have an interest in the work I’ve been doing. To initiate the introductions, I crafted a brief description of what I’ve been up to for the past 5 years in this area. I’ve also decided to share it here as follows: 

As always, [name deleted], I enjoyed our conversation at the recent AGU meeting in Toronto. Below, I’ve tried to provide some context for the work I’ve been doing in the area of knowledge representations over the past few years. I’m deeply interested in any introductions you might be able to broker with others at York who might have an interest in applications of the same.

Since 2004, I’ve been interested in expressive representations of data. My investigations started with a representation of geophysical data in the eXtensible Markup Language (XML). Although this was successful, use of the approach underlined the importance of metadata (data about data) as an oversight. To address this oversight, a subsequent effort introduced a relationship-centric representation via the Resource Description Format (RDF). RDF, by the way, forms the underpinnings of the next-generation Web – variously known as the Semantic Web, Web 3.0, etc. In addition to taking care of issues around metadata, use of RDF paved the way for increasingly expressive representations of the same geophysical data. For example, to represent features in and of the geophysical data, an RDF-based scheme for annotation was introduced using XML Pointer Language (XPointer). Somewhere around this point in my research, I placed all of this into a framework.

A data-centric framework for knowledge representation.

A data-centric framework for knowledge representation.

 In addition to applying my Semantic Framework to use cases in Internet Protocol (IP) networking, I’ve continued to tease out increasingly expressive representations of data. Most recently, these representations have been articulated in RDFS – i.e., RDF Schema. And although I have not reached the final objective of an ontological representation in the Web Ontology Language (OWL), I am indeed progressing in this direction. (Whereas schemas capture the vocabulary of an application domain in geophysics or IT, for example, ontologies allow for knowledge-centric conceptualizations of the same.)  

From niche areas of geophysics to IP networking, the Semantic Framework is broadly applicable. As a workflow for systematically enhancing the expressivity of data, the Framework is based on open standards emerging largely from the World Wide Web Consortium (W3C). Because there is significant interest in this next-generation Web from numerous parties and angles, implementation platforms allow for increasingly expressive representations of data today. In making data actionable, the ultimate value of the Semantic Framework is in providing a means for integrating data from seemingly incongruous disciplines. For example, such representations are actually responsible for providing new results – derived by querying the representation through a ‘semantified’ version of the Structured Query Language (SQL) known as SPARQL. 

I’ve spoken formally and informally about this research to audiences in the sciences, IT, and elsewhere. With York co-authors spanning academic and non-academic staff, I’ve also published four refereed journal papers on aspects of the Framework, and have an invited book chapter currently under review – interestingly, this chapter has been contributed to a book focusing on data management in the Semantic Web. Of course, I’d be pleased to share any of my publications and discuss aspects of this work with those finding it of interest.

With thanks in advance for any connections you’re able to facilitate, Ian. 

If anything comes of this, I’m sure I’ll write about it here – eventually!

In the meantime, feedback is welcome.

RDF-ization: Is That What I’ve Been Up To?

Recently, on his blogKingsley Idehen wrote:

RDF-ization is a term used by the Semantic Web community to describe the process of generating RDF from non RDF Data Sources such as (X)HTML, Weblogs, Shared Bookmark Collections, Photo Galleries, Calendars, Contact Managers, Feed Subscriptions, Wikis, and other information resource collections.

Although Idehen identifies a number of data sources, he does not explicitly identify two data sources I’ve been spending a fair amount of time with over the past few years: 

Of course, whether the motivation is personal/social-networking or scientific/IT related, the attention to RDF-ization is win-win for all stakeholders. Why? Anything that accelerates the RDF-ization of non-RDF data sources brings us that much closer to realizing the true value of the Semantic Web.

Annotation Modeling: In Press

Our manuscript on annotation modeling is one step closer to publication now, as late last night my co-authors and I received sign-off on the copy-editing phase. The journal, Computers and Geosciences, is now preparing proofs.
For the most part then, as authors, we’re essentially done.
However, we may not be able to resist the urge to include a “Note Added in Proof”. At the very least, this note will allude to:

  • The work being done to refactor Annozilla for use in a Firefox 3 context; and
  • How annotation is figuring in OWL2 (Google “W3C OWL2” for more).

Stay tuned …

Annotation Modeling: To Appear in Comp & Geosci

What a difference a day makes!
Yesterday I learned that my paper on semantic platforms was rejected.
Today, however, the news was better as a manuscript on annotation modeling was
accepted for publication.
It’s been a long road for this paper:

The abstract of the paper is as follows:

Annotation Modeling with Formal Ontologies:
Implications for Informal Ontologies

L. I. Lumb[1], J. R. Freemantle[2], J. I. Lederman[2] & K. D.
Aldridge[2]
[1] Computing and Network Services, York University, 4700 Keele Street,
Toronto, Ontario, M3J 1P3, Canada
[2] Earth & Space Science and Engineering, York University, 4700 Keele
Street, Toronto, Ontario, M3J 1P3, Canada
Knowledge representation is increasingly recognized as an important component of any cyberinfrastructure (CI). In order to expediently address scientific needs, geoscientists continue to leverage the standards and implementations emerging from the World Wide Web Consortium’s (W3C) Semantic Web effort. In an ongoing investigation, previous efforts have been aimed towards the development of a semantic framework for the Global Geodynamics Project (GGP). In contrast to other efforts, the approach taken has emphasized the development of informal ontologies, i.e., ontologies that are derived from the successive extraction of Resource Description Format (RDF) representations from eXtensible Markup Language (XML), and then Web Ontology Language (OWL) from RDF. To better understand the challenges and opportunities for incorporating annotations into the emerging semantic framework, the present effort focuses on knowledge-representation modeling involving formal ontologies. Although OWL’s internal mechanism for annotation is constrained to ensure computational completeness and decidability, externally originating annotations based on the XML Pointer Language (XPointer) can easily violate these constraints. Thus the effort of modeling with formal ontologies allows for recommendations applicable to the case of incorporating annotations into informal ontologies.

I expect the whole paper will be made available in the not-too-distant future …

Evolving Semantic Frameworks into Platforms: Unpublished ms.

I learned yesterday that the manuscript I submitted to HPCS 2008 was not accepted 😦
It may take my co-authors and I some time before this manuscript is revised and re-submitted.
This anticipated re-submission latency, along with the fact that we believe the content needs to be shared in a timely fashion, provides the motivation for sharing the manuscript online.
To whet your appetite, the abstract is as follows:

Evolving a Semantic Framework into a Network-Enabled Semantic Platform
A data-oriented semantic framework has been developed previously for a project involving a network of globally distributed scientific instruments. Through the use of this framework, the semantic expressivity and richness of the project’s ASCII data is systematically enhanced as it is successively represented in XML (eXtensible Markup Language), RDF (Resource Description Formal) and finally as an informal ontology in OWL (Web Ontology Language). In addition to this representational transformation, there is a corresponding transformation from data into information into knowledge. Because this framework is broadly applicable to ASCII and binary data of any origin, it is appropriate to develop a network-enabled semantic platform that identifies the enabling semantic components and interfaces that already exist, as well as the key gaps that need to be addressed to completely implement the platform. After briefly reviewing the semantic framework, a J2EE (Java 2 Enterprise Edition) based implementation for a network-enabled semantic platform is provided. And although the platform is in principle usable, ongoing adoption suggests that strategies aimed at processing XML via parallel I/O techniques are likely an increasingly pressing requirement.

AGU Poster: Relationship-Centric Ontology Integration

Later today in San Francisco, at the 2007 Fall Meeting of the American Geophysical Union (AGU), one of my co-authors will be presenting our poster entitled “Relationship-Centric Ontology Integration” (abstract).

This poster will be in a session for which I was a co-convenor and described elsewhere.

A PDF-version of the poster is available elsewhere (agu07_the_poster_v2.pdf).

Annotation’s Two Solitudes

Most fundamentally, and despite the fact that the term is frequently used, annotation has not been universally defined. Although annotation can be fairly consistently regarded as editorial metadata, or even information about data, there exists considerable latitude regarding the origin/destination scope of annotation. In the language-centric OWL context, annotations originate internal to the ontology and can annotate internally and to some extent externally. Anecdotally, this remains a common usage mode for annotation as evidenced in the online and off-line literature plus most-mature scientific projects. In contrast in the referencing-based case of XPointer, annotations originate external to the formal ontology, and can annotate the ontology itself or make external references. (The figure below attempts to clarify this not entirely subtle distinction.)

The two solitudes with respect to definitions is carried forward in terms of differences in annotation terminology. As a very concrete example, both the OWL and XPointer communities separately define and make use of the same annotation properties or types. There is clearly an opportunity for the W3C to better harmonize the efforts between these distinct communities, their efforts and common interests.

Annotation Location versus Outcome

Annotation Really Is A Big Deal!

My expressed interest in annotation began as a footnote:

An alternative approach has the following two steps: First, extract RDF from the .GGP and .AUX files as before. Second, incorporate data contained in the .LOG file via annotation. Annotation is a well-established practice [45, Chapter 4] involving RDF and the XML Pointer Language (XPointer, [22]) — essentially a URI-centric fragment identifier. This conversion flow is currently under investigation and the corresponding manuscript is in preparation.

This footnote appeared in a paper that was published by the IEEE for HPCS 2006. The alluded-to manuscript will soon be available from the IEEE and will be presented in mid-May at HPCS 2007.

In addition to this manuscript on annotation, along with my co-authors, I’ve recently submitted a broader-based treatment to a special issue (“Geoscience Knowledge Representation for Cyberinfrastructure”) of Computers & Geosciences (C&G). The abstract of the C&G submission is as follows:

Incorporating Annotations into Formal and Informal Ontologies: Experiences and Implications

L. I. Lumb, J. R. Freemantle, J. I. Lederman & K. D. Aldridge

Abstract

Traditionally, and to a first approximation, annotations can be regarded as comments. In the case of the Web Ontology Language (OWL), this perspective is largely accurate, as annotations are internal constructs included with the language. As internal constructs, annotations in OWL Description Logic (DL) are also constrained to ensure, ultimately, that they do not negatively impact on the ontology’s ability to remain computationally complete and decidable. Formal ontologies, however, can also be annotated externally with the XML Pointer Language (XPointer). Because XPointer-based annotations are quite likely to result in violations of the constraints traditionally placed on OWL DL’s built-in annotations, there exist potentially serious consequences for maintaining self-contained formal ontologies. Insight gained in modeling annotations in formal ontologies using top-down strategies can be applied to informal ontologies. In part, the previous practice of incorporating feature-based annotations directly into informal ontologies is regarded differently, as the XPointer-based annotations may require more complex OWL dialects in which computational completeness and decidability cannot be guaranteed. Critical to the development of informal ontologies is Gleaning Resource Descriptions from Dialects of Languages (GRDDL), as it facilitates the extraction of Resource Description Format (RDF) relationships from representations cast in the eXtensible Markup Language (XML). In order to fully enable the creation of informal ontologies, however, an analogous functionality is required to extract OWL classes, properties and individuals from RDF-based representations. Although a strategy for this capability has been specified, hopefully community based efforts will soon target a corresponding implementation.

Key words:
Annotation, Formal Ontology, Informal Ontology, Ontology, Semantic Web, XPointer, Web Ontology Language

In addition to these papers, I’ve blogged a lot about annotation. And the more I delve into annotation, the more I’m taken by it’s applicability. For example, I’ll be making a presentation at CANHEIT 2007 on annotation and wikis.

Annotation really is a big deal!

Annozilla: A Firefox Plug-in for Annotation

In early August I wrote: “… the only Web browser that I know of that supports annotation is the W3C’s Amaya.”

I am delighted to report that there is a Firefox plug-in for annotation:

This is the the Annozilla project, designed to view and create annotations associated with a web page, as defined by the W3C Annotea project. The idea is to store annotations as RDF on a server, using XPointer (or at least XPointer-like constructs) to identify the region of the document being annotated.

It’s aligned with the W3C – it makes use of W3C standards like XPointer and RDF.

This is precisely what I was goading Google into doing with Google Notebook.