If you peruse my blog to any extent, it won’t take you long to come across posts on automated schemes for data/information/knowledge representation. Although I tend to work with semi-structured data, typically of scientific origin, others work with raw text.
Making sense of raw text is something that humans do routinely from a very young age. However, try to get a computer to make sense of raw text, and you’re in for a treat.
For reasons like these, it’s interesting to hear about research being done at UC Irivine on text mining. Why? To quote a recent press release: “Text mining allows a computer to extract useful information from unstructured text.”
As the following excerpt from the same press release states, such an approach appears promising:
… the model generated a list of words that included “rider,” “bike,” “race,” “Lance Armstrong” and “Jan Ullrich.” From this, researchers were easily able to identify that topic as the Tour de France. By examining the probability of words appearing in stories about the Tour de France, researchers learned that Armstrong was written about seven times as much as Ullrich. Charting information over time, researchers discovered that discussion of Tour de France peaked in the summer months but decreased slightly year to year.
Further research is required on my part to understand if the topic modeling (see “About topic modeling” at http://www.ics.uci.edu/community/news/press/view_press?id=51) referred to in this context bears any resemblance to the topic maps I read about in texts on the Semantic Web.
As far as I can tell the UCI project did not use Topic Maps. However, they definitely could have used Topic Maps, both to model the term set used as input, and as output from the processing to capture the terms, documents, and relationships.