Blazing Semantic Trails
Relationship Web: Blazing Semantic Trails between Web Resources
Amit Sheth, IEEE Fellow, and Cartic Ramakrishnan
Keyword as inputs to search engines with documents as responses has been the prevalent mode of access to information on the Web. Although there has been a recent shift toward entity-aware methods of information access, these remain devoid of semantics. Semantics are increasingly recognized as the lynchpin of search, integrating resources (data and services), and analytics applications on the Web. We argue that relationships are at the heart of semantics. We envision a Web of Relationships (including implicit and explicit links with associated meaningful descriptions and properties) that relate content across Web resources. Such a meta-layer of relationships interconnecting heterogeneous Web resources will serve as a configurable lens that could provide context-customizable access to Web resources. Under this a powerful new paradigm information access over the Web will be transformed from a mere document retrieval operation to an information framework that supports insight elicitation and semantic analytics over Web resources. Contrast this with the current keyword-in-document-out paradigm of information access, in which a human carefully selects keyword for a search engine that returns a bunch of documents, and a human then goes through these documents, finds relevant facts, and synthesizes knowledge or insight within his or her brain. In this column, we outline the vision and discuss how recent improvements in content extraction and semantic annotation will help create such a Relationship Web. We then describe new browsing, search, and analytics techniques--such as semantic browsing, hypothesis-based document retrieval and knowledge discovery over text-- powered by the Relationship Web. A couple of examples from biomedical domain domains are given to highlight the merits of the Relationship Web vision over the con-temporary keyword-in-document-out paradigm of access to Web resources.
Vannevar Bush’s MEMEX
The MEMEX vision outlined by Dr. Vannevar Bush in his 1945 Atlantic Monthly article  pointed out the limitations of a topic-hierarchy-centric document organization mechanism and proposed a contrasting view. Describing how the human brain navigates an information space in what he called trail blazing, Dr. Bush said, “It operates by association. With one item in its grasp, it snaps instantly to the next that is suggested by the association of thoughts, in accordance with some intricate web of trails carried by the cells of the brain.” In this paper Dr. Bush did not explicitly describe the role of relationships in realizing the Memex vision. More recently, while studying the semantics of Semantic Networks, Dr. William Woods  paid specific attention to the need for representing various kinds of relationships between objects so that inference mechanisms can utilize these relationships in addition to attributes. The need for special focus on relationships is echoed by Grady Booch when he says, “An object by itself is intensely uninteresting.”
“Everything's connected, all along the line. Cause and effect. That's the beauty of it. Our job is to trace the connections and reveal them.” Jack in Terry Gilliam’s 1985 film - “Brazil”
MREF- an early strawman
In our past work we have observed the changing focus from documents to entities and on to relationships. We have also investigated a broad variety of issues related to modeling, validating, discovering and exploiting various types of relationships between entities in content . The first result of these efforts was the concept of Metadata Reference Links (MREFs), which proposed associating semantic metadata with hypertext links [5, 6].
When we introduced MREF in 1996, in an effort to elevate the role of relationships and to use relationship metadata to organize resources on the Web, we had in-deed noticed the importance of metadata as recognized by R. Guha’s work on MCF. When we revisted MREF in 1998, we defined it on the top of Resource Description Framework (RDF, http://www.w3.org/RDF/) because we recognized that RDF had elevated relationships as the first class object in modeling data and metadata on the Web. RDF now is a W3C standard for describing and exchanging semantics of Web data.
New Capabilities and Enablers
Our early attempts at MREF faced several limitations that can be addressed with today’s emerging capabilities:
• Success in developing large populated domain ontologies, especially in such fields as life sciences and health care (e.g., Open Biological Ontologies, http://obo.sourceforge.net/), and the boost ontolo-gies have provided for both annotation and the abil-ity to compute complex relationships and perform different forms of reasoning.
• Standardization of, acceptance of and comprehensive support for RDF/RDFS, with associated query languages including emerging support for path and subgraph extraction that support modeling of complex relationships  and corresponding computation infrastructure.
• Emerging capabilities for semantic metadata extrac-tion and annotation, encompassing named entity recognition , , ; disambiguation/reference reconciliation ; and relationship extraction , permit us to create RDF from all types of data—structured, semi-structured, and textu-al/unstructured. The ability to extract metadata from other digital media including scientific experi-mental data, sensor data, and multimedia (audio, video, image, etc.) is rapidly emerging.
Types of Relationships
Understanding the specific meanings ascribed to relationships that “connect the dots” is very important as we utilize the Relationship Web for knowledge discovery and insight into data for problem solving. One interesting dimension along which we can study relationships was introduced in . We extend this with one more type of relationships we call Explicit Linguistic Relationship.
These are relationships between terms or documents that are implied by things such as the following:
• Co-occurrence of terms in the same cluster, after a clustering process based on some similarity measure is completed.
• Linking of one document to another via a hyperlink, indicative of some relationship between the anchor text of the link and the content of the target page.
• Two documents’ belonging to categories that are siblings in a concept hierarchy.
Explicit Linguistic Relationships
These are instances of named relationships between known entities. A good example of such a relationship is seen in Figure 1. The phrase “he beat Randy Johnson” in the first paragraph, after being passed through a relationship extraction  and pronominal resolution  engine can be converted into the triple Dontrelle Wil-lis->beat->Randy Johnson.
The Semantics conveyed by ontology schemas ex-pressed in RDFS (http://www.w3.org/TR/rdf-schema/) or OWL (http://www.w3.org/TR/owl-features/). Semantics that are represented in some well-formed syntactic form (governed by syntax rules) are referred to as Formal Semantics.Before delving into formal relationships we present a few example of what we mean by formal semantics with relationships that are indicative of such semantics. Examples of such semantics are:
• The semantics of subsumption in Description Logics, reflecting the human tendency of categorizing by means of broader or narrower descriptions. The ISA relationship is used to indicate subsumption.
• The semantics of Partonomy, accounting for what is part of an object, not what category the object be-longs to. The PART_OF relationship is used to indicate this. There are some necessary and sufficient features that make a language formal and by association make its semantics formal. These features include:
• The notions of Model and Model Theoretic Semantics: Expressions in a formal language are interpreted in models. The structure common to all models in which a given language is interpreted (the model structure for the model-theoretic interpretation of the given language) reflects certain basic pre-suppositions about the “structure of the world” that are implicit in the language.
• The Principle of Compositionality: The meaning of an expression is a function of the meanings of its parts and the way they are syntactically combined. In other words, the semantics of an expression are computed using the semantics of its parts, obtained using an interpretation function. Relationships that possess formal semantics can be used to make inferences. This is due to the fact that formal relationships of the type described above have semantic interpretations that are grounded in set containment. In contrast however the semantic interpretation of an Explicit Linguistic Relationship as in Dontrelle Willis->beat->Randy Johnson is not as straight forward.
Building the Relationship Web
We anticipate that the Relationship Web will mainly use Implicit Relationships and Explicit Linguistic Relationships, with the use of Formal Relationships when available. Our recent work  in schema-driven extraction of named relationships can be seen as a top-down approach to relationship extraction. Beginning with an ontology schema containing a rich set of named relationships (and their synonyms), we extracted instances of these relationships between known entities in text. This is a use of Formal Relationships to create Explicit Linguistic Relationships between Web re-sources. A complementary bottom-up approach to relation-ship extraction could use statistical word co-occurrence patterns to establish the existence of named relationships between terms in text. This can be seen as a use of Implicit Relationships to extract Explicit Linguistic Relationships. A good example of such a bottom-up approach is the work on learning relational similarity . Regardless of whether a top-down or a bottom-up approach is used, the objective is to superimpose the extracted relationships on the content from which they were extracted.
Complex Relationships and their role in the Relation-ship Web
Based on the types of relationships described we pro-pose the notion of complex relationships. These are rela-tionships that combine one or more types of relationships: Implicit, Explicit Linguistic or Formal relationships. One powerful form of complex relationships that has recently been defined is Semantic Associations , which are based on intuitive notions such as connec-tivity and semantic similarity. In , a formalization of Semantic Associations is given using the RDF data model. The RDF data model may be represented as a labeled directed graph, where each triple <Subject, Predicate, Object> (RDF statement) is represented by two nodes, labeled Subject and Object, and an arc, labeled Predicate (or Property), leading from Subject to Object. The ρ-operator, defined in , allows the following questions: “How is X related to Y?” over an RDF graph (typically created from semantic metadata extracted from heterogeneous documents) and returns a set of paths connecting X to Y. We have also investigated the challenging issue of ranking these paths [18, 19]. Ranking of Semantic Associations was necessitated by the sheer number of such associations between these entities even on moderate-size RDF graphs. Even a ranked list of associations can be a daunting task for a user to interpret and may in some cases cause a severe cognitive overload. In a related effort aimed at reducing such cognitive overload, we have also adapted subgraph discovery techniques to discover relatively small but informative subgraphs connecting the entities in the result of a given execution of the -operator . Path and Subgraph discovery techniques discussed above have employed either user-specified criteria or statistical measures computed over the RDF graph to rank the resulting paths.
Another form of complex relationships was introduced earlier in the InfoQuilt project , where we sought to model complex causal relationships such as “Volcano affects Environment” using a combination of Explicit Linguistic Relationships and numerical constraints (Figure 3).
Browsing the Relationship Web
Dr. Vannevar Bush suggested that Memex , a personal information store, would help a user “stitch” related documents together into structures that he referred to as “trails.” He presented a few examples of the use of such trails, some of which are listed below. “The physician, puzzled by her patient's reactions, strikes the trail established in studying an earlier similar case, and runs rapidly through analogous case histories, with side references to the classics for the pertinent anatomy and histology. The chemist, struggling with the synthesis of an organic compound, has all the chemical literature before him in his laboratory, with trails following the analogies of compounds, and side trails to their physical and chemical behavior.”  Once named relationships have been extracted from Web resources, the natural next step is to superimpose the extracted metadata back onto the original text. Fol-lowing this logic and inspired by Dr. Bush’s vision we propose a Semantic Browsing paradigm in which the user of a Semantic Browser will be able to traverse a document space based on named relationships between entities int two documents of interest. Moreover, such a Semantic Browser will support the creation of what we refer to as Semantic Trails. Figure 4 shows a prototype of a Semantic Browser that can be used to browse a Relationship Web for biomedical literature from PubMed. The figure shows the idea of starting a semantic trail with a concept in one document, and going to another document which has a concept that is related through a series of named relationships defined in a UMLS based ontology.
Exploring the Relationship Web via Semantic Trails
Semantic Trails can be represented as sequences of document sets along with the graph pattern used by the user in finding the trail. Formally, a simple form of Semantic Trail (ST) defined for textual documents is a pair where, is the sequence of documents traversed using the fragments of the graph pattern in sequence. All graph patterns in Semantic Trails are required to be acyclic. The Semantic Browser will there-fore support indexing of Semantic Trails for future re-trieval. For initial implementations of Semantic Trail retrieval we intend to use SPARQL (http://www.w3.org/TR/rdf-sparql-query/), which supports path expression based queries. Some recent advances in querying with user preferences  could also potentially be used for this purpose.
Hypothesis-driven Document Retrieval
Another application that the Relationship Web will support will change the way documents are retrieved, presented to the user and analyzed by them.
After launching a search using a keyword search to conventional search engines, the user is presented with a list of resources that are relevant to the query. The rele-vance of these resources has been based on algorithms such as Page-Rank, or, in the case of biomedical literature databases such as PubMed, on human editorial judgment.
Although a document is deemed relevant, the ac-tual nature of the “relatedness” will be unclear until the user clicks the link to read the article and validate the “relatedness” for her. The choice of links to visit in the process of browsing arguably depends on ones’ interpretation of the sentence or phrase containing the link. This subjectivity is enhanced in some cases by very limited metadata and by a user’s knowledge, especially about what links recognizable concepts. When visiting the selected link one attempts to validate a relationship between the information request that was expressed as keywords and the content in the page. Throughout the process, the relationships between the concepts play a critical role, that is, the knowledge being sought in the search process lies in the relationships between concepts. We envision a more powerful retrieval paradigm in which a complex relationship between entities (a hypothesis of the “relatedness”) is given as input and the results are presented to the user as a sequence of (possibly overlapping) document clusters, wherein each cluster contains documents that cor-roborate a specific fragment of the hypothesis, as illus-trated in Figure 5. The hypothesis in the input above suggests that calcium channel blockers inhibit stress, which is an observed symptom of migraine patients. Since magnesium is a natural calcium channel blocker, this hypothesis seeks to find documents that collectively validate it.
We believe that the time is now right for a shift from the keyword-in-document-out paradigm to something that will provide more insight into content. Leveraging the power of relationships, we have in this article pro-posed the idea of a Relationship Web. We have sug-gested how this will transform search, browsing, and retrieval of Web resources, taking the user one step closer to an automated interpretation of Web content via the use of relationships We anticipate that preliminary forms of a Relationship Web will take shape in specific domains such as biomedical literature. We plan to create one such Relationship Web for literature from PubMed pertaining to Urological diseases. As the scalability of semantic metadata extraction tools grows, larger community- wide Relationship Webs over heterogeneous Web content will emerge.
Relationship Web takes you away from “which document” could have information I need, to “what’s in the resources” that gives me the insight and knowledge I need for decision making.
How does the Relationship Web relate to the Semantic Web? We see the Semantic Web as an enabler of the Relationship Web. What metadata, annotation, or labeling is to the Semantic Web, relationships of all forms (implicit, explicit and formal) are to the Relationship Web. The primary goal of the Semantic Web has been described (by Tim Berners-Lee and many others) as integration of data or labeling of Web resources for more precise exploitation by both machines and humans. Going to the next level, the Relationship Web organizes Web resources for analysis that goes beyond integration to trailblazing, leading to deeper insights and better decision making.
1. Bush, V., As We May Think. The Atlantic Monthly, 1945. 176(1): p. 101-108.
2. Woods, W., What's in a link: Foundations for Semantic Networks, in Representation and Understanding, D. Bobrow and A. Collins, Editors. 1975, Academic Press: New York. p. 35-82.
3. Booch, G., Object Oriented Design with Applications. 1990: Benjamin-Cummings Publishing Co., Inc. Redwood City, CA USA.
4. Sheth, A.P., I.B. Arpinar, and V. Kashyap, Relationships at the Heart of Semantic Web: Modeling, Discovering and Exploiting Complex Semantic Relationships, in Enhancing the Power of the Internet Studies in Fuzziness and Soft Computing, M. Nikravesh, et al., Editors. 2003, Springer-Verlag.
5. Shah, K. and A. Sheth, Logical Information Modeling of Web-Accessible Heterogeneous Digital Assets, in Proc. Forum on Research and Technology Advances in Digital Libraries. 1998, IEEE Computer Soc. Press, Los Alamitos, Calif. p. 266-275.
6. Sheth, A., S. Thacker, and S. Patel, Complex relationships and knowledge discovery support in the InfoQuilt system. The VLDB Journal The International Journal on Very Large Data Bases, 2003. V12(1): p. 2-27.
7. Anyanwu, K., A. Maduko, and A. Sheth, SPARQ2L: Towards Support For Subgraph Extraction Queries in RDF Databases, in The 16th International World Wide Web Conference, (WWW2007). 2007: Banff, Canada. p. 117-127.
8. Hammond, B., A. Sheth, and K. Kochut, Semantic Enhancement Engine: A Modular Document Enhancement Platform for Semantic Applications over Heterogeneous Content, in Real World Semantic Web Applications, V. Kashyap and L. Shklar, Editors. 2002, Ios Press Inc. p. 29-49.
9. Dill, S., et al., SemTag and Seeker: Bootstrapping the Semantic Web Via Automated Semantic Annotation, in Twelfth International World Wide Web Conference. 2003: Budapest, Hungary. p. 178-186.
10. Popov, B., et al. KIM - Semantic Annotation Platform. in 2nd International Semantic Web Conference (ISWC2003). 2003. Sanibel Island, Florida, USA: Springer-Verlag.
11. Dong, X., A. Halevy, and J. Madhavan, Reference reconciliation in complex information spaces, in SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data. 2005, ACM. p. 85-96.
12. Ramakrishnan, C., K.J. Kochut, and A.P. Sheth, A Framework for Schema-Driven Relationship Discovery from Unstructured Text, in ISWC 2006. 2006. p. 583-596.
13. Sheth, A.P., C. Ramakrishnan, and C. Thomas, Semantics for the Semantic Web: the Implicit, the Formal and the Powerful. International Journal on Semantic Web and Information Systems, 2005. 1(1): p. 1-18.
14. Lappin, S. and H. Leass, An algorithm for pronominal anaphora resolution. Comput. Linguist., 1994. 20(4): p. 535-561.
15. Turney, P. Measuring Semantic Similarity by Latent Relational Analysis. 2005 [cited; Available from: http://arxiv.org/abs/cs/0508053
16. Anyanwu, K. and A. Sheth, The p Operator: Discovering and Ranking Associations on the Semantic Web. SIGMOD Record, 2002. 31(4): p. 42-47.
17. Anyanwu, K. and A. Sheth, ρ-Queries: enabling querying for semantic associations on the semantic web, in Proceedings of the 12th international conference on World Wide Web. 2003, ACM Press: Budapest, Hungary. p. 690-699.
18. Aleman-Meza, B., et al. Context-Aware Semantic Association Ranking. in First International Workshop on Semantic Web and Databases. 2003. Berlin, Germany.
19. Aleman-Meza, B., et al., Ranking Complex Relationships on the Semantic Web. IEEE Internet Computing, 2005. 9(3): p. 37-44.
20. Ramakrishnan, C., et al., Discovering Informative Connection Subgraphs in Multi-relational Graphs. SIGKDD Explorations, 2005. 7(2): p. 56-63.
21. Wolf, S., Z.P. Jeff, and T. Uwe, Querying the Semantic Web with Preferences, in Lecture Notes in Computer Science : The Semantic Web - ISWC 2006. 2006. p. 612-624.
Amit Sheth is the LexisNexis Ohio Eminent Scholar and director of the Knowledge Enabled Information and Ser-vices (Kno.e.sis) Center (http://knoesis.wright.edu) at Wright State University. His reseach interests include the Semantic Web, services science, and information integration and analysis. Sheth is a fellow of the IEEE. Contact him at firstname.lastname@example.org
Cartic Ramakrishnan is a Ph.D. student in the Kno.e.sis center, Department of Computer Science and Engineering, Wright State University. His research interests are focused on the role of semantics in text mining and analytics.