From Knoesis wiki
Note : This is a draft version of SA-REST. The latest (and active) version is the W3C Member Submission
SA-REST is a simple and open microformat for enhancing Web resources with additional semantic information. In addition to HTML and XHTML, the SA-REST approach can also be used to enrich Atom, RSS, and arbitrary XML. SA-REST is one of several open microformat standards.
Collaborators in the SA-REST initiative are listed by their affiliations.
- kno.e.sis center, Wright State University, Dayton, OH
- Amit P. Sheth.
- Karthik Gomadam.
- Ajith Ranabahu
Current mashup tools and technologies
One of the main drawbacks of the current state of the art is the lack of support for interoperability, especially that of data. Since most of the existing tools limit their support to services internal to the specific vendor that created them (for example, Google Mashup Editor has a complete abstraction of Google Maps service) or to services that have standard types of outputs such as RSS or Atom (Yahoo! Pipes), the problem of interop is not highlighted.
- Visual: Complexity arising out of the need to create intuitive visual elements and handle various events relating to them
- Data: Complexity arising due to heterogeneity in data schemas and formats.
The primary objective of SA-REST is to address the data complexity issue.
SA-REST in a nutshell
SA-REST is a microformat to add additional meta-data to (but not limited to) REST API descriptions in HTML and XHTML. Developers can directly embed meta-data from various models such an ontology, taxonomy or a tag cloud into their API descriptions. The embedded meta-data can be used to improve search (for example: perform faceted search for APIs), data mediation (in conjunction with XML annotation) as well as help in easier integration of services to create mashups.
Rather than talking only about SWS (not relevant), we should instead talk about other microformats and the synergy therein with SA-REST.
Researchers in the area of Semantic Web Services have proposed various specifications, the prominent of which are
In 2005, the W3C initiated a charter to create a standard for adding semantics to WSDL descriptions. The WSDL-S specification (submitted by Services Research Lab at kno.e.sis from LSDIS Lab in GA along with IBM) was taken as the primary input for the charter. This led to the standardization of SAWSDL (Semantic Annotation of WSDL and XML Schema). SAWSDL has had a significant impact in the evolution of SA-REST. However, the adoption of a microformat based approach is a key difference between the SAWSDL and SA-REST frameworks. However, it must be noted here that the principles of Schema annotation, lifting and lowering can be directly used from SAWSDL for XML data objects in the RESTful environment.
The number of available APIs are growing fast. In April 2008, we found that there were about 700 APIs added to ProgrammableWeb. In September, that number is over 900. Currently, general purpose search engines like Google are largely used to find these APIs. However, these treat API documents like any other in indexing and ranking APIs. As a result, search for APIs (even when specific queries like "Maps API") results in API resources being scattered all over the result set. Web API directories like programmableWeb do present a more domain-specific solution. However, they largely rely on user tags for classification and searching.
Addition of meta-data to capture the various facets of APIs (their functionality, the message types they support, clientside bindings, protocol) can allow for better searching. The results of one such framework APIHut is presented in Faceted Search for APIs. We also present our initial evaluation of precision and recall metrics. SA-REST can improve faceted search in a significant manner. Using known techniques of GRDDL and XSLT, one can extract RDF representations of APIs. This can then be indexed and searched upon.
Data Mediation and Mediatability
The importance of enabling easier approaches to data mediation has been well understood. In the context of mashups this is even more important, largely due to the fact that often developers are faced with the burden of handling data at the client side. SA-REST will address this issue in two ways
- Adopting XML annotation from SAWSDL: This will allow us to add the lifting and lowering transformations to data elements as a part of the API description. Information about SAWSDL lifting and lowering can be found in the SAWSDL spec on schema annotations. There is a small catch that we have to address here. In the WSDL world, data exchange was XML de-facto. However, in the RESTful environment, developers can use many formats such as JSON, GData, RSS. It will be interesting to investigate this as a part of the SA-REST effort.
- Mediatability: Mediatability is a measure of the estimated human effort for performing data mediation manually. Having additional annotations can significantly help us in computing the mediatability. Even when automatic mediation is not possible, knowing how hard or easy the mediation between two services can definitely help developers in choosing services for their mashups.
Smart mashups are those that allow the end user more flexibility to change certain services in a mashup. For example, in the popular Housing Maps mashup, if the quality of Yahoo! maps in a certain area is better than that of Google, the user must have the flexibility to change it. To realize this, we are pursuing on a meta based approach for mashup creation. In this approach, the developer creates the mashup application at a meta level and services are added to them at the run time. In this context, there needs to be a way for the developer to specify the requirements for a service and the system to check if the user preference meets the requirement. Having annotations can help accomplish this task with lesser difficulty.
Semi-automatic text annotation is a significant research area, primarily due to the large volume of text data that becomes available everyday. It is not viable to annotate such volumes of data purely by human effort and one needs to employ text processing techniques to provide automatic markup. One major challenge in text processing is disambiguation, selecting the correct semantics of a word that may be used across domains to represent different concepts.
The domain-rel property acts as a guide to describe the domain(s) a certain text snippet describes and hence provides a means for the text processing / automatic annotation engines to perform effective disambiguation.
Design principles and methods
XHTML Design principles
Due to the specific nature of this type of annotations, there are several design principles that are usually followed when designing XHTML based microformats. These patterns are well documented in the hcalendar microformat specification. However we outline the most important design principles that were followed during this microformat design.
- Reuse the schema (names, objects, properties, values, types, hierarchies, constraints) as much as possible from pre-existing, established, well-supported standards by reference. This introduces minimum or no disruption to the regular machinery that interacts with this markup.
- Use a generic structural element (e.g. <span> or <div> ), or the appropriate contextual element (e.g. an <li> inside a <ul> or <ol>).
- if the format of the data according to the original schema is too long and/or not human-friendly,place the literal data into the 'title' attribute leaving the human readable text inside the element. This is extensively used in the hcalendar format with the <abbr> element where literal text is wrapped with <abbr> elements to provide the machine-friendly annotation. The specific design decision in hcalendar is detailed out by the author here.
Now we provide two styles of applying the microformat. These styles can be used interchangeably depending on readability and convenience considerations.
Class and Title Style
Guided by the first design principle, the class and title attributes are used to provide a name-value relationship for the text content. For example marking up the word map with a specific class reference would appear in XHTML as follows
<span class="sem-class" title="http://apihut.com/schema/apihut-taxonomy#Map">Maps</span>
Class Only Style
This is a convenient way of achieving the same objective of attaching a name-value pair to text content. The class attribute contains the name and value separated by a space and the first section is considered to be the name and the other, the value. Class only style has the additional benefit of not having an undesired tooltip effect on the text, but gives up the neat separation and hence sacrifices the readability.
<span class="sem-class http://apihut.com/schema/apihut-taxonomy#Map">Maps</span>
The microformat properties can be categorized into two major types.
Markups that pertain to a block like div, body etc. Such markup applies to a larger portion of text that may or may not contain other markups.
<body class="domain-rel" title="http://apihut.com/taxonomies/domainClassification.rdf#maps"> ... </body>
In this example, the domain-rel property is added to the body and hence covers the complete text content encapsulated by the <body> element. This particular property indicates that API descriptions inside the body belong to the maps domain as described in the domain model.
Markup on a single element like a span which wraps a word,phrase or a single resource. Element markup should not contain other markup.
Often it is necessary to associate multiple values with a single property. For example one might need to indicate that a certain text content as a whole is relevant to both mathematics and biology domains. In such situations SA-REST allows enumerations to be included as the value for a property. Enumerations are a white space separated list of references. The following example illustrates the use of enumerations as a property value.
<body class="domain-rel" title="http://apihut.com/taxonomies/domainClassification.rdf#mathematics http://apihut.com/taxonomies/domainClassification.rdf#biology"> ... </body>
When using the class only style, the list of values after the name are considered to be the value. The previous markup, when made with class only style, would appear as follows.
<body class="domain-rel http://apihut.com/taxonomies/domainClassification.rdf#mathematics http://apihut.com/taxonomies/domainClassification.rdf#biology"> ... </body>
Basic SA-REST properties
SA-REST has three basic properties discussed below. These properties provide for mechanisms to add richer semantic information for any Web resource. However, one can extend SA-REST to capture resource specific semantics. Examples of these include SA-REST extensions for Web APIs and SA-REST extensions for social networking profiles. An author desiring to create a new microformat, however is strongly urged to consider reuse for a resource type before attempting an extension.
The domain-rel property allows description of domain information for an entire resource. If a given resource (such as blog posts) has content spanning multiple domains, it is desired to add multiple domain-rel elements, each surrounding a section of the resource. If such a separation cannot be made, the title attribute should be an enumeration of the domains.
Simple domain-rel example
<span class="domain-rel" title="http://apihut.com/schemas/socialnetworking#socialnetworks> The growing trend of "liking" has recently caught a lot of attention of both network users as well as developers.</span>
The example below illustrates a multi-domain scenario where the domain contexts can be separated in the content. This annotation is desirable when a resources (such as integrated feeds) that draw content from multiple sources are created.
Multi-domain domain-rel example 1
<span class="domain-rel" title="http://apihut.com/schemas/socialnetworking#socialnetworks> The growing trend of "liking" has recently caught a lot of attention of both network users as well as developers...</span> <span class="domain-rel" title="http://apihut.com/schemas/economy#banking"> I also came across this interesting discussion on bailout that talked about nationalization of banks</span>
A very frequent scenario is one where a resource content spans multiple domains and the content is not contextualized.
Multi-domain domain-rel example 2
<span class="domain-rel" title="http://apihut.com/schemas/socialnetworking#socialnetworks http://apihut.com/schemas/economy#recession"> One often wonders the future of advertisement driven Web applications in the current economic scenario. For example, social networking applications such as...</span>
The sem-rel property will capture the semantics of a link. This evolves from the popular rel tag. An application of sem-rel would be to describe a data model that is captured in a XSD. The primary purpose of the sem-rel tag is to allow developers to add "top level annotation" to schemas that are third party. The sem-rel property also allows enumeration within the title attribute. It is used in conjunction with the a attribute.
<a href="http://foo.xsd" sem-rel="http://taxonomy.org/computerscience#firstname"> This is the input schema</a>
<a href="www.teachmegooglemaps.com" class="sem-rel" title="http://apihut.com/taxonomies/domainClassification.rdf#maps"> Learn Google Maps
sem-class is an element markup property. sem-class can be used to markup a single entity within a resource. Like the domain-rel tag, the sem-rel tag can be an enumeration. For example, in a blog, sem-class can be used to markup single words, while in a page (such as YouTube), sem-class mark up an video object. Both scenarios are exemplified below.
One striking observation in evolution of <span class="sem-class" title="http://tap.stanford.edu/#computer"> Computers </span> is the relationship between speed and size..
The example below illustrates the markup of a flash media object. The markup describes the actual video that is embedded therein. In this case, the video is from American Dad episode from hulu.com (may not be active all the time)
sem-class on non-textual content
<span class="sem-class" title="http://entertainment.org/schemas/tv#american_dad"><div id="player-container" style="text-align: center;"> <embed id="player" height="368" width="790" flashvars="stage_width=790&stage_height=368&content_id=m1ppkqeh&bitrate=700000&user_id=-1" bgcolor="#000000" allowfullscreen="true" allowscriptaccess="sameDomain" quality="high" name="player" style="z-index: 10;" src="/player.swf" type="application/x-shockwave-flash"/> </div></span>.
The most straight forward way to process the documents is to use XSLT along with GRDDL. XSLT is a well supported and a flexible way to transform XML documents from one form to another, typically the target form being XML or any other text format. GRDDL specification describes how the XSLT transformation can be used to convert annotated XHTML/HTML documents to RDF. The following snippet shows a specification of the transformation stylesheet according to GRDDL.
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:grddl='http://www.w3.org/2003/g/data-view#' grddl:transformation="glean_title.xsl http://www.w3.org/2001/sw/grddl-wg/td/getAuthor.xsl" >
The subsequent processing can be done using the RDF representation.