KnowledgeWiki

From Knoesis wiki
Revision as of 03:26, 15 February 2016 by Nishita (Talk | contribs) (Important Queries)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

KnowledgeWiki A semantic platform for creating, integrating and curating knowledge graphs. Federated Semantic Services Platform for Open Materials Science and Engineering : Building a semantic infrastructure to create domain models for material science domain, provide semantic search over the multi model data sources and enable the exchanging of material science data via Linked Data.

People

Graduate Students: Nishita Jaykumar, PavanKalyan Yallamelli, Vinh Nguyen, Sarasi Lalithsena
External Collaborators: Clare Paul AFRL/RX PI
Faculty: Amit P. Sheth (Advisor), Krishnaprasad Thirunarayan
Past Members: Kalpa Gunaratna, Swapnil Soni, Siva Kumar Cheekula and Mary Panahiazar

Introduction

The White House’s Materials Genome Initiate (MGI) seeks to substantially improve the process of new material discovery and development, and shorten the time to deployment. Two of the core components of this initiative - new and sophisticated computer modeling technologies and next-generation experimental tools - received initial federal research support through 2012. The third major component is that of developing solutions for broader access to scientific data about materials to aid in achieving the goal of faster development of new materials at lower costs.

Our approach recognizes the need for providing easy access to large amounts of highly distributed and heterogeneous data – including unstructured (scientific literature or publications), semi-structured and structured data. We recognize the need to support a variety of data as well as resources that provide data using APIs and Web services. We recognize the need for tools to be able to easily exchange data. We also recognize the need for integrated provenance (i.e., data lineage) to support data quality and relevance, and access control for organizations to share information when desired and yet keep valuable intellectual property confidential. To address these requirements, we will use recent advances in semantic web (standards, search and query processing techniques and tools, Web of Data or Linked Open Data) and semantic services computing, along with integral support for provenance and access control. In a complementary effort during the first year, the development of domain models and knowledge bases (ontologies, taxonomies, and vocabularies), will be carried out with support from ARFL’s Materials and Manufacturing Directorate.

This three-year project will undertake three broad classes of tasks. The first related to creating semantic infrastructure including ability to create semantic metadata for a variety of data types utilizing domain models and knowledge bases. The second relates to semantic search for all varieties of data, including resources with services based access. The third relates to development of a novel semantic data exchange scheme for materials science (termed Linked Open Materials Data) by developing an open data based approach.

Approach

We identified that Semantic Mediawiki is the ideal tool for the task at hand. There are 11 Templates and 28 properties that we determined to be necessary for capturing the data in Material Science.

This section is mainly concentrated on overall architecture of the system, development of new extension using Singleton template, representing the provenance metadata of the triples and algorithm for identifying the singleton templates for any given RDF dataset.

Overall Architecture

This section explains the data collection via the existing semantic forms,integrating singleton property template in SMW , new data representation and representing entity, creating triple and representing each entity as wiki page.

Data collection, Data representation and Data management are the three phases used to collect the data, representing the data using singleton template and creating triples and pages in wiki.

Singleton Template Extension

In mediawiki templates are used as a simplest way for including the markup. Singleton template extension was implemented in order to represent the provenance information of RDF triples. This extension was seamlessly incorporated into the existing extension.

Data Model

In this section we discuss the details of the data model and vocabulary modelling. The following table describes the details of the templates


 

No: Template Name Resource Link
1 Definition Text http://matvocab.org/wiki-dev/index.php/Template:tmpltDefinitionText
2 Definitions on Other Websites http://matvocab.org/wiki-dev/index.php/Template:tmpltDefinitionOnOtherWebsite
3 Name, Abbreviations, Symbols, Synonyms, and Units http://matvocab.org/wiki-dev/index.php/Template:tmpltName
4 Image http://matvocab.org/wiki-dev/index.php/Template:tmpltImage
5 Video http://matvocab.org/wiki-dev/index.php/Template:tmpltVideo
6 Sound http://matvocab.org/wiki-dev/index.php/Template:tmpltSound
7 Equation http://matvocab.org/wiki-dev/index.php/Template:tmpltEquation
8 Code Snippet http://matvocab.org/wiki-dev/index.php/Template:tmpltCodeSnippet
9 Source Code http://matvocab.org/wiki-dev/index.php/Template:tmpltSourceCode
10 Related Information http://matvocab.org/wiki-dev/index.php/Template:tmpltRelatedPageInThisWiki


Properties


 

No: Template Name Resource Link
1 skos:definiton http://matvocab.org/wiki-dev/index.php/Property:skos:definition
2 dcterms:source http://matvocab.org/wiki-dev/index.php/Property:dcterms:source
3 mv:sourceType http://matvocab.org/wiki-dev/index.php/Property:mv:sourceType
4 mv:sourceURL http://matvocab.org/wiki-dev/index.php/Property:mv:sourceURL
5 dcterms:license http://matvocab.org/wiki-dev/index.php/Property:dcterms:license
6 dcterms:creator http://matvocab.org/wiki-dev/index.php/Property:dcterms:creator
7 mv:sourceType http://matvocab.org/wiki-dev/index.php/Property:mv:sourceType
8 rdfs:isDefinedBy http://matvocab.org/wiki-dev/index.php/Property:rdfs:isDefinedBy
9 rdfs:comment http://matvocab.org/wiki-dev/index.php/Property:rdfs:comment
10 rdfs:label http://matvocab.org/wiki-dev/index.php/Property:rdfs:label
11 vaem:abbreviation http://matvocab.org/wiki-dev/index.php/Property:vaem:abbreviation
12 mv:symbol http://matvocab.org/wiki-dev/index.php/Property:mv:symbol
13 qudt:unit http://matvocab.org/wiki-dev/index.php/Property:qudt:unit
14 schema:image http://matvocab.org/wiki-dev/index.php/Property:schema:image
15 schema:video http://matvocab.org/wiki-dev/index.php/Property:schema:video
16 mo:recording_of http://matvocab.org/wiki-dev/index.php/Property:mo:recording_of
17 xhv:math http://matvocab.org/wiki-dev/index.php/Property:xhv:math
18 mv:codeSnippet http://matvocab.org/wiki-dev/index.php/Property:mv:codeSnippet
19 schema:programmingLanguage http://matvocab.org/wiki-dev/index.php/Property:schema:programmingLanguage
20 rdfs:comment http://matvocab.org/wiki-dev/index.php/Property:rdfs:comment
21 schema:programmingLanguage http://matvocab.org/wiki-dev/index.php/Property:schema:programmingLanguage
22 rdf:seeAlso http://matvocab.org/wiki-dev/index.php/Property:rdfs:comment
23 dcterms:references http://matvocab.org/wiki-dev/index.php/Property:rdfs:seeAlso
24 dcterms:bibliographicCitation http://matvocab.org/wiki-dev/index.php/Property:dcterms:bibliographicCitation
25 dcterms:identifier http://matvocab.org/wiki-dev/index.php/Property:dcterms:identifier
25 mv:synonym http://matvocab.org/wiki-dev/index.php/Property:mv:synonym


Property Template Approach

  • This section discusses the details of the steps taken to automatically create the wiki pages for the entities in YAGO dataset
  1. Identify a list of regular properties
  2. Identify a list of Generic properties
  3. Create one page per property  
    1. For each property check the count of datatypes it has (using group by query)
    2. if it has only datatype, map that dt to the wikidata dt (create the has type : type)
    3. else create an empty property
    4. if the object is URI then the datatype is Page
  4. Create a list of regular templates
    1. The name of the template is taken from the  name of the property
    2. generate the regular template tag.
  5. Create a list of singleton templates
    1. The name of the template is taken from the generic property
    2. Add a UUID property for existing value of singleton property
    3. Generate the meta-template tag/code for each template
  6. For prperties created from yago, add another property to capture the data of the original property (make a link from this property to the original property to link to yago dataset)
  7. Virtuoso configuration.
  8. Analyze the statistics of the concept

Important Queries

  • Get all the regular properties (triples with regular properties):
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT distinct ?p
WHERE {
  ?s ?p ?o .
  FILTER (NOT EXISTS {?s rdf:singletonPropertyOf ?x . })
  FILTER (NOT EXISTS {?p rdf:singletonPropertyOf ?y . })
  FILTER (NOT EXISTS {?o rdf:singletonPropertyOf ?z . })
}
  • What are the singleton properties :
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT  distinct ?x ?s
WHERE {
   ?s rdf:singletonPropertyOf ?x .  
}
  • What are the generic properties :
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT distinct ?x
WHERE {
  ?s ?p ?o .
  ?p rdf:singletonPropertyOf ?x . 
}
  • What are all the meta-properties of the singleton property ?
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix exp: <http://example.org/>
SELECT distinct ?x ?p
WHERE {
  ?s ?p ?o .
  ?s rdf:singletonPropertyOf ?x .
  FILTER (?p != rdf:singletonPropertyOf )
}
  • Filter all literals in the graph
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?s ?p ?o
WHERE { 
?s ?p ?o FILTER isLiteral(?o) .
}
  • Get all properties for each entity
select ?s ?p ?o
where {
?s ?p ?o
}
  • Get all concepts in the graph
prefix owl:<http://www.w3.org/2002/07/owl#>
SELECT DISTINCT ?a
where
{
?a a owl:Thing
}
group by ?s

For more information please visit YagoDataset