Report on the open house discussion at SWPM2010 Shanghai, China, November 7, 2010


What is the most important item of provenance standardization for you and your community


We had around 14 people participating in the discussion. 1/3 of the attendees are from the provenance community and the rest are from a variety of communities, including social web, pharmaceutical companies, library communities, or just new to provenance research.

Discussion Points

Three themes emerged from the discussion: - A requirement for a minimum provenance model: This was the starting topic. The need for an extensible minimum provenance model showed up once and again during the discussion, starting with the concept of metametadata introduced by Johanna Völker during the presentation of her paper on RDF provenance through named graphs (presentation). The temptation to model every edge case instead of just the main case should be avoided in such minimum model. This was strongly supported by other participants, who also emphasized that the solution might probably not be a global minimum but a modular model and an extensible model that can be adapted and extended to different scenarios.

1. A notice of lack of outreach outside the provenance community. Where are the consumers of provenance information? What is their involvement in the XG? It was suggested that providers and consumers of provenance data should be brought in a possible WG. A big problem is getting users in the different domains like e.g. Pharma to produce provenance information. This can only be done reducing entry barriers to the technology by:

  • providing a very lightweight provenance model and
  • offering assisted and trustworthy means that record and manage provenance information transparently from the user.

2. The idea of liaison was also raised: If there will be a working group, more liaison must be formed, for collaboration and for dissemination. For example, in the XG we never talked to the library community and they are surely experts in metadata, e.g. the metametadata work mentioned by Johanna is exactly from there and they are moving very fast.

Other interesting though apparently less critical issues were:

1. It should be possible to attach modifiers to provenance statements so that it can be expressed whether they are produced by automated means or by humans. If the context or situation, in which provenance records are created, is also documented, more meaningful interpretations of such provenance information can be produced.

2. A layered approach seems to be needed that allows querying provenance records where the metametadata concept is applied. Named graphs and federated sparql seemed to be very relevant here in order to avoid writing evillish complex queries.