The First International Workshop on the role of Semantic Web in Provenance Management
October 25, 2009, Westfields Conference Center, Washington D.C., USA.
(Co-located with the 8th International Semantic Web Conference, ISWC-2009)


The Second SWPM Workshop organized at ISWC2010 - SWPM2010



  • Call for paper: Special IEEE Internet Computing Issue on "Provenance in Web Applications". CFP details
  • Author presentations are available in Program section!
  • Oct 26, 2009: A bullet list of one-liners from the audience, in response to the question: "What is your single most important item in the research agenda for semantics and provenance"? This list will appear (or be linked from) the recently created W3C Incubator on Provenance
  • The workshop proceedings are available online at http://ceur-ws.org/Vol-526


  • 8.30am Registration
  • 8.45am - 9.00am Introduction
  • 9.00am - 10.00am Keynote Address
  • 10.00am - 10.30am Break
  • 10.30am - 12.00noon Session I
    • Presentation 1 (10:30am - 11:00am): Semantic Provenance for Science Data Products: Application to Image Data Processing, Stephan Zednik,Peter Fox, Deborah L. McGuinness, Paulo Pinheiro da Silva and Cynthia Chang.
    • Presentation 2 (11:00am - 11:30am): Reasoning With Provenance, Trust and all that other Meta Knowlege in OWL, Simon Schenk,Renata Dividino and Steffen Staab
    • Presentation 3 (11:30am - 12:00noon): A New Perspective on Semantics of Data Provenance(Invited Paper), Sudha Ram and Jun Liu
  • 12 - 1.30pm Lunch
  • 1.30pm - 3:00pm Session II
    • Presentation 4 (1:30pm - 2:00pm): Semantically Annotated Provenance in the Life Science Grid, Bin Cao, Beth Plale, Girish Subramanian, Paolo Missier, Carole Goble and Yogesh Simmhan
    • Presentation 5 (2:00pm - 2:30pm): On User Views in Scientific Workflow Systems (Invited Paper), Susan Davidson, Yi Chen, Peng Sun and Sarah Cohen-Boulakia
    • Presentation 6 (2:30pm - 3:00pm): Provenance information in biomedical knowledge repositories - A use case (Invited Paper), Olivier Bodenreider
  • 3:00pm - 3.30pm Break
  • 3.30pm - 5:00pm Session III
    • Presentation 7 (3:30pm - 4:00pm): Using Web Data Provenance for Quality Assessment, Olaf Hartig and Jun Zhao.
    • Presentation 8 (4:00pm - 4:30pm): Towards Usable and Interoperable Workflow Provenance: Empirical Case Studies using PML, James Michaelis, Li Ding, Zhenning Shangguan, Stephan Zednik, Rui Huang, Paulo Pinheiro da Silva, Nicholas Del Rio and Deborah L. McGuinness.
  • 4:30pm - 5:00pm Open House
    • Discussion Agenda: What is your single most important item in the research agenda for semantics and provenance


The growing eScience infrastructure is enabling scientists to generate scientific data on an industrial scale [1]. Similarly, the Web 2.0 paradigm is enabling Web users to create focused applications that combine data from multiple sources, popularly referred to as “mashups”, on an extremely large scale [2]. The importance of managing various forms of apparently ancillary metadata, in addition to the primary data products of eScience, Web, and business applications is increasingly being recognized as critical for the correct interpretation of the data. In this proposal we focus specifically on metadata that describes the origins of the data. The term provenance, from the French word “provenir” meaning “to come from", describes the lineage, or origins, of a data entity. Provenance metadata is required to correctly interpret the results of a process execution, to validate data processing tools, to verify the quality of data, and to associate measures of trust to the data.

The primary objective of this workshop is to explore the role of Semantic Web and its standards in addressing some of the critical challenges facing provenance management, namely:

  • Efficiently capturing and propagating provenance information as data is processed, fragmented and recombined across multiple applications and domains on a Web scale.
  • A common representation model for provenance, underpinned by a formal theory for use by both agents and humans [8] [9].
  • Interoperability of provenance information generated in distributed environments such as the Web and myGrid [10].
  • Tools leveraging the Semantic Web for visualization of provenance information [11].

Relevance and Timeliness

The scale at which data across different domains (biomedical informatics [3],[4], astronomy [5], oceanography [6], and Web-mashups [2]) is created and processed, mandates the use of automated software tools for both the processing and analysis of provenance metadata in a scalable way. The proof layer in the Semantic Web layer cake, corresponding to provenance information, has been identified as an important component for the implementation of “trust mechanisms” and effective information extraction from the Web [7]. Sahoo et al. [8] recently brought together the elements of the Semantic Web and provenance metadata to define “semantic provenance.”

Several workshops each addressing different aspects of provenance have been held, such as Provenance in Databases [12], Provenance in Scientific Workflows [12], and IPAW (2006 through 2008), but none of these workshops have specifically addressed the role of Semantic Web in provenance management. Further, the recent funding (by NSF and NIH, respectively) of large eScience projects such as the Semantic Provenance Capture in Data Ingest Systems (SPCDIS) [13], and Semantic Problem solving Environment for T.cruzi [14] makes this workshop timely and relevant. The recently approved IEEE Internet Computing special issue on “Provenance in Web applications in Business, eScience and Social Networking” also emphasizes the increasing importance of provenance management for computer science researchers.


The workshop anticipates the participation of researchers in academia, industry, and government involved in both provenance management and Semantic Web. Given the focus of this workshop on real world eScience, Web, and business projects, we expect domain scientists, Web technologists, and researchers in industry who are interested in provenance management to actively participate in this workshop. The workshop also aims to raise awareness among provenance researchers about Semantic Web and correspondingly highlight provenance management as a rich problem domain for Semantic Web researchers.

Workshop Format

Invited Talk

Prof. Carole Goble
Professor, School of Computer Science, University of Manchester

Paper Presentations

The workshop solicits the submission of original research papers dealing with analytical, theoretical, and practical aspects of provenance management using Semantic Web. Topics of interest include, but are not restricted to:

  • Representation models for provenance, provenance ontologies
  • Provenance analysis (reasoning, knowledge discovery, user-defined rules)
  • Annotation of scientific data using provenance ontologies
  • Role of provenance in social networks, social media and Web 2.0 (mashups)
  • Interoperability and propagation of provenance across applications
  • Large scale storage and efficient querying of provenance
  • Provenance infrastructure for eScience, business, and Web applications
  • Role of provenance in scientific data management

Duration of the Workshop

The workshop is scheduled to be a full-day meeting.



  • Amit Sheth
Amit Sheth is an educator, researcher, and entrepreneur. He is the LexisNexis Ohio Eminent Scholar for Advanced Data Management and Analysis and the director of Kno.e.sis Center at the Wright State University. He has some of the best cited papers (h-index 58) in information integration, workflow management, Semantic Web and semantic web services, and his research interests includes semantics-empowered sensor and social computing on the Web. His research has led to two companies and many deployed systems and applications. http://knoesis.org/amit
  • Vassilis Christophides
Vassilis Christophides is an Associate Professor at the Department of Computer Science, University of Crete, and affiliated researcher at the FORTH-ICS, Hellas. His main research interests include Semantic Web and Peer-to-Peer information management systems, semistructured and XML/RDF data models and query languages as well as description and composition languages for e-services. He has published over 60 articles in international conferences and journals and received the 2004 SIGMOD Test of Time Award and the Best Paper Award at the 2nd and 6th International Semantic Web Conference in 2003 and 2007. http://www.ics.forth.gr/~christop/

Organizing Committee/PC Co-Chairs

  • Juliana Freire, University of Utah
Juliana Freire is an Associate Professor at the School of Computing at the University of Utah. Before, she was member of technical staff at the Database Systems Research Department at Bell Laboratories (Lucent Technologies) and an Assistant Professor at OGI/OHSU. An important theme is Professor Freire's work is the development of data management technology to address new problems introduced by emerging applications, including the Web and scientific applications. Her recent research has focused on two main topics: scientific data management and Web mining. Within scientific data management, she is best known for her work in provenance and scientific workflows, and for being a co-creator of VisTrails.
Further details are at: http://www.cs.utah.edu/~juliana/
  • Paolo Missier, University of Manchester, UK
Paolo Missier joined the School of Computer Science, University of Manchester as a doctoral student in 2004 and then as a Research Fellow since 2008. . His recent research interests are in data and information quality, process automation and workflow technology and its implications for data and metadata management. Prior to joining academia, Paolo was a Research Scientist at Bellcore (Telcordia), NJ, USA and then an independent consultant for the Italian Public Administration, an independent researcher on national Italian and EU projects, as well as an adjunct Professor in Databases in Milano, Italy.
Further details are at: http://www.cs.man.ac.uk/~pmissier/
  • Satya S. Sahoo, Kno.e.sis Center, Wright State University
Satya Sahoo is a researcher and doctoral student at the Kno.e.sis Center, Wright State University. His research interests include semantic provenance, knowledge representation, and information integration in biomedical and sensor domains. He has defined a formal logic-based provenance management framework for scientific data (part of the NIH-funded project Semantic PSE for T.cruzi).
Further details are at: http://cci.case.edu/cci/index.php/Satya_Sahoo, Email: satyasahoo@ieee.org

Program Committee

  • Aleksander Slominski, IBM Research
  • Bertram Ludäscher, University of California Davis
  • Beth Plale, Indiana University
  • Claudio Silva, University of Utah
  • Francisco Curbera, IBM Research
  • Giorgos Flouris, FORTH-ICS, Greece
  • Ilkay Altintas, San Diego Supercomputer Center, UCSD
  • James Cheney, University of Edinburgh
  • Jun Zhao, Oxford University
  • Kei Cheung, Yale University
  • Krishnaprasad Thirunarayan, Wright State University
  • Luc Moreau, University of Southampton
  • Nirmal Mukhi, IBM Research
  • Olivier Bodenreider, National Library of Medicine, NIH
  • Paulo Pinheiro da Silva, University of Texas at El Paso
  • Peter Fox, Tetherless World Research Constellation, RPI
  • Roger Barga, Microsoft Research
  • Sarah Cohen-Boulakia, Universite Paris-Sud
  • Sudha Ram, Arizona State University
  • Val Tannen, University of Pennsylvania
  • Yogesh Simmhan, Microsoft Research

Submissions of Papers

Submissions and reviewing will be handled using the EasyChair reviewing system. Submitted papers will be refereed by at least three members the Program Committee. Accepted papers will be published as CEUR Workshop Proceedings and also made available to attendees on an electronic media (either CD or USB stick).

All submissions should be maximum 6 pages long (in IEEE format http://www.ieee.org/web/publications/pubservices/confpub/AuthorTools/conferenceTemplates.html) in PDF format.

Please submit your paper using the EasyChair site at: http://www.easychair.org/conferences/?conf=swpm2009

Important Dates

  • Submissions due(Closed): Friday, August 14, 2009 23:59 (11:59pm) Hawaii time
  • Notification: Monday, August 31, 2009
  • Camera ready papers due: October 2, 2009
  • Workshop Date: October 25 2009, Westfields Conference Center, Washington D.C., USA.


  1. Hey T, Trefethen, A. E. . Cyberinfrastructure for e-Science. Science 2005;308(5723):817 - 821.
  2. Maximilien EM, Ranabahu, A., Gomadam, K. An Online Platform for Web APIs and Service Mashups. IEEE Internet Computing 2008;12(5):32-43.
  3. www.nbirn.net.
  4. http://bioontology.org.
  5. http://pan-starrs.ifa.hawaii.edu/.
  6. http://www.neptune.washington.edu/.
  7. Sizov S. What Makes You Think That? The Semantic Web's Proof Layer. IEEE Intelligent Systems 2007;22(6):94-99.
  8. Sahoo SS, Sheth, A., Henson, C. Semantic Provenance for eScience: Managing the Deluge of Scientific Data. IEEE Internet Computing 2008;12(4):46-54.
  9. Santos E, Lins, L., Ahrens, J.P., Freire, J., Silva, C.T. A First Study on Clustering Collections of Workflow Graphs. In: IPAW 2008. Utah; 2008.
  10. http://www.mygrid.org.uk/.
  11. Freire J, Silva, C.T., Callahan, S.P., Santos, E., Scheidegger,C. E., Vo, H.T. . Managing Rapidly-Evolving Scientific Workflows. In: proceedings of the International Provenance and Annotation Workshop (IPAW); 2006.
  12. http://wiki.esi.ac.uk/Principles_of_Provenance.
  13. http://spcdis.hao.ucar.edu/.
  14. http://knoesis.wright.edu/research/semsci/projects/tcruzi/.
