Workshops‎ > ‎RiCeRcA 2009‎ > ‎

A Data Integration System based on ASP

Nicola Leone, Francesco Ricca, Luca Agostino Rubino, Giorgio Terracina

Abstract

The task of an information integration system is to combine data residing at different sources, providing the user with a unified view of them, called global schema [1]. Simple data integration scenarios have been widely studied and efficient systems are already available. However, when some constraints are imposed on the quality of the global data, the integration process becomes difficult and, often, it may provide ambiguous results. Important research efforts have been spent in this area, but no actual system efficiently implementing the corresponding techniques is available yet. Starting from the experience we gained in the INFOMIX [2] project we propose a new system that aims at overcoming some limitations experienced in real-world scenarios. In particular, the new data integration system provides:
  1. A comprehensive information model, through which the knowledge about the integration domain can be easily specified. It also provides the possibility of defining expressive integrity constraints (ICs) over the global schema and a precise characterization of the relationship between global schema and the local data sources.
  2. Capability of dealing with data that may result inconsistent with respect to global ICs [3,4,5].
  3. Advanced information integration algorithms providing a formal correspondence between the data integration system and the expected query answers, especially in the handling of inconsistent data.
  4. Mass-memory-based evaluation strategies to deal with real world scenarios involving massive amounts of data.
  5. User-friendly Graphical User Interface to both design and query integration systems.
In particular, the system is based on Answer Set Programming (ASP) [6] and exploits datalog-based methods for answering user queries, which are sound and complete with respect to the semantic of query answering. This guarantees meaningful data integration. It incorporates a number of optimization techniques that "localize" and limit the inefficient computation, due to the handling of inconsistencies, to a very small fragment of the input. This allows obtaining fast query-answering, even in such a powerful data-integration framework. The problem of consistent query answering is reduced to cautious reasoning on disjunctive datalog programs, which allows to effectively compute the query results precisely, by using state-of-the-art disjunctive datalog systems. The formal query semantics is captured also in presence of inconsistent data. Finally, the system adopts DLVDB [7] as internal query evaluation engine, which allows for mass-memory evaluations and distributed data management features.

References

  1. Lenzerini, M.: Data integration: A theoretical perspective. In: Proc. PODS 2002. (2002) 233-246 
  2. Leone, N. et al.: The INFOMIX System for Advanced Integration of Incomplete and Inconsistent Data In: Proc. ACM SIGMOD 2005. (2005) 915-917 
  3. Bertossi, L.E., Hunter, A., Schaub, T., eds.: Inconsistency Tolerance. Volume 3300 of LNCS. Springer (2005)
  4. Arenas, M., Bertossi, L.E., Chomicki, J.: Consistent Query Answers in Inconsistent Databases. In: Proc. PODS 1999, ACM Press (1999) 68-79 
  5. Chomicki, J., Marcinkowski, J.: Minimal-change integrity maintenance using tuple deletions. Information and Computation 197(1/2) (2005) 90-121 
  6. Gelfond, M., Lifschitz, V.: Classical Negation in Logic Programs and Disjunctive Databases. NGC, 9, 1991, 365-385. 
  7. Terracina, G., Leone, N., Lio, V., Panetta, C.: Experimenting with recursive queries in database and logic programming systems. Theory and Practice of Logic Programming (TPLP) . Cambridge University Press, UK. 8(2) (2008) 129-165
Ċ
Marco Gavanelli,
Dec 16, 2009, 10:06 AM
Comments