Nicola Leone, Francesco Ricca, Luca Agostino Rubino, Giorgio Terracina
The task of an information integration system is to combine data residing at different sources,
providing the user with a unified view of them, called global schema
. Simple data
integration scenarios have been widely studied and efficient systems are already available.
However, when some constraints are imposed on the quality of the global data, the integration
process becomes difficult and, often, it may provide ambiguous results. Important research
efforts have been spent in this area, but no actual system efficiently implementing the
corresponding techniques is available yet.
Starting from the experience we gained in the INFOMIX  project
we propose a new system that aims at overcoming some limitations experienced in real-world scenarios.
In particular, the new data integration system provides:
- A comprehensive information model, through which the knowledge about the integration
domain can be easily specified. It also provides the possibility of defining expressive integrity constraints
(ICs) over the global schema and a precise characterization of the relationship between global
schema and the local data sources.
- Capability of dealing with data that may result inconsistent with respect to global
- Advanced information integration algorithms providing a formal
correspondence between the data integration system and the expected query answers, especially in
the handling of inconsistent data.
- Mass-memory-based evaluation strategies to deal with real world scenarios involving
massive amounts of data.
- User-friendly Graphical User Interface to both design and query integration systems.
In particular, the system is based on Answer Set Programming (ASP)  and
exploits datalog-based methods for answering user queries,
which are sound and complete with respect to the semantic of query answering. This guarantees
meaningful data integration.
It incorporates a number of optimization techniques that "localize" and limit the inefficient
computation, due to the handling of inconsistencies, to a very small fragment of the input. This
allows obtaining fast query-answering, even in such a powerful data-integration framework.
The problem of consistent query answering is reduced to cautious reasoning on
disjunctive datalog programs, which allows to effectively compute the query results precisely, by
using state-of-the-art disjunctive datalog systems. The formal query semantics is captured also
in presence of inconsistent data.
Finally, the system adopts DLVDB
 as internal query evaluation engine,
which allows for mass-memory evaluations and distributed data management features.
1. Lenzerini, M.: Data integration: A theoretical perspective. In: Proc. PODS 2002. (2002) 233-246
2. Leone, N. et al.: The INFOMIX System for Advanced Integration of Incomplete and Inconsistent Data In: Proc. ACM SIGMOD 2005. (2005) 915-917
3. Bertossi, L.E., Hunter, A., Schaub, T., eds.: Inconsistency Tolerance. Volume 3300 of LNCS. Springer (2005)
4. Arenas, M., Bertossi, L.E., Chomicki, J.: Consistent Query Answers in Inconsistent Databases. In: Proc. PODS 1999, ACM Press (1999) 68-79
5. Chomicki, J., Marcinkowski, J.: Minimal-change integrity maintenance using tuple deletions. Information and Computation 197(1/2) (2005) 90-121
6. Gelfond, M., Lifschitz, V.: Classical Negation in Logic Programs and Disjunctive Databases. NGC, 9, 1991, 365-385.
7. Terracina, G., Leone, N., Lio, V., Panetta, C.: Experimenting with recursive queries in database and logic programming systems. Theory and Practice of Logic Programming (TPLP) . Cambridge University Press, UK. 8(2) (2008) 129-165