Jump to content

Ontology-based data integration: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
m update reference with new working link
m citation support
Line 1: Line 1:
'''Ontology-based data integration''' involves the use of [[ontology (computer science)|ontology]](s) to effectively combine data or information from multiple heterogeneous sources.<ref name="wache">{{cite conference |author1=H. Wache |author2=T. Vögele |author3=U. Visser |author4=H. Stuckenschmidt |author5=G. Schuster |author6=H. Neumann |author7=S. Hübner | title=Ontology-Based Integration of Information A Survey of Existing Approaches | year=2001 | citeseerx = 10.1.1.142.4390 }}</ref> It is one of the multiple [[data integration]] approaches and may be classified as Global-As-View (GAV).<ref name="refone">{{cite conference | author=Maurizio Lenzerini | title=Data Integration: A Theoretical Perspective | year=2002 | pages=243–246 | url=http://www.dis.uniroma1.it/~lenzerin/homepagine/talks/TutorialPODS02.pdf }}</ref> The effectiveness of ontology based data integration is closely tied to the consistency and expressivity of the ontology used in the integration process.
'''Ontology-based data integration''' involves the use of [[ontology (computer science)|ontology]](s) to effectively combine data or information from multiple heterogeneous sources.<ref name="wache">{{cite conference |author1=H. Wache |author2=T. Vögele |author3=U. Visser |author4=H. Stuckenschmidt |author5=G. Schuster |author6=H. Neumann |author7=S. Hübner | title=Ontology-Based Integration of Information A Survey of Existing Approaches | year=2001 | citeseerx = 10.1.1.142.4390 }}</ref> It is one of the multiple [[data integration]] approaches and may be classified as Global-As-View (GAV).<ref name="refone">{{cite conference | author=Maurizio Lenzerini | title=Data Integration: A Theoretical Perspective | year=2002 | pages=243–246 | url=http://www.dis.uniroma1.it/~lenzerin/homepagine/talks/TutorialPODS02.pdf }}</ref> The effectiveness of ontology based data integration is closely tied to the consistency and expressivity of the ontology used in the integration process<ref name=":0">{{Cite journal|last=Alrehamy|first=Hassan|last2=Walker|first2=Coral|date=2018-03-26|title=SemLinker: automating big data integration for casual users|url=https://doi.org/10.1186/s40537-018-0123-x|journal=Journal of Big Data|volume=5|pages=14|doi=10.1186/s40537-018-0123-x|issn=2196-1115}}</ref>.


==Background==
==Background==
Line 25: Line 25:
===Approaches using ontologies for data integration===
===Approaches using ontologies for data integration===
There are three main architectures that are implemented in ontology-based data integration applications,<ref name="wache"/> namely,
There are three main architectures that are implemented in ontology-based data integration applications,<ref name="wache"/> namely,
;Single ontology approach: A single ontology is used as a global reference model in the system. This is the simplest approach as it can be simulated by other approaches.<ref name="wache"/> SIMS<ref name="arens"/> a prominent example of this approach. The Structured Knowledge Source Integration component of [[Cyc|Research Cyc]] is another prominent example of this approach.<ref>http://www.cyc.com/content/semantic-knowledge-source-integration</ref><ref>http://www.aaai.org/ojs/index.php/aimagazine/article/viewArticle/2299</ref> (Title = Harnessing Cyc to Answer Clinical Researchers' Ad Hoc Queries)
;Single ontology approach: A single ontology is used as a global reference model in the system<ref name=":0" />. This is the simplest approach as it can be simulated by other approaches.<ref name="wache"/> SIMS<ref name="arens"/> a prominent example of this approach. The Structured Knowledge Source Integration component of [[Cyc|Research Cyc]] is another prominent example of this approach.<ref>http://www.cyc.com/content/semantic-knowledge-source-integration</ref><ref>http://www.aaai.org/ojs/index.php/aimagazine/article/viewArticle/2299</ref> (Title = Harnessing Cyc to Answer Clinical Researchers' Ad Hoc Queries)


;Multiple ontologies: Multiple ontologies, each modeling an individual data source, are used in combination for integration. Though, this approach is more flexible than the single ontology approach, it requires creation of mappings between the multiple ontologies. Ontology mapping is a challenging issue and is focus of large number of research efforts in [[computer science]] [http://www.ontologymatching.org/]. The OBSERVER system<ref name="mena">{{cite conference |author1=E. Mena |author2=V. Kashyap |author3=A. Sheth |author4=A. Illarramendi | title=OBSERVER: An Approach for Query Processing in Global Information Systems based on Interoperation across Pre-existing Ontologies | year=1996 | url=http://dit.unitn.it/~p2p/RelatedWork/Matching/MKSI96.pdf}}</ref> is an example of this approach.
;Multiple ontologies: Multiple ontologies, each modeling an individual data source, are used in combination for integration. Though, this approach is more flexible than the single ontology approach, it requires creation of mappings between the multiple ontologies. Ontology mapping is a challenging issue and is focus of large number of research efforts in [[computer science]] [http://www.ontologymatching.org/]. The OBSERVER system<ref name="mena">{{cite conference |author1=E. Mena |author2=V. Kashyap |author3=A. Sheth |author4=A. Illarramendi | title=OBSERVER: An Approach for Query Processing in Global Information Systems based on Interoperation across Pre-existing Ontologies | year=1996 | url=http://dit.unitn.it/~p2p/RelatedWork/Matching/MKSI96.pdf}}</ref> is an example of this approach.

Revision as of 11:27, 27 March 2018

Ontology-based data integration involves the use of ontology(s) to effectively combine data or information from multiple heterogeneous sources.[1] It is one of the multiple data integration approaches and may be classified as Global-As-View (GAV).[2] The effectiveness of ontology based data integration is closely tied to the consistency and expressivity of the ontology used in the integration process[3].

Background

Data from multiple sources are characterized by multiple types of heterogeneity. The following hierarchy is often used:[4][5][6]

  • Syntactic heterogeneity: is a result of differences in representation format of data
  • Schematic or structural heterogeneity: the native model or structure to store data differ in data sources leading to structural heterogeneity. Schematic heterogeneity that particularly appears in structured databases is also an aspect of structural heterogeneity.[4]
  • Semantic heterogeneity: differences in interpretation of the 'meaning' of data are source of semantic heterogeneity
  • System heterogeneity: use of different operating system, hardware platforms lead to system heterogeneity

Ontologies, as formal models of representation with explicitly defined concepts and named relationships linking them, are used to address the issue of semantic heterogeneity in data sources. In domains like bioinformatics and biomedicine, the rapid development, adoption and public availability of ontologies [1] has made it possible for the data integration community to leverage them for semantic integration of data and information.

The role of ontologies

Ontologies enable the unambiguous identification of entities in heterogeneous information systems and assertion of applicable named relationships that connect these entities together. Specifically, ontologies play the following roles:

  • Content Explication[1]

The ontology enables accurate interpretation of data from multiple sources through the explicit definition of terms and relationships in the ontology.

In some systems like SIMS,[7] the query is formulated using the ontology as a global query schema.

  • Verification[1]

The ontology verifies the mappings used to integrate data from multiple sources. These mappings may either be user specified or generated by a system.

Approaches using ontologies for data integration

There are three main architectures that are implemented in ontology-based data integration applications,[1] namely,

Single ontology approach
A single ontology is used as a global reference model in the system[3]. This is the simplest approach as it can be simulated by other approaches.[1] SIMS[7] a prominent example of this approach. The Structured Knowledge Source Integration component of Research Cyc is another prominent example of this approach.[8][9] (Title = Harnessing Cyc to Answer Clinical Researchers' Ad Hoc Queries)
Multiple ontologies
Multiple ontologies, each modeling an individual data source, are used in combination for integration. Though, this approach is more flexible than the single ontology approach, it requires creation of mappings between the multiple ontologies. Ontology mapping is a challenging issue and is focus of large number of research efforts in computer science [2]. The OBSERVER system[10] is an example of this approach.
Hybrid approaches
The hybrid approach involves the use of multiple ontologies that subscribe to a common, top-level vocabulary.[11] The top-level vocabulary defines the basic terms of the domain. Thus, the hybrid approach makes it easier to use multiple ontologies for integration in presence of the common vocabulary.

See also

References

  1. ^ a b c d e f H. Wache; T. Vögele; U. Visser; H. Stuckenschmidt; G. Schuster; H. Neumann; S. Hübner (2001). Ontology-Based Integration of Information A Survey of Existing Approaches. CiteSeerX 10.1.1.142.4390.
  2. ^ Maurizio Lenzerini (2002). Data Integration: A Theoretical Perspective (PDF). pp. 243–246.
  3. ^ a b Alrehamy, Hassan; Walker, Coral (2018-03-26). "SemLinker: automating big data integration for casual users". Journal of Big Data. 5: 14. doi:10.1186/s40537-018-0123-x. ISSN 2196-1115.{{cite journal}}: CS1 maint: unflagged free DOI (link)
  4. ^ a b A.P. Sheth (1999). Changing Focus on Interoperability in Information Systems: From System, Syntax, Structure to Semantics (PDF). pp. 5–30. {{cite book}}: Unknown parameter |booktitle= ignored (help)
  5. ^ AHM02 Tutorial 5: Data Integration and Mediation; Contributors: B. Ludaescher, I. Altintas, A. Gupta, M. Martone, R. Marciano, X. Qian
  6. ^ "AHM02 Tutorial 5: Data Integration and Mediation". users.sdsc.edu. Retrieved 2017-11-23.
  7. ^ a b Y. Arens; C. Hsu; C.A. Knoblock (1996). Query Processing in sims information mediator (PDF).
  8. ^ http://www.cyc.com/content/semantic-knowledge-source-integration
  9. ^ http://www.aaai.org/ojs/index.php/aimagazine/article/viewArticle/2299
  10. ^ E. Mena; V. Kashyap; A. Sheth; A. Illarramendi (1996). OBSERVER: An Approach for Query Processing in Global Information Systems based on Interoperation across Pre-existing Ontologies (PDF).
  11. ^ Cheng Hian Goh (1997). Representing and Reasoning about Semantic Conflicts in Heterogeneous Information Systems (PDF).