User:Aandre156/sandbox: Difference between revisions
Line 95: | Line 95: | ||
* [[Datalog]] |
* [[Datalog]] |
||
* [[Prolog]] |
* [[Prolog]] |
||
* [[Conjunctive query]] |
|||
* [[DBMS]] |
* [[DBMS]] |
||
* [[Semantic Web Rule Language]] |
|||
== References == |
== References == |
Revision as of 13:32, 30 June 2022
Paradigm | Logic, Declarative |
---|---|
Family | Datalog |
Designed by | Oxford University |
Major implementations | |
into JupyterLab |
The Vadalog system is a state-of-the-art Knowledge Graph Management System (KGMS) that offers a language for performing complex logic reasoning tasks over data stored in knowledge graphs and that, at the same time, it delivers a platform to support the entire spectrum of data science tasks: data integration, pre-processing, statistical analysis, machine learning, algorithmic modeling and probabilistic reasoning. [1] Its language is based on an extension of the rule-based language Datalog, Warded Datalog ±, a high-performance Datalog system using an aggressive termination control strategy.[2] The system is implemented and integrated in JupyterLab, which is therefore at the hearth of the Vadalog system. Through this integration, Vadalog can support the entire spectrum of data science activities and tools. The system can make use of relational databases, such as Postgres and MySQL, graph databases, such as neo4j, machine learning tools (e.g., Weka and scikit-learn), and a web data extraction tool, OXPath. Additional Python libraries and extensions can also be easily integrated into the system. [3]
Vadalog is the result of the VADA research programme, a joint effort of the universities of Oxford, Manchester and Edinburgh and around 20 industrial partners. [4]
Knowledge Graph Management Systems (KGMS)
A knowledge graph management system (KGMS) has the purpose to manage and handle knowledge graphs, which incorporate large amounts of data in the form of facts and relationships. It can be seen as the sum of three components: [5]
where:
- KBMS is a knowledge base management system
- Big Data, which is the need of handling large amounts of data, especially when considering that knowledge graphs have been thought as a solution for integrating multiple data sources, both corporate and public knowledge, which results in a very large knowledge graph.
- (Data) Analytics is the need to provide access to existing software packages for machine learning, text mining, data analytics, and data visualization and to combine them together in the same platform
On a more technical basis, the main requirements that can be identified for defining a KGMS are:
- Define a language and a formalism with high expressive power, in terms of the kind of rules it is possible to define. [6] [7]
- Cost-effective data wrangling, in all its steps, from data cleaning to web data extraction and big data access from many different sources [8]
- Efficient logical, probabilistic and ontological reasoning [9] [5]
- Low complexity, both in terms of space complexity and syntax
- Need of interfaces to access many heterogeneous data sources, such as corporate RDBMS, NoSQL or RDF stores, the web, machine-learning and analytics packages. [3]
Other requirements reflect standard DBMS functions and services, as proposed by Codd. [10]
Vadalog System
Vadalog offers a platform that fulfills the requirements of a KGMS. It is able to perform rule-based reasoning tasks on top of knowledge graphs and it also supports the data science workflow, such as data visualization and machine learning.[5]
Reasoning task and recursion
A rule is an expression of the form where:
- are the atoms of the body
- is the atom of the head.
A rule allows to infer new knowledge starting from the variables that are in the body: when all the variables in the body of a rule are successfully assigned, the rule is activated and it results in the derivation of the head predicate: given a database and a set of rules , a reasoning task aims at inferring the so-called intensional knowledge, applying the rules to the database (the extensional knowledge).
The most wide-spread form of knowledge that has been adopted over the last decades has been in the form of rules, be it in rule-based systems, ontology-based systems or other forms and it can be typically captured in knowledge graphs. [3] The nature of knowledge graphs also makes the presence of recursion in these rules a particularly important aspect. Recursion means that the same rules might be called multiple times before obtaining the final answer of the reasoning task and it is particularly powerful as it allows an inference based on previously inferred results. This implies that the system must provide a strategy that guarantees termination. More technically, a program is recursive if the dependency graph built with the application of the rules is cyclical. The simplest form of recursion is that in which the head of a rule also appears in the body (self-recursive rules).
The query language
The Vadalog language allows to answer reasoning queries that also include recursion. It is based on Warded Datalog ±, which belongs to the Datalog± family of languages that extends Datalog by existential quantifiers in rule heads[11] and at the same time restricts its syntax in order to achieve decidability and tractability.[12] [13] Existential rules are also known as tuple-generating dependencies (tgds), which are rules where existential quantification is allowed in the head. [1]
An existential rule has the following form:
or, alternatively, in Datalog syntax, it can be written as follows:
p(X,Z) :- r(X).
It should be noted that variables in Vadalog are like variables in first-order logic and that a variable is local to the rule in which it occurs. This means that occurrences of the same variable name in different rules refer to different variables.
Warded Datalog ±
In case of a set of rules , consisting of the following:
r(X,Y) :- p(X).
p(Z) :- r(X,Z).
the variable Z in the second rule is said to be dangerous, since the first rule will generate a null in the second term of the atom r and this will be injected to the second rule to get the atom p, leading to a propagation of nulls when trying to find an answer to the program. If arbitrary propagation is allowed, reasoning is undecidable and the program will be infinite.[3] Warded Datalog± overcomes this issue asking the requirement that for every rule in a set , all the variables in the rule bodies must coexist in at least one atom in the head, called a ward. The concept of wardness restricts the way dangerous variable can be used inside a program. Although this is a limit in terms of expressive power, with this requirement and thanks to its architecture and termination algorithms, Warded Datalog± is able to find answers to a program in a finite number of steps. It also exhibits a good trade-off between computational complexity and expressive power, capturing PTIME data complexity while allowing ontological reasoning and the possibility of running programs with recursion. [14]
Vadalog extension
Vadalog replicates in its entirety Warded Datalog± and it extends it with the inclusion in the language of:
- monotonic aggregations (min, max, sum, prod, count operators) [15]
- the support of different data types (strings, integer, double, date, boolean, set, lists, marked nulls)
- rich annotation mechanism to define how to interact with data sources and external libraries
In addiction, the system provides an highly engineered architecture to allow efficient computation. This is done in two ways:
- adopting an in-memory processing architecture and a pull stream-based approach, which limits the memory consumption or make it predictable.
- using an aggressive termination control strategy, which detects patterns and redundancy while building of the chase-graph (used to generate the answers) as early as possible and cuts off computation when they occur. [3] This refers to the isomorphism of facts, which means that if a fact h is isomorphic to h', the system will only explore the chase graph of fact h, reducing the number of steps needed. [16]
The Vadalog system is therefore able to perform ontological reasoning tasks, as it belongs to the Datalog family. Reasoning with the logical core of Vadalog captures OWL 2 QL and SPARQL [17] (through the use of existential quantifiers), and graph analytics (through support for recursion and aggregation). The declarative nature of the language makes the code concise, easy-to-read and manageable. [1]
Jupyter Support
The system provides an integration with the JupyterLab platform, where Vadalog programs can be written and run and the output can be read, exploiting the functionalities of the platform. It gives also the possibility to evaluate the correctness of the program, run it and analyse the derivation process of output facts. The integration with data science tools are realized in terms of data bindings primitives and functions. [3]
- Data binding primitives: bindings allows to connect to external data sources or systems like a database, a framework or a library. This include connection with relational database as Postgres or MySQL, and graph databases, such as neo4j. it is also possible to integrate machine learning libraries (Weka, scikit-learn, Tensorflow) and a web data extraction tool, OXPath, which allows to retrieve data directly from the web. This is possible through the so-called annotations: these are special facts that augment the sets of existential rules with specific behaviors. The @input annotation tells the system that the facts for an atom of the program are imported from an external data source, for example a relational database. The @output annotation specifies that the facts for an atom of the program will be exported to an external target, for example the standard output or a relational database. Other annotations like
@bind
,@mapping
,@qbind
allow to customize the data sources for the@input
annotation or the targets for the@output
annotation. - Functions: custom expressions which make use of arithmetic operators, string manipulation or comparison can also be supported. Libraries written in Python are also enabled and they can be integrated with the dedicated annotation
@library
The Jupyter support includes the possibility to dispose of tools as syntax highlighting, code analysis (checking whether the code is correct or there are errors) and explanations of results (how the result has been obtained): all these functionalities are embedded in the notebook and help in writing and analyzing Vadalog codes.
See also
References
- ^ a b c Bellomarini, Luigi; Fayzrakhmanov, Ruslan R.; Gottlob, Georg; Kravchenko, Andrey; Laurenza, Eleonora; Nenov, Yavor; Reissfelder, Stéphane; Sallinger, Emanuel; Sherkhonov, Evgeny; Vahdati, Sahar; Wu, Lianlong (2022-04-01). "Data science with Vadalog: Knowledge Graphs with machine learning and reasoning in practice". Future Generation Computer Systems. 129: 407–422. doi:10.1016/j.future.2021.10.021. ISSN 0167-739X.
- ^ Bellomarini, Luigi; Sallinger, Emanuel; Gottlob, Georg (2018-05-01). "The Vadalog system: datalog-based reasoning for knowledge graphs". Proceedings of the VLDB Endowment. 11 (9): 975–987. doi:10.14778/3213880.3213888. ISSN 2150-8097.
- ^ a b c d e f Bellomarini, Luigi; Fayzrakhmanov, Ruslan R.; Gottlob, Georg; Kravchenko, Andrey; Laurenza, Eleonora; Nenov, Yavor; Reissfelder, Stephane; Sallinger, Emanuel; Sherkhonov, Evgeny; Wu, Lianlong (2018-07-23). "Data Science with Vadalog: Bridging Machine Learning and Reasoning". arXiv:1807.08712 [cs]. doi:10.48550/arxiv.1807.08712.
- ^ Bellomarini, Luigi; Gottlob, Georg; Pieris, Andreas; Sallinger, Emanuel (2018). Benzmüller, Christoph; Ricca, Francesco; Parent, Xavier; Roman, Dumitru (eds.). "Vadalog: A Language and System for Knowledge Graphs". Rules and Reasoning. Cham: Springer International Publishing: 3–8. doi:10.1007/978-3-319-99906-7_1. ISBN 978-3-319-99906-7.
- ^ a b c Bellomarini, Luigi; Gottlob, Georg; Pieris, Andreas; Sallinger, Emanuel (2018). Tjoa, A Min; Bellatreche, Ladjel; Biffl, Stefan; van Leeuwen, Jan; Wiedermann, Jiří (eds.). "Swift Logic for Big Data and Knowledge Graphs". SOFSEM 2018: Theory and Practice of Computer Science. Cham: Springer International Publishing: 3–16. doi:10.1007/978-3-319-73117-9_1. ISBN 978-3-319-73117-9.
- ^ Bellomarini, Luigi; Fakhoury, Daniele; Gottlob, Georg; Sallinger, Emanuel (2019). "Knowledge Graphs and Enterprise AI: The Promise of an Enabling Technology". 2019 IEEE 35th International Conference on Data Engineering (ICDE). Macao, Macao: IEEE: 26–37. doi:10.1109/ICDE.2019.00011. ISBN 978-1-5386-7474-1.
- ^ Dantsin, Evgeny; Eiter, Thomas; Gottlob, Georg; Voronkov, Andrei (2001-09-01). "Complexity and expressive power of logic programming". ACM Computing Surveys. 33 (3): 374–425. doi:10.1145/502807.502810. ISSN 0360-0300.
- ^ Konstantinou, Nikolaos; Koehler, Martin; Abel, Edward; Civili, Cristina; Neumayr, Bernd; Sallinger, Emanuel; Fernandes, Alvaro A.A.; Gottlob, Georg; Keane, John A.; Libkin, Leonid; Paton, Norman W. (2017-05-09). "The VADA Architecture for Cost-Effective Data Wrangling". Proceedings of the 2017 ACM International Conference on Management of Data. SIGMOD '17. New York, NY, USA: Association for Computing Machinery: 1599–1602. doi:10.1145/3035918.3058730. ISBN 978-1-4503-4197-4.
- ^ Cali`, Andrea; Gottlob, Georg; Pieris, Andreas (2012-12-01). "Towards more expressive ontology languages: The query answering problem". Artificial Intelligence. 193: 87–128. doi:10.1016/j.artint.2012.08.002. ISSN 0004-3702.
- ^ Connolly, Thomas M.; Begg, Carolyn E. (2014). Database Systems – A Practical Approach to Design Implementation and Management (6th ed.) (6th ed.). Pearson. ISBN 978-1292061184.
- ^ Calì, Andrea; Gottlob, Georg; Lukasiewicz, Thomas (2012-07-01). "A general Datalog-based framework for tractable query answering over ontologies". Journal of Web Semantics. Special Issue on Dealing with the Messiness of the Web of Data. 14: 57–83. doi:10.1016/j.websem.2012.03.001. ISSN 1570-8268.
- ^ Abiteboul, Serge; Hull, Richard; Vianu, Victor (1995). Foundations of Databases: The Logical Level (1st ed.). USA: Addison-Wesley Longman Publishing Co., Inc. ISBN 978-0-201-53771-0.
- ^ Calì, Andrea; Gottlob, Georg; Lukasiewicz, Thomas; Marnette, Bruno; Pieris, Andreas (2010). "Datalog+/-: A Family of Logical Knowledge Representation and Query Languages for New Applications". 25th Annual IEEE Symposium on Logic in Computer Science: 228–242. doi:10.1109/LICS.2010.27.
- ^ Bellomarini, Luigi; Benedetto, Davide; Gottlob, Georg; Sallinger, Emanuel (2022-03-01). "Vadalog: A modern architecture for automated reasoning with large knowledge graphs". Information Systems. 105: 101528. doi:10.1016/j.is.2020.101528. ISSN 0306-4379.
- ^ Shkapsky, Alexander; Yang, Mohan; Zaniolo, Carlo (2015). "Optimizing recursive queries with monotonic aggregates in DeALS". 2015 IEEE 31st International Conference on Data Engineering: 867–878. doi:10.1109/ICDE.2015.7113340.
- ^ Bellomarini, Luigi; Laurenza, Eleonora; Sallinger, Emanuel; Sherkhonov, Evgeny (2020). Gutiérrez-Basulto, Víctor; Kliegr, Tomáš; Soylu, Ahmet; Giese, Martin; Roman, Dumitru (eds.). "Reasoning Under Uncertainty in Knowledge Graphs". Rules and Reasoning. Cham: Springer International Publishing: 131–139. doi:10.1007/978-3-030-57977-7_9. ISBN 978-3-030-57977-7.
- ^ Gottlob, G.; Pieris, Andreas (2015). "Beyond SPARQL under OWL 2 QL Entailment Regime: Rules to the Rescue". IJCAI.