8  Querying Data through Ontologies

 8.1  Introduction
 8.2  Querying RDF data: notation and semantics
 8.3  Querying through RDFS ontologies
 8.4  Answering queries through DL-LITE ontologies
  8.4.1  DL-LITE
  8.4.2  Consistency checking
  8.4.3  Answer set evaluation
  8.4.4  Impact of combining DL-LITER and DL-LITEF on query answering
 8.5  Further reading
 8.6  Exercises

8.1  Introduction

As we saw in the previous chapter, ontologies form the backbone of the Semantic Web by providing a conceptual view of data and services available worldwide via the Web. We discussed the RDF language for describing knowledge, and a family of languages, called description logics, that provide formal foundations for ontologies and in particular for the OWL ontology language recommended by the W3C.

In this chapter, the focus is on querying RDF data. Since massive volumes of RDF sources are more and more present on the Web, specifying queries for RDF data that can be evaluated efficiently is becoming every day more and more important.

We will see that the set of query answers strongly depends on the semantic context. We will study query answering when the ontology is specified in RDFS and then when the ontology is specified in DL-LITE. The ontology language DL-LITE belongs to the DL family. It has been designed as a trade-off between the ability to describe a wide range of domains of interest and query evaluation efficiency. More precisely, we will focus on two important fragments of DL-LITE, namely DL-LITER and DL-LITEF .

We will observe that the problem of answering queries through ontologies is quite different from that of answering database queries for the following two essential reasons:

Implicit facts.
In a DBMS setting, all the facts are explicit. For instance, the constraint “Every PhD student is a student” enforces that, before inserting a value v in the PhDStudent table, this value is also inserted in the Student table (if not already there). In an ontology context, someone may be a student not explicitly but because of the constraint used as an inference rule. In addition, the implicit facts may be incompletely known, coming from constraints such as “A professor teaches at least one master course”. From a fact such as Professor(dupond), one can infer the two facts Teaches(dupond,x) and MasterCourse(x) for some unknown value x. Such partially known implicit facts may however be useful for answering queries such as “Give me all the persons who teach a master course”.
Inconsistency.
In a DBMS setting, the constraint “each course must have a single responsible” is also viewed as a law that cannot be violated. An update of the corresponding table that would violate this law would simply be rejected. In an ontology context, such local verifications are not sufficient for checking data inconsistency. Because of data incompleteness, this may require intricate reasoning on different constraints and data distributed over different tables. For instance, if in addition of the previous key constraint, it is declared that “only professors can be responsible of courses in which they must teach”, “a master course is taught by a single teacher”, and “lecturers are not professors”, the presence of the three following facts in different tables of the database makes it inconsistent: Lecturer(jim), TeachesIn(jim,ue431) and MasterCourse(ue431). The reason is that because of the constraint “only professors can be responsible of courses in which they must teach”, we can infer that the course ue431 must have a responsible x who is unknown but for whom we have a partial information: s/he is a professor and s/he teaches in the course ue431. Without knowing x, the fact that she is a professor is sufficient to infer that xjim (since jim is a lecturer and thus not a professor). Therefore, the course ue431 is taught by two distinct teachers, which is forbidden for a master course.

From this, it should be clear that query answering through ontologies is more complicated that in classical databases. We have to reason to find which inferences may participate in answering a given query. We also have to reason to verify the consistency of our knowledge.

8.2  Querying RDF data: notation and semantics

In this section, we set the stage for querying RDF data. We also discuss the impact of ontologies (knowledge on the domain of interest) on the answers. To simplify, we ignore here blank nodes.

Figure 8.1 is an enhanced version of the University example that we will use throughout this chapter. The first column provides RDF data in the triple syntax, while the second column shows the corresponding facts in FOL.


Subject Predicate Object FOL semantics




dupond Leads infoDept Leads(dupond,infoDept)
dupond rdf:type Professor Professor(dupond)
durand ResponsibleOf ue111 ResponsibleOf(durand,ue111)
durand Leads csDept Leads(durand,csDept)
paul TeachesTo pierre TeachesTo(paul,pierre)
paul rdf:type PhdStudent PhDStudent(paul)
paul EnrolledIn infoDept EnrolledIn(paul, infodept)
pierre EnrolledIn infoDept EnrolledIn(pierre, infodept)
pierre rdf:type Undergrad Undergrad(pierre)
pierre RegisteredTo ue111 Registered(pierre, ue111)
ue111 OfferedBy infoDept OfferedBy(ue111,infoDept)
ue111 rdf:type CSCourse CSCourse(ue111)
jim EnrolledIn csDept EnrolledIn(jim, csDept)
csDept rdf:type TeachingDept TeachingDept(csDept)





Figure 8.1: RDF triple syntax and its FOL semantics


RDF triples can be asserted in a very flexible way and almost without constraints. The association of some ontology is not a requirement. Users can update a collection of RDF statements freely by just adding/removing triples. The only reserved word in the RDF vocabulary is rdf:type that is used to relate constant names to types, i.e., classes in domain of interest or unary predicates in FOL world.

Let us now consider querying a set of RDF facts, for which the query language SPARQL has been proposed. We briefly consider it next. query language. We briefly consider it next.

SPARQL (pronounced “sparkle”) is a recursive acronym standing for SPARQL Protocol And RDF Query Language. It is a W3C recommendation as of 2008. Although it does borrow some features from XQuery (functions and operators), it is based on the graph model underlying RDF data.

For instance, the following query expresses in SPARQL the search of all the individuals who are enrolled in a department led by a Professor.

select x where x EnrolledIn y, z Leads y, z rdf:type Professor

We used here an SQL-like syntax. There exists competing syntaxes for expressing SPARQL queries. The corresponding query in FOL notation is:

q(x)  : - yz[EnrolledIn(x,y) Leads(z,y) Professor(z)]

This is a conjunctive query, i.e., a FOL formula without negation or disjunction, of the form

q(x1,...,xm ) :- ∃y1,...,yn[R1(u1) ∧ ...∧ Rp (up)]
where each ui is a vector of variables in {x1,...,xm,y1,...,yn} or constants, and each variable xi appears in the body of the query (i.e., for each x∈{x1,...,xm}, there exists ui such that xui).

In the remainder of this chapter, we use conjunctive queries as the query language for RDF. From the example, it should be clear that all we say is relevant to SPARQL. We use a (standard) simplified notation for conjunctive queries. We omit the existential quantifiers and denote the connector by a comma. Observe that this does not introduce any ambiguity. In particular, all variables not occurring in the “head of the query” (i.e., in q(x1,...,xm)) are understood as existentially quantified. The variables in x1,...,xm are said to be distinguished.

In this simplified form, the example SPARQL query becomes:

q(x) :- EnrolledIn(x,y), Leads(z,y), Professor(z)

Now consider a query in the general form:

q(x1,...,xm) :- R 1(u1),...,Rp(up)
with existential variables y1,...,yn. Following the standard FOL semantics, the evaluation of the query consists in finding valuations ν of the variables for which the closed fact Ri(ν(ui)) “holds” for each i. The corresponding answer is then q(ν(x1),...,ν(xm)). Equally, we may say that (ν(x1),...,ν(xm)) is an answer for the query q. When the query is unary (i.e., it has a single distinguished variable), we either say “q(a) is an answer” or “a is an answer for q”.

An essential issue is in the meaning of “holds” in the previous informal definition.

Inference with data only. In the simplest case, a fact Ri(ν(ui)) holds if it is a known fact (explicitly stated in the data store). For instance, consider the previous conjunctive query in the University example. The evaluation of the query q(x) against the facts of Figure 8.1 returns {paul, pierre} as its answer set. To see why paul is an answer, we just check that, by mapping the distinguished variable to the constant paul, and the existential variables y and z respectively to the constants infoDept and dupond, all the conjuncts in the query definition are satisfied by facts in the database. The same holds if the distinguished variable x is instantiated with the constant pierre.

More inference using an ontology. Now, let us assume that a fact Ri(ν(ui)) holds if it is a known fact or if it is a consequence of the known facts by taking the ontological statements into account. Suppose now that we also have the knowledge that someone responsible for a class has to be a professor, that is in DL syntax:

∃ResponsibleOf ⊑ Professor.
Additional answers can then be inferred. For instance, for the query q(x), the additional answer jim is obtained. It would come from the presence in the data of the facts EnrolledIn(jim, csDept), Leads(durand, csDept), and from the fact Professor(durand), which, without being explicitly stated in the data, is logically entailed from the fact ResponsibleOf(durand, ue111) and ResponsibleOf Professor.

To see why, we just have to consider the FOL semantics of this DL statement:

∀x∀y [ResponsibleOf (x,y)⇒  Professor(x)].

This logical implication indeed allows inferring the fact Professor(durand) from the fact ResponsibleOf(durand, ue111).

More subtly, we can get answers from partially instantiated facts that can be logically entailed by the knowledge base. Suppose that we know that a professor teaches at least one course, that is in DL syntax:

Professor⊑ ∃TeachesIn .
and consider the query q(x) : -TeachesIn(x,y)

From the explicit ground fact Professor(durand) and the contraint Professor ⊑∃TeachesIn, we know that TeachesIn(durand,v) holds for some unknown value v. The valuation of v may vary in the different “worlds” satisfying the constraint. This is however sufficient to infer that answer q(durand) is true in all these possible worlds.

Formal definition of answer set. Recall that φψ (i.e., φimplies ψ) if each interpretation making φ true also makes ψ true, or equivalently, every model of φ is a model of ψ. We next provide a formal definition of the answer set of a query for a DL knowledge base, that captures the general setting where data (the Abox A) is associated to an ontology (the Tbox T ) to form a DL knowledge base K = TA. A query to the knowledge base is a conjunctive query using class or property predicates from the given knowledge base with the proper arity. (Class predicates have arity one and property predicates arity 2.) A valuation ν of a set of variables {z1,...,zp} is a substitution (denoted {z1/a1,...,zp/ap}) that assigns each variable zi to a constant ai (two distinct variables may be assigned to a same constant). Given two valuations ν and νof two disjoint sets of variables {z1,...,zp} and {v1,...,vk} , νν denotes the valuation assigning the variables zi to the corresponding constants in ν, and the variables vj to the corresponding constants in ν.

We can now formally define the notion of answers.

Definition 8.2.1 Let q(x1,...,xm)  : - R1(u1),...,Rp(up) be a query to a knowledge base K. An answer is a ground fact q(ν(x1),...,ν(xm)) for some valuation ν of the distinguished variables such that in every model of Kthere exists a valuation νof the existential variables for which Ri(νν(ui)) is true for each i. The answer set of qfor Kis the set of all such answers. It is denoted q(K).

Consider again the previous University query example. We have seen that its answer set varies depending on the knowledge base against which it is evaluated. In particular, if A is the set of facts of Figure 8.1 and T is {∃ResponsibleOf Professor}, we have:

Boolean queries. To conclude this section, we consider a particular interesting case of queries, that of Boolean queries. The arity of a query is the number of its distinguished variables. A query of arity 0, i.e., a query of the form q()  :  ..., is called a Boolean query. Note that there is a single possible answer, namely q(). In this case, we see that as a positive answer to the query, i.e., as true. If the answer set is empty, we see that as false. To see an example, consider the query

q’() :- Student(x), TeachesTo(x,y)

This Boolean query asks whether there exists a student teaching to other students. Suppose T= {PhDStudent Student}. Then we can distinguish two cases:

In the second case, the fact Student(paul), although not in the Abox A, can be inferred from the fact PhDStudent(paul) in A and the inclusion statement PhDStudent Student in T. Together with the fact TeachesTo(paul,pierre) present in the Abox, it makes the body of the query q satisfied.

8.3  Querying through RDFS ontologies

In this section, we consider RDF data without blank nodes (that can be seen as an Abox), associated to an RDFS ontology (that can be seen as a very simple Tbox). RDF data and RDFS statements can be denoted and stored as triples. However, the important point is that RDFS statements have a logical semantics which can be operationalized as a set of inference rules (see Section 7.3 in Chapter 7). We illustrate here how this can be used to answer queries.

Figure 8.2 is an example of an RDFS ontology that can be associated to the RDF data in Figure 8.1. The RDFS statements composing the ontology are given in three notations: the triple notation, the DL notation, and the corresponding FOL notation.


RDFS notation DL notation FOL notation



AcademicStaff rdfs:subClassOfStaff AcademicStaff Staff AcademicStaff (X) Staff (X)
Professor rdfs:subClassOfAcademicStaff Professor AcademicStaff Professor(X) AcademicStaff (X)
Lecturer rdfs:subClassOfAcademicStaff Lecturer AcademicStaff Lecturer(X) AcademicStaff (X)
PhDStudent rdfs:subClassOfLecturer PhDStudent Lecturer PhDStudent(X) Lecturer(X)
PhDStudent rdfs:subClassOfStudent PhDStudent Student PhDStudent(X) Student(X)
TeachesIn rdfs:domainAcademicStaff TeachesIn AcademicStaff TeachesIn(X,Y) AcademicStaff (X)
TeachesIn rdfs:rangeCourse TeachesIn-Course TeachesIn(X,Y) Course(Y)
ResponsibleOf rdfs:domainProfessor ResponsibleOf Professor ResponsibleOf (X,Y) Professor(X)
ResponsibleOf rdfs:rangeCourse ResponsibleOf -Course ResponsibleOf (X,Y) Course(Y)
TeachesTo rdfs:domainAcademicStaff TeachesTo AcademicStaff TeachesTo(X,Y) AcademicStaff (X)
TeachesTo rdfs:rangeStudent TeachesTo-Student TeachesTo(X,Y) Student(Y)
Leads rdfs:domainAdminStaff Leads AdminStaff Leads(X,Y) AdminStaff (X)
Leads rdfs:rangeDept Leads-Dept Leads(X,Y) Dept(Y)
RegisteredIn rdfs:domainStudent RegisteredIn Student RegisteredIn(X,Y) Student(X)
RegisteredIn rdfs:rangeCourse RegisteredIn-Course RegisteredIn(X,Y) Course(Y)
ResponsibleOf rdfs:subPropertyOfTeachesIn ResponsibleOf TeachesIn ResponsibleOf (X,Y) TeachesIn(X,Y)




Figure 8.2: An RDFS ontology expressed in different notations


As already said, these RDFS statements can be used to infer new triples (i.e., new facts) from the RDF database. For example, the RDF triple durandResponsibleOf ue111in Figure 8.1 corresponds to the fact ResponsibleOf (durand,ue111), and the RDFS statement ResponsibleOf rdfs:domainProfessorcorresponds to the logical rule: ResponsibleOf (X,Y) Professor(X). The condition of this rule can be mapped with the fact ResponsibleOf (durand,ue111) by the substitution {X/durand, Y/ue111}, and thus the corresponding instantiation of the conclusion Professor(durand) can be inferred. This new fact can in turn trigger a rule such as Professor(X) AcademicStaff (X), thereby allowing the inference of additional facts such as AcademicStaff (durand).

More generally, RDFS statements correspond to rules that can be applied in a forward-chaining manner to the initial set of facts until saturation, i.e., until no more fact can be inferred. It is important to see that the variables in the head of rule all occur in the body. In other words, no variable is quantified existentially. So rules always infer new ground facts. Such rules are said to be safe. We will use unsafe rules when we consider DL-LITE, which will render query processing more complicated.

The simple forward-chaining Algorithm 1 starts with the set of initial facts and repeats inference steps until saturation.


Saturation(A,T )

Input: An Abox A and an RDFS Tbox T

Output: The set of facts that are inferred: Δ0

(1)F A

(2)Δ0 A

(3)repeat Δ1 ←∅

(4)     foreach rule condition conclusion in T ,

(5)          if there exists a substitution σ such that σ.condition Δ0

(6)          and σ.conclusionF

(7)               add σ.conclusion to Δ1

(8)     F F Δ1

(9)     Δ0 Δ1

(10)until Δ1 =

Algorithm 1: The Saturation algorithm

Figure 8.3 shows the facts resulting from the application of Algorithm 1 to the facts of Figure 8.1 and the rules of Figure 8.2.


Asserted facts Inferred facts


Leads(dupond,infoDept) AdminStaff(dupond)
Professor(dupond) Dept(infoDept)
ResponsibleOf(durand,ue111) AcademicStaff(dupond)
Leads(durand,csDept) Professor(durand)
TeachesTo(paul,pierre) Course(ue111)
PhDStudent(paul) AcademicStaff(durand)
EnrolledIn(paul, infodept) AdminStaff(durand)
EnrolledIn(pierre, infodept) Dept(csDept)
Undergrad(pierre) AcademicStaff(paul)
Registered(pierre, ue111) Student(pierre)
OfferedBy(ue111,infoDept) Student(paul)
CSCourse(ue111) Student(pierre)
EnrolledIn(jim, csDept) Lecturer(paul)
TeachingDept(csDept) AcademicStaff(paul)
Staff(paul)
Staff(dupond)
Staff(durand)



Figure 8.3: Inferred facts from RDF facts and an associated RDFS ontology


To answer queries from RDF facts associated to an RDFS ontology, one can proceed as follows. First one compute all the inferred facts (in a bottom-up manner) with the previous algorithm. Each step of the loop can be computed, for instance, using a standard relational query engine. This yields a new database consisting of the set of all the facts (asserted or inferred). Then one can evaluate the query directly on that database using a standard relational query engine.

For example, the standard evaluation against the set of (asserted + inferred) facts in Figure 8.3 of the query

q(x) :- E nrolled(x ,y),Leads(z,y ),Professor(z)
(searching for individuals enrolled in a department led by a Professor) returns {paul,pierre,jim} as its answer set. If we evaluate the same query against the set of asserted facts only, we do not find the answer jim.

Complexity analysis. It is interesting to estimate both the maximum number of inferred triples and the worst-case time complexity for inferring them. Of course, this depends on the number of asserted triples (i.e., the size of the data) and also on the number of axioms in the ontology (i.e., the size of the ontology). Let M be the number of facts in the Abox and N the number of axioms in the Tbox. From the presence of some initial fact C(a), one can derive a number of new facts C(a) for some class C. Note that the number of such C(a) is bounded by the number of axioms in the ontology, i.e., it is less than N. Now consider some initial fact R(a,b). From it, one can derive some facts R(a,b) or R(b,a) as well as some facts C(a) or C(b) for some Rand C. Again, one can observe that for a particular R(a,b), the number of new facts one can derive is bounded by the number of axioms in the ontology, i.e., it is less than N. Since the number of initial facts is M, the number of facts one can derive is bounded by M×N. Observe in particular that it is linear in the number of database facts.

Now consider the worst-case time complexity for inferring them by the Algorithm 1. We have to perform at most M×N iterations. Each iteration can be performed in polynomial time. So the algorithm is in PTIME. One can show more precisely that it is in 0((M×N)2).

8.4  Answering queries through DL-LITE ontologies

In this section, we consider two important fragments of the DL-LITE ontology language of the DL family. As we will see, querying is feasible for these two languages even though they provide a quite rich framework for describing semantics. We study querying through ontologies expressed in these two fragments.

8.4.1  DL-LITE

A DL-LITE ontology may contain axioms corresponding (up to the syntax) to those allowed in an RDFS ontology. Besides, it may contain other axioms, of three kinds: positive inclusions (PI for short), negative inclusions (NI) and key constraints (Key). Figure 8.4 shows examples of these three kinds of DL-LITE axioms with their corresponding FOL semantics. These constraints are not expressible in RDFS.


DL notation Corresponding logical rule



PI Professor ⊑∃TeachesIn Professor(X) ⇒∃YTeachesIn(X,Y)
Course ⊑∃RegisteredIn- Course(X) ⇒∃YRegisteredIn(Y,X)



NI Student ⊑¬Staff Student(X) ⇒¬Staff (X)



Key (functResponsibleOf -) ResponsibleOf (Y,X) ResponsibleOf (Z,X) Y = Z




Figure 8.4: Examples of DL-LITE axioms not expressible in RDFS


We next consider in turn these new kinds of axioms.

Positive inclusion and incompleteness. A positive inclusion axiom is an expression of one of the following forms:

DL notation Corresponding logical rule


B ⊑∃P B(X) ⇒∃YP(X,Y)


Q ⊑∃P Q(X,Y) ⇒∃ZP(X,Z)


B ⊑∃P- B(X) ⇒∃YP(Y,X)
Q ⊑∃P- Q(X,Y) ⇒∃ZP(Z,X)


P Q- or  P-Q P(X,Y) Q(Y,X)


where P and Q denote properties and B denotes a class. Recall that P- denotes the inverse of P, i.e., P-(x,y) iff P(y,x) for all x,y.

Observe that expressions of the form P B belong to DL-LITE since they already are in RDFS. Expressions of the form P Q (so equivalently P-Q-) also belong to DL-LITE for the same reason.

It is important to note that the logical rules corresponding to PI axioms expressible in DL-LITE are not necessarily safe (as opposed to RDFS that uses only safe rules.) Consider the rule

∀X (Professor(X) ⇒ ∃ Y(TeachesIn(X ,Y )))
The variable Y is existentially quantified. As already mentioned, the main issue is that, as a consequence, such an axiom does not produce new facts (i.e., ground atoms) from initial facts, but only an incomplete information in the form of atoms that may be partially instantiated. For example, from the fact Professor(durand), the previous axiom permits to infer that there exists some course(s) y that durand teaches. In other words, we know that there exists some fact of the form TeachesIn(durand,y) that is true but we do not know the value of y. This makes it difficult to apply the bottom-up approach described in Section 8.3. Such an approach is not appropriate for answering queries through DL-LITE ontologies.

Negative inclusion and inconsistencies. A negative inclusion axioms is an expression that takes one of the forms:

DL notation

B 1 ⊑¬B2

R1 ⊑¬R2

where

The corresponding logic rules are left as an exercise. An example of NI (expressing the constraint “Students do not teach courses”) and the corresponding logical rule are as follows:

DL notation Corresponding logical rule


Student ⊑¬∃TeachesIn Student(X) ⇒¬∃YTeachesIn(X,Y)
or equivalently, YTeachesIn(X,Y) ⇒¬Student(X)


NIs express disjointness constraints between classes or between properties, and thus introduce negation in the language. Therefore, the knowledge base against which the queries have to be evaluated may be inconsistent, i.e., a model of the corresponding theory may not exist. Note that this is not possible with RDFS ontologies: we showed an algorithm that computed a model (indeed the smallest model).

For example, adding the NI Student ⊑¬Staff to the ontology of Figure 8.2 leads to the inconsistency of the knowledge base made of the facts in Figure 8.1 and of the axioms in the ontology of Figure 8.2 enriched with that NI. The reason is that from the fact PhDStudent(paul), and the inclusion axiom PhdStudent Student, we can infer the fact Student(paul), and in turn the literal ¬Staff (paul) from the NI Student ⊑¬Staff . On the other hand, the fact Staff (paul) can be inferred from the fact PhDStudent(paul) and the inclusion axioms PhdStudent Lecturer, Lecturer AcademicStaff and AcademicStaff Staff .

Key constraints and more inconsistencies. Key constraints are expressed by functionality axioms of the form (functP) or (functP-) where P is a property and P- denotes the inverse property of P. Figure 8.5 shows their logical semantics in the form of logical rules.


DL notation corresponding logical rule


(functP) P(X,Y) P(X,Z) Y = Z
(functP-) P(Y,X) P(Z,X) Y = Z



Figure 8.5: Functionality axioms expressible in DL-LITE and not in RDFS

Observe that key constraints may also lead to inconsistencies. This is the case if we attempt to equate two distinct constants, e.g., durand and dupond. For instance, the axiom (functResponsibleOf -) expresses that a course must have a unique professor responsible for it. Therefore, a knowledge base containing this axiom and the two facts:

ResponsibleOf (durand,ue111) and ResponsibleOf (dupond,ue111),

would be inconsistent. This is because we assume implicitly that an individual is denoted by a single constant. This natural (in practice) assumption is called in logic the unique name assumption.

We will consider the following two fragments of DL-LITE:

DL-LITER is obtained by extending the axioms of RDFS with the PI and NI axioms.
DL-LITEF is obtained by extending the axioms of RDFS with key constraints, the PI and NI axioms, but excluding inclusion between properties. Note that, since DL-LITEF does not permit to express inclusion between properties, RDFS is not included in DL-LITEF .

One may wonder why one would choose such a convoluted language. Why not simply extend RDFS with the 3 kinds of axioms? This is because functional constraints interact with inclusion constraints in intricate ways. Query evaluation when they are all present is much more complex. This will be illustrated by an example in Section 8.4.4.

From the previous discussion, there are two fundamental differences between query answering in the context of RDFS and of DL-LITE knowledge bases:

Inconsistency.
RDFS does not permit expressing any form of negation, so an RDFS knowledge base is always consistent. On the other hand, a DL-LITE knowledge base may be inconsistent. Thus, answering queries through DL-LITE ontologies requires to make sure that the data is consistent with respect to the constraints expressed in the ontology.
Incompleteness.
The rules corresponding to RDFS axioms are safe thereby allowing the simple bottom-up algorithm we described. On the other hand, axioms in DL-LITE may correspond to unsafe rules. Thus a bottom-up approach may infer atoms that are not ground, i.e., some incomplete facts. Therefore, we will have to use a top-down approach for evaluating the queries that is more appropriate than the bottom-up approach.

In Section 8.4.2, we show an algorithm for checking consistency of a DL-LITE knowledge base, and in Section 8.4.3 an algorithm for answering conjunctive queries posed to a DL-LITE knowledge base. The particularity of these two algorithms is that they work in two-steps:

  1. in a first step, we reason with the Tbox alone (i.e., the ontology without the data) and some conjunctive queries;
  2. in the second step, we evaluate these conjunctive queries against the data in the Abox.

Separating ontology reasoning from data processing is typically a desired feature (when possible). In particular, such an approach has the practical interest that it makes it possible to use an SQL engine for the second step, thus taking advantage of well-established query optimization strategies supported by standard relational data management systems. In the first step, we deal with the Tbox only, typically of much smaller size.

In Section 8.4.4, we show by an example that DL-LITER and DL-LITEF are two maximal fragments of the DL-Lite family for which reformulating queries into SQL is possible: combining constraints expressible in DL-LITER and DL-LITEF may result in an infinite number of non redundant SQL reformulations for some queries.

8.4.2  Consistency checking

Towards consistency checking, the first step uses the Tbox alone. It consists in computing the deductive closure of the Tbox, i.e., all the inclusion axioms that are logically entailed by the axioms declared in the Tbox. More precisely, the deductive closure (closure for short) of a DL-LITE Tbox is defined as follows.

Definition 8.4.1 (Closure of a Tbox) Let T be a DL-LITEF or a DL-LITER Tbox. The closure of T, denoted by cl(T ), is inductively defined as follows:

  1. All the statements in T are also in cl(T ).
  2. If B1 B2 and B2 B3 are in cl(T ), then B1 B3 is in cl(T ).
  3. If R1 R2 and R2 B are in cl(T ), then R1 B is in cl(T ).
  4. If R1 R2 and R2-B are in cl(T ), then R1-B is in cl(T ).
  5. If R1 R2 and R2 R3 are in cl(T ), then R1 R3 is in cl(T ).
  6. If R1 R2 is in cl(T ), then R1-R2-is in cl(T ).
  7. If B1 B2 and B2 ⊑¬B3 (or B3 ⊑¬B2) are in cl(T ), then B1 ⊑¬B3 is in cl(T ).
  8. If R1 R2 and R2 ⊑¬B (or B⊑¬∃R2) are in cl(T ), then R1 ⊑¬B is in cl(T ).
  9. If R1 R2 and R2-⊑¬B (or B⊑¬∃R2-) are in cl(T ), then R1-⊑¬B is in cl(T ).
  10. If R1 R2 and R2 ⊑¬R3 (or R3 ⊑¬R2) are in cl(T ), then R1 ⊑¬R3 is in cl(T ).
  11. If R1 ⊑¬R2 or R2 ⊑¬R1 is in cl(T ), then R1-⊑¬R2-is in cl(T ).
    1. In the case in which T is a DL-LITEF Tbox, if one of the statements R ⊑¬∃R or R-⊑¬∃R-is in cl(T ), then both such statements are in cl(T ).
    2. In the case in which T is a DL-LITER Tbox, if one of the statements R ⊑¬∃R, R-⊑¬∃R-, or R⊑¬Ris in cl(T ), then all three such statements are in cl(T ).

Observe that although all axioms should be considered to construct this closure, only negative inclusions and key constraints can raise an inconsistency. The set of all the negative inclusions and key constraints in cl(T ) is called the NI-closure. For example, consider the Tbox of Figure 8.6 made of the RDFS ontology shown in Figure 8.2 enriched with the PIs and NI shown in Figure 8.4. The NI-closure of that Tbox is shown in Figure 8.7.


DL notation FOL notation


AcademicStaff Staff AcademicStaff (X) Staff (X)
Professor AcademicStaff Professor(X) AcademicStaff (X)
Lecturer AcademicStaff Lecturer(X) AcademicStaff (X)
PhDStudent Lecturer PhDStudent(X) Lecturer(X)
PhDStudent Student PhDStudent(X) Student(X)
TeachesIn AcademicStaff TeachesIn(X,Y) AcademicStaff (X)
TeachesIn-Course TeachesIn(X,Y) Course(Y)
ResponsibleOf Professor ResponsibleOf (X,Y) Professor(X)
ResponsibleOf -Course ResponsibleOf (X,Y) Course(Y)
TeachesTo AcademicStaff TeachesTo(X,Y) AcademicStaff (X)
TeachesTo-Student TeachesTo(X,Y) Student(Y)
Leads AdminStaff Leads(X,Y) AdminStaff (X)
Leads-Dept Leads(X,Y) Dept(Y)
RegisteredIn Student RegisteredIn(X,Y) Student(X)
RegisteredIn-Course RegisteredIn(X,Y) Course(Y)
ResponsibleOf TeachesIn ResponsibleOf (X,Y) TeachesIn(X,Y)
Professor ⊑∃TeachesIn Professor(X) ⇒∃YTeachesIn(X,Y)
Course ⊑∃RegisteredIn- Course(X) ⇒∃YRegisteredIn(Y,X)


Student ⊑¬Staff Student(X) ⇒¬Staff (X)



Figure 8.6: A DL-LITE Tbox


This example shows that it is possible to infer an important number of new NIs. In fact, we have to compute all the consequences. But as we will see there is at most a polynomial number of consequences.


DL notation FOL notation


Student ⊑¬Staff Student(X) ⇒¬Staff (X)
PhDStudent ⊑¬Staff PhDStudent(X) ⇒¬Staff (X)
TeachesTo-⊑¬Staff TeachesTo(Y,X) ⇒¬Staff (X)
RegisteredIn ⊑¬Staff RegisteredIn(X,Y) ⇒¬Staff (X)
Lecturer ⊑¬Student Lecturer(X) ⇒¬Student(X)
Lecturer ⊑¬PhDStudent Lecturer(X) ⇒¬PhDStudent(X)
Lecturer ⊑¬∃TeachesTo- Lecturer(X) ⇒¬∃Y[TeachesTo(Y,X)]
Lecturer ⊑¬∃RegisteredIn Lecturer(X) ⇒¬∃Y[RegisteredIn(X,Y)]
Professor ⊑¬Student Professor(X) ⇒¬Student(X)
Professor ⊑¬PhDStudent Professor(X) ⇒¬PhDStudent(X)
Professor ⊑¬∃TeachesTo- Professor(X) ⇒¬∃Y[TeachesTo(Y,X)]
Professor ⊑¬∃RegisteredIn Professor(X) ⇒¬∃Y[RegisteredIn(X,Y)]
AcademicStaff ⊑¬Student AcademicStaff (X) ⇒¬Student(X)
AcademicStaff ⊑¬PhDStudent AcademicStaff (X) ⇒¬PhDStudent(X)
AcademicStaff ⊑¬∃TeachesTo- AcademicStaff (X) ⇒¬∃Y[TeachesTo(Y,X)]
AcademicStaff ⊑¬∃RegisteredIn AcademicStaff (X) ⇒¬∃Y[RegisteredIn(X,Y)]
Staff ⊑¬Student Staff (X) ⇒¬Student(X)
Staff ⊑¬PhDStudent Staff (X) ⇒¬PhDStudent(X)
Staff ⊑¬∃TeachesTo- Staff (X) ⇒¬∃Y[TeachesTo(Y,X)]
Staff ⊑¬∃RegisteredIn Staff (X) ⇒¬∃Y[RegisteredIn(X,Y)]
TeachesTo ⊑¬Student TeachesTo(X,Y) ⇒¬Student(X)
TeachesTo ⊑¬PhDStudent TeachesTo(X,Y) ⇒¬PhDStudent(X)
TeachesTo ⊑¬∃TeachesTo- TeachesTo(X,Y) ⇒¬∃Z[TeachesTo(Z,X)]
TeachesTo ⊑¬∃RegisteredIn TeachesTo(X,Y) ⇒¬∃Z[RegisteredIn(X,Z)]
TeachesIn ⊑¬Student TeachesIn(X,Y) ⇒¬Student(X)
TeachesIn ⊑¬PhDStudent TeachesIn(X,Y) ⇒¬PhDStudent(X)
TeachesIn ⊑¬∃TeachesTo- TeachesIn(X,Y) ⇒¬∃Z[TeachesTo(Z,X)]
TeachesIn ⊑¬∃RegisteredIn TeachesIn(X,Y) ⇒¬∃Z[RegisteredIn(X,Z)]



Figure 8.7: The NI-closure of the Tbox in Figure 8.6


We use three propositions for analyzing the consistency problem: one for evaluating the complexity of evaluating the closure and the last two for showing the logical soundness and completeness of this closure. Finally, a fourth proposition will show how to use these results for data consistency checking.

Proposition 8.4.2 (Size of the closure of a Tbox and complexity of its computation) Let T be a DL-LITEF or a DL-LITER Tbox.

  1. The number of statements in cl(T ) is at most polynomial in the size of T.
  2. cl(T ) can be computed in polynomial time in the size of T.

Proof (sketch). (1.) Follows from the form of the statements that are allowed in a DL-LITEF or a DL-LITER Tbox.

For (2.), consider the items (2.) to (12.) in Definition 8.4.1. These are closure rules that are exhaustively applied to the Tbox until saturation. Let T0 = T . For each i, let Δi be the set of statements that can be derived from Ti directly using the closure rules (2.) to (12.). Let Ti+1 = Ti Δi. Clearly, for each i, Ti cl(T ), so its size is polynomial in the size of T .

Now since the size of Ti is polynomial in the size of T , each step of the computation can clearly be performed in PTIME. Since the number of steps is less than the number of statements in cl(T ), the entire computation can be performed in PTIME.

The next proposition states the soundness of the closure.

Proposition 8.4.3 (Soundness of the closure of a Tbox) For each T, Tcl(T ). In other words, for each Abox Asatisfying a Tbox T, Aalso satisfies cl(T ).

Proof (sketch). cl(T )T , since T is included in cl(T ). Clearly, the application of each closure rule is sound. So for each i, TiTi+1. By induction, T = T0Ti for each i. Thus Tcl(T ), so Tcl(T ).

The next proposition establishes the completeness of the closure of a Tbox T : cl(T ) contains all the PIs, NIs and key constraints that are logically entailed by that Tbox (up to equivalence).

Proposition 8.4.4 (Completeness of the closure of a Tbox) Let T be a DL-LITEF or a DL-LITER Tbox. Then

  1. Let XY be a NI or a PI. If TX⊑¬X

    then X⊑¬Xcl(T ),

    otherwise TXY iff XY cl(T ) or ¬Y ⊑¬Xcl(T ).

  2. T(funct R) iff (funct R) cl(T ) or R⊑¬∃Rcl(T ).

Proof (sketch). We build a canonical interpretation I of the classes and properties appearing in T as follows: for each X such that X⊑¬Xcl(T ), I(X) = ; for the other classes or properties, we associate a constant (respectively a pair of constants) with each class (respectively each property) and we initialize their interpretations with those (pairs of) constants. Then, we complete these interpretations by applying the positive inclusions in cl(T ) as logical rules in a bottom up manner until saturation. For instance, if A⊑∃P is in cl(T ), from the initial state where a is I(A), and (p1,p2) is in I(P), we add a new constant p3 in the domain of interpretation and the pair (a,p3) in I(P). Now, if P ⊑∃Q is also in cl(T ), we add two new constants p4 and p5 in the domain and the pairs (a,p4) and (p1,p5) to I(Q).

Clearly, by construction, I is a model of each PI in T . It is also a model of each NI X⊑¬Y in T . Suppose that this not the case: there would exist a constant x which is in I(X) and in I(Y). By construction of I, it would exist a chain of positive inclusions in cl(T ) between X and Y and thus X Y would be in cl(T ), and therefore X ⊑¬X would be in cl(T ) too, and in this case I(X) would be empty, which contradicts that I is not a model of X⊑¬Y.

To prove (1), if TX ⊑¬X, in every model of T , X must be empty, in particular in I. By construction of I, this means that X⊑¬Xcl(T ). Otherwise, consider a PI XY such that TX Y. Since I is a model of T , I(X) I(Y). By construction of I, this means that there exists a chain of positive inclusions in cl(T ) between X and Y and thus X Y is in cl(T ).

Consider now a NI X⊑¬Y such that neither X⊑¬Y nor Y ⊑¬X belong to cl(T ). Let us define the interpretation J such that

In particular J(X) = D (since X⊑¬X is not in cl(T )), and J(Y) = D (since neither X⊑¬Y nor Y ⊑¬X belong to cl(T )). Clearly, J is a model of T , but it is not a model of X⊑¬Y. Therefore, TX⊑¬Y. This ends the proof of (1).

For proving (the contraposite of) (2), we adapt the above canonical interpretation I by initializing with {(p,q),(p,r)} the interpretation of all the properties R such that neither (functR) nor R⊑¬∃R belong to cl(T ). And we show that the resulting interpretation Iis a model of T in which the constraints of functionality of such R is not satisfied.

Finally, the last proposition establishes that checking consistency can be reduced to check whether the data in A satisfy every NI in the closure.

Proposition 8.4.5 (Consistency checking using NI-closure) Let T be a DL-LITEF or a DL-LITER Tbox. Let Abe an Abox associated to T. T ,Ais unsatisfiable iff there exists a NI or a key constraint in the closure of T which is violated by some facts of A.

Proof (sketch). For every constant a appearing in the Abox A, we define A(a) as the set of facts extracted from A as follows:

We first show that if T ,Ais unsatisfiable, there exists a constant a such that T ,A(a) is unsatisfiable. In fact we show the contrapositive: suppose that for every constant a, T ,A(a)is satisfiable: for each a, there exists an interpretation Ia satisfying the inclusions in T and all the facts in A(a). It is easy to show that the interpretation I defined on the union of domains of interpretations as follows is a model of T ,A(which is then satisfiable):

Then, since each A(a) is a conjunction of facts of the form X(a), if T ,Ais unsatisfiable, there exists a constant a such that T ,X1(a) ... Xn(a) is unsatisfiable. Therefore, T ,x(X1(x) ... Xn(x)) is unsatisfiable. This entails: Tx(¬X1(x) ... ∨¬Xn(x)), which in DL notation corresponds to: TX1 ⊑¬X2 ... ⊔¬Xn. Because of the form of the inclusion allowed in DL-LITE, there must exist i such that TX1 ⊑¬Xi. According to Proposition 8.4.4, this entails that the corresponding NI X1 ⊑¬Xi is in the closure of T and that A violates it (since it includes X1(a) and Xi(a)).

Conversely, it is easy to show that if a NI in the closure of T is violated by some facts in the Abox, then T ,Ais unsatisfiable. If it were not the case, since according Proposition 8.4.3 T and cl(T ) have the same models, there would be a model in which all the NIs in the closure of T would be satisfied by the facts in A.

The second step of consistency checking, after the NI-closure is computed, does not require any further computation on the Tbox T . This second step simply consists in evaluating against the Abox A (seen as a relational database) a Boolean query corresponding to each negated NI in the NI-closure of the Tbox. If one of those Boolean queries is evaluated to true against A, it means that some data in the Abox A violates the corresponding NI, and therefore the knowledge base K = T ,Ais inconsistent.

For example, consider the NI: TeachesTo ⊑¬PhDStudent. Its corresponding FOL formula φ and its negation are:

∀x,y′[TeachesTo(x,y′)⇒  ¬PhD Student(x)]   φ
∃x,y′[TeachesTo(x,y′)∧ PhD Student(x)]    ¬ φ
and the corresponding Boolean query is:
qunsat() :- TeachesTo(x,y ′),PhD Student(x)
i.e., the direct translation of the negation of the NI.

Consider the evaluation of the qunsat query against the Abox A of Figure 8.1. It evaluates to true: consider valuation ν with ν(x) = pierre, ν(y) = paul and the facts

PhD Stu den t(paul) and TeachesTo(pau l,pierre).
Thus, the knowledge base K made of the Tbox of Figure 8.6 and the Abox of Figure 8.1 is inconsistent.

The transformation of NIs into Boolean queries that correspond to their negation is described in Definition 8.4.6.

Definition 8.4.6 (Transformation of NIs into Boolean queries) The transformation δ of NIs into Boolean queries corresponding to their negation is defined as follows:

δ(B 1 ⊑ ¬ B2) =   qunsat :- γ 1(x ),γ2(x) such that
                  γi(x )= Ai(x) if Bi = Ai
                  γi(x )= Pi(x,yi) if Bi = ∃Pi
                  γ (x )= P (y ,x) if B = ∃P -
                   i      i  i      i     i
δ(R 1 ⊑ ¬ R2) =   qunsat :- ρ1(x,y) , ρ2(x,y) such that
                  ρi(x,y) = Pi(x,y ) if Ri = Pi-
                  ρi(x,y) = Pi(y,x ) if Ri = P i
δ((functP))   =   qunsat :- P (x ,y) , P(x,z) , y⁄= z
δ((functP-))  =   qunsat :- P (x ,y) , P(z,y) , x⁄= z

This second step of consistency checking is summarized in the Consistent Algorithm (Algorithm 2). In the algorithm, for each NI clause α, the query qunsat,α is an SQL query computing the Boolean conjunctive queries δ(α). Also, db(A) denotes the A set in a relational database.


Consistent(T ,A)

Input: a KB K = T ,A

Output: true if K is satisfiable, false otherwise

(1)qunsat = (i.e., qunsat is false)

(2)foreach αcln(T )  let  qunsat = qunsat qunsat,α(db(A))

(3)

(4)if qunsat = return true

(5)else return false

Algorithm 2: The Consistent algorithm

It is important to emphasize that this two-step approach for consistency checking does not require any inference on the data. The only inferences concern the Tbox and consist in computing the deductive closure of its axioms, from which the NI-closure (denoted cln(T ) in the Algorithm) is extracted.

Consider the Abox Aobtained from the inconsistent Abox A in Figure 8.1 by deleting the fact PhDStudent(paul). The knowledge base made of the Abox Ain Figure 8.8 and the Tbox T in Figure 8.6 is consistent. (See Exercise 8.6.4.)


Subject Predicate Object FOL semantics




dupond Leads infoDept Leads(dupond,infoDept)
dupond rdf:type Professor Professor(dupond)
durand ResponsibleOf ue111 ResponsibleOf(durand,ue111)
durand Leads csDept Leads(durand,csDept)
paul TeachesTo pierre TeachesTo(paul,pierre)
pierre EnrolledIn infoDept EnrolledIn(pierre, infodept)
pierre rdf:type Undergrad Undergrad(pierre)
pierre RegisteredTo ue111 Registered(pierre, ue111)
ue111 OfferedBy infoDept OfferedBy(ue111,infoDept)
ue111 rdf:type CSCourse CSCourse(ue111)
jim EnrolledIn csDept EnrolledIn(jim, csDept)
csDept rdf:type TeachingDept TeachingDept(csDept)





Figure 8.8: A: an Abox consistent w.r.t the Tbox of Figure 8.6


8.4.3  Answer set evaluation

In the previous section, the negative constraints played the main role. Once we know the knowledge base is consistent and move to query answering, the positive constraints take over.

Answering queries to a DL-LITE knowledge base is done in two steps. The first step is the query reformulation, which consists in translating the original query q into a set Q of queries. The second step consists in evaluating the queries in Q over the Abox (again seen as a relational database). The beauty of the approach is that this will provide the answer set. Of course, simply evaluating q over the Abox would possibly yield an incomplete answer. Completeness is achieved by the “reasoning” in the reformulation step. During this step, we access only the Tbox and not the data.

The query reformulation step is performed by the PerfectRef (Algorithm 3). It consists in reformulating the initial query by using the PIs in T as rewriting rules. The intuition is that PIs are seen as logical rules that are applied in backward-chaining to query atoms in order to expand them (in a resolution style). In databases, this is called a chase.

The queries we consider, i.e., the conjunctive queries, consist of several atoms. In general, because of the existential variables, new variables are introduced in queries. So we could be lead to generate more and more queries with new variables. It turns out that we will be able to control this process and generate only a finite number of distinct queries. This is due to the limitations of the constraints allowed in the Tbox. As outlined in Section 8.4.4, as soon as we allow the combination of key constraints with inclusions of properties, we may generate an infinite number of non redundant queries.

Consider a PI rule αβ. Applicability of the rule to an atom of a query is defined by:

As usual, _ denotes here an unbounded existential variable of a query.

The following definition defines the result gr(g,I) of the goal reduction of the atom g using the PI I, which is at the core of PerfectRef.

Definition 8.4.7 (Backward application of a PI to an atom) Let I be an inclusion assertion that is applicable to the atom g. Then, gr(g,I) is the atom defined as follows:

if g = A(x) and I = A1 A, then gr(g,I) = A1(x)
if g = A(x) and I = P A, then gr(g,I) = P(x,_)
if g = A(x) and I = P-A, then gr(g,I) = P(_,x)
if g = P(x,_) and I = A⊑∃P, then gr(g,I) = A(x)
if g = P(x,_) and I = P1 ⊑∃P, then gr(g,I) = P1(x,_)
if g = P(x,_) and I = P1-⊑∃P, then gr(g,I) = P1(_,x)
if g = P(_,x) and I = A⊑∃P-, then gr(g,I) = A(x)
if g = P(_,x) and I = P1 ⊑∃P-, then gr(g,I) = P1(x,_)
if g = P(_,x) and I = P1-⊑∃P-, then gr(g,I) = P1(_,x)
if g = P(x1,x2) and either I = P1 P or I = P1-P- then gr(g,I) = P1(x1,x2)
if g = P(x1,x2) and either I = P1 P-or I = P1-P then gr(g,I) = P1(x2,x1)

The subtle point of PerfectRef is the need of simplifying the produced reformulations, so that some PIs that were not applicable to a reformulation become applicable to its simplifications. A simplification amounts to unify two atoms of a reformulation using their most general unifier and then to switch the possibly new unbounded existential variables to the anonymous variable denoted _.

Let us illustrate the reformulation step of the following query using the PIs in the Tbox T of Figure 8.6:

q(x):- TeachesIn(x,y), RegisteredIn(z,y), Student(z).

Figure 8.9 shows the result returned by PerfectRef (q(x),T ).


Initial query
q(x):- TeachesIn(x,y), RegisteredIn(z,y), Student(z)



Reformulations applied PI Reformulated query



q1(x):- ResponsibleOf(x,y), RegisteredIn(z,y), Student(z) ResponsibleOf TeachesIn q(x)
q2(x):- TeachesIn(x,y), RegisteredIn(z,y), PhDStudent(z) PhDStudent Student q(x)
q3(x):- TeachesIn(x,y), RegisteredIn(z,y), TeachesTo(_,z) TeachesTo-Student q(x)
q4(x):- TeachesIn(x,y), RegisteredIn(_,y) RegisteredIn Student q(x)
q5(x):- ResponsibleOf(x,y), RegisteredIn(z,y), PhDStudent(z) PhDStudent Student q1(x)
q6(x):- ResponsibleOf(x,y), RegisteredIn(z,y), TeachesTo(_,z) TeachesTo-Student q1(x)
q7(x):- ResponsibleOf(x,y), RegisteredIn(_,y), RegisteredIn Student q1(x)
q8(x):- ResponsibleOf(x,y), RegisteredIn(z,y), PhDStudent(z) ResponsibleOf TeachesIn q2(x)
q9(x):- ResponsibleOf(x,y), RegisteredIn(z,y), TeachesTo(_,z) ResponsibleOf TeachesIn q3(x)
q10(x):- ResponsibleOf(x,y), RegisteredIn(_,y) ResponsibleOf TeachesIn q4(x)
q11(x):- TeachesIn(x,y), Course(y) Course ⊑∃RegisteredIn- q4(x)
q12(x):- TeachesIn(x,_) TeachesIn-Course q11(x)
q13(x):- ResponsibleOf(x,_) ResponsibleOf TeachesIn q12(x)
q14(x):- Professor(x) Professor ⊑∃TeachesIn q12(x)




Figure 8.9: A query and its reformulations obtained by PerfectRef applied to the Tbox of Figure 8.6


We detail here the inference chain leading to some reformulations that are particularly interesting for getting answers for q from the data in the Abox Aof Figure 8.8. This also illustrates the need of the simplification step. The reformulation:

q4(x):- TeachesIn(x,y), RegisteredIn(_,y).

is obtained by:

In turn, q4(x) can be reformulated by the backward application of the PI Course ⊑∃RegisteredIn- to the atom RegisteredIn(_,y), which results in the reformulation q11(x):

q11(x):- TeachesIn(x,y), Course(y).

Then, the reformulation q12(x) is produced by the backward application of the PI TeachesIn-Course, and the simplification by unification of the two atoms followed by the replacement of the existential variable y, now unbounded, with the anonymous variable _.

q12(x):- TeachesIn(x,_).

Finally, the reformulations q13(x) and q14(x) are obtained from the backward application of the PIs ResponsibleOf TeachesIn and Professor ⊑∃TeachesIn respectively.

q13(x):- ResponsibleOf(x,_).
q14(x):- Professor(x).

It is important to notice that the answers durand and dupond are obtained for the initial query q(x) thanks to those reformulations q13(x) and q14(x): they would not be returned by the standard evaluation of the query q(x) against the Abox Aof Figure 8.8.

In the PerfectRef algorithm (Algorithm 3):


PerfectRef (q,T )

Input: a conjunctive query q and a Tbox T

Output: a union of conjunctive queries: PR

(1)PR := {q}

(2)repeat

(3)PR:= PR

(4)foreach qPR

(5)(a) foreach gq

(6)if a PI I T is applicable to g

(7)PR := PR∪{q[g/gr(g,I)]}

(8)(b) foreach g1,g2 q

(9)if g1 et g2 sont unifiables

(10)PR := PR∪{τ(reduce(q,g1,g2))}

(11)until PR= PR 

(12)

Algorithm 3: The PerfectRef algorithm

Figure 8.9 shows the result returned by PerfectRef (q(x),T ), where q(x) is the query of the previous example, and T is the Tbox of Figure 8.6. The second column makes explicit the PI used for obtaining the corresponding reformulation. Note that equivalent reformulations can be produced by different inferences, such as for example the reformulations q4(x) and q7(x).

Although we will not prove it here, the following properties hold:

soundness.
All the facts computed using PerfectRef are correct query answers.
completeness.
All query answers are obtained.
complexity.
Since we touch the data only for the evaluation of FOL queries, the worst-case complexity is PTIME in the size of the Abox. The number of reformulations is PTIME in the size of the Tbox. Therefore, the complexity of evaluating a query against a DL-LITER or DL-LITEF knowledge base is PTIME in the size of the knowledge base.

8.4.4  Impact of combining DL-LITER and DL-LITEF on query answering

In this section, we exhibit an example showing that the interaction of key constraints (the specificity of DL-LITEF ) with inclusion constraints between properties (the specificity of DL-LITER ) may lead to a reformulation of a query into an infinite number of conjunctive rewritings, each one likely to bring additional answers. This makes an algorithmic approach such as the one we used for DL-LITER and DL-LITEF in isolation incomplete for query answering when DL-LITER and DL-LITEF are combined together.

Consider a Tbox made of the following inclusion axioms, in which R and P are two properties and S is a class:

R ⊑ P
(functP )
S ⊑ ∃R
   -
∃R   ⊑ ∃R
Let us consider the following query:
q(x) :-R(z,x)
The following query expression is a valid reformulation for the query q:
r1(x) :-S(x1),P(x1,x)

To see this, we observe that from the fact S(x1) and the PI S ⊑∃R, it can be inferred that there exists y such that R(x1,y) holds, and thus P(x1,y) holds too (since RP). From the functionality constraint on P and the conjunct P(x1,x) in the body of r1, we can now infer that y = x, and thus that R(x1,x) holds. Therefore, zR(z,x) is logically entailed by x1S(x1) P(x1,x), i.e., r1(x) is contained in the query q(x), and thus is a valid reformulation of the query q(x).

It turns out that the situation is even more subtle. Surprisingly, this reformulation r1(x) is not the only one. In fact there exists an infinite number of different reformulations for q(x). Let k 2. The following query is a valid reformulation of q(x):

r(x) :- S(x ),P(x ,x  ),...,P (x,x)
k        k     k  k- 1        1

To show that rk(x) is logically contained in q(x), we exploit again the axiom of functionality of P and the inclusion axiom between R and P: from the fact S(xk) and the PI S ⊑∃R, it can be inferred that there exists yk such that R(xk,yk) holds, and thus P(xk,yk) holds too (since RP). Since P is functional, we get: yk = xk-1, and thus R(xk,xk-1) holds. Now, based on the PI R-⊑∃R, there exists yk-1 such that R(xk-1,yk-1) holds, and with the same reasoning as before, we get yk-1 = xk-2, and thus R(xk-1,xk-2) holds. By induction, we obtain that R(x1,x) holds, i.e., rk(x) is logically contained in the query q(x).

One can also show that for each k, there exists an Abox such that the reformulation rk returns answers that are not returned by the reformulation rk for k<k. Thus, there exists an infinite number of non redundant conjunctive reformulations.

It can be shown that if we combine key constraints and inclusions of properties in a restricted way, this problem can be avoided. For instance, if key constraints are forbidden on properties involved in right-hand side of an inclusion axiom, there is a finite number of non redundant conjunctive reformulations and they can be found by the PerfectRef algorithm.

8.5  Further reading

The spreading of RDF data on the Web and the importance of queries for such data is illustrated by the Billion Triple Track of the Semantic Web Challenge 1. The idea is to “do something” efficiently with one billion of RDFS triples.

RDFS. Reasoners for RDFS are available on line and can be downloaded, like for instance the Jena2 ontology API [104], which implements the forward-chaining algorithm we described. A SPARQL [175] engine included in the same programmatic Jena environment enables storing and querying data sets made of (asserted + inferred) triples. In fact, since RDFS statements are also stored as RDF triplets, SPARQL can also be used to query the schema, and not only the RDF data. For guaranteeing the completeness of the answers to a schema query, all the inference rules that we have given Figure 7.7 ( section describing RDFS) in Chapter 7 must be taken into account, and not only the subset that we considered in the forward-chaining algorithm we described. We mentioned the practical advantage of separating the computation over a Tbox from that over the Abox. This is useful also from a theoretical point of view. This gives a bound on the data complexity (the complexity in terms of the Abox only) of consistency checking and of query answering. We showed that they can be performed using FOL queries and it is known [1599] that evaluating FOL queries over a relational database is in LOGSPACE in the size of the database.

DL-lite. DL-LITER has been recently incorporated into the version OWL2 [176] of OWL as the profile called OWL2 QL. The proof that the PerfectRef Algorithm computes the whole answer is shown in [38]. It follows that the complexity of query answering by reformulation in these fragments of DL-LITE is polynomial in the size of the Tbox, and in LOGSPACE in the size of the Abox.

A major result in [38] is that DL-LITER and DL-LITEF are two maximal fragments of the DL-Lite family supporting tractable query answering over large amounts of data. It has been shown in [38] that consistency checking and instance recognition (which a particular case of query answering), while being LOGSPACE both for DL-LITER and DL-LITEF Tboxes, are PTIME-COMPLETE for the languages that combine the axioms of both (denoted DL-LITERF). This complexity result shows it is unlikely that an approach based on query reformulation would provide a complete query answering algorithm for DL-LITERF.

QuOnto ([13]) is a JAVA tool implementing the DL-Lite family of ontology representation languages. It permits the declaration of an ontology as a DL-LITE Tbox, the construction of an associated Abox that can be stored and as a MySQL database. The consistency checking of the resulting DL-LITE knowledge base, and query answering by reformulation are the core functionalities of QuOnto, based on the implementation in Java of the algorithms presented in this chapter.

Datalog+-. Recent research [1211] has extended the Datalog database query language towards query answering over ontologies. This has resulted in a unifying framework based on a family of expressive extensions of Datalog, called Datalog+-, that captures DL-LITER and DL-LITEF .

8.6  Exercises

Exercise 8.6.1 With the University example, find a new query that has a different answer:

  1. on the RDF data vs. the RDF data together with the RDFS ontology.
  2. on the RDF data vs. the RDF data together with the DL-LITE ontology.

Exercise 8.6.2 Prove that the Saturation algorithm runs in 0((M×N)2).

Exercise 8.6.3 Prove that the rules used for computing the TBox closure are sound.

Exercise 8.6.4 Consider Abox Ain Figure 8.1 and Aobtained by deleting the fact PhDStudent(paul), and the Tbox T in Figure 8.6. Show that:

  1. The knowledge base AT is inconsistent.
  2. The knowledge base A′∪T is consistent.

Exercise 8.6.5 Give the FOL rule corresponding to the different cases of negative inclusion axioms.

1http://challenge.semanticweb.org/