8.1 Introduction

8.2 Querying RDF data: notation and semantics

8.3 Querying through RDFS ontologies

8.4 Answering queries through DL-LITE ontologies

8.4.1 DL-LITE

8.4.2 Consistency checking

8.4.3 Answer set evaluation

8.4.4 Impact of combining DL-LITEand DL-LITEon query answering

8.5 Further reading

8.6 Exercises

8.2 Querying RDF data: notation and semantics

8.3 Querying through RDFS ontologies

8.4 Answering queries through DL-LITE ontologies

8.4.1 DL-LITE

8.4.2 Consistency checking

8.4.3 Answer set evaluation

8.4.4 Impact of combining DL-LITEand DL-LITEon query answering

8.5 Further reading

8.6 Exercises

As we saw in the previous chapter, ontologies form the backbone of the Semantic Web by providing a conceptual view of data and services available worldwide via the Web. We discussed the RDF language for describing knowledge, and a family of languages, called description logics, that provide formal foundations for ontologies and in particular for the OWL ontology language recommended by the W3C.

In this chapter, the focus is on querying RDF data. Since massive volumes of RDF sources are more and more present on the Web, specifying queries for RDF data that can be evaluated efficiently is becoming every day more and more important.

We will see that the set of query answers strongly depends on the semantic context. We will study query answering when the ontology is specified in RDFS and then when the ontology is specified in DL-LITE. The ontology language DL-LITE belongs to the DL family. It has been designed as a trade-off between the ability to describe a wide range of domains of interest and query evaluation efficiency. More precisely, we will focus on two important fragments of DL-LITE, namely DL-LITE and DL-LITE.

We will observe that the problem of answering queries through ontologies is quite different from that of answering database queries for the following two essential reasons:

- Implicit facts.
- In a DBMS setting, all the facts are explicit. For instance, the constraint “Every PhD student is a student” enforces that, before inserting a value v in the PhDStudent table, this value is also inserted in the Student table (if not already there). In an ontology context, someone may be a student not explicitly but because of the constraint used as an inference rule. In addition, the implicit facts may be incompletely known, coming from constraints such as “A professor teaches at least one master course”. From a fact such as Professor(dupond), one can infer the two facts Teaches(dupond,x) and MasterCourse(x) for some unknown value x. Such partially known implicit facts may however be useful for answering queries such as “Give me all the persons who teach a master course”.
- Inconsistency.
- In a DBMS setting, the constraint “each course must have a single responsible” is also viewed as a law that cannot be violated. An update of the corresponding table that would violate this law would simply be rejected. In an ontology context, such local verifications are not sufficient for checking data inconsistency. Because of data incompleteness, this may require intricate reasoning on different constraints and data distributed over different tables. For instance, if in addition of the previous key constraint, it is declared that “only professors can be responsible of courses in which they must teach”, “a master course is taught by a single teacher”, and “lecturers are not professors”, the presence of the three following facts in different tables of the database makes it inconsistent: Lecturer(jim), TeachesIn(jim,ue431) and MasterCourse(ue431). The reason is that because of the constraint “only professors can be responsible of courses in which they must teach”, we can infer that the course ue431 must have a responsible x who is unknown but for whom we have a partial information: s/he is a professor and s/he teaches in the course ue431. Without knowing x, the fact that she is a professor is sufficient to infer that x≠jim (since jim is a lecturer and thus not a professor). Therefore, the course ue431 is taught by two distinct teachers, which is forbidden for a master course.

From this, it should be clear that query answering through ontologies is more complicated that in classical databases. We have to reason to find which inferences may participate in answering a given query. We also have to reason to verify the consistency of our knowledge.

In this section, we set the stage for querying RDF data. We also discuss the impact of ontologies (knowledge on the domain of interest) on the answers. To simplify, we ignore here blank nodes.

Figure 8.1 is an enhanced version of the University example that we will use throughout this chapter. The first column provides RDF data in the triple syntax, while the second column shows the corresponding facts in FOL.

Subject | Predicate | Object | FOL semantics |

dupond | Leads | infoDept | Leads(dupond,infoDept) |

dupond | rdf:type | Professor | Professor(dupond) |

durand | ResponsibleOf | ue111 | ResponsibleOf(durand,ue111) |

durand | Leads | csDept | Leads(durand,csDept) |

paul | TeachesTo | pierre | TeachesTo(paul,pierre) |

paul | rdf:type | PhdStudent | PhDStudent(paul) |

paul | EnrolledIn | infoDept | EnrolledIn(paul, infodept) |

pierre | EnrolledIn | infoDept | EnrolledIn(pierre, infodept) |

pierre | rdf:type | Undergrad | Undergrad(pierre) |

pierre | RegisteredTo | ue111 | Registered(pierre, ue111) |

ue111 | OfferedBy | infoDept | OfferedBy(ue111,infoDept) |

ue111 | rdf:type | CSCourse | CSCourse(ue111) |

jim | EnrolledIn | csDept | EnrolledIn(jim, csDept) |

csDept | rdf:type | TeachingDept | TeachingDept(csDept) |

RDF triples can be asserted in a very flexible way and almost without constraints. The association of some ontology is not a requirement. Users can update a collection of RDF statements freely by just adding/removing triples. The only reserved word in the RDF vocabulary is rdf:type that is used to relate constant names to types, i.e., classes in domain of interest or unary predicates in FOL world.

Let us now consider querying a set of RDF facts, for which the query language SPARQL has been proposed. We briefly consider it next. query language. We briefly consider it next.

SPARQL (pronounced “sparkle”) is a recursive acronym standing for SPARQL Protocol And RDF Query Language. It is a W3C recommendation as of 2008. Although it does borrow some features from XQuery (functions and operators), it is based on the graph model underlying RDF data.

For instance, the following query expresses in SPARQL the search of all the individuals who are enrolled in a department led by a Professor.

We used here an SQL-like syntax. There exists competing syntaxes for expressing SPARQL queries. The corresponding query in FOL notation is:

q(x) : - ∃y∃z[EnrolledIn(x,y) ∧Leads(z,y) ∧Professor(z)]

This is a conjunctive query, i.e., a FOL formula without negation or disjunction, of the form

In the remainder of this chapter, we use conjunctive queries as the query language for
RDF. From the example, it should be clear that all we say is relevant to SPARQL. We
use a (standard) simplified notation for conjunctive queries. We omit the existential
quantifiers and denote the connector ∧ by a comma. Observe that this does not introduce any
ambiguity. In particular, all variables not occurring in the “head of the query” (i.e., in
q(x_{1},...,x_{m})) are understood as existentially quantified. The variables in x_{1},...,x_{m} are said to be
distinguished.

In this simplified form, the example SPARQL query becomes:

Now consider a query in the general form:

An essential issue is in the meaning of “holds” in the previous informal definition.

Inference with data only. In the simplest case, a fact R_{i}(ν(u_{i})) holds if it is a known fact (explicitly
stated in the data store). For instance, consider the previous conjunctive query in the University
example. The evaluation of the query q(x) against the facts of Figure 8.1 returns {paul, pierre} as its
answer set. To see why paul is an answer, we just check that, by mapping the distinguished variable
to the constant paul, and the existential variables y and z respectively to the constants
infoDept and dupond, all the conjuncts in the query definition are satisfied by facts in the
database. The same holds if the distinguished variable x is instantiated with the constant
pierre.

More inference using an ontology. Now, let us assume that a fact R_{i}(ν(u_{i})) holds if it is a known
fact or if it is a consequence of the known facts by taking the ontological statements into account.
Suppose now that we also have the knowledge that someone responsible for a class has to be a
professor, that is in DL syntax:

To see why, we just have to consider the FOL semantics of this DL statement:

This logical implication indeed allows inferring the fact Professor(durand) from the fact ResponsibleOf(durand, ue111).

More subtly, we can get answers from partially instantiated facts that can be logically entailed by the knowledge base. Suppose that we know that a professor teaches at least one course, that is in DL syntax:

From the explicit ground fact Professor(durand) and the contraint Professor ⊑∃TeachesIn, we know that TeachesIn(durand,v) holds for some unknown value v. The valuation of v may vary in the different “worlds” satisfying the constraint. This is however sufficient to infer that answer q(durand) is true in all these possible worlds.

Formal definition of answer set. Recall that φ⊧ψ (i.e., φimplies ψ) if each interpretation making
φ true also makes ψ true, or equivalently, every model of φ is a model of ψ. We next
provide a formal definition of the answer set of a query for a DL knowledge base, that
captures the general setting where data (the Abox ) is associated to an ontology (the
Tbox ) to form a DL knowledge base = ∪. A query to the knowledge base is a
conjunctive query using class or property predicates from the given knowledge base
with the proper arity. (Class predicates have arity one and property predicates arity
2.)
A valuation ν of a set of variables {z_{1},...,z_{p}} is a substitution (denoted {z_{1}/a_{1},...,z_{p}/a_{p}}) that
assigns each variable z_{i} to a constant a_{i} (two distinct variables may be assigned to a same constant).
Given two valuations ν and ν′ of two disjoint sets of variables {z_{1},...,z_{p}} and {v_{1},...,v_{k}} , ν∘ν′
denotes the valuation assigning the variables z_{i} to the corresponding constants in ν, and the
variables v_{j} to the corresponding constants in ν′.

We can now formally define the notion of answers.

Definition 8.2.1 Let q(x_{1},...,x_{m}) : - R_{1}(u_{1}),...,R_{p}(u_{p}) be a query to a knowledge base . An
answer is a ground fact q(ν(x_{1}),...,ν(x_{m})) for some valuation ν of the distinguished variables such
that in every model of there exists a valuation ν′of the existential variables for which R_{i}(ν∘ν′(u_{i}))
is true for each i. The answer set of qfor is the set of all such answers. It is denoted q().

Consider again the previous University query example. We have seen that its answer set varies depending on the knowledge base against which it is evaluated. In particular, if is the set of facts of Figure 8.1 and is {∃ResponsibleOf ⊑Professor}, we have:

- q() = {paul,pierre}.
- q(∪ ) = {paul,pierre,jim}.

Boolean queries. To conclude this section, we consider a particular interesting case of queries, that of Boolean queries. The arity of a query is the number of its distinguished variables. A query of arity 0, i.e., a query of the form q() : ..., is called a Boolean query. Note that there is a single possible answer, namely q(). In this case, we see that as a positive answer to the query, i.e., as true. If the answer set is empty, we see that as false. To see an example, consider the query

This Boolean query asks whether there exists a student teaching to other students. Suppose ′ = {PhDStudent ⊑Student}. Then we can distinguish two cases:

- q′() = ∅ and the answer is no.
- q′(,′) = {q′()} and the answer is yes.

In the second case, the fact Student(paul), although not in the Abox , can be inferred from the fact PhDStudent(paul) in and the inclusion statement PhDStudent ⊑Student in ′. Together with the fact TeachesTo(paul,pierre) present in the Abox, it makes the body of the query q′ satisfied.

In this section, we consider RDF data without blank nodes (that can be seen as an Abox), associated to an RDFS ontology (that can be seen as a very simple Tbox). RDF data and RDFS statements can be denoted and stored as triples. However, the important point is that RDFS statements have a logical semantics which can be operationalized as a set of inference rules (see Section 7.3 in Chapter 7). We illustrate here how this can be used to answer queries.

Figure 8.2 is an example of an RDFS ontology that can be associated to the RDF data in Figure 8.1. The RDFS statements composing the ontology are given in three notations: the triple notation, the DL notation, and the corresponding FOL notation.

RDFS notation | DL notation | FOL notation |

⟨AcademicStaff rdfs:subClassOfStaff ⟩ | AcademicStaff ⊑Staff | AcademicStaff (X) ⇒Staff (X) |

⟨Professor rdfs:subClassOfAcademicStaff ⟩ | Professor ⊑AcademicStaff | Professor(X) ⇒AcademicStaff (X) |

⟨Lecturer rdfs:subClassOfAcademicStaff ⟩ | Lecturer ⊑AcademicStaff | Lecturer(X) ⇒AcademicStaff (X) |

⟨PhDStudent rdfs:subClassOfLecturer⟩ | PhDStudent ⊑Lecturer | PhDStudent(X) ⇒Lecturer(X) |

⟨PhDStudent rdfs:subClassOfStudent⟩ | PhDStudent ⊑Student | PhDStudent(X) ⇒Student(X) |

⟨TeachesIn rdfs:domainAcademicStaff ⟩ | ∃TeachesIn ⊑AcademicStaff | TeachesIn(X,Y) ⇒AcademicStaff (X) |

⟨TeachesIn rdfs:rangeCourse⟩ | ∃TeachesIn^{-}⊑Course | TeachesIn(X,Y) ⇒Course(Y) |

⟨ResponsibleOf rdfs:domainProfessor⟩ | ∃ResponsibleOf ⊑Professor | ResponsibleOf (X,Y) ⇒Professor(X) |

⟨ResponsibleOf rdfs:rangeCourse⟩ | ∃ResponsibleOf ^{-}⊑Course | ResponsibleOf (X,Y) ⇒Course(Y) |

⟨TeachesTo rdfs:domainAcademicStaff ⟩ | ∃TeachesTo ⊑AcademicStaff | TeachesTo(X,Y) ⇒AcademicStaff (X) |

⟨TeachesTo rdfs:rangeStudent⟩ | ∃TeachesTo^{-}⊑Student | TeachesTo(X,Y) ⇒Student(Y) |

⟨Leads rdfs:domainAdminStaff ⟩ | ∃Leads ⊑AdminStaff | Leads(X,Y) ⇒AdminStaff (X) |

⟨Leads rdfs:rangeDept⟩ | ∃Leads^{-}⊑Dept | Leads(X,Y) ⇒Dept(Y) |

⟨RegisteredIn rdfs:domainStudent⟩ | ∃RegisteredIn ⊑Student | RegisteredIn(X,Y) ⇒Student(X) |

⟨RegisteredIn rdfs:rangeCourse⟩ | ∃RegisteredIn^{-}⊑Course | RegisteredIn(X,Y) ⇒Course(Y) |

⟨ResponsibleOf rdfs:subPropertyOfTeachesIn⟩ | ResponsibleOf ⊑TeachesIn | ResponsibleOf (X,Y) ⇒TeachesIn(X,Y) |

As already said, these RDFS statements can be used to infer new triples (i.e., new facts) from the RDF database. For example, the RDF triple ⟨durandResponsibleOf ue111⟩ in Figure 8.1 corresponds to the fact ResponsibleOf (durand,ue111), and the RDFS statement ⟨ResponsibleOf rdfs:domainProfessor⟩ corresponds to the logical rule: ResponsibleOf (X,Y) ⇒Professor(X). The condition of this rule can be mapped with the fact ResponsibleOf (durand,ue111) by the substitution {X/durand, Y/ue111}, and thus the corresponding instantiation of the conclusion Professor(durand) can be inferred. This new fact can in turn trigger a rule such as Professor(X) ⇒AcademicStaff (X), thereby allowing the inference of additional facts such as AcademicStaff (durand).

More generally, RDFS statements correspond to rules that can be applied in a forward-chaining manner to the initial set of facts until saturation, i.e., until no more fact can be inferred. It is important to see that the variables in the head of rule all occur in the body. In other words, no variable is quantified existentially. So rules always infer new ground facts. Such rules are said to be safe. We will use unsafe rules when we consider DL-LITE, which will render query processing more complicated.

The simple forward-chaining Algorithm 1 starts with the set of initial facts and repeats inference steps until saturation.

Saturation(, )

Input: An Abox and an RDFS Tbox

Output: The set of facts that are inferred: Δ_{0}

(4) foreach rule condition ⇒conclusion in ,

(5) if there exists a substitution σ such that σ.condition ∈ Δ_{0}

(10)until Δ_{1} = ∅

Figure 8.3 shows the facts resulting from the application of Algorithm 1 to the facts of Figure 8.1 and the rules of Figure 8.2.

Asserted facts | Inferred facts |

Leads(dupond,infoDept) | AdminStaff(dupond) |

Professor(dupond) | Dept(infoDept) |

ResponsibleOf(durand,ue111) | AcademicStaff(dupond) |

Leads(durand,csDept) | Professor(durand) |

TeachesTo(paul,pierre) | Course(ue111) |

PhDStudent(paul) | AcademicStaff(durand) |

EnrolledIn(paul, infodept) | AdminStaff(durand) |

EnrolledIn(pierre, infodept) | Dept(csDept) |

Undergrad(pierre) | AcademicStaff(paul) |

Registered(pierre, ue111) | Student(pierre) |

OfferedBy(ue111,infoDept) | Student(paul) |

CSCourse(ue111) | Student(pierre) |

EnrolledIn(jim, csDept) | Lecturer(paul) |

TeachingDept(csDept) | AcademicStaff(paul) |

Staff(paul) | |

Staff(dupond) | |

Staff(durand) | |

To answer queries from RDF facts associated to an RDFS ontology, one can proceed as follows. First one compute all the inferred facts (in a bottom-up manner) with the previous algorithm. Each step of the loop can be computed, for instance, using a standard relational query engine. This yields a new database consisting of the set of all the facts (asserted or inferred). Then one can evaluate the query directly on that database using a standard relational query engine.

For example, the standard evaluation against the set of (asserted + inferred) facts in Figure 8.3 of the query

Complexity analysis. It is interesting to estimate both the maximum number of inferred triples and the worst-case time complexity for inferring them. Of course, this depends on the number of asserted triples (i.e., the size of the data) and also on the number of axioms in the ontology (i.e., the size of the ontology). Let M be the number of facts in the Abox and N the number of axioms in the Tbox. From the presence of some initial fact C(a), one can derive a number of new facts C′(a) for some class C′. Note that the number of such C′(a) is bounded by the number of axioms in the ontology, i.e., it is less than N. Now consider some initial fact R(a,b). From it, one can derive some facts R′(a,b) or R′(b,a) as well as some facts C′(a) or C′(b) for some R′ and C′. Again, one can observe that for a particular R(a,b), the number of new facts one can derive is bounded by the number of axioms in the ontology, i.e., it is less than N. Since the number of initial facts is M, the number of facts one can derive is bounded by M×N. Observe in particular that it is linear in the number of database facts.

Now consider the worst-case time complexity for inferring them by the Algorithm
1. We have to perform at most M×N iterations. Each iteration can be performed in
polynomial time. So the algorithm is in PTIME. One can show more precisely that it is in
0((M×N)^{2}).

In this section, we consider two important fragments of the DL-LITE ontology language of the DL family. As we will see, querying is feasible for these two languages even though they provide a quite rich framework for describing semantics. We study querying through ontologies expressed in these two fragments.

A DL-LITE ontology may contain axioms corresponding (up to the syntax) to those allowed in an RDFS ontology. Besides, it may contain other axioms, of three kinds: positive inclusions (PI for short), negative inclusions (NI) and key constraints (Key). Figure 8.4 shows examples of these three kinds of DL-LITE axioms with their corresponding FOL semantics. These constraints are not expressible in RDFS.

DL notation | Corresponding logical rule | |

PI | Professor ⊑∃TeachesIn | Professor(X) ⇒∃YTeachesIn(X,Y) |

Course ⊑∃RegisteredIn^{-} | Course(X) ⇒∃YRegisteredIn(Y,X) | |

NI | Student ⊑¬Staff | Student(X) ⇒¬Staff (X) |

Key | (functResponsibleOf ^{-}) | ResponsibleOf (Y,X) ∧ResponsibleOf (Z,X) ⇒Y = Z |

We next consider in turn these new kinds of axioms.

Positive inclusion and incompleteness. A positive inclusion axiom is an expression of one of the following forms:

DL notation | Corresponding logical rule |

B ⊑∃P | B(X) ⇒∃YP(X,Y) |

∃Q ⊑∃P | Q(X,Y) ⇒∃ZP(X,Z) |

B ⊑∃P^{-} | B(X) ⇒∃YP(Y,X) |

∃Q ⊑∃P^{-} | Q(X,Y) ⇒∃ZP(Z,X) |

P ⊑Q^{-} or P^{-}⊑Q | P(X,Y) ⇒Q(Y,X) |

where P and Q denote properties and B denotes a class. Recall that P^{-} denotes the inverse of P, i.e.,
P^{-}(x,y) iff P(y,x) for all x,y.

Observe that expressions of the form ∃P ⊑B belong to DL-LITE since they already are in RDFS.
Expressions of the form P ⊑Q (so equivalently P^{-}⊑Q^{-}) also belong to DL-LITE for the same
reason.

It is important to note that the logical rules corresponding to PI axioms expressible in DL-LITE are not necessarily safe (as opposed to RDFS that uses only safe rules.) Consider the rule

Negative inclusion and inconsistencies. A negative inclusion axioms is an expression that takes one of the forms:

DL notation |

B_{
1} ⊑¬B_{2} |

R_{1} ⊑¬R_{2} |

where

- B
_{1}and B_{2}are either classes or expressions of the form ∃P or ∃P^{-}for some property P - where R
_{1}and R_{2}are either properties or inverses of properties.

The corresponding logic rules are left as an exercise. An example of NI (expressing the constraint “Students do not teach courses”) and the corresponding logical rule are as follows:

DL notation | Corresponding logical rule |

Student ⊑¬∃TeachesIn | Student(X) ⇒¬∃YTeachesIn(X,Y) |

or equivalently, ∃YTeachesIn(X,Y) ⇒¬Student(X) | |

NIs express disjointness constraints between classes or between properties, and thus introduce negation in the language. Therefore, the knowledge base against which the queries have to be evaluated may be inconsistent, i.e., a model of the corresponding theory may not exist. Note that this is not possible with RDFS ontologies: we showed an algorithm that computed a model (indeed the smallest model).

For example, adding the NI Student ⊑¬Staff to the ontology of Figure 8.2 leads to the inconsistency of the knowledge base made of the facts in Figure 8.1 and of the axioms in the ontology of Figure 8.2 enriched with that NI. The reason is that from the fact PhDStudent(paul), and the inclusion axiom PhdStudent ⊑Student, we can infer the fact Student(paul), and in turn the literal ¬Staff (paul) from the NI Student ⊑¬Staff . On the other hand, the fact Staff (paul) can be inferred from the fact PhDStudent(paul) and the inclusion axioms PhdStudent ⊑Lecturer, Lecturer ⊑AcademicStaff and AcademicStaff ⊑Staff .

Key constraints and more inconsistencies. Key constraints are expressed by functionality axioms of
the form (functP) or (functP^{-}) where P is a property and P^{-} denotes the inverse property of P.
Figure 8.5 shows their logical semantics in the form of logical rules.

DL notation | corresponding logical rule |

(functP) | P(X,Y) ∧P(X,Z) ⇒Y = Z |

(functP^{-}) | P(Y,X) ∧P(Z,X) ⇒Y = Z |

Observe that key constraints may also lead to inconsistencies. This is the case if we
attempt to equate two distinct constants, e.g., durand and dupond. For instance, the axiom
(functResponsibleOf ^{-}) expresses that a course must have a unique professor responsible for it.
Therefore, a knowledge base containing this axiom and the two facts:

ResponsibleOf (durand,ue111) and ResponsibleOf (dupond,ue111),

would be inconsistent. This is because we assume implicitly that an individual is denoted by a single constant. This natural (in practice) assumption is called in logic the unique name assumption.

We will consider the following two fragments of DL-LITE:

- DL-LITE is obtained by extending the axioms of RDFS with the PI and NI axioms.
- DL-LITE is obtained by extending the axioms of RDFS with key constraints, the PI and NI axioms, but excluding inclusion between properties. Note that, since DL-LITE does not permit to express inclusion between properties, RDFS is not included in DL-LITE.

One may wonder why one would choose such a convoluted language. Why not simply extend RDFS with the 3 kinds of axioms? This is because functional constraints interact with inclusion constraints in intricate ways. Query evaluation when they are all present is much more complex. This will be illustrated by an example in Section 8.4.4.

From the previous discussion, there are two fundamental differences between query answering in the context of RDFS and of DL-LITE knowledge bases:

- Inconsistency.
- RDFS does not permit expressing any form of negation, so an RDFS knowledge base is always consistent. On the other hand, a DL-LITE knowledge base may be inconsistent. Thus, answering queries through DL-LITE ontologies requires to make sure that the data is consistent with respect to the constraints expressed in the ontology.
- Incompleteness.
- The rules corresponding to RDFS axioms are safe thereby allowing the simple bottom-up algorithm we described. On the other hand, axioms in DL-LITE may correspond to unsafe rules. Thus a bottom-up approach may infer atoms that are not ground, i.e., some incomplete facts. Therefore, we will have to use a top-down approach for evaluating the queries that is more appropriate than the bottom-up approach.

In Section 8.4.2, we show an algorithm for checking consistency of a DL-LITE knowledge base, and in Section 8.4.3 an algorithm for answering conjunctive queries posed to a DL-LITE knowledge base. The particularity of these two algorithms is that they work in two-steps:

- in a first step, we reason with the Tbox alone (i.e., the ontology without the data) and some conjunctive queries;
- in the second step, we evaluate these conjunctive queries against the data in the Abox.

Separating ontology reasoning from data processing is typically a desired feature (when possible). In particular, such an approach has the practical interest that it makes it possible to use an SQL engine for the second step, thus taking advantage of well-established query optimization strategies supported by standard relational data management systems. In the first step, we deal with the Tbox only, typically of much smaller size.

In Section 8.4.4, we show by an example that DL-LITE and DL-LITE are two maximal fragments of the DL-Lite family for which reformulating queries into SQL is possible: combining constraints expressible in DL-LITE and DL-LITE may result in an infinite number of non redundant SQL reformulations for some queries.

Towards consistency checking, the first step uses the Tbox alone. It consists in computing the deductive closure of the Tbox, i.e., all the inclusion axioms that are logically entailed by the axioms declared in the Tbox. More precisely, the deductive closure (closure for short) of a DL-LITE Tbox is defined as follows.

Definition 8.4.1 (Closure of a Tbox) Let be a DL-LITEor a DL-LITETbox. The closure of , denoted by cl( ), is inductively defined as follows:

- All the statements in are also in cl( ).
- If B
_{1}⊑B_{2}and B_{2}⊑B_{3}are in cl( ), then B_{1}⊑B_{3}is in cl( ). - If R
_{1}⊑R_{2}and ∃R_{2}⊑B are in cl( ), then ∃R_{1}⊑B is in cl( ). - If R
_{1}⊑R_{2}and ∃R_{2}^{-}⊑B are in cl( ), then ∃R_{1}^{-}⊑B is in cl( ). - If R
_{1}⊑R_{2}and R_{2}⊑R_{3}are in cl( ), then R_{1}⊑R_{3}is in cl( ). - If R
_{1}⊑R_{2}is in cl( ), then R_{1}^{-}⊑R_{2}^{-}is in cl( ). - If B
_{1}⊑B_{2}and B_{2}⊑¬B_{3}(or B_{3}⊑¬B_{2}) are in cl( ), then B_{1}⊑¬B_{3}is in cl( ). - If R
_{1}⊑R_{2}and ∃R_{2}⊑¬B (or B⊑¬∃R_{2}) are in cl( ), then ∃R_{1}⊑¬B is in cl( ). - If R
_{1}⊑R_{2}and ∃R_{2}^{-}⊑¬B (or B⊑¬∃R_{2}^{-}) are in cl( ), then ∃R_{1}^{-}⊑¬B is in cl( ). - If R
_{1}⊑R_{2}and R_{2}⊑¬R_{3}(or R_{3}⊑¬R_{2}) are in cl( ), then R_{1}⊑¬R_{3}is in cl( ). - If R
_{1}⊑¬R_{2}or R_{2}⊑¬R_{1}is in cl( ), then R_{1}^{-}⊑¬R_{2}^{-}is in cl( ). -
- In the case in which is a DL-LITE Tbox, if one of the statements ∃R ⊑¬∃R or
∃R
^{-}⊑¬∃R^{-}is in cl( ), then both such statements are in cl( ). - In the case in which is a DL-LITE Tbox, if one of the statements ∃R ⊑¬∃R,
∃R
^{-}⊑¬∃R^{-}, or R⊑¬Ris in cl( ), then all three such statements are in cl( ).

- In the case in which is a DL-LITE Tbox, if one of the statements ∃R ⊑¬∃R or
∃R

Observe that although all axioms should be considered to construct this closure, only negative inclusions and key constraints can raise an inconsistency. The set of all the negative inclusions and key constraints in cl( ) is called the NI-closure. For example, consider the Tbox of Figure 8.6 made of the RDFS ontology shown in Figure 8.2 enriched with the PIs and NI shown in Figure 8.4. The NI-closure of that Tbox is shown in Figure 8.7.

DL notation | FOL notation |

AcademicStaff ⊑Staff | AcademicStaff (X) ⇒Staff (X) |

Professor ⊑AcademicStaff | Professor(X) ⇒AcademicStaff (X) |

Lecturer ⊑AcademicStaff | Lecturer(X) ⇒AcademicStaff (X) |

PhDStudent ⊑Lecturer | PhDStudent(X) ⇒Lecturer(X) |

PhDStudent ⊑Student | PhDStudent(X) ⇒Student(X) |

∃TeachesIn ⊑AcademicStaff | TeachesIn(X,Y) ⇒AcademicStaff (X) |

∃TeachesIn^{-}⊑Course | TeachesIn(X,Y) ⇒Course(Y) |

∃ResponsibleOf ⊑Professor | ResponsibleOf (X,Y) ⇒Professor(X) |

∃ResponsibleOf ^{-}⊑Course | ResponsibleOf (X,Y) ⇒Course(Y) |

∃TeachesTo ⊑AcademicStaff | TeachesTo(X,Y) ⇒AcademicStaff (X) |

∃TeachesTo^{-}⊑Student | TeachesTo(X,Y) ⇒Student(Y) |

∃Leads ⊑AdminStaff | Leads(X,Y) ⇒AdminStaff (X) |

∃Leads^{-}⊑Dept | Leads(X,Y) ⇒Dept(Y) |

∃RegisteredIn ⊑Student | RegisteredIn(X,Y) ⇒Student(X) |

∃RegisteredIn^{-}⊑Course | RegisteredIn(X,Y) ⇒Course(Y) |

ResponsibleOf ⊑TeachesIn | ResponsibleOf (X,Y) ⇒TeachesIn(X,Y) |

Professor ⊑∃TeachesIn | Professor(X) ⇒∃YTeachesIn(X,Y) |

Course ⊑∃RegisteredIn^{-} | Course(X) ⇒∃YRegisteredIn(Y,X) |

Student ⊑¬Staff | Student(X) ⇒¬Staff (X) |

This example shows that it is possible to infer an important number of new NIs. In fact, we have to compute all the consequences. But as we will see there is at most a polynomial number of consequences.

DL notation | FOL notation |

Student ⊑¬Staff | Student(X) ⇒¬Staff (X) |

PhDStudent ⊑¬Staff | PhDStudent(X) ⇒¬Staff (X) |

∃TeachesTo^{-}⊑¬Staff | TeachesTo(Y,X) ⇒¬Staff (X) |

∃RegisteredIn ⊑¬Staff | RegisteredIn(X,Y) ⇒¬Staff (X) |

Lecturer ⊑¬Student | Lecturer(X) ⇒¬Student(X) |

Lecturer ⊑¬PhDStudent | Lecturer(X) ⇒¬PhDStudent(X) |

Lecturer ⊑¬∃TeachesTo^{-} | Lecturer(X) ⇒¬∃Y[TeachesTo(Y,X)] |

Lecturer ⊑¬∃RegisteredIn | Lecturer(X) ⇒¬∃Y[RegisteredIn(X,Y)] |

Professor ⊑¬Student | Professor(X) ⇒¬Student(X) |

Professor ⊑¬PhDStudent | Professor(X) ⇒¬PhDStudent(X) |

Professor ⊑¬∃TeachesTo^{-} | Professor(X) ⇒¬∃Y[TeachesTo(Y,X)] |

Professor ⊑¬∃RegisteredIn | Professor(X) ⇒¬∃Y[RegisteredIn(X,Y)] |

AcademicStaff ⊑¬Student | AcademicStaff (X) ⇒¬Student(X) |

AcademicStaff ⊑¬PhDStudent | AcademicStaff (X) ⇒¬PhDStudent(X) |

AcademicStaff ⊑¬∃TeachesTo^{-} | AcademicStaff (X) ⇒¬∃Y[TeachesTo(Y,X)] |

AcademicStaff ⊑¬∃RegisteredIn | AcademicStaff (X) ⇒¬∃Y[RegisteredIn(X,Y)] |

Staff ⊑¬Student | Staff (X) ⇒¬Student(X) |

Staff ⊑¬PhDStudent | Staff (X) ⇒¬PhDStudent(X) |

Staff ⊑¬∃TeachesTo^{-} | Staff (X) ⇒¬∃Y[TeachesTo(Y,X)] |

Staff ⊑¬∃RegisteredIn | Staff (X) ⇒¬∃Y[RegisteredIn(X,Y)] |

∃TeachesTo ⊑¬Student | TeachesTo(X,Y) ⇒¬Student(X) |

∃TeachesTo ⊑¬PhDStudent | TeachesTo(X,Y) ⇒¬PhDStudent(X) |

∃TeachesTo ⊑¬∃TeachesTo^{-} | TeachesTo(X,Y) ⇒¬∃Z[TeachesTo(Z,X)] |

∃TeachesTo ⊑¬∃RegisteredIn | TeachesTo(X,Y) ⇒¬∃Z[RegisteredIn(X,Z)] |

∃TeachesIn ⊑¬Student | TeachesIn(X,Y) ⇒¬Student(X) |

∃TeachesIn ⊑¬PhDStudent | TeachesIn(X,Y) ⇒¬PhDStudent(X) |

∃TeachesIn ⊑¬∃TeachesTo^{-} | TeachesIn(X,Y) ⇒¬∃Z[TeachesTo(Z,X)] |

∃TeachesIn ⊑¬∃RegisteredIn | TeachesIn(X,Y) ⇒¬∃Z[RegisteredIn(X,Z)] |

We use three propositions for analyzing the consistency problem: one for evaluating the complexity of evaluating the closure and the last two for showing the logical soundness and completeness of this closure. Finally, a fourth proposition will show how to use these results for data consistency checking.

Proposition 8.4.2 (Size of the closure of a Tbox and complexity of its computation) Let be a DL-LITEor a DL-LITETbox.

- The number of statements in cl( ) is at most polynomial in the size of .
- cl( ) can be computed in polynomial time in the size of .

Proof (sketch). (1.) Follows from the form of the statements that are allowed in a DL-LITE or a DL-LITE Tbox.

For (2.), consider the items (2.) to (12.) in Definition 8.4.1. These are closure rules that are
exhaustively applied to the Tbox until saturation. Let _{0} = . For each i, let Δ_{i} be the set of
statements that can be derived from _{i} directly using the closure rules (2.) to (12.). Let
_{i+1} = _{i} ∪ Δ_{i}. Clearly, for each i, _{i} ⊆cl( ), so its size is polynomial in the size of
.

Now since the size of _{i} is polynomial in the size of , each step of the computation can clearly
be performed in PTIME. Since the number of steps is less than the number of statements in cl( ),
the entire computation can be performed in PTIME. □

The next proposition states the soundness of the closure.

Proposition 8.4.3 (Soundness of the closure of a Tbox) For each , ≡ cl( ). In other words, for each Abox satisfying a Tbox , also satisfies cl( ).

Proof (sketch). cl( )⊧ , since is included in cl( ). Clearly, the application of each closure rule
is sound. So for each i, _{i}⊧_{i+1}. By induction, = _{0}⊧_{i} for each i. Thus ⊧cl( ), so
≡cl( ). □

The next proposition establishes the completeness of the closure of a Tbox : cl( ) contains all the PIs, NIs and key constraints that are logically entailed by that Tbox (up to equivalence).

Proposition 8.4.4 (Completeness of the closure of a Tbox) Let be a DL-LITEor a DL-LITETbox. Then

- Let X⊑Y be a NI or a PI. If ⊧X⊑¬X
then X⊑¬X∈cl( ),

otherwise ⊧X⊑Y iff X⊑Y ∈cl( ) or ¬Y ⊑¬X∈cl( ).

- ⊧(funct R) iff (funct R) ∈cl( ) or ∃R⊑¬∃R∈cl( ).

Proof (sketch). We build a canonical interpretation I of the classes and properties appearing in as
follows: for each X such that X⊑¬X∈cl( ), I(X) = ∅; for the other classes or properties, we
associate a constant (respectively a pair of constants) with each class (respectively each
property) and we initialize their interpretations with those (pairs of) constants. Then, we
complete these interpretations by applying the positive inclusions in cl( ) as logical rules
in a bottom up manner until saturation. For instance, if A⊑∃P is in cl( ), from the
initial state where a is I(A), and (p_{1},p_{2}) is in I(P), we add a new constant p_{3} in the
domain of interpretation and the pair (a,p_{3}) in I(P). Now, if ∃P ⊑∃Q is also in cl( ), we
add two new constants p_{4} and p_{5} in the domain and the pairs (a,p_{4}) and (p_{1},p_{5}) to
I(Q).

Clearly, by construction, I is a model of each PI in . It is also a model of each NI X⊑¬Y in . Suppose that this not the case: there would exist a constant x which is in I(X) and in I(Y). By construction of I, it would exist a chain of positive inclusions in cl( ) between X and Y and thus X ⊑Y would be in cl( ), and therefore X ⊑¬X would be in cl( ) too, and in this case I(X) would be empty, which contradicts that I is not a model of X⊑¬Y.

To prove (1), if ⊧X ⊑¬X, in every model of , X must be empty, in particular in I. By construction of I, this means that X⊑¬X∈cl( ). Otherwise, consider a PI X⊑Y such that ⊧X ⊑Y. Since I is a model of , I(X) ⊆I(Y). By construction of I, this means that there exists a chain of positive inclusions in cl( ) between X and Y and thus X ⊑Y is in cl( ).

Consider now a NI X⊑¬Y such that neither X⊑¬Y nor Y ⊑¬X belong to cl( ). Let us define the interpretation J such that

- J(Z) = ∅ for each class or property Z appearing in the right-hand side of a NI in cl( ) of the form X⊑¬U or U ⊑¬X,
- J(A) = D (where D is the whole domain of interpretation) for the other classes, and J(P) = D×D for the other properties.

In particular J(X) = D (since X⊑¬X is not in cl( )), and J(Y) = D (since neither X⊑¬Y nor Y ⊑¬X belong to cl( )). Clearly, J is a model of , but it is not a model of X⊑¬Y. Therefore, ⊭X⊑¬Y. This ends the proof of (1).

For proving (the contraposite of) (2), we adapt the above canonical interpretation I by initializing with {(p,q),(p,r)} the interpretation of all the properties R such that neither (functR) nor ∃R⊑¬∃R belong to cl( ). And we show that the resulting interpretation I′ is a model of in which the constraints of functionality of such R is not satisfied. □

Finally, the last proposition establishes that checking consistency can be reduced to check whether the data in satisfy every NI in the closure.

Proposition 8.4.5 (Consistency checking using NI-closure) Let be a DL-LITE or a DL-LITETbox. Let be an Abox associated to . ⟨ ,⟩is unsatisfiable iff there exists a NI or a key constraint in the closure of which is violated by some facts of .

Proof (sketch). For every constant a appearing in the Abox , we define (a) as the set of facts extracted from as follows:

- if A(a) ∈, then A(a) is added in (a)
- if P(a,b) ∈, then (∃P)(a) is added in (a) and (∃P
^{-})(b) is added in (b)

We first show that if ⟨ ,⟩ is unsatisfiable, there exists a constant a such that ⟨ ,(a)⟩
is unsatisfiable. In fact we show the contrapositive: suppose that for every constant a,
⟨ ,(a)⟩ is satisfiable: for each a, there exists an interpretation I_{a} satisfying the inclusions
in and all the facts in (a). It is easy to show that the interpretation I defined on
the union of domains of interpretations as follows is a model of ⟨ ,⟩ (which is then
satisfiable):

- for every class or property X: I(X) = ⋃
_{ a}I_{a}(X) - for every constant a: I(a) = I
_{a}(a)

Then, since each (a) is a conjunction of facts of the form X(a), if ⟨ ,⟩ is unsatisfiable, there exists
a constant a such that ,X_{1}(a) ∧ ... ∧X_{n}(a) is unsatisfiable. Therefore, ,∃x(X_{1}(x) ∧ ... ∧X_{n}(x)) is
unsatisfiable. This entails: ⊧∀x(¬X_{1}(x) ∨ ... ∨¬X_{n}(x)), which in DL notation corresponds to:
⊧X_{1} ⊑¬X_{2} ⊔ ... ⊔¬X_{n}. Because of the form of the inclusion allowed in DL-LITE, there must exist i
such that ⊧X_{1} ⊑¬X_{i}. According to Proposition 8.4.4, this entails that the corresponding
NI X_{1} ⊑¬X_{i} is in the closure of and that violates it (since it includes X_{1}(a) and
X_{i}(a)).

Conversely, it is easy to show that if a NI in the closure of is violated by some facts in the Abox, then ⟨ ,⟩ is unsatisfiable. If it were not the case, since according Proposition 8.4.3 and cl( ) have the same models, there would be a model in which all the NIs in the closure of would be satisfied by the facts in . □

The second step of consistency checking, after the NI-closure is computed, does not require any further computation on the Tbox . This second step simply consists in evaluating against the Abox (seen as a relational database) a Boolean query corresponding to each negated NI in the NI-closure of the Tbox. If one of those Boolean queries is evaluated to true against , it means that some data in the Abox violates the corresponding NI, and therefore the knowledge base = ⟨ ,⟩ is inconsistent.

For example, consider the NI: ∃TeachesTo ⊑¬PhDStudent. Its corresponding FOL formula φ and its negation are:

Consider the evaluation of the q_{unsat} query against the Abox of Figure 8.1. It evaluates to true:
consider valuation ν with ν(x) = pierre, ν(y′) = paul and the facts

The transformation of NIs into Boolean queries that correspond to their negation is described in Definition 8.4.6.

Definition 8.4.6 (Transformation of NIs into Boolean queries) The transformation δ of NIs into Boolean queries corresponding to their negation is defined as follows:

This second step of consistency checking is summarized in the Consistent Algorithm
(Algorithm 2). In the algorithm, for each NI clause α, the query q_{unsat,α} is an SQL query
computing the Boolean conjunctive queries δ(α). Also, db() denotes the set in a relational
database.

Consistent( ,)

Input: a KB = ⟨ ,⟩

Output: true if is satisfiable, false otherwise

(1)q_{unsat} = ∅ (i.e., q_{unsat} is false)

(2)foreach α∈cln( ) let q_{unsat} = q_{unsat} ∪q_{unsat,α}(db())

(4)if q_{unsat} = ∅return true

(5)else return false

It is important to emphasize that this two-step approach for consistency checking does not require any inference on the data. The only inferences concern the Tbox and consist in computing the deductive closure of its axioms, from which the NI-closure (denoted cln( ) in the Algorithm) is extracted.

Consider the Abox ′ obtained from the inconsistent Abox in Figure 8.1 by deleting the fact PhDStudent(paul). The knowledge base made of the Abox ′ in Figure 8.8 and the Tbox in Figure 8.6 is consistent. (See Exercise 8.6.4.)

Subject | Predicate | Object | FOL semantics |

dupond | Leads | infoDept | Leads(dupond,infoDept) |

dupond | rdf:type | Professor | Professor(dupond) |

durand | ResponsibleOf | ue111 | ResponsibleOf(durand,ue111) |

durand | Leads | csDept | Leads(durand,csDept) |

paul | TeachesTo | pierre | TeachesTo(paul,pierre) |

pierre | EnrolledIn | infoDept | EnrolledIn(pierre, infodept) |

pierre | rdf:type | Undergrad | Undergrad(pierre) |

pierre | RegisteredTo | ue111 | Registered(pierre, ue111) |

ue111 | OfferedBy | infoDept | OfferedBy(ue111,infoDept) |

ue111 | rdf:type | CSCourse | CSCourse(ue111) |

jim | EnrolledIn | csDept | EnrolledIn(jim, csDept) |

csDept | rdf:type | TeachingDept | TeachingDept(csDept) |

In the previous section, the negative constraints played the main role. Once we know the knowledge base is consistent and move to query answering, the positive constraints take over.

Answering queries to a DL-LITE knowledge base is done in two steps. The first step is the query reformulation, which consists in translating the original query q into a set Q of queries. The second step consists in evaluating the queries in Q over the Abox (again seen as a relational database). The beauty of the approach is that this will provide the answer set. Of course, simply evaluating q over the Abox would possibly yield an incomplete answer. Completeness is achieved by the “reasoning” in the reformulation step. During this step, we access only the Tbox and not the data.

The query reformulation step is performed by the PerfectRef (Algorithm 3). It consists in reformulating the initial query by using the PIs in as rewriting rules. The intuition is that PIs are seen as logical rules that are applied in backward-chaining to query atoms in order to expand them (in a resolution style). In databases, this is called a chase.

The queries we consider, i.e., the conjunctive queries, consist of several atoms. In general, because of the existential variables, new variables are introduced in queries. So we could be lead to generate more and more queries with new variables. It turns out that we will be able to control this process and generate only a finite number of distinct queries. This is due to the limitations of the constraints allowed in the Tbox. As outlined in Section 8.4.4, as soon as we allow the combination of key constraints with inclusions of properties, we may generate an infinite number of non redundant queries.

Consider a PI rule α⇒β. Applicability of the rule to an atom of a query is defined by:

- It is applicable to an atom A(x) of a query if A occurs in β.
- It is applicable to an atom P(x
_{1},x_{2}) of a query if- α⇒β is a role inclusion assertion and P or P
^{-}occurs in β; - x
_{2}= _ and β is ∃P; - x
_{1}= _ and β is ∃P^{-}.

- α⇒β is a role inclusion assertion and P or P

As usual, _ denotes here an unbounded existential variable of a query.

The following definition defines the result gr(g,I) of the goal reduction of the atom g using the PI I, which is at the core of PerfectRef.

Definition 8.4.7 (Backward application of a PI to an atom) Let I be an inclusion assertion that is applicable to the atom g. Then, gr(g,I) is the atom defined as follows:

if | g = A(x) | and | I = A_{1} ⊑A, | then | gr(g,I) = A_{1}(x) |

if | g = A(x) | and | I = ∃P ⊑A, | then | gr(g,I) = P(x,_) |

if | g = A(x) | and | I = ∃P^{-}⊑A, | then | gr(g,I) = P(_,x) |

if | g = P(x,_) | and | I = A⊑∃P, | then | gr(g,I) = A(x) |

if | g = P(x,_) | and | I = ∃P_{1} ⊑∃P, | then | gr(g,I) = P_{1}(x,_) |

if | g = P(x,_) | and | I = ∃P_{1}^{-}⊑∃P, | then | gr(g,I) = P_{1}(_,x) |

if | g = P(_,x) | and | I = A⊑∃P^{-}, | then | gr(g,I) = A(x) |

if | g = P(_,x) | and | I = ∃P_{1} ⊑∃P^{-}, | then | gr(g,I) = P_{1}(x,_) |

if | g = P(_,x) | and | I = ∃P_{1}^{-}⊑∃P^{-}, | then | gr(g,I) = P_{1}(_,x) |

if | g = P(x_{1},x_{2}) | and | either I = P_{1} ⊑P or I = P_{1}^{-}⊑P^{-} | then | gr(g,I) = P_{1}(x_{1},x_{2}) |

if | g = P(x_{1},x_{2}) | and | either I = P_{1} ⊑P^{-}or I = P_{1}^{-}⊑P | then | gr(g,I) = P_{1}(x_{2},x_{1}) |

The subtle point of PerfectRef is the need of simplifying the produced reformulations, so that some PIs that were not applicable to a reformulation become applicable to its simplifications. A simplification amounts to unify two atoms of a reformulation using their most general unifier and then to switch the possibly new unbounded existential variables to the anonymous variable denoted _.

Let us illustrate the reformulation step of the following query using the PIs in the Tbox of Figure 8.6:

Figure 8.9 shows the result returned by PerfectRef (q(x), ).

Initial query | ||

q(x):- TeachesIn(x,y), RegisteredIn(z,y), Student(z) | ||

Reformulations | applied PI | Reformulated query |

q_{1}(x):- ResponsibleOf(x,y), RegisteredIn(z,y), Student(z) | ResponsibleOf ⊑TeachesIn | q(x) |

q_{2}(x):- TeachesIn(x,y), RegisteredIn(z,y), PhDStudent(z) | PhDStudent ⊑Student | q(x) |

q_{3}(x):- TeachesIn(x,y), RegisteredIn(z,y), TeachesTo(_,z) | ∃TeachesTo^{-}⊑Student | q(x) |

q_{4}(x):- TeachesIn(x,y), RegisteredIn(_,y) | ∃RegisteredIn ⊑Student | q(x) |

q_{5}(x):- ResponsibleOf(x,y), RegisteredIn(z,y), PhDStudent(z) | PhDStudent ⊑Student | q_{1}(x) |

q_{6}(x):- ResponsibleOf(x,y), RegisteredIn(z,y), TeachesTo(_,z) | ∃TeachesTo^{-}⊑Student | q_{1}(x) |

q_{7}(x):- ResponsibleOf(x,y), RegisteredIn(_,y), | ∃RegisteredIn ⊑Student | q_{1}(x) |

q_{8}(x):- ResponsibleOf(x,y), RegisteredIn(z,y), PhDStudent(z) | ResponsibleOf ⊑TeachesIn | q_{2}(x) |

q_{9}(x):- ResponsibleOf(x,y), RegisteredIn(z,y), TeachesTo(_,z) | ResponsibleOf ⊑TeachesIn | q_{3}(x) |

q_{10}(x):- ResponsibleOf(x,y), RegisteredIn(_,y) | ResponsibleOf ⊑TeachesIn | q_{4}(x) |

q_{11}(x):- TeachesIn(x,y), Course(y) | Course ⊑∃RegisteredIn^{-} | q_{4}(x) |

q_{12}(x):- TeachesIn(x,_) | ∃TeachesIn^{-}⊑Course | q_{11}(x) |

q_{13}(x):- ResponsibleOf(x,_) | ResponsibleOf ⊑TeachesIn | q_{12}(x) |

q_{14}(x):- Professor(x) | Professor ⊑∃TeachesIn | q_{12}(x) |

We detail here the inference chain leading to some reformulations that are particularly interesting for getting answers for q from the data in the Abox ′ of Figure 8.8. This also illustrates the need of the simplification step. The reformulation:

is obtained by:

- the backward application to the atom Student(z) of q(x) of the PI: ∃RegisteredIn ⊑
Student, which leads to the reformulation
- followed by a simplification step, consisting in unifying the two redundant atoms in the body of q′: the atom Registered(z,y) is kept instead of the atom Registered(z,_) because y is an existential variable which is bounded within the body of q′. But now, the existential variable z is unbounded within the body of q′: it is replaced by the anonymous variable _.

In turn, q_{4}(x) can be reformulated by the backward application of the PI Course ⊑∃RegisteredIn^{-}
to the atom RegisteredIn(_,y), which results in the reformulation q_{11}(x):

Then, the reformulation q_{12}(x) is produced by the backward application of the PI ∃
TeachesIn^{-}⊑Course, and the simplification by unification of the two atoms followed by the
replacement of the existential variable y, now unbounded, with the anonymous variable
_.

Finally, the reformulations q_{13}(x) and q_{14}(x) are obtained from the backward application of the
PIs ResponsibleOf ⊑TeachesIn and Professor ⊑∃TeachesIn respectively.

It is important to notice that the answers durand and dupond are obtained for the initial query q(x)
thanks to those reformulations q_{13}(x) and q_{14}(x): they would not be returned by the standard
evaluation of the query q(x) against the Abox ′ of Figure 8.8.

In the PerfectRef algorithm (Algorithm 3):

- The notation q[g/gr(g,I)] (Line 7) denotes the replacement of the atom g in the body of the query q with the result gr(g,I) of the backward application of the PI I to the atom g,
- The operator reduce(q,g,g′) (Line 10) denotes the simplification of the body of q obtained by replacing the conjunction of its two atoms g and g’ with their most general unifier (if g and g’ can be unified),
- The operator τ (Line 10) replaces in the body of a query all the possibly new unbounded existential variables with the anonymous variable denoted _.

PerfectRef (q, )

Input: a conjunctive query q and a Tbox

Output: a union of conjunctive queries: PR

(6)if a PI I ∈ is applicable to g

(9)if g_{1} et g_{2} sont unifiables

(10)PR := PR∪{τ(reduce(q,g_{1},g_{2}))}

(12)

Figure 8.9 shows the result returned by PerfectRef (q(x), ), where q(x) is the query of the
previous example, and is the Tbox of Figure 8.6. The second column makes explicit the PI used
for obtaining the corresponding reformulation. Note that equivalent reformulations can
be produced by different inferences, such as for example the reformulations q_{4}(x) and
q_{7}(x).

Although we will not prove it here, the following properties hold:

- soundness.
- All the facts computed using PerfectRef are correct query answers.
- completeness.
- All query answers are obtained.
- complexity.
- Since we touch the data only for the evaluation of FOL queries, the worst-case complexity is PTIME in the size of the Abox. The number of reformulations is PTIME in the size of the Tbox. Therefore, the complexity of evaluating a query against a DL-LITE or DL-LITE knowledge base is PTIME in the size of the knowledge base.

In this section, we exhibit an example showing that the interaction of key constraints (the specificity of DL-LITE) with inclusion constraints between properties (the specificity of DL-LITE) may lead to a reformulation of a query into an infinite number of conjunctive rewritings, each one likely to bring additional answers. This makes an algorithmic approach such as the one we used for DL-LITE and DL-LITE in isolation incomplete for query answering when DL-LITE and DL-LITE are combined together.

Consider a Tbox made of the following inclusion axioms, in which R and P are two properties and S is a class:

To see this, we observe that from the fact S(x_{1}) and the PI S ⊑∃R, it can be inferred that there
exists y such that R(x_{1},y) holds, and thus P(x_{1},y) holds too (since R⊑P). From the functionality
constraint on P and the conjunct P(x_{1},x) in the body of r_{1}, we can now infer that y = x, and
thus that R(x_{1},x) holds. Therefore, ∃zR(z,x) is logically entailed by ∃x_{1}S(x_{1}) ∧P(x_{1},x),
i.e., r_{1}(x) is contained in the query q(x), and thus is a valid reformulation of the query
q(x).

It turns out that the situation is even more subtle. Surprisingly, this reformulation r_{1}(x) is not the
only one. In fact there exists an infinite number of different reformulations for q(x). Let k ≥ 2. The
following query is a valid reformulation of q(x):

To show that r_{k}(x) is logically contained in q(x), we exploit again the axiom of functionality of P
and the inclusion axiom between R and P: from the fact S(x_{k}) and the PI S ⊑∃R, it can be inferred
that there exists y_{k} such that R(x_{k},y_{k}) holds, and thus P(x_{k},y_{k}) holds too (since R⊑P). Since P is
functional, we get: y_{k} = x_{k-1}, and thus R(x_{k},x_{k-1}) holds. Now, based on the PI ∃R^{-}⊑∃R, there
exists y_{k-1} such that R(x_{k-1},y_{k-1}) holds, and with the same reasoning as before, we get y_{k-1} = x_{k-2},
and thus R(x_{k-1},x_{k-2}) holds. By induction, we obtain that R(x_{1},x) holds, i.e., r_{k}(x) is logically
contained in the query q(x).

One can also show that for each k, there exists an Abox such that the reformulation r_{k} returns
answers that are not returned by the reformulation r_{k′} for k′<k. Thus, there exists an infinite
number of non redundant conjunctive reformulations.

It can be shown that if we combine key constraints and inclusions of properties in a restricted way, this problem can be avoided. For instance, if key constraints are forbidden on properties involved in right-hand side of an inclusion axiom, there is a finite number of non redundant conjunctive reformulations and they can be found by the PerfectRef algorithm.

The spreading of RDF data on the Web and the importance of queries for such
data is illustrated by the Billion Triple Track of the Semantic Web Challenge
^{1}. The
idea is to “do something” efficiently with one billion of RDFS triples.

RDFS. Reasoners for RDFS are available on line and can be downloaded, like for instance the Jena2 ontology API [104], which implements the forward-chaining algorithm we described. A SPARQL [175] engine included in the same programmatic Jena environment enables storing and querying data sets made of (asserted + inferred) triples. In fact, since RDFS statements are also stored as RDF triplets, SPARQL can also be used to query the schema, and not only the RDF data. For guaranteeing the completeness of the answers to a schema query, all the inference rules that we have given Figure 7.7 ( section describing RDFS) in Chapter 7 must be taken into account, and not only the subset that we considered in the forward-chaining algorithm we described. We mentioned the practical advantage of separating the computation over a Tbox from that over the Abox. This is useful also from a theoretical point of view. This gives a bound on the data complexity (the complexity in terms of the Abox only) of consistency checking and of query answering. We showed that they can be performed using FOL queries and it is known [159, 9] that evaluating FOL queries over a relational database is in LOGSPACE in the size of the database.

DL-lite. DL-LITE has been recently incorporated into the version OWL2 [176] of OWL as the profile called OWL2 QL. The proof that the PerfectRef Algorithm computes the whole answer is shown in [38]. It follows that the complexity of query answering by reformulation in these fragments of DL-LITE is polynomial in the size of the Tbox, and in LOGSPACE in the size of the Abox.

A major result in [38] is that DL-LITE and DL-LITE are two maximal fragments of the DL-Lite family supporting tractable query answering over large amounts of data. It has been shown in [38] that consistency checking and instance recognition (which a particular case of query answering), while being LOGSPACE both for DL-LITE and DL-LITE Tboxes, are PTIME-COMPLETE for the languages that combine the axioms of both (denoted DL-LITE). This complexity result shows it is unlikely that an approach based on query reformulation would provide a complete query answering algorithm for DL-LITE.

QuOnto ([13]) is a JAVA tool implementing the DL-Lite family of ontology representation languages. It permits the declaration of an ontology as a DL-LITE Tbox, the construction of an associated Abox that can be stored and as a MySQL database. The consistency checking of the resulting DL-LITE knowledge base, and query answering by reformulation are the core functionalities of QuOnto, based on the implementation in Java of the algorithms presented in this chapter.

Datalog^{+-}. Recent research [12, 11] has extended the Datalog database query language towards
query answering over ontologies. This has resulted in a unifying framework based on a family
of expressive extensions of Datalog, called Datalog^{+-}, that captures DL-LITE and
DL-LITE.

Exercise 8.6.1 With the University example, find a new query that has a different answer:

- on the RDF data vs. the RDF data together with the RDFS ontology.
- on the RDF data vs. the RDF data together with the DL-LITE ontology.

Exercise 8.6.2 Prove that the Saturation algorithm runs in 0((M×N)^{2}).

Exercise 8.6.3 Prove that the rules used for computing the TBox closure are sound.

Exercise 8.6.4 Consider Abox in Figure 8.1 and ′obtained by deleting the fact PhDStudent(paul), and the Tbox in Figure 8.6. Show that:

- The knowledge base ∪ is inconsistent.
- The knowledge base ′∪ is consistent.

Exercise 8.6.5 Give the FOL rule corresponding to the different cases of negative inclusion axioms.

^{1}http://challenge.semanticweb.org/