8 Querying Data through Ontologies

As we saw in the previous chapter, ontologies form the backbone of the Semantic Web by providing a conceptual view of data and services available worldwide via the Web. We discussed the RDF language for describing knowledge, and a family of languages, called description logics, that provide formal foundations for ontologies and in particular for the OWL ontology language recommended by the W3C.

In this chapter, the focus is on querying RDF data. Since massive volumes of RDF sources are more and more present on the Web, specifying queries for RDF data that can be evaluated efficiently is becoming every day more and more important.

We will see that the set of query answers strongly depends on the semantic context. We will study query answering when the ontology is specified in RDFS and then when the ontology is specified in DL-LITE. The ontology language DL-LITE belongs to the DL family. It has been designed as a trade-off between the ability to describe a wide range of domains of interest and query evaluation efficiency. More precisely, we will focus on two important fragments of DL-LITE, namely DL-LITE $R$ and DL-LITE $F$ .

We will observe that the problem of answering queries through ontologies is quite different from that of answering database queries for the following two essential reasons:

From this, it should be clear that query answering through ontologies is more complicated that in classical databases. We have to reason to find which inferences may participate in answering a given query. We also have to reason to verify the consistency of our knowledge.

8.2 Querying RDF data: notation and semantics

In this section, we set the stage for querying RDF data. We also discuss the impact of ontologies (knowledge on the domain of interest) on the answers. To simplify, we ignore here blank nodes.

Figure 8.1 is an enhanced version of the University example that we will use throughout this chapter. The first column provides RDF data in the triple syntax, while the second column shows the corresponding facts in FOL.

Subject	Predicate	Object	FOL semantics

dupond	Leads	infoDept	Leads(dupond,infoDept)
dupond	rdf:type	Professor	Professor(dupond)
durand	ResponsibleOf	ue111	ResponsibleOf(durand,ue111)
durand	Leads	csDept	Leads(durand,csDept)
paul	TeachesTo	pierre	TeachesTo(paul,pierre)
paul	rdf:type	PhdStudent	PhDStudent(paul)
paul	EnrolledIn	infoDept	EnrolledIn(paul, infodept)
pierre	EnrolledIn	infoDept	EnrolledIn(pierre, infodept)
pierre	rdf:type	Undergrad	Undergrad(pierre)
pierre	RegisteredTo	ue111	Registered(pierre, ue111)
ue111	OfferedBy	infoDept	OfferedBy(ue111,infoDept)
ue111	rdf:type	CSCourse	CSCourse(ue111)
jim	EnrolledIn	csDept	EnrolledIn(jim, csDept)
csDept	rdf:type	TeachingDept	TeachingDept(csDept)

Figure 8.1:

RDF triple syntax and its FOL semantics

RDF triples can be asserted in a very flexible way and almost without constraints. The association of some ontology is not a requirement. Users can update a collection of RDF statements freely by just adding/removing triples. The only reserved word in the RDF vocabulary is rdf:type that is used to relate constant names to types, i.e., classes in domain of interest or unary predicates in FOL world.

Let us now consider querying a set of RDF facts, for which the query language SPARQL has been proposed. We briefly consider it next. query language. We briefly consider it next.

SPARQL (pronounced “sparkle”) is a recursive acronym standing for SPARQL Protocol And RDF Query Language. It is a W3C recommendation as of 2008. Although it does borrow some features from XQuery (functions and operators), it is based on the graph model underlying RDF data.

For instance, the following query expresses in SPARQL the search of all the individuals who are enrolled in a department led by a Professor.

We used here an SQL-like syntax. There exists competing syntaxes for expressing SPARQL queries. The corresponding query in FOL notation is:

This is a conjunctive query, i.e., a FOL formula without negation or disjunction, of the form

In the remainder of this chapter, we use conjunctive queries as the query language for RDF. From the example, it should be clear that all we say is relevant to SPARQL. We use a (standard) simplified notation for conjunctive queries. We omit the existential quantifiers and denote the connector ∧ by a comma. Observe that this does not introduce any ambiguity. In particular, all variables not occurring in the “head of the query” (i.e., in q(x₁,...,x_m)) are understood as existentially quantified. The variables in x₁,...,x_m are said to be distinguished.

An essential issue is in the meaning of “holds” in the previous informal definition.

Inference with data only. In the simplest case, a fact R_i(ν(u_i)) holds if it is a known fact (explicitly stated in the data store). For instance, consider the previous conjunctive query in the University example. The evaluation of the query q(x) against the facts of Figure 8.1 returns {paul, pierre} as its answer set. To see why paul is an answer, we just check that, by mapping the distinguished variable to the constant paul, and the existential variables y and z respectively to the constants infoDept and dupond, all the conjuncts in the query definition are satisfied by facts in the database. The same holds if the distinguished variable x is instantiated with the constant pierre.

More inference using an ontology. Now, let us assume that a fact R_i(ν(u_i)) holds if it is a known fact or if it is a consequence of the known facts by taking the ontological statements into account. Suppose now that we also have the knowledge that someone responsible for a class has to be a professor, that is in DL syntax:

This logical implication indeed allows inferring the fact Professor(durand) from the fact ResponsibleOf(durand, ue111).

More subtly, we can get answers from partially instantiated facts that can be logically entailed by the knowledge base. Suppose that we know that a professor teaches at least one course, that is in DL syntax:

From the explicit ground fact Professor(durand) and the contraint Professor ⊑∃TeachesIn, we know that TeachesIn(durand,v) holds for some unknown value v. The valuation of v may vary in the different “worlds” satisfying the constraint. This is however sufficient to infer that answer q(durand) is true in all these possible worlds.

Formal definition of answer set. Recall that φ⊧ψ (i.e., φimplies ψ) if each interpretation making φ true also makes ψ true, or equivalently, every model of φ is a model of ψ. We next provide a formal definition of the answer set of a query for a DL knowledge base, that captures the general setting where data (the Abox

) is associated to an ontology (the Tbox

) to form a DL knowledge base

∪

. A query to the knowledge base is a conjunctive query using class or property predicates from the given knowledge base with the proper arity. (Class predicates have arity one and property predicates arity 2.) A valuation ν of a set of variables {z₁,...,z_p} is a substitution (denoted {z₁/a₁,...,z_p/a_p}) that assigns each variable z_i to a constant a_i (two distinct variables may be assigned to a same constant). Given two valuations ν and ν′ of two disjoint sets of variables {z₁,...,z_p} and {v₁,...,v_k} , ν∘ν′ denotes the valuation assigning the variables z_i to the corresponding constants in ν, and the variables v_j to the corresponding constants in ν′.

Definition 8.2.1 Let q(x₁,...,x_m) : - R₁(u₁),...,R_p(u_p) be a query to a knowledge base . An answer is a ground fact q(ν(x₁),...,ν(x_m)) for some valuation ν of the distinguished variables such that in every model of there exists a valuation ν′of the existential variables for which R_i(ν∘ν′(u_i)) is true for each i. The answer set of qfor is the set of all such answers. It is denoted q().

Consider again the previous University query example. We have seen that its answer set varies depending on the knowledge base against which it is evaluated. In particular, if

is the set of facts of Figure 8.1 and

is {∃ResponsibleOf ⊑Professor}, we have:

Boolean queries. To conclude this section, we consider a particular interesting case of queries, that of Boolean queries. The arity of a query is the number of its distinguished variables. A query of arity 0, i.e., a query of the form q() : ..., is called a Boolean query. Note that there is a single possible answer, namely q(). In this case, we see that as a positive answer to the query, i.e., as true. If the answer set is empty, we see that as false. To see an example, consider the query

This Boolean query asks whether there exists a student teaching to other students. Suppose

′ = {PhDStudent ⊑Student}. Then we can distinguish two cases:

In the second case, the fact Student(paul), although not in the Abox

, can be inferred from the fact PhDStudent(paul) in

and the inclusion statement PhDStudent ⊑Student in

′. Together with the fact TeachesTo(paul,pierre) present in the Abox, it makes the body of the query q′ satisfied.

8.3 Querying through RDFS ontologies

In this section, we consider RDF data without blank nodes (that can be seen as an Abox), associated to an RDFS ontology (that can be seen as a very simple Tbox). RDF data and RDFS statements can be denoted and stored as triples. However, the important point is that RDFS statements have a logical semantics which can be operationalized as a set of inference rules (see Section 7.3 in Chapter 7). We illustrate here how this can be used to answer queries.

Figure 8.2 is an example of an RDFS ontology that can be associated to the RDF data in Figure 8.1. The RDFS statements composing the ontology are given in three notations: the triple notation, the DL notation, and the corresponding FOL notation.

RDFS notation	DL notation	FOL notation

⟨AcademicStaff rdfs:subClassOfStaff ⟩	AcademicStaff ⊑Staff	AcademicStaff (X) ⇒Staff (X)
⟨Professor rdfs:subClassOfAcademicStaff ⟩	Professor ⊑AcademicStaff	Professor(X) ⇒AcademicStaff (X)
⟨Lecturer rdfs:subClassOfAcademicStaff ⟩	Lecturer ⊑AcademicStaff	Lecturer(X) ⇒AcademicStaff (X)
⟨PhDStudent rdfs:subClassOfLecturer⟩	PhDStudent ⊑Lecturer	PhDStudent(X) ⇒Lecturer(X)
⟨PhDStudent rdfs:subClassOfStudent⟩	PhDStudent ⊑Student	PhDStudent(X) ⇒Student(X)
⟨TeachesIn rdfs:domainAcademicStaff ⟩	∃TeachesIn ⊑AcademicStaff	TeachesIn(X,Y) ⇒AcademicStaff (X)
⟨TeachesIn rdfs:rangeCourse⟩	∃TeachesIn^-⊑Course	TeachesIn(X,Y) ⇒Course(Y)
⟨ResponsibleOf rdfs:domainProfessor⟩	∃ResponsibleOf ⊑Professor	ResponsibleOf (X,Y) ⇒Professor(X)
⟨ResponsibleOf rdfs:rangeCourse⟩	∃ResponsibleOf ^-⊑Course	ResponsibleOf (X,Y) ⇒Course(Y)
⟨TeachesTo rdfs:domainAcademicStaff ⟩	∃TeachesTo ⊑AcademicStaff	TeachesTo(X,Y) ⇒AcademicStaff (X)
⟨TeachesTo rdfs:rangeStudent⟩	∃TeachesTo^-⊑Student	TeachesTo(X,Y) ⇒Student(Y)
⟨Leads rdfs:domainAdminStaff ⟩	∃Leads ⊑AdminStaff	Leads(X,Y) ⇒AdminStaff (X)
⟨Leads rdfs:rangeDept⟩	∃Leads^-⊑Dept	Leads(X,Y) ⇒Dept(Y)
⟨RegisteredIn rdfs:domainStudent⟩	∃RegisteredIn ⊑Student	RegisteredIn(X,Y) ⇒Student(X)
⟨RegisteredIn rdfs:rangeCourse⟩	∃RegisteredIn^-⊑Course	RegisteredIn(X,Y) ⇒Course(Y)
⟨ResponsibleOf rdfs:subPropertyOfTeachesIn⟩	ResponsibleOf ⊑TeachesIn	ResponsibleOf (X,Y) ⇒TeachesIn(X,Y)

Figure 8.2:

An RDFS ontology expressed in different notations

As already said, these RDFS statements can be used to infer new triples (i.e., new facts) from the RDF database. For example, the RDF triple ⟨durandResponsibleOf ue111⟩ in Figure 8.1 corresponds to the fact ResponsibleOf (durand,ue111), and the RDFS statement ⟨ResponsibleOf rdfs:domainProfessor⟩ corresponds to the logical rule: ResponsibleOf (X,Y) ⇒Professor(X). The condition of this rule can be mapped with the fact ResponsibleOf (durand,ue111) by the substitution {X/durand, Y/ue111}, and thus the corresponding instantiation of the conclusion Professor(durand) can be inferred. This new fact can in turn trigger a rule such as Professor(X) ⇒AcademicStaff (X), thereby allowing the inference of additional facts such as AcademicStaff (durand).

More generally, RDFS statements correspond to rules that can be applied in a forward-chaining manner to the initial set of facts until saturation, i.e., until no more fact can be inferred. It is important to see that the variables in the head of rule all occur in the body. In other words, no variable is quantified existentially. So rules always infer new ground facts. Such rules are said to be safe. We will use unsafe rules when we consider DL-LITE, which will render query processing more complicated.

The simple forward-chaining Algorithm 1 starts with the set of initial facts and repeats inference steps until saturation.

Saturation(

)

Input: An Abox and an RDFS Tbox

Output: The set of facts that are inferred: Δ₀

(1)F ←

(2)Δ₀ ←

(3)repeat Δ₁ ←∅

(4) foreach rule condition ⇒conclusion in ,

(5) if there exists a substitution σ such that σ.condition ∈ Δ₀

(6) and σ.conclusion∉F

(7) add σ.conclusion to Δ₁

(8) F ←F ∪ Δ₁

(9) Δ₀ ← Δ₁

(10)until Δ₁ = ∅

Algorithm 1:

The Saturation algorithm

Figure 8.3 shows the facts resulting from the application of Algorithm 1 to the facts of Figure 8.1 and the rules of Figure 8.2.

Asserted facts	Inferred facts

Leads(dupond,infoDept)	AdminStaff(dupond)
Professor(dupond)	Dept(infoDept)
ResponsibleOf(durand,ue111)	AcademicStaff(dupond)
Leads(durand,csDept)	Professor(durand)
TeachesTo(paul,pierre)	Course(ue111)
PhDStudent(paul)	AcademicStaff(durand)
EnrolledIn(paul, infodept)	AdminStaff(durand)
EnrolledIn(pierre, infodept)	Dept(csDept)
Undergrad(pierre)	AcademicStaff(paul)
Registered(pierre, ue111)	Student(pierre)
OfferedBy(ue111,infoDept)	Student(paul)
CSCourse(ue111)	Student(pierre)
EnrolledIn(jim, csDept)	Lecturer(paul)
TeachingDept(csDept)	AcademicStaff(paul)
	Staff(paul)
	Staff(dupond)
	Staff(durand)

Figure 8.3:

Inferred facts from RDF facts and an associated RDFS ontology

To answer queries from RDF facts associated to an RDFS ontology, one can proceed as follows. First one compute all the inferred facts (in a bottom-up manner) with the previous algorithm. Each step of the loop can be computed, for instance, using a standard relational query engine. This yields a new database consisting of the set of all the facts (asserted or inferred). Then one can evaluate the query directly on that database using a standard relational query engine.

For example, the standard evaluation against the set of (asserted + inferred) facts in Figure 8.3 of the query

Complexity analysis. It is interesting to estimate both the maximum number of inferred triples and the worst-case time complexity for inferring them. Of course, this depends on the number of asserted triples (i.e., the size of the data) and also on the number of axioms in the ontology (i.e., the size of the ontology). Let M be the number of facts in the Abox and N the number of axioms in the Tbox. From the presence of some initial fact C(a), one can derive a number of new facts C′(a) for some class C′. Note that the number of such C′(a) is bounded by the number of axioms in the ontology, i.e., it is less than N. Now consider some initial fact R(a,b). From it, one can derive some facts R′(a,b) or R′(b,a) as well as some facts C′(a) or C′(b) for some R′ and C′. Again, one can observe that for a particular R(a,b), the number of new facts one can derive is bounded by the number of axioms in the ontology, i.e., it is less than N. Since the number of initial facts is M, the number of facts one can derive is bounded by M×N. Observe in particular that it is linear in the number of database facts.

Now consider the worst-case time complexity for inferring them by the Algorithm 1. We have to perform at most M×N iterations. Each iteration can be performed in polynomial time. So the algorithm is in PTIME. One can show more precisely that it is in 0((M×N)²).

8.4 Answering queries through DL-LITE ontologies

In this section, we consider two important fragments of the DL-LITE ontology language of the DL family. As we will see, querying is feasible for these two languages even though they provide a quite rich framework for describing semantics. We study querying through ontologies expressed in these two fragments.

8.4.1 DL-LITE

A DL-LITE ontology may contain axioms corresponding (up to the syntax) to those allowed in an RDFS ontology. Besides, it may contain other axioms, of three kinds: positive inclusions (PI for short), negative inclusions (NI) and key constraints (Key). Figure 8.4 shows examples of these three kinds of DL-LITE axioms with their corresponding FOL semantics. These constraints are not expressible in RDFS.

	DL notation	Corresponding logical rule

PI	Professor ⊑∃TeachesIn	Professor(X) ⇒∃YTeachesIn(X,Y)
	Course ⊑∃RegisteredIn^-	Course(X) ⇒∃YRegisteredIn(Y,X)

NI	Student ⊑¬Staff	Student(X) ⇒¬Staff (X)

Key	(functResponsibleOf ^-)	ResponsibleOf (Y,X) ∧ResponsibleOf (Z,X) ⇒Y = Z

Figure 8.4:

Examples of DL-LITE axioms not expressible in RDFS

Positive inclusion and incompleteness. A positive inclusion axiom is an expression of one of the following forms:

where P and Q denote properties and B denotes a class. Recall that P^- denotes the inverse of P, i.e., P^-(x,y) iff P(y,x) for all x,y.

Observe that expressions of the form ∃P ⊑B belong to DL-LITE since they already are in RDFS. Expressions of the form P ⊑Q (so equivalently P^-⊑Q^-) also belong to DL-LITE for the same reason.

It is important to note that the logical rules corresponding to PI axioms expressible in DL-LITE are not necessarily safe (as opposed to RDFS that uses only safe rules.) Consider the rule

Negative inclusion and inconsistencies. A negative inclusion axioms is an expression that takes one of the forms:

The corresponding logic rules are left as an exercise. An example of NI (expressing the constraint “Students do not teach courses”) and the corresponding logical rule are as follows:

NIs express disjointness constraints between classes or between properties, and thus introduce negation in the language. Therefore, the knowledge base against which the queries have to be evaluated may be inconsistent, i.e., a model of the corresponding theory may not exist. Note that this is not possible with RDFS ontologies: we showed an algorithm that computed a model (indeed the smallest model).

For example, adding the NI Student ⊑¬Staff to the ontology of Figure 8.2 leads to the inconsistency of the knowledge base made of the facts in Figure 8.1 and of the axioms in the ontology of Figure 8.2 enriched with that NI. The reason is that from the fact PhDStudent(paul), and the inclusion axiom PhdStudent ⊑Student, we can infer the fact Student(paul), and in turn the literal ¬Staff (paul) from the NI Student ⊑¬Staff . On the other hand, the fact Staff (paul) can be inferred from the fact PhDStudent(paul) and the inclusion axioms PhdStudent ⊑Lecturer, Lecturer ⊑AcademicStaff and AcademicStaff ⊑Staff .

Key constraints and more inconsistencies. Key constraints are expressed by functionality axioms of the form (functP) or (functP^-) where P is a property and P^- denotes the inverse property of P. Figure 8.5 shows their logical semantics in the form of logical rules.

DL notation	corresponding logical rule

(functP)	P(X,Y) ∧P(X,Z) ⇒Y = Z
(functP^-)	P(Y,X) ∧P(Z,X) ⇒Y = Z

Figure 8.5:

Functionality axioms expressible in DL-LITE and not in RDFS

Observe that key constraints may also lead to inconsistencies. This is the case if we attempt to equate two distinct constants, e.g., durand and dupond. For instance, the axiom (functResponsibleOf ^-) expresses that a course must have a unique professor responsible for it. Therefore, a knowledge base containing this axiom and the two facts:

would be inconsistent. This is because we assume implicitly that an individual is denoted by a single constant. This natural (in practice) assumption is called in logic the unique name assumption.

One may wonder why one would choose such a convoluted language. Why not simply extend RDFS with the 3 kinds of axioms? This is because functional constraints interact with inclusion constraints in intricate ways. Query evaluation when they are all present is much more complex. This will be illustrated by an example in Section 8.4.4.

From the previous discussion, there are two fundamental differences between query answering in the context of RDFS and of DL-LITE knowledge bases:

In Section 8.4.2, we show an algorithm for checking consistency of a DL-LITE knowledge base, and in Section 8.4.3 an algorithm for answering conjunctive queries posed to a DL-LITE knowledge base. The particularity of these two algorithms is that they work in two-steps:

Separating ontology reasoning from data processing is typically a desired feature (when possible). In particular, such an approach has the practical interest that it makes it possible to use an SQL engine for the second step, thus taking advantage of well-established query optimization strategies supported by standard relational data management systems. In the first step, we deal with the Tbox only, typically of much smaller size.

In Section 8.4.4, we show by an example that DL-LITE $R$ and DL-LITE $F$ are two maximal fragments of the DL-Lite family for which reformulating queries into SQL is possible: combining constraints expressible in DL-LITE $R$ and DL-LITE $F$ may result in an infinite number of non redundant SQL reformulations for some queries.

8.4.2 Consistency checking

Towards consistency checking, the first step uses the Tbox alone. It consists in computing the deductive closure of the Tbox, i.e., all the inclusion axioms that are logically entailed by the axioms declared in the Tbox. More precisely, the deductive closure (closure for short) of a DL-LITE Tbox is defined as follows.

Definition 8.4.1 (Closure of a Tbox) Let be a DL-LITE $F$ or a DL-LITE $R$ Tbox. The closure of , denoted by cl( ), is inductively defined as follows:

All the statements in are also in cl( ).
If B₁ ⊑B₂ and B₂ ⊑B₃ are in cl( ), then B₁ ⊑B₃ is in cl( ).
If R₁ ⊑R₂ and ∃R₂ ⊑B are in cl( ), then ∃R₁ ⊑B is in cl( ).
If R₁ ⊑R₂ and ∃R₂^-⊑B are in cl( ), then ∃R₁^-⊑B is in cl( ).
If R₁ ⊑R₂ and R₂ ⊑R₃ are in cl( ), then R₁ ⊑R₃ is in cl( ).
If R₁ ⊑R₂ is in cl( ), then R₁^-⊑R₂^-is in cl( ).
If B₁ ⊑B₂ and B₂ ⊑¬B₃ (or B₃ ⊑¬B₂) are in cl( ), then B₁ ⊑¬B₃ is in cl( ).
If R₁ ⊑R₂ and ∃R₂ ⊑¬B (or B⊑¬∃R₂) are in cl( ), then ∃R₁ ⊑¬B is in cl( ).
If R₁ ⊑R₂ and ∃R₂^-⊑¬B (or B⊑¬∃R₂^-) are in cl( ), then ∃R₁^-⊑¬B is in cl( ).
If R₁ ⊑R₂ and R₂ ⊑¬R₃ (or R₃ ⊑¬R₂) are in cl( ), then R₁ ⊑¬R₃ is in cl( ).
If R₁ ⊑¬R₂ or R₂ ⊑¬R₁ is in cl( ), then R₁^-⊑¬R₂^-is in cl( ).
1. In the case in which is a DL-LITE $F$ Tbox, if one of the statements ∃R ⊑¬∃R or ∃R^-⊑¬∃R^-is in cl( ), then both such statements are in cl( ).
2. In the case in which is a DL-LITE $R$ Tbox, if one of the statements ∃R ⊑¬∃R, ∃R^-⊑¬∃R^-, or R⊑¬Ris in cl( ), then all three such statements are in cl( ).

Observe that although all axioms should be considered to construct this closure, only negative inclusions and key constraints can raise an inconsistency. The set of all the negative inclusions and key constraints in cl(

) is called the NI-closure. For example, consider the Tbox of Figure 8.6 made of the RDFS ontology shown in Figure 8.2 enriched with the PIs and NI shown in Figure 8.4. The NI-closure of that Tbox is shown in Figure 8.7.

DL notation	FOL notation

AcademicStaff ⊑Staff	AcademicStaff (X) ⇒Staff (X)
Professor ⊑AcademicStaff	Professor(X) ⇒AcademicStaff (X)
Lecturer ⊑AcademicStaff	Lecturer(X) ⇒AcademicStaff (X)
PhDStudent ⊑Lecturer	PhDStudent(X) ⇒Lecturer(X)
PhDStudent ⊑Student	PhDStudent(X) ⇒Student(X)
∃TeachesIn ⊑AcademicStaff	TeachesIn(X,Y) ⇒AcademicStaff (X)
∃TeachesIn^-⊑Course	TeachesIn(X,Y) ⇒Course(Y)
∃ResponsibleOf ⊑Professor	ResponsibleOf (X,Y) ⇒Professor(X)
∃ResponsibleOf ^-⊑Course	ResponsibleOf (X,Y) ⇒Course(Y)
∃TeachesTo ⊑AcademicStaff	TeachesTo(X,Y) ⇒AcademicStaff (X)
∃TeachesTo^-⊑Student	TeachesTo(X,Y) ⇒Student(Y)
∃Leads ⊑AdminStaff	Leads(X,Y) ⇒AdminStaff (X)
∃Leads^-⊑Dept	Leads(X,Y) ⇒Dept(Y)
∃RegisteredIn ⊑Student	RegisteredIn(X,Y) ⇒Student(X)
∃RegisteredIn^-⊑Course	RegisteredIn(X,Y) ⇒Course(Y)
ResponsibleOf ⊑TeachesIn	ResponsibleOf (X,Y) ⇒TeachesIn(X,Y)
Professor ⊑∃TeachesIn	Professor(X) ⇒∃YTeachesIn(X,Y)
Course ⊑∃RegisteredIn^-	Course(X) ⇒∃YRegisteredIn(Y,X)

Student ⊑¬Staff	Student(X) ⇒¬Staff (X)

Figure 8.6:

A DL-LITE Tbox

This example shows that it is possible to infer an important number of new NIs. In fact, we have to compute all the consequences. But as we will see there is at most a polynomial number of consequences.

DL notation	FOL notation

Student ⊑¬Staff	Student(X) ⇒¬Staff (X)
PhDStudent ⊑¬Staff	PhDStudent(X) ⇒¬Staff (X)
∃TeachesTo^-⊑¬Staff	TeachesTo(Y,X) ⇒¬Staff (X)
∃RegisteredIn ⊑¬Staff	RegisteredIn(X,Y) ⇒¬Staff (X)
Lecturer ⊑¬Student	Lecturer(X) ⇒¬Student(X)
Lecturer ⊑¬PhDStudent	Lecturer(X) ⇒¬PhDStudent(X)
Lecturer ⊑¬∃TeachesTo^-	Lecturer(X) ⇒¬∃Y[TeachesTo(Y,X)]
Lecturer ⊑¬∃RegisteredIn	Lecturer(X) ⇒¬∃Y[RegisteredIn(X,Y)]
Professor ⊑¬Student	Professor(X) ⇒¬Student(X)
Professor ⊑¬PhDStudent	Professor(X) ⇒¬PhDStudent(X)
Professor ⊑¬∃TeachesTo^-	Professor(X) ⇒¬∃Y[TeachesTo(Y,X)]
Professor ⊑¬∃RegisteredIn	Professor(X) ⇒¬∃Y[RegisteredIn(X,Y)]
AcademicStaff ⊑¬Student	AcademicStaff (X) ⇒¬Student(X)
AcademicStaff ⊑¬PhDStudent	AcademicStaff (X) ⇒¬PhDStudent(X)
AcademicStaff ⊑¬∃TeachesTo^-	AcademicStaff (X) ⇒¬∃Y[TeachesTo(Y,X)]
AcademicStaff ⊑¬∃RegisteredIn	AcademicStaff (X) ⇒¬∃Y[RegisteredIn(X,Y)]
Staff ⊑¬Student	Staff (X) ⇒¬Student(X)
Staff ⊑¬PhDStudent	Staff (X) ⇒¬PhDStudent(X)
Staff ⊑¬∃TeachesTo^-	Staff (X) ⇒¬∃Y[TeachesTo(Y,X)]
Staff ⊑¬∃RegisteredIn	Staff (X) ⇒¬∃Y[RegisteredIn(X,Y)]
∃TeachesTo ⊑¬Student	TeachesTo(X,Y) ⇒¬Student(X)
∃TeachesTo ⊑¬PhDStudent	TeachesTo(X,Y) ⇒¬PhDStudent(X)
∃TeachesTo ⊑¬∃TeachesTo^-	TeachesTo(X,Y) ⇒¬∃Z[TeachesTo(Z,X)]
∃TeachesTo ⊑¬∃RegisteredIn	TeachesTo(X,Y) ⇒¬∃Z[RegisteredIn(X,Z)]
∃TeachesIn ⊑¬Student	TeachesIn(X,Y) ⇒¬Student(X)
∃TeachesIn ⊑¬PhDStudent	TeachesIn(X,Y) ⇒¬PhDStudent(X)
∃TeachesIn ⊑¬∃TeachesTo^-	TeachesIn(X,Y) ⇒¬∃Z[TeachesTo(Z,X)]
∃TeachesIn ⊑¬∃RegisteredIn	TeachesIn(X,Y) ⇒¬∃Z[RegisteredIn(X,Z)]

Figure 8.7:

The NI-closure of the Tbox in Figure 8.6

We use three propositions for analyzing the consistency problem: one for evaluating the complexity of evaluating the closure and the last two for showing the logical soundness and completeness of this closure. Finally, a fourth proposition will show how to use these results for data consistency checking.

Proof (sketch). (1.) Follows from the form of the statements that are allowed in a DL-LITE $F$ or a DL-LITE $R$ Tbox.

For (2.), consider the items (2.) to (12.) in Definition 8.4.1. These are closure rules that are exhaustively applied to the Tbox until saturation. Let

₀ =

. For each i, let Δ_i be the set of statements that can be derived from

_i directly using the closure rules (2.) to (12.). Let

_i+1 =

_i ∪ Δ_i. Clearly, for each i,

_i ⊆cl(

), so its size is polynomial in the size of

Now since the size of

_i is polynomial in the size of

, each step of the computation can clearly be performed in PTIME. Since the number of steps is less than the number of statements in cl(

), the entire computation can be performed in PTIME. □

Proof (sketch). cl(

)⊧

, since

is included in cl(

). Clearly, the application of each closure rule is sound. So for each i,

_i⊧

_i+1. By induction,

₀⊧

_i for each i. Thus

⊧cl(

), so

≡cl(

). □

The next proposition establishes the completeness of the closure of a Tbox

: cl(

) contains all the PIs, NIs and key constraints that are logically entailed by that Tbox (up to equivalence).

Proposition 8.4.4 (Completeness of the closure of a Tbox) Let be a DL-LITE $F$ or a DL-LITE $R$ Tbox. Then

Let X⊑Y be a NI or a PI. If ⊧X⊑¬X
then X⊑¬X∈cl( ),
otherwise ⊧X⊑Y iff X⊑Y ∈cl( ) or ¬Y ⊑¬X∈cl( ).
⊧(funct R) iff (funct R) ∈cl( ) or ∃R⊑¬∃R∈cl( ).

Proof (sketch). We build a canonical interpretation I of the classes and properties appearing in

as follows: for each X such that X⊑¬X∈cl(

), I(X) = ∅; for the other classes or properties, we associate a constant (respectively a pair of constants) with each class (respectively each property) and we initialize their interpretations with those (pairs of) constants. Then, we complete these interpretations by applying the positive inclusions in cl(

) as logical rules in a bottom up manner until saturation. For instance, if A⊑∃P is in cl(

), from the initial state where a is I(A), and (p₁,p₂) is in I(P), we add a new constant p₃ in the domain of interpretation and the pair (a,p₃) in I(P). Now, if ∃P ⊑∃Q is also in cl(

), we add two new constants p₄ and p₅ in the domain and the pairs (a,p₄) and (p₁,p₅) to I(Q).

Clearly, by construction, I is a model of each PI in

. It is also a model of each NI X⊑¬Y in

. Suppose that this not the case: there would exist a constant x which is in I(X) and in I(Y). By construction of I, it would exist a chain of positive inclusions in cl(

) between X and Y and thus X ⊑Y would be in cl(

), and therefore X ⊑¬X would be in cl(

) too, and in this case I(X) would be empty, which contradicts that I is not a model of X⊑¬Y.

To prove (1), if

⊧X ⊑¬X, in every model of

, X must be empty, in particular in I. By construction of I, this means that X⊑¬X∈cl(

). Otherwise, consider a PI X⊑Y such that

⊧X ⊑Y. Since I is a model of

, I(X) ⊆I(Y). By construction of I, this means that there exists a chain of positive inclusions in cl(

) between X and Y and thus X ⊑Y is in cl(

Consider now a NI X⊑¬Y such that neither X⊑¬Y nor Y ⊑¬X belong to cl(

). Let us define the interpretation J such that

In particular J(X) = D (since X⊑¬X is not in cl(

)), and J(Y) = D (since neither X⊑¬Y nor Y ⊑¬X belong to cl(

)). Clearly, J is a model of

, but it is not a model of X⊑¬Y. Therefore,

⊭X⊑¬Y. This ends the proof of (1).

For proving (the contraposite of) (2), we adapt the above canonical interpretation I by initializing with {(p,q),(p,r)} the interpretation of all the properties R such that neither (functR) nor ∃R⊑¬∃R belong to cl(

). And we show that the resulting interpretation I′ is a model of

in which the constraints of functionality of such R is not satisfied. □

Finally, the last proposition establishes that checking consistency can be reduced to check whether the data in

satisfy every NI in the closure.

Proof (sketch). For every constant a appearing in the Abox

, we define

(a) as the set of facts extracted from

as follows:

We first show that if ⟨

⟩ is unsatisfiable, there exists a constant a such that ⟨

(a)⟩ is unsatisfiable. In fact we show the contrapositive: suppose that for every constant a, ⟨

(a)⟩ is satisfiable: for each a, there exists an interpretation I_a satisfying the inclusions in

and all the facts in

(a). It is easy to show that the interpretation I defined on the union of domains of interpretations as follows is a model of ⟨

⟩ (which is then satisfiable):

Then, since each

(a) is a conjunction of facts of the form X(a), if ⟨

⟩ is unsatisfiable, there exists a constant a such that

,X₁(a) ∧ ... ∧X_n(a) is unsatisfiable. Therefore,

,∃x(X₁(x) ∧ ... ∧X_n(x)) is unsatisfiable. This entails:

⊧∀x(¬X₁(x) ∨ ... ∨¬X_n(x)), which in DL notation corresponds to:

⊧X₁ ⊑¬X₂ ⊔ ... ⊔¬X_n. Because of the form of the inclusion allowed in DL-LITE, there must exist i such that

⊧X₁ ⊑¬X_i. According to Proposition 8.4.4, this entails that the corresponding NI X₁ ⊑¬X_i is in the closure of

and that

violates it (since it includes X₁(a) and X_i(a)).

Conversely, it is easy to show that if a NI in the closure of

is violated by some facts in the Abox, then ⟨

⟩ is unsatisfiable. If it were not the case, since according Proposition 8.4.3

and cl(

) have the same models, there would be a model in which all the NIs in the closure of

would be satisfied by the facts in

. □

The second step of consistency checking, after the NI-closure is computed, does not require any further computation on the Tbox

. This second step simply consists in evaluating against the Abox

(seen as a relational database) a Boolean query corresponding to each negated NI in the NI-closure of the Tbox. If one of those Boolean queries is evaluated to true against

, it means that some data in the Abox

violates the corresponding NI, and therefore the knowledge base

= ⟨

⟩ is inconsistent.

For example, consider the NI: ∃TeachesTo ⊑¬PhDStudent. Its corresponding FOL formula φ and its negation are:

Consider the evaluation of the q_unsat query against the Abox

of Figure 8.1. It evaluates to true: consider valuation ν with ν(x) = pierre, ν(y′) = paul and the facts

The transformation of NIs into Boolean queries that correspond to their negation is described in Definition 8.4.6.

Definition 8.4.6 (Transformation of NIs into Boolean queries) The transformation δ of NIs into Boolean queries corresponding to their negation is defined as follows:

δ(B 1 ⊑ ¬ B2) = qunsat :- γ 1(x ),γ2(x) such that γi(x )= Ai(x) if Bi = Ai γi(x )= Pi(x,yi) if Bi = ∃Pi γ (x )= P (y ,x) if B = ∃P - i i i i i δ(R 1 ⊑ ¬ R2) = qunsat :- ρ1(x,y) , ρ2(x,y) such that ρi(x,y) = Pi(x,y ) if Ri = Pi- ρi(x,y) = Pi(y,x ) if Ri = P i δ((functP)) = qunsat :- P (x ,y) , P(x,z) , y⁄= z δ((functP-)) = qunsat :- P (x ,y) , P(z,y) , x⁄= z

This second step of consistency checking is summarized in the Consistent Algorithm (Algorithm 2). In the algorithm, for each NI clause α, the query q_unsat,α is an SQL query computing the Boolean conjunctive queries δ(α). Also, db(

) denotes the

set in a relational database.

Consistent(

)

Input: a KB = ⟨ ,⟩

Output: true if is satisfiable, false otherwise

(1)q_unsat = ∅ (i.e., q_unsat is false)

(2)foreach α∈cln( ) let q_unsat = q_unsat ∪q_unsat,α(db())

(3)

(4)if q_unsat = ∅return true

(5)else return false

Algorithm 2:

The Consistent algorithm

It is important to emphasize that this two-step approach for consistency checking does not require any inference on the data. The only inferences concern the Tbox and consist in computing the deductive closure of its axioms, from which the NI-closure (denoted cln(

) in the Algorithm) is extracted.

Consider the Abox

′ obtained from the inconsistent Abox

in Figure 8.1 by deleting the fact PhDStudent(paul). The knowledge base made of the Abox

′ in Figure 8.8 and the Tbox

in Figure 8.6 is consistent. (See Exercise 8.6.4.)

Subject	Predicate	Object	FOL semantics

dupond	Leads	infoDept	Leads(dupond,infoDept)
dupond	rdf:type	Professor	Professor(dupond)
durand	ResponsibleOf	ue111	ResponsibleOf(durand,ue111)
durand	Leads	csDept	Leads(durand,csDept)
paul	TeachesTo	pierre	TeachesTo(paul,pierre)
pierre	EnrolledIn	infoDept	EnrolledIn(pierre, infodept)
pierre	rdf:type	Undergrad	Undergrad(pierre)
pierre	RegisteredTo	ue111	Registered(pierre, ue111)
ue111	OfferedBy	infoDept	OfferedBy(ue111,infoDept)
ue111	rdf:type	CSCourse	CSCourse(ue111)
jim	EnrolledIn	csDept	EnrolledIn(jim, csDept)
csDept	rdf:type	TeachingDept	TeachingDept(csDept)

Figure 8.8:

′: an Abox consistent w.r.t the Tbox of Figure 8.6

8.4.3 Answer set evaluation

In the previous section, the negative constraints played the main role. Once we know the knowledge base is consistent and move to query answering, the positive constraints take over.

Answering queries to a DL-LITE knowledge base is done in two steps. The first step is the query reformulation, which consists in translating the original query q into a set Q of queries. The second step consists in evaluating the queries in Q over the Abox (again seen as a relational database). The beauty of the approach is that this will provide the answer set. Of course, simply evaluating q over the Abox would possibly yield an incomplete answer. Completeness is achieved by the “reasoning” in the reformulation step. During this step, we access only the Tbox and not the data.

The query reformulation step is performed by the PerfectRef (Algorithm 3). It consists in reformulating the initial query by using the PIs in

as rewriting rules. The intuition is that PIs are seen as logical rules that are applied in backward-chaining to query atoms in order to expand them (in a resolution style). In databases, this is called a chase.

The queries we consider, i.e., the conjunctive queries, consist of several atoms. In general, because of the existential variables, new variables are introduced in queries. So we could be lead to generate more and more queries with new variables. It turns out that we will be able to control this process and generate only a finite number of distinct queries. This is due to the limitations of the constraints allowed in the Tbox. As outlined in Section 8.4.4, as soon as we allow the combination of key constraints with inclusions of properties, we may generate an infinite number of non redundant queries.

Consider a PI rule α⇒β. Applicability of the rule to an atom of a query is defined by:

The following definition defines the result gr(g,I) of the goal reduction of the atom g using the PI I, which is at the core of PerfectRef.

Definition 8.4.7 (Backward application of a PI to an atom) Let I be an inclusion assertion that is applicable to the atom g. Then, gr(g,I) is the atom defined as follows:

if	g = A(x)	and	I = A₁ ⊑A,	then	gr(g,I) = A₁(x)
if	g = A(x)	and	I = ∃P ⊑A,	then	gr(g,I) = P(x,_)
if	g = A(x)	and	I = ∃P^-⊑A,	then	gr(g,I) = P(_,x)
if	g = P(x,_)	and	I = A⊑∃P,	then	gr(g,I) = A(x)
if	g = P(x,_)	and	I = ∃P₁ ⊑∃P,	then	gr(g,I) = P₁(x,_)
if	g = P(x,_)	and	I = ∃P₁^-⊑∃P,	then	gr(g,I) = P₁(_,x)
if	g = P(_,x)	and	I = A⊑∃P^-,	then	gr(g,I) = A(x)
if	g = P(_,x)	and	I = ∃P₁ ⊑∃P^-,	then	gr(g,I) = P₁(x,_)
if	g = P(_,x)	and	I = ∃P₁^-⊑∃P^-,	then	gr(g,I) = P₁(_,x)
if	g = P(x₁,x₂)	and	either I = P₁ ⊑P or I = P₁^-⊑P^-	then	gr(g,I) = P₁(x₁,x₂)
if	g = P(x₁,x₂)	and	either I = P₁ ⊑P^-or I = P₁^-⊑P	then	gr(g,I) = P₁(x₂,x₁)

The subtle point of PerfectRef is the need of simplifying the produced reformulations, so that some PIs that were not applicable to a reformulation become applicable to its simplifications. A simplification amounts to unify two atoms of a reformulation using their most general unifier and then to switch the possibly new unbounded existential variables to the anonymous variable denoted _.

Let us illustrate the reformulation step of the following query using the PIs in the Tbox

of Figure 8.6:

Initial query
q(x):- TeachesIn(x,y), RegisteredIn(z,y), Student(z)

Reformulations	applied PI	Reformulated query

q₁(x):- ResponsibleOf(x,y), RegisteredIn(z,y), Student(z)	ResponsibleOf ⊑TeachesIn	q(x)
q₂(x):- TeachesIn(x,y), RegisteredIn(z,y), PhDStudent(z)	PhDStudent ⊑Student	q(x)
q₃(x):- TeachesIn(x,y), RegisteredIn(z,y), TeachesTo(_,z)	∃TeachesTo^-⊑Student	q(x)
q₄(x):- TeachesIn(x,y), RegisteredIn(_,y)	∃RegisteredIn ⊑Student	q(x)
q₅(x):- ResponsibleOf(x,y), RegisteredIn(z,y), PhDStudent(z)	PhDStudent ⊑Student	q₁(x)
q₆(x):- ResponsibleOf(x,y), RegisteredIn(z,y), TeachesTo(_,z)	∃TeachesTo^-⊑Student	q₁(x)
q₇(x):- ResponsibleOf(x,y), RegisteredIn(_,y),	∃RegisteredIn ⊑Student	q₁(x)
q₈(x):- ResponsibleOf(x,y), RegisteredIn(z,y), PhDStudent(z)	ResponsibleOf ⊑TeachesIn	q₂(x)
q₉(x):- ResponsibleOf(x,y), RegisteredIn(z,y), TeachesTo(_,z)	ResponsibleOf ⊑TeachesIn	q₃(x)
q₁₀(x):- ResponsibleOf(x,y), RegisteredIn(_,y)	ResponsibleOf ⊑TeachesIn	q₄(x)
q₁₁(x):- TeachesIn(x,y), Course(y)	Course ⊑∃RegisteredIn^-	q₄(x)
q₁₂(x):- TeachesIn(x,_)	∃TeachesIn^-⊑Course	q₁₁(x)
q₁₃(x):- ResponsibleOf(x,_)	ResponsibleOf ⊑TeachesIn	q₁₂(x)
q₁₄(x):- Professor(x)	Professor ⊑∃TeachesIn	q₁₂(x)

Figure 8.9:

A query and its reformulations obtained by PerfectRef applied to the Tbox of Figure 8.6

We detail here the inference chain leading to some reformulations that are particularly interesting for getting answers for q from the data in the Abox

′ of Figure 8.8. This also illustrates the need of the simplification step. The reformulation:

In turn, q₄(x) can be reformulated by the backward application of the PI Course ⊑∃RegisteredIn^- to the atom RegisteredIn(_,y), which results in the reformulation q₁₁(x):

Then, the reformulation q₁₂(x) is produced by the backward application of the PI ∃ TeachesIn^-⊑Course, and the simplification by unification of the two atoms followed by the replacement of the existential variable y, now unbounded, with the anonymous variable _.

Finally, the reformulations q₁₃(x) and q₁₄(x) are obtained from the backward application of the PIs ResponsibleOf ⊑TeachesIn and Professor ⊑∃TeachesIn respectively.

It is important to notice that the answers durand and dupond are obtained for the initial query q(x) thanks to those reformulations q₁₃(x) and q₁₄(x): they would not be returned by the standard evaluation of the query q(x) against the Abox

′ of Figure 8.8.

PerfectRef (q,

)

Input: a conjunctive query q and a Tbox

Output: a union of conjunctive queries: PR

(1)PR := {q}

(2)repeat

(3)PR′ := PR

(4)foreach q∈PR′

(5)(a) foreach g∈q

(6)if a PI I ∈ is applicable to g

(7)PR := PR∪{q[g/gr(g,I)]}

(8)(b) foreach g₁,g₂ ∈q

(9)if g₁ et g₂ sont unifiables

(10)PR := PR∪{τ(reduce(q,g₁,g₂))}

(11)until PR′ = PR

(12)

Algorithm 3:

The PerfectRef algorithm

Figure 8.9 shows the result returned by PerfectRef (q(x),

), where q(x) is the query of the previous example, and

is the Tbox of Figure 8.6. The second column makes explicit the PI used for obtaining the corresponding reformulation. Note that equivalent reformulations can be produced by different inferences, such as for example the reformulations q₄(x) and q₇(x).

8.4.4 Impact of combining DL-LITE $R$ and DL-LITE $F$ on query answering

In this section, we exhibit an example showing that the interaction of key constraints (the specificity of DL-LITE $F$ ) with inclusion constraints between properties (the specificity of DL-LITE $R$ ) may lead to a reformulation of a query into an infinite number of conjunctive rewritings, each one likely to bring additional answers. This makes an algorithmic approach such as the one we used for DL-LITE $R$ and DL-LITE $F$ in isolation incomplete for query answering when DL-LITE $R$ and DL-LITE $F$ are combined together.

Consider a Tbox made of the following inclusion axioms, in which R and P are two properties and S is a class:

To see this, we observe that from the fact S(x₁) and the PI S ⊑∃R, it can be inferred that there exists y such that R(x₁,y) holds, and thus P(x₁,y) holds too (since R⊑P). From the functionality constraint on P and the conjunct P(x₁,x) in the body of r₁, we can now infer that y = x, and thus that R(x₁,x) holds. Therefore, ∃zR(z,x) is logically entailed by ∃x₁S(x₁) ∧P(x₁,x), i.e., r₁(x) is contained in the query q(x), and thus is a valid reformulation of the query q(x).

It turns out that the situation is even more subtle. Surprisingly, this reformulation r₁(x) is not the only one. In fact there exists an infinite number of different reformulations for q(x). Let k ≥ 2. The following query is a valid reformulation of q(x):

To show that r_k(x) is logically contained in q(x), we exploit again the axiom of functionality of P and the inclusion axiom between R and P: from the fact S(x_k) and the PI S ⊑∃R, it can be inferred that there exists y_k such that R(x_k,y_k) holds, and thus P(x_k,y_k) holds too (since R⊑P). Since P is functional, we get: y_k = x_k-1, and thus R(x_k,x_k-1) holds. Now, based on the PI ∃R^-⊑∃R, there exists y_k-1 such that R(x_k-1,y_k-1) holds, and with the same reasoning as before, we get y_k-1 = x_k-2, and thus R(x_k-1,x_k-2) holds. By induction, we obtain that R(x₁,x) holds, i.e., r_k(x) is logically contained in the query q(x).

One can also show that for each k, there exists an Abox such that the reformulation r_k returns answers that are not returned by the reformulation r_k′ for k′<k. Thus, there exists an infinite number of non redundant conjunctive reformulations.

It can be shown that if we combine key constraints and inclusions of properties in a restricted way, this problem can be avoided. For instance, if key constraints are forbidden on properties involved in right-hand side of an inclusion axiom, there is a finite number of non redundant conjunctive reformulations and they can be found by the PerfectRef algorithm.

8.5 Further reading

The spreading of RDF data on the Web and the importance of queries for such data is illustrated by the Billion Triple Track of the Semantic Web Challenge ¹. The idea is to “do something” efficiently with one billion of RDFS triples.

RDFS. Reasoners for RDFS are available on line and can be downloaded, like for instance the Jena2 ontology API [104], which implements the forward-chaining algorithm we described. A SPARQL [175] engine included in the same programmatic Jena environment enables storing and querying data sets made of (asserted + inferred) triples. In fact, since RDFS statements are also stored as RDF triplets, SPARQL can also be used to query the schema, and not only the RDF data. For guaranteeing the completeness of the answers to a schema query, all the inference rules that we have given Figure 7.7 ( section describing RDFS) in Chapter 7 must be taken into account, and not only the subset that we considered in the forward-chaining algorithm we described. We mentioned the practical advantage of separating the computation over a Tbox from that over the Abox. This is useful also from a theoretical point of view. This gives a bound on the data complexity (the complexity in terms of the Abox only) of consistency checking and of query answering. We showed that they can be performed using FOL queries and it is known [159, 9] that evaluating FOL queries over a relational database is in LOGSPACE in the size of the database.

DL-lite. DL-LITE $R$ has been recently incorporated into the version OWL2 [176] of OWL as the profile called OWL2 QL. The proof that the PerfectRef Algorithm computes the whole answer is shown in [38]. It follows that the complexity of query answering by reformulation in these fragments of DL-LITE is polynomial in the size of the Tbox, and in LOGSPACE in the size of the Abox.

A major result in [38] is that DL-LITE $R$ and DL-LITE $F$ are two maximal fragments of the DL-Lite family supporting tractable query answering over large amounts of data. It has been shown in [38] that consistency checking and instance recognition (which a particular case of query answering), while being LOGSPACE both for DL-LITE $R$ and DL-LITE $F$ Tboxes, are PTIME-COMPLETE for the languages that combine the axioms of both (denoted DL-LITE

). This complexity result shows it is unlikely that an approach based on query reformulation would provide a complete query answering algorithm for DL-LITE

QuOnto ([13]) is a JAVA tool implementing the DL-Lite family of ontology representation languages. It permits the declaration of an ontology as a DL-LITE Tbox, the construction of an associated Abox that can be stored and as a MySQL database. The consistency checking of the resulting DL-LITE knowledge base, and query answering by reformulation are the core functionalities of QuOnto, based on the implementation in Java of the algorithms presented in this chapter.

Datalog^+-. Recent research [12, 11] has extended the Datalog database query language towards query answering over ontologies. This has resulted in a unifying framework based on a family of expressive extensions of Datalog, called Datalog^+-, that captures DL-LITE $R$ and DL-LITE $F$ .

¹http://challenge.semanticweb.org/

DL notation	Corresponding logical rule

B ⊑∃P	B(X) ⇒∃YP(X,Y)

∃Q ⊑∃P	Q(X,Y) ⇒∃ZP(X,Z)

B ⊑∃P^-	B(X) ⇒∃YP(Y,X)
∃Q ⊑∃P^-	Q(X,Y) ⇒∃ZP(Z,X)

P ⊑Q^- or P^-⊑Q	P(X,Y) ⇒Q(Y,X)

DL notation	Corresponding logical rule

Student ⊑¬∃TeachesIn	Student(X) ⇒¬∃YTeachesIn(X,Y)
	or equivalently, ∃YTeachesIn(X,Y) ⇒¬Student(X)