[
next
] [
prev
] [
prev-tail
] [
tail
] [
up
]
Contents
Introduction
I
Modeling Web Data
1
Data Model
1.1
Semistructured data
1.2
XML
1.2.1
XML documents
1.2.2
Serialized and tree-based forms
1.2.3
XML syntax
1.2.4
Typing and namespaces
1.2.5
To type or not to type
1.3
Web Data Management with XML
1.3.1
Data exchange
1.3.2
Data integration
1.4
The XML World
1.4.1
XML dialects
1.4.2
XML standards
1.5
Further reading
1.6
Exercises
1.6.1
XML documents
1.6.2
XML standards
2
XPath and XQuery
2.1
Introduction
2.2
Basics
2.2.1
XPath and XQuery data model for documents
2.2.2
The XQuery model (continued) and sequences
2.2.3
Specifying paths in a tree: XPath
2.2.4
A first glance at XQuery expressions
2.2.5
XQuery vs XSLT
2.3
XPath
2.3.1
Steps and path expressions
2.3.2
Evaluation of path expressions
2.3.3
Generalities on axes and node tests
2.3.4
Axes
2.3.5
Node tests and abbreviations
2.3.6
Predicates
2.3.7
XPath 2.0
2.4
FLWOR expressions in XQuery
2.4.1
Defining variables: the
for
and
let
clauses
2.4.2
Filtering: the
where
clause
2.4.3
The
return
clause
2.4.4
Advanced features of XQuery
2.5
XPath foundations
2.5.1
A relational view of an XML tree
2.5.2
Navigational XPath
2.5.3
Evaluation
2.5.4
Expressiveness and first-order logic
2.5.5
Other XPath fragments
2.6
Further reading
2.7
Exercises
3
Typing
3.1
Motivating Typing
3.2
Automata
3.2.1
Automata on Words
3.2.2
Automata on Ranked Trees
3.2.3
Unranked Trees
3.2.4
Trees and Monadic Second-Order Logic
3.3
Schema Languages for XML
3.3.1
Document Type Definitions
3.3.2
XML Schema
3.3.3
Other Schema Languages for XML
3.4
Typing Graph Data
3.4.1
Graph Semistructured Data
3.4.2
Graph Bisimulation
3.4.3
Data guides
3.5
Further reading
3.6
Exercises
4
XML Query Evaluation
4.1
XML fragmentation
4.2
XML identifiers
4.2.1
Region-based identifiers
4.2.2
Dewey-based identifiers
4.2.3
Structural identifiers and updates
4.3
XML evaluation techniques
4.3.1
Structural join
4.3.2
Optimizing structural join queries
4.3.3
Holistic twig joins
4.4
Further reading
4.5
Exercises
5
Putting into Practice: Managing an XML Database with
E
X
I
S
T
5.1
Pre-requisites
5.2
Installing
E
X
I
S
T
5.3
Getting started with
E
X
I
S
T
5.4
Running XPath and XQuery queries with the sandbox
5.4.1
XPath
5.4.2
XQuery
5.4.3
Complement: XPath and XQuery operators and functions
5.5
Programming with
E
X
I
S
T
5.5.1
Using the
XML:DB
API with
E
X
I
S
T
5.5.2
Accessing
E
X
I
S
T
with Web Services
5.6
Projects
5.6.1
Getting started
5.6.2
Shakespeare Opera Omnia
5.6.3
MusicXML on line
6
Putting into Practice: Tree Pattern Evaluation using SAX
6.1
Tree-pattern dialects
6.2
CTP evaluation
6.3
Extensions
II
Web Data Semantics and Integration
7
Ontologies, RDF, and OWL
7.1
Introduction
7.2
Ontologies by example
7.3
RDF, RDFS, and OWL
7.3.1
Web resources, URI, namespaces
7.3.2
RDF
7.3.3
RDFS: RDF Schema
7.3.4
OWL
7.4
Ontologies and (Description) Logics
7.4.1
Preliminaries: the DL jargon
7.4.2
: the prototypical DL
7.4.3
Simple DLs for which reasoning is polynomial
7.4.4
The
DL-
L
I
T
E
family: a good trade-off
7.5
Further reading
7.6
Exercises
8
Querying Data through Ontologies
8.1
Introduction
8.2
Querying RDF data: notation and semantics
8.3
Querying through RDFS ontologies
8.4
Answering queries through
DL-
L
I
T
E
ontologies
8.4.1
DL-
L
I
T
E
8.4.2
Consistency checking
8.4.3
Answer set evaluation
8.4.4
Impact of combining
DL-
L
I
T
E
and
DL-
L
I
T
E
on query answering
8.5
Further reading
8.6
Exercises
9
Data Integration
9.1
Introduction
9.2
Containment of conjunctive queries
9.3
Global-as-view mediation
9.4
Local-as-view mediation
9.4.1
The Bucket algorithm
9.4.2
The Minicon algorithm
9.4.3
The Inverse-rules algorithm
9.4.4
Discussion
9.5
Ontology-based mediators
9.5.1
Adding functionality constraints
9.5.2
Query rewriting using views in
DL-
L
I
T
E
9.6
Peer-to-Peer Data Management Systems
9.6.1
Answering queries using GLAV mappings is undecidable
9.6.2
Decentralized
DL-
L
I
T
E
9.7
Further reading
9.8
Exercices
10
Putting into Practice: Wrappers and Data Extraction with XSLT
10.1
Extracting Data from Web Pages
10.2
Restructuring Data
11
Putting into Practice: Ontologies in Practice (by Fabian M. Suchanek)
11.1
Exploring and installing
Y
A
G
O
11.2
Querying
Y
A
G
O
11.3
Web access to ontologies
11.3.1
Cool URIs
11.3.2
Linked Data
12
Putting into Practice: Mashups with
Y
A
H
O
O
! P
I
P
E
S
and XProc
12.1
Y
A
H
O
O
! P
I
P
E
S
: A Graphical Mashup Editor
12.2
XProc: An XML Pipeline Language
III
Building Web Scale Applications
13
Web search
13.1
The World Wide Web
13.2
Parsing the Web
13.2.1
Crawling the Web
13.2.2
Text Preprocessing
13.3
Web Information Retrieval
13.3.1
Inverted Files
13.3.2
Answering Keyword Queries
13.3.3
Large-scale Indexing with Inverted Files
13.3.4
Clustering
13.3.5
Beyond Classical IR
13.4
Web Graph Mining
13.4.1
PageRank
13.4.2
HITS
13.4.3
Spamdexing
13.4.4
Discovering Communities on the Web
13.5
Hot Topics in Web Search
13.6
Further Reading
13.7
Exercises
14
An Introduction to Distributed Systems
14.1
Basics of distributed systems
14.1.1
Networking infrastructures
14.1.2
Performance of a distributed storage system
14.1.3
Data replication and consistency
14.2
Failure management
14.2.1
Failure recovery
14.2.2
Distributed transactions
14.3
Required properties of a distributed system
14.3.1
Reliability
14.3.2
Scalability
14.3.3
Availability
14.3.4
Efficiency
14.3.5
Putting everything together: the CAP theorem
14.4
Particularities of P2P networks
14.5
Case study: a Distributed File System for very large files
14.5.1
Large scale file system
14.5.2
Architecture
14.5.3
Failure handling
14.6
Further reading
15
Distributed Access Structures
15.1
Hash-based structures
15.1.1
Distributed Linear Hashing
15.1.2
Consistent Hashing
15.1.3
Case study:
C
H
O
R
D
15.2
Distributed indexing: Search Trees
15.2.1
Design issues
15.2.2
Case study:
B
A
T
O
N
15.2.3
Case Study:
B
I
G
T
A
B
L
E
15.3
Further reading
15.4
Exercises
16
Distributed Computing with
M
A
P
R
E
D
U
C
E
and
P
I
G
16.1
M
A
P
R
E
D
U
C
E
16.1.1
Programming model
16.1.2
The programming environment
16.1.3
M
A
P
R
E
D
U
C
E
internals
16.2
P
I
G
16.2.1
A simple session
16.2.2
The data model
16.2.3
The operators
16.2.4
Using
M
A
P
R
E
D
U
C
E
to optimize
P
I
G
programs
16.3
Further reading
16.4
Exercises
17
Putting into Practice: Full-Text Indexing with
L
U
C
E
N
E
(by Nicolas Travers)
17.1
Preliminary: a
L
U
C
E
N
E
sandbox
17.2
Indexing plain-text with
L
U
C
E
N
E
– A full example
17.2.1
The main program
17.2.2
Create the Index
17.2.3
Adding documents
17.2.4
Searching the index
17.2.5
L
U
C
E
N
E
querying syntax
17.3
Put it into practice!
17.3.1
Indexing a directory content
17.3.2
Web site indexing (project)
17.4
L
U
C
E
N
E
– Tuning the scoring (project)
18
Putting into Practice: Recommendation Methodologies (by Alban Galland)
18.1
Introduction to recommendation systems
18.2
Pre-requisites
18.3
Data analysis
18.4
Generating some recommendations
18.4.1
Global recommendation
18.4.2
User-based collaborative filtering
18.4.3
Item-based collaborative filtering
18.5
Projects
18.5.1
Scaling
18.5.2
The probabilistic way
18.5.3
Improving recommendation
19
Putting into Practice: Large-Scale Data Management with
H
A
D
O
O
P
19.1
Installing and running
H
A
D
O
O
P
19.2
Running
M
A
P
R
E
D
U
C
E
jobs
19.3
P
I
G
L
A
T
I
N
scripts
19.4
Running in cluster mode (optional)
19.4.1
Configuring
H
A
D
O
O
P
in cluster mode
19.4.2
Starting, stopping and managing
H
A
D
O
O
P
19.5
Exercises
20
Putting into Practice:
C
O
U
C
H
DB
, a JSON Semi-Structured Database
20.1
Introduction to the
C
O
U
C
H
DB
document database
20.1.1
JSON, a lightweight semi-structured format
20.1.2
C
O
U
C
H
DB
, architecture and principles
20.1.3
Preliminaries: set up your
C
O
U
C
H
DB
environment
20.1.4
Adding data
20.1.5
Views
20.1.6
Querying views
20.1.7
Distribution strategies: master-master, master-slave and shared-nothing
20.2
Putting
C
O
U
C
H
DB
into Practice!
20.2.1
Exercises
20.2.2
Project: build a distributed bibliographic database with
C
O
U
C
H
DB
20.3
Further reading
References
[
next
] [
prev
] [
prev-tail
] [
front
] [
up
]