Part III
Building Web Scale Applications

13  Web search
 13.1  The World Wide Web
 13.2  Parsing the Web
 13.3  Web Information Retrieval
 13.4  Web Graph Mining
 13.5  Hot Topics in Web Search
 13.6  Further Reading
 13.7  Exercises
14  An Introduction to Distributed Systems
 14.1  Basics of distributed systems
 14.2  Failure management
 14.3  Required properties of a distributed system
 14.4  Particularities of P2P networks
 14.5  Case study: a Distributed File System for very large files
 14.6  Further reading
15  Distributed Access Structures
 15.1  Hash-based structures
 15.2  Distributed indexing: Search Trees
 15.3  Further reading
 15.4  Exercises
16  Distributed Computing with MAPREDUCE and PIG
 16.1  MAPREDUCE
 16.2  PIG
 16.3  Further reading
 16.4  Exercises
17  Putting into Practice: Full-Text Indexing with LUCENE (by Nicolas Travers)
 17.1  Preliminary: a LUCENE sandbox
 17.2  Indexing plain-text with LUCENE – A full example
 17.3  Put it into practice!
 17.4  LUCENE – Tuning the scoring (project)
18  Putting into Practice: Recommendation Methodologies (by Alban Galland)
 18.1  Introduction to recommendation systems
 18.2  Pre-requisites
 18.3  Data analysis
 18.4  Generating some recommendations
 18.5  Projects
19  Putting into Practice: Large-Scale Data Management with HADOOP
 19.1  Installing and running HADOOP
 19.2  Running MAPREDUCE jobs
 19.3  PIGLATIN scripts
 19.4  Running in cluster mode (optional)
 19.5  Exercises
20  Putting into Practice: COUCHDB, a JSON Semi-Structured Database
 20.1  Introduction to the COUCHDB document database
 20.2  Putting COUCHDB into Practice!
 20.3  Further reading