Ph.D. thesis

Distributed data management with the rule-based language: Webdamlog

Material:

  • The full manuscript of my thesis is available in open access via TEL: link here
  • For a quick glance at my thesis you can go through the slides of my defense: link here
  • The video of the presentation I showed during my defense can be viewed with additional commentaries: right click and save target of this link to download the video.

Defense at 2pm on December 5, 2013

Abstract

We introduce Webdamlog, a datalog-style language for managing distributed data and knowledge. The language extends datalog in a number of ways, notably with a novel feature, namely delegation, allowing peers to exchange not only facts but also rules. We present a user study that demonstrates the usability of the language. We describe a Webdamlog engine that extends a distributed datalog engine, namely Bud, with the support of delegation and of a number of other novelties of Webdamlog such as the possibility to have variables denoting peers or relations. We mention novel optimization techniques, notably one based on the provenance of facts and rules. We exhibit experiments that demonstrate that the rich features of Webdamlog can be supported at reasonable cost and that the engine scales to large volumes of data. Finally, we discuss the implementation of a Webdamlog peer system that provides an environment for the engine. In particular, a peer supports wrappers to exchange Webdamlog data with non-Webdamlog peers. We illustrate these peers by presenting a picture management application that we used for demonstration purposes.

Jury

  • Serge Abiteboul (advisor)
  • Christine Collet (reviewer)
  • Pascal Molli (reviewer)
  • Nicole Bidoit (examiner)
  • Bogdan Cautis (examiner)
  • David Gross-Amblard (examiner)

Resume

Information management on the Internet relies on a wide variety of systems, each specialized for a particular task. The personal data and favorite applications of a Web user are typically distributed across many heterogeneous devices and systems, e.g., residing on a smartphone, laptop, tablet, TV box, or managed by Facebook, Google, etc. Additional data and computational resources are also available to the user from relatives, friends, colleagues, possibly via social network systems. Because of the distribution and heterogeneity, the management of personal data and knowledge has become a major challenge.

A Web user is regularly facing information management tasks that may be extremely cumbersome to carry out manually. Yet, automating these tasks, for example by writing scripts, is far beyond the skills of most Web users. Some systems attempt to provide integrated services to support these needs. For instance, Facebook provides a wrapper service to integrate Dropbox accounts and blogs. However, such services are often limited in the functionality they support. Also, by delegating such services to systems like Facebook, a user is lead to entrust more and more of his data to a single company, at the cost of losing ownership and control of his own data.

Our goal is to enable a Web user to easily specify distributed data management tasks in place, i.e. without centralizing the data to a single provider. Our system is therefore not a replacement for Facebook, or any centralized system, but an alternative that allows users to launch their own peers on their machines with their own personal data, and to collaborate with Web services.

Towards this goal, we propose Webdamlog, an elegant language for managing distributed data and knowledge. As a datalog-style language, its main benefits are the familiar ones: a declarative approach alleviates the conceptual complexity on the user, while at the same time allowing for powerful performance optimizations on the part of the system. Besides this language, our contributions consist of the design and implementation of an engine supporting Webdamlog, novel optimization techniques taylored to this setting, and the development of an environment for the peers supporting Webdamlog.

Date & Venue

On Thursday December 5, 2013 at 2pm in the room “Pavillon des Jardins”, at École Normale Superieure de Cachan.


View Pavillon des Jardins from train station in a larger map

So more details on how you can reach “ENS de Cachan” from Paris are available on the LSV website.

Other useful links