Webdam Project

PhD defense of Alban Galland

September 19th, 2011

Comments Off

Alban Galland will defend his PhD, September 28th 2011 at 15:00, room 455 of PCRI at Gif-sur-Yvette (Plateau de Saclay)

Title: Distributed Data Management with Access Control
– Social Networks and Data of the Web

Abstract: The amount of information on the Web is spreading very rapidly. Users as
well as companies bring data to the network and are willing to share with
others. They quickly reach a situation where their information is hosted on
many machines they own and on a large number of autonomous systems
where they have accounts. Management of all this information is rapidly
becoming beyond human expertise. We introduce WebdamExchange, a
novel distributed knowledge-base model that includes logical statements
for specifying information, access control, secrets, distribution, and knowl-
edge about other peers. These statements can be communicated, replicated,
queried, and updated, while keeping track of time and provenance. The
resulting knowledge guides distributed data management. WebdamEx-
change model is based on WebdamLog, a new rule-based language for
distributed data management that combines in a formal setting deductive
rules as in Datalog with negation, (to specify intensional data) and active
rules as in Datalog¬¬ (for updates and communications). The model pro-
vides a novel setting with a strong emphasis on dynamicity and interactions
(in a Web 2.0 style). Because the model is powerful, it provides a clean
basis for the speciﬁcation of complex distributed applications. Because it
is simple, it provides a formal framework for studying many facets of the
problem such as distribution, concurrency, and expressivity in the context
of distributed autonomous peers. We also discuss an implementation of a
proof-of-concept system that handles all the components of the knowledge
base and experiments with a lighter system designed for smartphones. We
believe that these contributions are a good foundation to overcome the
problems of Web data management, in particular with respect to access
control.

News defense, Members, P2P, Privacy, Report

Brainstorming on Foundations of Web Data Management

September 10th, 2009

Comments Off

The Webdam Workshop on Brainstorming on Foundations of Web Data Management took place on August 28th, 2009 at Télécom ParisTech. It was an occasion to present Webdam first achievements to a panel of specially talented researchers, all known as being a leading force in their respective research fields. It was also an occasion to share and compare different and useful visions of how the data management of the web should be founded theoritically.
In particular, we got the following exciting presentations :

Serge Abiteboul, Webdam in brief: Serge presented the main motivations and goals of Webdam: noticing that management of distributed data on the web is not supported by robust models and theory, he proposes to focus on information residing in autonomous systems, following the direction of Axml. This talk raised interesting debate with the audience on concurrency control and more generally on expectations about Webdam.
Marie-Christine Rousset, Representing and Reasoning on Web Data Semantics, Survey and Challenges: Marie-Christine presented the importance of data semantics to constrain meta-data for web data management. This will allow reasoning on knowledge using logic. This talk raised questions on the best kind of logic to use, the limitations of RDF and extensions to numeric properties.
Stefano Ceri, Search Computing: Stefano presented his work on the ERC project Search Computing, mostly focused on data management and query optimization. This talk build natural bridges with Webdam, since the use of a rich data will deeply improve quality of search and process modeling; social network are also a natural path for promising interaction. This research also raises questions about the link between search and probabilistic databases.
Georg Gottlob, Web Data Extraction — Present and Future: Georg’s talk argued on the need of tools to bridge the gap between unstructured and structured information to feed the data management system. He proposed a langage for expressing such extraction methods and tools to support it. It raised questions about creating new annotations on Datalog and managing duplicates.
Tova Milo, Querying Past and Future in Web Applications: Tova presented applications which would more naturally grow on top of a rich distributed data management system. In particular, she focused on the need to understand and optimize the interaction with the user, considering past interactions. The main challenges which animated the debate is the generalization of the application to a more generic scenario, using in particular a representation of workflows.
Peter Buneman, Provenance in databases and workflow: Peter’s talk demonstrated the importance of where-, how- and why-provenance. It provided some tools and model to use in presence of complex workflows. This topic is of direct interest for Webdam, since keeping trace of provenance is fundamental in such distributed environments.
Dan Suciu, Belief Databases: Dan demonstrated the importance of the management of belief in distributed data management system where each user has a consistent view of the database even if inconsistencies may appears across views. This talk raised interesting discussion on the representation of belief and the kind of logic to use in such a system.
Val Tannen, Provenance Propagation: Val developed the analysis of the previous speakers about the need of provenance to update and feedback propagation in a web data management system. He proposed an algebraic view of provenance in order to better understand it and get general results. The debate focused on summarization of why-provenance and levels of abstraction.
Victor Vianu, Static Analysis of Active XML Systems: Axml is a first model of web data management system which may support tasks, controlled by guards. Victor presented how properties of the system could be expressed in tree-LTL logic and verified, providing theoretical insurance on the behavior of the system.
Pierre Senellart, Probabilistic XML: Survey and Challenges: Pierre presented how XML probabilistic databases could leverage the uncertainty to better represent the knowledge on a distributed data management system. He also explained how to reason on this database. It raised exiting challenges like continuous probabilistic distributions and dependency tractability.
Luc Segoufin, Links with FoX Project: Fox is an european project which focuses on safe processing of dynamic data over Internet. It deals with similar problems as Webdam: data modeling and specification, querying, extracting and exchanging XML data, modeling and verification of temporal behavior and handling incomplete informations. The two projects have already produced fruitful collaboration.
Balder ten Cate, Structural Characterizations of Schema Mapping Languages: Balder presented how important schema mappings are for data integration on a distributed data management system. He proposed a study of the languages of data mapping schema. It raised interesting issues on adapting this model to XML data and schema mapping optimization.
Serge Abiteboul, Recent works around AXML: In this presentation, Serge introduced an existing application of Axml: the business artifact. This allows representing a workflow in a data-centric way, well suited for highly distributed applications. This raised a large number of questions about interaction between autonomous system, synchronization, movement of artifacts, monitoring, quality of services, access control…

News Database Theory, Dissemination, Model, Ontologies, P2P, Workshop

PeerSoN : a P2P social networks

March 10th, 2009

Comments Off

Report on the presentation of Sonja Buchegger, March 9th, 2009
See PeerSoN web site and slides for more details
Warning : this report outlines the understanding of the post author (Alban Galland) and nothing more.

Context

Ubiquitous computing is a model where devices and systems collaborate to solve tasks given by the user without him being conscious of it. This paradigm leads to problems of privacy, since you leave trace everywhere in a virtual world integrated to real world. All these data could be used for data-mining, from advertising to surveillance. This virtual world usually suffer for a lack of memory loss. These systems also tends to centralize the data of the users on a part of the system. The personal (private), public and commercial spheres collide in this context.

Social networks are another model where this privacy issue is risen, since users store very personal data on these systems. They are usually web 2.0 services which need Internet connexion. The main feature is to let users keep in touch with their friend in an ambient way.

Integrating ubiquitous computing and social networks in an ubiquitous P2P social network helping privacy is then specially challenging. One of the main reason to design such a system is that social networks naturally collide with real world and ubiquity is then specially desirable. It also solves most of the ownership question about data and avoid that systems dictate terms of use.

Distribution

Social networks and ubiquitous computing are naturally distributed. PeerSoN use a distributed storage of data. To solve online availability problem, it uses replication on friends, the keys parameters chosen given a trace of users characterizing their temporal and geographical distribution. To solve boot-strapping, it also use storage on random nodes. The peers communicate directly but they use a DHT for lookup. This DHT was build using openDHT and Planet Labs in a first version, but too many availability problems lead to a centralized emulation of HT (put/get/remove operations) on the current version. The peers are identified by the hash of a globally unique identifier (such as email address) . When connecting to the DHT, the user register his user id, his machine IP and his data.

Direct exchange

In order not to be dependent of a network connection (and to go further on the ubiquity), the design should take in account delay tolerant networks. It is useful to carry information from friend to friend. Asynchronous messaging is an example of such content. But it is not clear that distribution will work well this way. It is also useful for storage, since the system should use the storage available around.

Access control

There is a trade-off between privacy and search. The user defines what he want to be searchable. The system emulates a fine-grained access control with keys (whom can see which part of the profile). This method would also provide protection against storage provider. The key management emulates a standard public key infrastructure and key may be exchanged by direct contact.

Related work and issues

Distributed file management: usually, the assumptions are that data is stable and interests follow Zipf’s distribution. In SN context, data change a lot and distribution of interest is local
Anonymity: distribution in a DHT leaves less traces of the query
Media storage: the storage should be optimized using novelty.

On-going work

Response time testings using different assumptions on the network.

News Meetings, P2P, Privacy, Social Networks

Anonymisation in social-based P2P networks

March 2nd, 2009

Comments Off

Report on the presentation of Fabrice Le Fessant, February 23th, 2009
See slides for more details.
Warning : this report outlines the understanding of the post author (Alban Galland) and nothing more.

Context

In a context of P2P file sharing networks, some malicious peer may try to keep a log of the queries issued on the network in order to build upload and download profiles of other peers. To avoid censorship in particular, one may want to design a network where non-trusted peers may contribute to the life of the network without being able to locate publisher neither querier. A social-based P2P network naturally fits this requirement : friends are not hidden but trusted and they can anonymise the exchanges.

Previous work

There is already some social based P2P networks, such as the turtle network. It is close to gnutella but based on social network, which means that connexions are chosen and trusted. The search is done by flooding, which is quiet expensive in bandwidth.

There is also some anti-censorship networks, such as freenet. It manages small encrypted documents. The search is done by depth-first search, oriented by a notion of distance between users. The data is accessed by replication on the back-path. Such a network could be easily limited to friends.

Gnunet is another example of anti-censorship networks. The search is done by a limited breadth-first search. It use a shortcut system to randomly modify the id on the queries for the anonymisation. There is also a credit system to avoid flooding. It has been shown that these two optimizations are indeed a weakness for the anonymisation.

Some clues about Orkut

Some simulations have been done based on a trace of Orkut. They raised interesting questions about the topology of the network.

What is the distribution of the nodes degrees?
What happen for the connectivity when removing nodes?

The answers of these questions deeply depend of how the crawl have been made.

Problem

How to manage big files?
How to specify the level of the attacker to have different theoretical guaranties?
How to restrict real network to sub-network?

Ideas

The load should be balanced between query time and publication time. Most of the P2P methods are based on the query, but one could also think of diffusion process when a resource is published (through subscription to feeds, replication or local index tables materialization). Both methods could be mixed. It is the case in structured networks such as DHT where a distributed index is materialized and queried.

Finally, the methods should be optimized depending of the file type and the file size.

News Anonymization, Meetings, P2P, Social Networks

Incentives for Users of Social Software

January 14th, 2009

Comments Off

Report on the presentation of Panayotis Antoniadis, January 14th, 2009
See slides for more details.
Warning : this report outlines the understanding of the post author (Alban Galland) and nothing more.

Context

The presentation was focused on how to understand and model users behavior in P2P systems. The design of incentive mechanisms must indeed take into account not only economics but also social behavior.

The social networks are directly connected to the notion of self organized communities. P2P systems are going more slowly social than centralized systems, because of legal reasons or because they are less open to participation (friend-to-friend networks control their access). Web-based communities are efficient but sometimes they already are using P2P for content distribution. Some may argue that this content distribution could be used to manage all the community and that the web-access layer is then useless. This layer also leads to privacy and censorship problems which encourage systems enabling independence. Other reason to use P2P systems may also come from applications using distribution features. Actually, benefit is still not totally obvious : web-based and P2P seem in fact complementary. For example, web-based systems could be used to meet people and P2P systems to interact with friends (in a private way).

Challenges

There are two kinds of challenges :

Technical issues : content distribution, information integrity, different privacy/security issues. In general, identity matters in social system and some data should not be shared
Incentives issues : participation, resource sharing, trust

The presentation is focused on the latter one.

Example of Wireless Neighborhood communities

Hybrid on-line communities are both physical and virtual communities. The notion is connected to P2P systems and ad-hoc networks. They are used to provide Internet for everybody avoiding hotspots, but it is also fun to have a private network and it is a means to organize something in a neighborhood. There are already localized communities (lifeAt, i-neighbors, peuplade.fr, Facebook neighborhoods, meetup…) These services usually allow users to exchange services and information with their neighborhood, but are web based (not physical). There are also grassroots communities of wireless networks (seattle wireless net, awmn). The idea is to bring both components together to create incentives.

About incentives

Economics vision : the users share resources through market or reciprocity (token, reputation…). The goal is to design markets such as when they reach equilibrium, users have the targeted behavior. Modeling the optimization process is possible knowing utility and cost. But there is a problem of information because utility and cost are usually unknown, even by the users themselves which are then hard to predict.

Social vision : Even without economic reasons, P2P systems are still working : people make efforts without direct economics incentives. Some reasons are spirit, value/cost ratio, self-efficacy, altruism or in general social incentives…

In general, incentives cover a wide range from extrinsic to intrinsic motivations : payments, reciprocity, long term benefit, popularity, status, self-image, sense of efficacy, community vision, interest, fun…

The economics vision could be difficult to apply on the resources layer. For example, FON let you share a bit of your wifi with other Foneros, based on global reciprocity. But there is a deep problem of symmetry of resources usage since fewer people will try to access wifi of people living in an isolated place. There is other models as the yellow chair project or the wifi-thank-you site where incentives are purely social. In general, incentives are not additive : the more you control (extrinsic incentives), the less people are self-motivated (intrinsic incentives).

An interesting idea is to use a cross-layer incentive : sharing low-level resources is rewarded at the level of the community and reciprocally the good members of the community get more low-level resources.

Application to social-software design

The key features of a social-software are

the general vision or promise
the community outcome
the personal image of the user
the local activity : the user must feel that somebody has seen her profil or interacts with her
what the user see about the rest of the world
the types of relations and interactions…

On-going work

After studying at the Computer Science Department of University of Crete and at the Department of Informatics of Athens University of Economics and Business, Panayotis is now a post-doc researcher at LIP6 Laboratory (University of Pierre and Marie Curie, Paris) working on the design of incentive mechanisms for network shared testbeds (like Planetlab) and virtual communities (on the Internet and wireless networks). Panayotis is collaborating with Benedicte Le Grand, Marcelo Dias De Amorim, Ileana Apostol and Tridib Banerjee. He is member of the wip project.

Updated 01/20/2009 thanks to P. Amtoniadis helpful comments

News Incentives, Meetings, P2P, Social Networks

Webdam Project

Archive

PhD defense of Alban Galland

Brainstorming on Foundations of Web Data Management

PeerSoN : a P2P social networks

Context

Distribution

Direct exchange

Access control

Related work and issues

On-going work

Anonymisation in social-based P2P networks

Context

Previous work

Some clues about Orkut

Problem

Ideas

Incentives for Users of Social Software

Context

Challenges

Example of Wireless Neighborhood communities

About incentives

Application to social-software design

On-going work

Main menu

Recent Posts

Categories

Archives

Meta