Archive

Author Archive

Webdam dissemination

April 2nd, 2010
Comments Off

Serge Abiteboul attended the Datalog 2.0 Conference, at Oxford. He presented a survey of works around AXML and Webdam (Abiteboul10WSOxford)

Ioana Manolescu organized a very interesting panel at the last Conference on Extending Database Technology in Lausanne (joint with International Conference on Database Theory). Serge was a member of the panel (Manolescu10EDBTPanel, Abiteboul10EDBTPanel). Recent Webdam works on probabilistic XML was also presented at ICDT.

Serge is co-organizer of the Dagstuhl workshop Enabling Holistic Approaches to Business Process Lifecycle Management.

It is nice to see that topics that are important for Webdam are gaining popularity:

  • Datalog that was not very fashionable is striking back.
  • Data-centric workflows are one of the topics of the Dagstuhl workshop.
  • In general there is renewed interest for distributed data management.

News ,

Spring news

April 2nd, 2010
Comments Off

The webdam team is very happy to welcome Fabian Suchanek as a new member, starting June 1th 2010. Fabian was previously a visiting researcher at the Search Labs of Microsoft Research.
We should be able to announce other great hiring soon…

News

Webdam at BDA 2010 Summer School

October 22nd, 2009
Comments Off

The thematic 2010 summer school of the BDA (Bases de données avancées) conference on distributed very large databases will take place in Les Houches (France) from May 16th 2010 to May 21th 2010 with the support of Webdam.

Webdam will also contribute to the summer school with the following presentation:

  • Web data processing, Pierre Senellart and Philippe Rigaux
  • Logical approach for uniform querying of distributed and heterogeneous data, Marie-Christine Rousset

News ,

A logo for Webdam

August 20th, 2009
Comments Off

Serge did a wonderful logo for Webdam. There are some of the versions you may find useful.


webdam-pdf

Webdam logo in pdf


webdam-eps

Webdam logo in eps

Webdam logo in png

Webdam logo in png

Small Webdam logo in png

Small Webdam logo in png


webdam-white-pdf

White Webdam logo in pdf


webdam-white-eps

White Webdam logo in eps

White Webdam logo in png

White Webdam logo in png

Small white Webdam logo in png

Small white Webdam logo in png

News

Francois Bancilhon visiting Webdam

May 4th, 2009
Comments Off

Francois Bancilhon is visiting Webdam Monday 4 May 2009. He is currently working on creation of PACMAN, a cooperative R&D project, based in Paris and focused on mobile Internet applications
He will present the PACMAN project at 2:00pm in the meeting room of building G.

Title: The PACMAN project

Summary: PACMAN is an r&d project submitted to the last call for projects (March-April 2009) of the Systematic and Cap Digital Competitivity Clusters.
The participants of the project are Agence France Presse, Alcaltel-Lucent, Bearstech, Dexxon, Haploid, INRIA (Arles and Indes projects), LIP6 (SMA and SPR projects), Mandriva, Streamezzo, and the City of Paris.
The project is focused on Mobile Internet Applications. It considers that the hardware/software platform for mobility is the Smartphone (i.e., iPhoneOS, Android, Windows Mobile, Symbian and WebOS).
The objective of the project is to deliver a set of technologies supporting mobile application development, to validate the use of these technologies by a significant set of mobile applications and to distribute these applications through and on line store (PacStore). Technologies include development and runtime tools and middleware. One of the key problems addressed by the project is that of portability (a single code for the same application running on different Smartphones).
Applications targeted by the project include social applications, training application, multi-player games and ubiquitous applications.

Short Bio: François Bancilhon is currently working on creation of PACMAN, a cooperative R&D project, based in Paris and focused on mobile Internet applications. Before this, he was the chairman and CEO of Mandriva (formerly Mandrakesoft), one of the top world Linux publishers. Prior to Mandriva, he has founded and/or managed several software startups in France and in the US. Before becoming an entrepreneur, François was a researcher and a university professor, in France and the US, specializing in database technology. François holds an engineering degree from the École des Mines de Paris, a PhD from the University of Michigan and a Doctorate from the University of Paris XI.

News ,

Webdam at EDBT Summer School 2009

April 30th, 2009
Comments Off

The first results of Webdam will be presented at the 9th EDBT Summer School (August 31th- September 4th) during the first slot of the Programme dedicated to “First results from the ERC-funded projects” .

In case Serge is not available to go there for presenting Webdam (though he is announced as a speaker :-)), Marie-Christine Rousset will make the presentation. As a matter of facts, she will also give a tutorial on “semantic oriented data spaces”.

News ,

PeerSoN : a P2P social networks

March 10th, 2009
Comments Off

Report on the presentation of Sonja Buchegger, March 9th, 2009
See PeerSoN web site and slides for more details
Warning : this report outlines the understanding of the post author (Alban Galland) and nothing more.

Context

Ubiquitous computing is a model where devices and systems collaborate to solve tasks given by the user without him being conscious of it. This paradigm leads to problems of privacy, since you leave trace everywhere in a virtual world integrated to real world. All these data could be used for data-mining, from advertising to surveillance. This virtual world usually suffer for a lack of memory loss. These systems also tends to centralize the data of the users on a part of the system. The personal (private), public and commercial spheres collide in this context.

Social networks are another model where this privacy issue is risen, since users store very personal data on these systems. They are usually web 2.0 services which need Internet connexion. The main feature is to let users keep in touch with their friend in an ambient way.

Integrating ubiquitous computing and social networks in an ubiquitous P2P social network helping privacy is then specially challenging. One of the main reason to design such a system is that social networks naturally collide with real world and ubiquity is then specially desirable. It also solves most of the ownership question about data and avoid that systems dictate terms of use.

Distribution

Social networks and ubiquitous computing are naturally distributed. PeerSoN use a distributed storage of data. To solve online availability problem, it uses replication on friends, the keys parameters chosen given a trace of users characterizing their temporal and geographical distribution. To solve boot-strapping, it also use storage on random nodes. The peers communicate directly but they use a DHT for lookup. This DHT was build using openDHT and Planet Labs in a first version, but too many availability problems lead to a centralized emulation of HT (put/get/remove operations) on the current version. The peers are identified by the hash of a globally unique identifier (such as email address) . When connecting to the DHT, the user register his user id, his machine IP and his data.

Direct exchange

In order not to be dependent of a network connection (and to go further on the ubiquity), the design should take in account delay tolerant networks. It is useful to carry information from friend to friend. Asynchronous messaging is an example of such content. But it is not clear that distribution will work well this way. It is also useful for storage, since the system should use the storage available around.

Access control

There is a trade-off between privacy and search. The user defines what he want to be searchable. The system emulates a fine-grained access control with keys (whom can see which part of the profile). This method would also provide protection against storage provider. The key management emulates a standard public key infrastructure and key may be exchanged by direct contact.

Related work and issues

  • Distributed file management: usually, the assumptions are that data is stable and interests follow Zipf’s distribution. In SN context, data change a lot and distribution of interest is local
  • Anonymity: distribution in a DHT leaves less traces of the query
  • Media storage: the storage should be optimized using novelty.

On-going work

Response time testings using different assumptions on the network.

News , , ,

Sonja Buchegger visiting Webdam

March 9th, 2009
Comments Off

Sonja Buchegger is visiting Webdam Monday 9 March 2009. She is a senior research scientist at the Deutsche Telekom Laboratories, Berlin.
She will present Peerson, a P2P Social Networks at 2:00pm in the meeting room of building N.

Title : Peerson: P2P Social Networks

Summary : Online Social Networks like Facebook, MySpace, Xing, etc. have become extremely popular. Yet they have some limitations that we want to overcome for a next generation of social networks: privacy concerns and requirements of Internet connectivity, both of which are due to web-based applications on a central site whose owner has access to all data.
To overcome these limitations, we envision a paradigm shift from client-server to a peer-to-peer infrastructure coupled with encryption so that users keep control of their data and can use the social network also locally, without Internet access. This shift gives rise to many research questions intersecting networking, security, distributed systems and social network analysis, leading to a better understanding of how technology can support social interactions.

Short Bio : Sonja Buchegger is a senior research scientist at the Deutsche Telekom Laboratories, Berlin. In 2005 and 2006, she was a post-doctoral scholar at the University of California at Berkeley, School of Information. She received her Ph.D. in Communication Systems from EPFL, Switzerland, in 2004, a graduate degree in Computer Science in 1999, and undergraduate degrees in Computer Science in 1996 and in Business Administration in 1995 from the University of Klagenfurt, Austria. In 2003 and 2004 she was a research and teaching assistant at EPFL and from 1999 to 2003 she worked at the IBM Zurich Research Laboratory in the Network Technologies Group. Her current research interests are in social, economics, and security aspects of self-organized networks.

News ,

Anonymisation in social-based P2P networks

March 2nd, 2009
Comments Off

Report on the presentation of Fabrice Le Fessant, February 23th, 2009
See slides for more details.
Warning : this report outlines the understanding of the post author (Alban Galland) and nothing more.

Context

In a context of P2P file sharing networks, some malicious peer may try to keep a log of the queries issued on the network in order to build upload and download profiles of other peers. To avoid censorship in particular, one may want to design a network where non-trusted peers may contribute to the life of the network without being able to locate publisher neither querier. A social-based P2P network naturally fits this requirement : friends are not hidden but trusted and they can anonymise the exchanges.

Previous work

There is already some social based P2P networks, such as the turtle network. It is close to gnutella but based on social network, which means that connexions are chosen and trusted. The search is done by flooding, which is quiet expensive in bandwidth.

There is also some anti-censorship networks, such as freenet. It manages small encrypted documents. The search is done by depth-first search, oriented by a notion of distance between users. The data is accessed by replication on the back-path. Such a network could be easily limited to friends.

Gnunet is another example of anti-censorship networks. The search is done by a limited breadth-first search. It use a shortcut system to randomly modify the id on the queries for the anonymisation. There is also a credit system to avoid flooding. It has been shown that these two optimizations are indeed a weakness for the anonymisation.

Some clues about Orkut

Some simulations have been done based on a trace of Orkut. They raised interesting questions about the topology of the network.

  • What is the distribution of the nodes degrees?
  • What happen for the connectivity when removing nodes?

The answers of these questions deeply depend of how the crawl have been made.

Problem

  • How to manage big files?
  • How to specify the level of the attacker to have different theoretical guaranties?
  • How to restrict real network to sub-network?

Ideas

The load should be balanced between query time and publication time. Most of the P2P methods are based on the query, but one could also think of diffusion process when a resource is published (through subscription to feeds, replication or local index tables materialization). Both methods could be mixed. It is the case in structured networks such as DHT where a distributed index is materialized and queried.

Finally, the methods should be optimized depending of the file type and the file size.

News , , ,

Social Networks APIs

January 26th, 2009
Comments Off

Report on the presentation of Alban Galland, January 26th, 2009
See slides for more details.
Warning : this report outlines the understanding of the post author (Alban Galland) and nothing more.

Some existing APIs

There is different kind of APIs for Social Networks : micro-formats, ontologies, query interaction with social sites and application definition for social sites… These different kind of APIs give different level of details of how to represent a social graph and how to query it.

Mirco-formats

xfn (XHTML Friends networks) is a mirco-format which is use to tag the hyperlink with predefined friendship concepts. There are plenty of other micro-formats which are also linked to Social Networks. These micro-formats are easy to use and naturally distributed, but it is hard to have a global view of the network and to query it.

Ontologies

FOAF (Friend Of A Friend) is an ontology in RDF and OWL which describes a social network. It contains in particular a large number of way of identification. Because ontologies are hard to design, this one is still unstable. It is also hard to have a global view of the network and it could also lead to too much complexity in description. Nevertheless, some tools as Google Social Graph APIs add a layer to query both XFN and FOAF information on the web as a whole.

Interaction with social sites

Open Social is an example of APIs which allows application to interact with social sites. It is a package of three APIs (Javascript, REST and RPC). Using one of these APIs, a couple (viewer,application) can query a social site (container) about the social informations of a distinct user (owner). For example, the CoolApplication application could use my credential of logged user on Orkut to query it about some of my friends. The readable data are the profile of the users (corresponding to the Orkut profile) and their lists of friends. There are also some discovery capabilities of an unstructured table of data. This feature was probably designed for containers which are not owned by Google, but would like to use the API. That is, of course, the limit of the approach “one designs for himself and tries then to convince others”. The container can implement its own security policy to filter what is readable, but there is no way to specify such a policy and no clue about what should be the default policy. The application could also write two kinds of information : a log of activities and some persistent couples attribute-value. The FBJS, Rest-Like and Connect APIs of Facebook are designed with the same spirit.
These APIs allow some transfer of the data from social site to applications, which make the former more useful and the latter usable on more container. They are nevertheless designed in a centralized way (with a container) and privacy is largely under-specified.

Application specification

FBML is an example of application specification (or at least about rendering a user interface using embedded service calls to Facebook). The principle is relatively different of the interaction specification, since the evaluation of FBML is done by Facebook and the data are not transfered to the application server. FBML could access data using FQL or complex tags which are wrapping of FQL queries. FQL is an SQL-like language and the Facebook Social Network is described as table with indexability constraints which restrict the queries. This kind of APIs limits the expressiveness of the applications, but it protects more the privacy since it did not transfer any data to the applications.

Some clues about design of SN APIs

To summarize, a good API should be easy to use, distributed, easy to query as a whole and allow data transfer with privacy control.

Such APIs must rely on a model of social networks with :

  • People, which could be identified, authenticated and have a profile
  • Relationships, which could be of different type
  • Applications, which could be identified, authenticated, are described by a code and could write some part of the profile of a user.

They must also allow queries with right access. We believe that a good model of Social Network is a distributed knowledge base with right access. We are currently defining such a model.

News , , ,