Webdam Project

Brainstorming on Foundations of Web Data Management

September 10th, 2009

Comments Off

The Webdam Workshop on Brainstorming on Foundations of Web Data Management took place on August 28th, 2009 at Télécom ParisTech. It was an occasion to present Webdam first achievements to a panel of specially talented researchers, all known as being a leading force in their respective research fields. It was also an occasion to share and compare different and useful visions of how the data management of the web should be founded theoritically.
In particular, we got the following exciting presentations :

Serge Abiteboul, Webdam in brief: Serge presented the main motivations and goals of Webdam: noticing that management of distributed data on the web is not supported by robust models and theory, he proposes to focus on information residing in autonomous systems, following the direction of Axml. This talk raised interesting debate with the audience on concurrency control and more generally on expectations about Webdam.
Marie-Christine Rousset, Representing and Reasoning on Web Data Semantics, Survey and Challenges: Marie-Christine presented the importance of data semantics to constrain meta-data for web data management. This will allow reasoning on knowledge using logic. This talk raised questions on the best kind of logic to use, the limitations of RDF and extensions to numeric properties.
Stefano Ceri, Search Computing: Stefano presented his work on the ERC project Search Computing, mostly focused on data management and query optimization. This talk build natural bridges with Webdam, since the use of a rich data will deeply improve quality of search and process modeling; social network are also a natural path for promising interaction. This research also raises questions about the link between search and probabilistic databases.
Georg Gottlob, Web Data Extraction — Present and Future: Georg’s talk argued on the need of tools to bridge the gap between unstructured and structured information to feed the data management system. He proposed a langage for expressing such extraction methods and tools to support it. It raised questions about creating new annotations on Datalog and managing duplicates.
Tova Milo, Querying Past and Future in Web Applications: Tova presented applications which would more naturally grow on top of a rich distributed data management system. In particular, she focused on the need to understand and optimize the interaction with the user, considering past interactions. The main challenges which animated the debate is the generalization of the application to a more generic scenario, using in particular a representation of workflows.
Peter Buneman, Provenance in databases and workflow: Peter’s talk demonstrated the importance of where-, how- and why-provenance. It provided some tools and model to use in presence of complex workflows. This topic is of direct interest for Webdam, since keeping trace of provenance is fundamental in such distributed environments.
Dan Suciu, Belief Databases: Dan demonstrated the importance of the management of belief in distributed data management system where each user has a consistent view of the database even if inconsistencies may appears across views. This talk raised interesting discussion on the representation of belief and the kind of logic to use in such a system.
Val Tannen, Provenance Propagation: Val developed the analysis of the previous speakers about the need of provenance to update and feedback propagation in a web data management system. He proposed an algebraic view of provenance in order to better understand it and get general results. The debate focused on summarization of why-provenance and levels of abstraction.
Victor Vianu, Static Analysis of Active XML Systems: Axml is a first model of web data management system which may support tasks, controlled by guards. Victor presented how properties of the system could be expressed in tree-LTL logic and verified, providing theoretical insurance on the behavior of the system.
Pierre Senellart, Probabilistic XML: Survey and Challenges: Pierre presented how XML probabilistic databases could leverage the uncertainty to better represent the knowledge on a distributed data management system. He also explained how to reason on this database. It raised exiting challenges like continuous probabilistic distributions and dependency tractability.
Luc Segoufin, Links with FoX Project: Fox is an european project which focuses on safe processing of dynamic data over Internet. It deals with similar problems as Webdam: data modeling and specification, querying, extracting and exchanging XML data, modeling and verification of temporal behavior and handling incomplete informations. The two projects have already produced fruitful collaboration.
Balder ten Cate, Structural Characterizations of Schema Mapping Languages: Balder presented how important schema mappings are for data integration on a distributed data management system. He proposed a study of the languages of data mapping schema. It raised interesting issues on adapting this model to XML data and schema mapping optimization.
Serge Abiteboul, Recent works around AXML: In this presentation, Serge introduced an existing application of Axml: the business artifact. This allows representing a workflow in a data-centric way, well suited for highly distributed applications. This raised a large number of questions about interaction between autonomous system, synchronization, movement of artifacts, monitoring, quality of services, access control…

News Database Theory, Dissemination, Model, Ontologies, P2P, Workshop

Social Networks APIs

January 26th, 2009

Comments Off

Report on the presentation of Alban Galland, January 26th, 2009
See slides for more details.
Warning : this report outlines the understanding of the post author (Alban Galland) and nothing more.

Some existing APIs

There is different kind of APIs for Social Networks : micro-formats, ontologies, query interaction with social sites and application definition for social sites… These different kind of APIs give different level of details of how to represent a social graph and how to query it.

Mirco-formats

xfn (XHTML Friends networks) is a mirco-format which is use to tag the hyperlink with predefined friendship concepts. There are plenty of other micro-formats which are also linked to Social Networks. These micro-formats are easy to use and naturally distributed, but it is hard to have a global view of the network and to query it.

Ontologies

FOAF (Friend Of A Friend) is an ontology in RDF and OWL which describes a social network. It contains in particular a large number of way of identification. Because ontologies are hard to design, this one is still unstable. It is also hard to have a global view of the network and it could also lead to too much complexity in description. Nevertheless, some tools as Google Social Graph APIs add a layer to query both XFN and FOAF information on the web as a whole.

Interaction with social sites

Open Social is an example of APIs which allows application to interact with social sites. It is a package of three APIs (Javascript, REST and RPC). Using one of these APIs, a couple (viewer,application) can query a social site (container) about the social informations of a distinct user (owner). For example, the CoolApplication application could use my credential of logged user on Orkut to query it about some of my friends. The readable data are the profile of the users (corresponding to the Orkut profile) and their lists of friends. There are also some discovery capabilities of an unstructured table of data. This feature was probably designed for containers which are not owned by Google, but would like to use the API. That is, of course, the limit of the approach “one designs for himself and tries then to convince others”. The container can implement its own security policy to filter what is readable, but there is no way to specify such a policy and no clue about what should be the default policy. The application could also write two kinds of information : a log of activities and some persistent couples attribute-value. The FBJS, Rest-Like and Connect APIs of Facebook are designed with the same spirit.
These APIs allow some transfer of the data from social site to applications, which make the former more useful and the latter usable on more container. They are nevertheless designed in a centralized way (with a container) and privacy is largely under-specified.

Application specification

FBML is an example of application specification (or at least about rendering a user interface using embedded service calls to Facebook). The principle is relatively different of the interaction specification, since the evaluation of FBML is done by Facebook and the data are not transfered to the application server. FBML could access data using FQL or complex tags which are wrapping of FQL queries. FQL is an SQL-like language and the Facebook Social Network is described as table with indexability constraints which restrict the queries. This kind of APIs limits the expressiveness of the applications, but it protects more the privacy since it did not transfer any data to the applications.

Some clues about design of SN APIs

To summarize, a good API should be easy to use, distributed, easy to query as a whole and allow data transfer with privacy control.

Such APIs must rely on a model of social networks with :

People, which could be identified, authenticated and have a profile
Relationships, which could be of different type
Applications, which could be identified, authenticated, are described by a code and could write some part of the profile of a user.

They must also allow queries with right access. We believe that a good model of Social Network is a distributed knowledge base with right access. We are currently defining such a model.

News API, Meetings, Model, Social Networks

Introduction to Social Networks on Web

December 11th, 2008

Comments Off

Report on the presentation of Pierre Senellart, December 11, 2008.
See slides for more details.
Warning : this report outlines the understanding of the post author (Alban Galland) and nothing more.

Typology

Definition : a social content web site is a web site with users, content and implicit or explicit links between users.

This definition, rather large, cover as much the sites of blogs and of multimedia content as explicitly social networks (SN) based sites. The social content web sites are users based or content based. The users based site may be pure SN (professional as LinkedIn, friendship as MySpace or mixed as FaceBook), blog communities (SkyRock) or dating-sites (Meetic). The content based sites are sites where users could share or annotate content and meet through common interests. they could be catalogs of content (from Music as LastFm to bookmarks as delicious), content-sharing sites (pictures as flickr, videos as YouTube), content-producing site (wikipedia, forums, Yahoo! Answer…) or web-shop (ebay or Amazon).

Models

The natural model is a graph, directed or undirected, which could be multipartite (users, content, tags …). The links between users could be explicit (bridging links, declaration) or implicit (bonding links, through content).

The SN graphs are characterized by

sparse graph
small distances (small world graph, 6 degrees of separation theory)
high transitivity (clustering : two nodes close from a third one are likely to be close themselves)
degree distribution follows a power-law

SN are not randoms graph (which could be only sparse with small distance) nor random modification of a regular grid (which could be only sparse, with small distance and high transitivity). They are closer from free-scale graph, build by adding nodes one by one and linking each new node in order to preserve the property described above.

Algorithms

PageRank : this algorithm is used to rank mode in a graph according to their importance in the graph. It is not helpful on undirected graph since it converges to the degree of the node, but variants exists.
Search of communities : extract communities from the graph could be done using minimum cut/maximum flow algorithms or Markov clustering algorithms (MCL, removing betweeness edges)
Improve Information Retrieval : the tags could be used to improve semantic search. recommendation is also a topic of interest , using Collaborative filtering (user-based) or item based recommendation.Finally IR could be biased with distance on the SN graph

Conclusion

SN is larger than FaceBook!
There is some natural models and some natural research on IR, trust …

News Algorithms, Meetings, Model, Social Networks

Webdam Project

Archive

Brainstorming on Foundations of Web Data Management

Social Networks APIs

Some existing APIs

Mirco-formats

Ontologies

Interaction with social sites

Application specification

Some clues about design of SN APIs

Introduction to Social Networks on Web

Typology

Models

Algorithms

Conclusion

Main menu

Recent Posts

Categories

Archives

Meta