WebDam-MoDaS Workshop in Eilat
Joint workshop on Web data management and Crowd data sourcing
Eilat, October 2012
Presentation
This meeting is joint between Webdam (in its last year) and MoDaS (inits first). The meeting will bring together members of the two projects with the best world specialists in the topics.
Meeting Topic:
We are being overwhelmed by the masses of information that are available. Typically pieces of information are noisy: imprecise, incomplete, inconsistent. This may be the case for global information on the public Web as well as for private information in social networks systems. We are concerned with combining all the techniques we can to evaluate the quality of information and work to improve it. This will typically involve both reasoning in an imprecise environment (as stressed by Webdam) and relying on crowd participation (as advocated by MoDaS). The workshop will bring together the two approaches with an emphasis on the intersection of the two topics but also considering their disjunction to bring the two groups up to date with the two topics. The workshop will serve both as an assessment for Webdam and a brainstorming for MoDaS.
Program chairs: Tova Milo (Tel Aviv University), Serge Abiteboul
(INRIA, ENSCachan)
Brief closing words
The participants
This was a reasonably small workshop (in terms of number of participants). However, the diversity of the talks brought up a number of opportunities of synergies between complementing approaches. The two topics are fascinating and very active. The workshop highlighted the rich interaction between them.
This is a brief conclusion that attempts to summarize some discussions during the brain storming meetings at the workshop. We will ignore a large number of issues that were raised but were already nicely covered in the talks and can be found in their slides. We will focus on a few issues that we felt are more novel and striking.
This brief report is organized as follows. We briefly consider Web data management, then crowd data sourcing. Finally we discuss issues that relate to both together.
Web data management
In the spirit of the evolution of the Webdam project for the last couple of years, the focus of workshop on that topic was on personal and social data management. This is putting more emphasis on imprecision, inconsistencies, beliefs, opinions, etc. This is also bringing up in this setting the issue of ontology alignment (since there is no reason all individuals should use the same ontology).
With notably the work of Webdam, it is now understood that approaches to these problems require combining distributed data management, knowledge management (deductive databases), and probabilistic data management. This clearly requires more work as well as investigating issues still largely unexplored such as privacy.
Crowd data sourcing
For crowd data sourcing, the workshop highlighted the richness of applications notably in sciences. Crowd data sourcing is a more recent topic and this workshop helped clarify some of its aspects:
By essence, crowd data sourcing leads to an open-world semantics where facts, that are still not known to be true, may be stated by individuals.
The issue was raised of the distinction between facts and opinions. It is sometimes possible to approach this issue using probabilities: a fact has quasi sure probability whereas an opinion does not. This relates clearly to beliefs (how many people think A holds) and trust (how trustworthy are these people).
Towards a world of knowledge for machines and humans
To see a simple example, suppose we want to contact a friend. A system may try to help locating this friend using the information available in the web (social network systems, personal agendas, etc.). This network of systems may try to reason collectively to find an answer. A system participating in this task may also target individuals with questions in a crowd sourcing style, possibly ask them to validate some beliefs. Similarly, information can be pushed to the user as a result of a collective effort by machines and people. We can thus envision a world where machines and humans collaborate to process information. This was the idea underlying the cooperation between MoDas and Webdam.
An essential difference between the two projects is distribution. Distribution is in the essence of Web data management. The data sources are distributed (and autonomous). For now, the crowd data sourcing works in MoDas seem to privilege more a centralized setting. But there is no fundamental reason for that.
On the other hand, we kept running into issues that were found to be common to both topics during the workshop:
- Imprecision plays a critical role in both cases with issues such as uncertainty, inconsistencies, trust and belief. In particular, probabilities play an important role for both.
- Scaling of course in the number of machines or of humans.
- Intentionality/Open world.
In both case, data and knowledge exist somewhere and has to be discovered:- By exchanging knowledge between systems (Webdam)
- By asking individuals (Modas)
- By integrating/interrogating knowledge/data bases (many talks)
- By deciding how to allocate tasks (both projects).
All these aspects may be seen somewhat sketching the contours of a wide research area that encompasses both Webdam and Modas.