Webdam Project

Julia Stoyanovich and Gerome Miklau are going to give a talk at Télécom ParisTech on December 5th

November 15th, 2011

Comments Off

Webdam is very happy to welcome you at Télécom ParisTech on December 5th to the talk organized by Pierre Senellart.

This will take place in “Télécom ParisTech” 46, rue Barrault – 75013 Paris in room C017 in the basement.

Planning:

14:00 Gerome Miklau
15:00 Julias Stoyanovich

Gerome Miklau talk abstract

Using Inference to Improve the Accuracy of Differentially-Private Output

Differential privacy is a rigorous privacy standard that protects against powerful adversaries, offers precise accuracy guarantees, and has been successfully applied to a range of data analysis tasks. When differential privacy is satisfied, participants in a dataset enjoy the compelling assurance that information released about the dataset is virtually indistinguishable whether or not their personal data is included.

Differential privacy is achieved by introducing randomness into query answers, and a major goal of research in this area is to devise methods that offer the best accuracy for a fixed level of privacy. The original algorithm for achieving differential privacy, commonly called the Laplace mechanism, returns the true answer after the addition of random noise drawn from a Laplace distribution. If an analyst requires only the answer to a single query about the database, then a version of the Laplace mechanism is known to offer optimal accuracy. But the Laplace mechanism can be severely suboptimal when a set of correlated queries are submitted, and despite much recent work, optimal strategies for answering a collection of correlated queries are not known.

After reviewing the basic principles of differential privacy, I will describe two examples of how query constraints and statistical inference can be used to construct more accurate differentially-private algorithms, with no privacy penalty. The first example comes from our recent work investigating the properties of a social network that can be studied without threatening the privacy of individuals and their connections. I will show that the degree distribution of a network can be estimated privately and accurately by asking a special query for which constraints are known to hold, and then exploiting the constraints to infer a more accurate final result. The second example comes from the analysis of more typical tabular data (such as census or medical data). When answering a set of predicate counting queries, I will show that correlations amongst the queries can be exploited to significantly reduce error introduced by the privacy mechanism.

Julias Stoyanovich talk abstract

Ranked Exploration of Large Structured Datasets

In online applications such as Yahoo! Personals and Trulia.com, users define structured profiles in order to find potentially interesting matches. Typically, profiles are evaluated against large datasets and produce thousands of ranked matches. Highly ranked results tend to be homogeneous, which hinders data exploration. For example, a dating website user who is looking for a partner between 20 and 40 years old, and who sorts the matches by income from higher to lower, will see a large number of matches in their late 30s who hold an MBA degree and work in the financial industry, before seeing any matches in different age groups and walks of life. An alternative to presenting results in a ranked list is to find clusters, identified by a combination of attributes that correlate with rank, and that allow for richer exploration of the result set.

In the first part of this talk I will propose a novel data exploration paradigm, termed rank-aware interval-based clustering. I will formally define the problem and, to solve it, will propose a novel measure of locality, together with a family of clustering quality measures appropriate for this application scenario. These ingredients may be used by a variety of clustering algorithms, and I will present BARAC, a particular subspace-clustering algorithm that enables rank-aware interval-based clustering in domains with heterogeneous attributes. I will present results of a large-scale user study that validates the effectiveness of this approach. I will also demonstrate scalability with an extensive performance evaluation on datasets from Yahoo! Personals, a leading online dating site, and on restaurant data from Yahoo! Local.

In the second part of this talk I will describe on-going work on data exploration for datasets in which multiple alternative rankings are defined over the items, and where each ranking orders only a subset of the items. Such datasets arise naturally in a variety of application domains, including social (e.g., restaurant and movie rating sites) and biological (e.g., analysis of genetic data). In these datasets there is often a need to aggregate multiple rankings, computing, e.g., a single ranked list of differentially expressed genes across a variety of experimental conditions, or of restaurants that are well-liked by one’s friends. I will argue that blindly aggregating multiple rankings into a single list may lead to an uninformative result, because it may not fully leverage opinions of different, possibly disagreeing, groups of judges. I will describe a framework that robustly identifies ranked agreement, i.e., it finds groups of judges whose rankings can be meaningfully aggregated. Finally, I will show how structured attributes of items and of judges can be used to guide the process of identifying ranked agreement, and to describe the resulting consensus rankings to a user.

Bio:
Julia Stoyanovich is a Visiting Scholar at the University of Pennsylvania. Julia holds M.S. and Ph.D. degrees in Computer Science from Columbia University, and a B.S. in Computer Science and in Mathematics and Statistics from the University of Massachusetts at Amherst. After receiving her B.S. Julia went on to work for two start-ups and one real company in New York City, where she interacted with, and was puzzled by, a variety of massive datasets. Julia’s research focuses on modeling and exploring large datasets in presence of rich semantic and statistical structure. She has recently worked on personalized search and ranking in social content sites, rank-aware clustering in large structured datasets that focus on dating and restaurant reviews, data exploration in repositories of biological objects as diverse as scientific publications, functional genomics experiments and scientific workflows, and representation and inference in large datasets with missing values.

Events Algorithms, Anonymization, Meetings, Members, Visitors

PhD defense of Alban Galland

September 19th, 2011

Comments Off

Alban Galland will defend his PhD, September 28th 2011 at 15:00, room 455 of PCRI at Gif-sur-Yvette (Plateau de Saclay)

Title: Distributed Data Management with Access Control
– Social Networks and Data of the Web

Abstract: The amount of information on the Web is spreading very rapidly. Users as
well as companies bring data to the network and are willing to share with
others. They quickly reach a situation where their information is hosted on
many machines they own and on a large number of autonomous systems
where they have accounts. Management of all this information is rapidly
becoming beyond human expertise. We introduce WebdamExchange, a
novel distributed knowledge-base model that includes logical statements
for specifying information, access control, secrets, distribution, and knowl-
edge about other peers. These statements can be communicated, replicated,
queried, and updated, while keeping track of time and provenance. The
resulting knowledge guides distributed data management. WebdamEx-
change model is based on WebdamLog, a new rule-based language for
distributed data management that combines in a formal setting deductive
rules as in Datalog with negation, (to specify intensional data) and active
rules as in Datalog¬¬ (for updates and communications). The model pro-
vides a novel setting with a strong emphasis on dynamicity and interactions
(in a Web 2.0 style). Because the model is powerful, it provides a clean
basis for the speciﬁcation of complex distributed applications. Because it
is simple, it provides a formal framework for studying many facets of the
problem such as distribution, concurrency, and expressivity in the context
of distributed autonomous peers. We also discuss an implementation of a
proof-of-concept system that handles all the components of the knowledge
base and experiments with a lighter system designed for smartphones. We
believe that these contributions are a good foundation to overcome the
problems of Web data management, in particular with respect to access
control.

News defense, Members, P2P, Privacy, Report

Webdam meeting

March 14th, 2011

Comments Off

Webdam gathered its members at Telecom ParisTech in March 2011. The program and public version of the slides presented are available for consultation.

Events Meetings, Members, Workshop

Daniel and Yael joining Webdam

January 21st, 2011

Comments Off

Webdam is very happy to announce that Daniel Deutch and Yael Amsterdamer are joining the project beginning of February.

Daniel did his PhD with Tova Milo at Tel Aviv U., then a postdoc at the Computer & Information Science Department
in the University of Pennsylvania, Philadelphia.

Yael is starting a PhD also with Tova Milo.

News Members

Bruno Marnette joining Webdam

May 25th, 2010

Comments Off

Webdam is very happy that Bruno Marnette, who previously visited us for 2 months, joined the team for a post doc.

Bruno Marnette was doing his PhD at Oxford University under the direction of Georg Gottlob.

News Members

Fabian & Yannis & Summer School

May 19th, 2010

Comments Off

Fabian Suchanek who recently joined Webdam received the Honorable Mention at ACM Sigmod Dissertion Awards.
Yannis Papakonstantinou will be visiting Webdam for 3 weeks in July.
The BDA summer school co-organized by Webdam is happening now in Les Houches. An important part of the Webdam textbook is being tested.

News Members, SummerSchool, Visitors

Meghyn Bienvenu joining Webdam

April 20th, 2010

Comments Off

Webdam is very happy to announce that Meghyn Bienvenu is joining the project.

She is currently at University of Bremen. She is ranked 1st at CNRS, candidate at LRI, University Orsay.

News Members

Victor Vianu officially joining Webdam

April 9th, 2010

Comments Off

The webdam team is very happy to announce that Victor Vianu, who was already collaborating actively with us, will join us for 14 months, joining first of July. Victor will be in sabbatical first then leave from UCSD, where he is professor.

This great hiring, as well as the one of Fabian, demonstrates that web data management, the topic of the project, is an attractive research topic.

News Members

Spring news

April 2nd, 2010

Comments Off

The webdam team is very happy to welcome Fabian Suchanek as a new member, starting June 1th 2010. Fabian was previously a visiting researcher at the Search Labs of Microsoft Research.
We should be able to announce other great hiring soon…

News Members

Members update

September 8th, 2009

Comments Off

A number of researchers joined Webdam on September 1st

Yannis Katsis, PhD from UCSD
Amélie Marian, Prof. Rutgers, for September and December
Philippe Rigaux, Prof at U. Dauphine
Marie-Christine Rousset, Prof at U. Grenoble – part time

One extended his stay for a year

Evgeny Kharlamov, PhD student at Bolzano

Two are leaving us

Bogdan Marinoiu who finished a PhD and is joining SAP/BO
Bruno Marnette, PhD student at Oxford, after a 2-month internship

A new assistant joined the project

Isabelle Biercewicz

We thank Marie Domingues for her help during the first year of Webdam.

News Members

Older Entries

Archive

Julia Stoyanovich and Gerome Miklau are going to give a talk at Télécom ParisTech on December 5th

Gerome Miklau talk abstract

Julias Stoyanovich talk abstract

PhD defense of Alban Galland

Webdam meeting

Daniel and Yael joining Webdam

Bruno Marnette joining Webdam

Fabian & Yannis & Summer School

Meghyn Bienvenu joining Webdam

Victor Vianu officially joining Webdam

Spring news

Members update

Main menu

Recent Posts

Categories

Archives

Meta