Anonymisation in social-based P2P networks
Report on the presentation of Fabrice Le Fessant, February 23th, 2009
See slides for more details.
Warning : this report outlines the understanding of the post author (Alban Galland) and nothing more.
In a context of P2P file sharing networks, some malicious peer may try to keep a log of the queries issued on the network in order to build upload and download profiles of other peers. To avoid censorship in particular, one may want to design a network where non-trusted peers may contribute to the life of the network without being able to locate publisher neither querier. A social-based P2P network naturally fits this requirement : friends are not hidden but trusted and they can anonymise the exchanges.
There is already some social based P2P networks, such as the turtle network. It is close to gnutella but based on social network, which means that connexions are chosen and trusted. The search is done by flooding, which is quiet expensive in bandwidth.
There is also some anti-censorship networks, such as freenet. It manages small encrypted documents. The search is done by depth-first search, oriented by a notion of distance between users. The data is accessed by replication on the back-path. Such a network could be easily limited to friends.
Gnunet is another example of anti-censorship networks. The search is done by a limited breadth-first search. It use a shortcut system to randomly modify the id on the queries for the anonymisation. There is also a credit system to avoid flooding. It has been shown that these two optimizations are indeed a weakness for the anonymisation.
Some clues about Orkut
Some simulations have been done based on a trace of Orkut. They raised interesting questions about the topology of the network.
- What is the distribution of the nodes degrees?
- What happen for the connectivity when removing nodes?
The answers of these questions deeply depend of how the crawl have been made.
- How to manage big files?
- How to specify the level of the attacker to have different theoretical guaranties?
- How to restrict real network to sub-network?
The load should be balanced between query time and publication time. Most of the P2P methods are based on the query, but one could also think of diffusion process when a resource is published (through subscription to feeds, replication or local index tables materialization). Both methods could be mixed. It is the case in structured networks such as DHT where a distributed index is materialized and queried.
Finally, the methods should be optimized depending of the file type and the file size.