MIT Develops A Faster Way To Preserve Online Privacy

Privacy is a fundamental right that most people cherish. However, while browsing the internet, people often reveal personal information to various online databases that third-party companies could access. When an individual searches for medical symptoms online, the search engine could reveal their health conditions to Google, online medical databases such as WebMD, and even the advertisers associated with these platforms.

Researchers have been developing techniques that allow users to retrieve information from databases privately for decades. However, these techniques have remained slow and impractical. But now, a team of researchers from MIT has developed a scheme for private information retrieval that is about 30 times faster than other comparable methods, making it an efficient solution to preserve online privacy.

To Give Users Control Over Their Own Data

Their scheme allows users to search for information on a database without revealing their query to the server. This could enable private communication by preventing messaging apps from knowing what users are saying or who they are talking to. The technique could also fetch relevant online ads without advertising servers learning a user’s interests.

This work is really about giving users back some control over their own data. In the long run, we’d like browsing the web to be as private as browsing a library. This work doesn’t achieve that yet, but it starts building the tools to let us do this sort of thing quickly and efficiently in practice.

Alexandra Henzinger – a computer science graduate student and lead author of the paper introducing the technique.

The team includes Matthew Hong, an MIT computer science graduate student; Henry Corrigan-Gibbs, the Douglas Ross Career Development Professor of Software Technology in the MIT Department of Electrical Engineering and Computer Science (EECS) and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL); Sarah Meiklejohn, a professor in cryptography and security at University College London and a staff research scientist at Google; and senior author Vinod Vaikuntanathan, an EECS professor and principal investigator in CSAIL. The research will be presented at the 2023 USENIX Security Symposium.

Earlier Techniques

The first schemes for private information retrieval were developed in the 1990s, partly by researchers at MIT. These techniques enable a user to communicate with a remote server that holds a database and read records from that database without the server knowing what the user is reading.

However, to preserve privacy, these techniques force the server to touch every single item in the database, making it impossible to tell which entry a user is searching for. But touching every item in a database slows down the query process, making the scheme impractical.

Simple PIR

To speed up the process, the MIT researchers developed a protocol called Simple PIR, in which the server performs much of the underlying cryptographic work in advance before a client even sends a query. This preprocessing step produces a data structure that holds compressed information about the database contents, which the client downloads before sending a query. The data structure acts as a hint for the client about what is in the database.

According to Henzinger, “Once the client has this hint, it can make an unbounded number of queries, and these queries are going to be much smaller in both the size of the messages you are sending and the work that you need the server to do. This is what makes Simple PIR so much faster.”

However, the size of the hint can be relatively large, making it difficult to implement the technique on real-world devices. For instance, to query a 1-gigabyte database, the client would need to download a 124-megabyte hint, driving up communication costs.

Double PIR

To reduce the size of the hint, the researchers developed a second technique called Double PIR, which involves running the Simple PIR scheme twice. The new method is nearly the fastest possible scheme for PIR and only requires a single server, making it simpler than many top-performing techniques that require two servers.

Double PIR can preserve privacy while running at about 10 gigabytes per second, compared with the previous throughput limit of about 300 megabytes per second. The researchers said their approach is “particularly appealing” because the same hint can be used an unlimited number of times, rendering the cost of computing the hint insignificant in a typical scenario where the same database is accessed many times.

Source: MIT News, Adam Zewe