The '20 Questions' Problem with the SAFE Act

If the SAFE Act passes, FBI Agents could craft sequential queries to access content without a warrant. A simple “U. S. person query cap amendment” can fix this.

Introduction

Congress is reviewing whether and how to renew part of the permission it retroactively gave to the administration to conduct warrantless surveillance on Americans, in the form of Title VII of the FISA Amendments Act of 2008. As of the date of writing, the current deadline for that Title’s expiration is April 19, 2024.

Under this Act, NSA gathers a vast “foreign intelligence information” database, without a warrant, that “targets” over 200,000 foreign nationals, who need not have engaged in any crime in order to be targeted. By design, this database includes the communications of the Americans who communicate with those foreign nationals.

The FBI has a query interface for this database that mixes content and metadata selectors, and permits agents to view the content of matching communications with a US person endpoint (if the other endpoint is a previously targeted foreign party located outside of the US) without a warrant.¹ The FBI has queried this database for information on U.S. persons over four million times in the last three years. The secret Foreign Intelligence Surveillance Court reviews the agencies’ targeting, minimization and querying procedures, and receives reports of abuses, and certifies annually that the procedures – and therefore the surveillance – is constitutional, whether the abuses get remedied or not.

This “backdoor search loophole” blows a hole in the Fourth Amendment.

In response, pro-civil liberties legislators in both parties have proposed, in two bills, a change that would require a warrant by default before the FBI is able to query this “702 database” for information on U.S. persons. Congress deadlocked on the issue, trapped between the administration, the agencies and the Intelligence Committees on one side, who would prefer to see warrantless surveillance continue without meaningful reforms; and, on the other side, progressives, the Freedom Caucus, and the House Judiciary Committee, who have tried to enable floor votes on this and other key surveillance reforms.

Now, in an attempt at compromise, the Chair of the Senate Judiciary Committee (Dick Durbin, D-IL) and the Ranking Member of Judiciary (Mike Lee, R-UT) have introduced the “SAFE Act.” The bill contains a new proposed warrant requirement that would differ from the warrant requirements that reformers previously proposed. It is Restore the Fourth’s view that the agencies’ current practices under Title VII of FISA do violate the Fourth Amendment. Politically, civil society’s best chance in the near future to remedy this gap in law enforcement’s access to the fruits of mass surveillance does rest with the SAFE Act, which is why we have endorsed it. However, this article seeks to inform legislators, journalists and civil liberties folks, by discussing a key problem with the SAFE Act’s warrant requirement as it stands.

The SAFE Act’s “Twenty Questions Problem” with Queries

The SAFE Act’s warrant language requires agents to get a warrant only in order to view the full actual content of the U. S. person’s communications queried. Agents would be able to query using selectors that are associated with U.S. persons in combination with any other query parameters, and determine whether that query returns a response or responses – without needing to get a warrant.

This language is flawed. The fundamental issue is that the number of queries an agent can make, with respect to a U. S. person and without getting a warrant, is not capped at all. With enough content-sensitive queries returning the number of matches, the non-contents output alone is enough to make inferences about the content of communications, without obtaining the warrant that the SAFE Act envisions would be required of FBI agents who want to learn the content of a message. Decades of academic literature, well-known to the intelligence community, provide optimal strategies for minimizing the number of queries required to ascertain content information. The FBI could therefore, with only a little trouble, skirt the ‘warrant requirement’ for any U. S. person, and obtain key details about content without having to trigger the SAFE Act’s requirements for ‘covered queries.’ We’re dubbing this the “Twenty Questions problem.”

This drafting problem could have been easily avoided if there were more people in the room well versed in computer science research. That would identify ahead of time unintended cybersecurity and civil liberties consequences in tech-related bills. The intelligence agencies hire a ton of people like this, which is why we’re confident that they’re both aware of this potential issue, and capable of exploiting it. As former NSA Director Hayden put it, the intelligence community feels it to be a duty to “play to the edge” of what the authorizing language in statute allows. So we need help from our allies to draft and pass such an amendment, so that all U. S. persons can truly be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures.

Understanding 702 Queries Through Game Theory

Game Theory is the branch of mathematics, computer science and related fields that models the behavior of players in a multiparty contest (or collaboration), and the optimal strategies that players can use to win or otherwise optimize their measure of success. In this particular game, to simplify, the law enforcement and intelligence community (“LE/IC”) strives to retain the ability to access as much data collected via mass surveillance as possible, without pesky friction like warrant restrictions required by the US Constitution. Civil liberties organizers strive to implement controls (such as legislation) that ensure warrant requirements are respected.

The popular game “Twenty Questions” is played as a contest between a guesser and the possessor of a secret noun. The guesser attempts to identify the secret in as few yes/no questions as possible, and in no more than 20, in order to win. The possessor of the secret must answer honestly and promptly each time.

If the reader has played the game before, they may be familiar with a popular and efficient strategy employed by experienced guessers: starting with an extremely broad query and narrowing the focus. Optimally, the guesser proceeds each time with a question that divides the set of remaining possible nouns roughly in half. This strategy is known to computer scientists as “binary search.” A typical starting sequence in the game is:

Is it tangible/physical?
(if yes) Is it or has it ever been alive?
(if yes) Is it an animal/human?
etc.

This strategy is extremely efficient. It is also so reproducible that it can not only be done by human children, but was also implemented successfully by 2003 in a commercially available children’s toy.

In a more complex game, where the aim is to discover any substantial substring of (or even the topic of) a body of text (communication content), the principle of cleverly crafting successive yes/no queries remains the same. A wide body of mathematical and computer science literature dating back to the 2000s (see References, below) provides strategies. These allow for a human or computer-assisted guesser to reach a precise conclusion about text content most of the time, in surprisingly few iterations.

So, let’s define a game between two parties. Alex is a person with Fourth Amendment protections (“U. S. person”) who has communicated in text with a foreigner targeted under Section 702 (“target”). His antagonist is Sam, an FBI agent who wants a promotion and sticks to rules, but doesn’t care much for constitutional intent.

Let’s assume Sam has already executed a query that has returned information that one matching communication exists in the database, one endpoint of which is Alex. Let’s also assume Sam is curious, but not yet confident enough to pursue a judicial warrant. Sam’s goal is to find out as much of the message contents as possible.

Sam wins if she discovers more than a threshold amount (c%) of the message content only by making successive queries that return the number of matches and each of those messages’ metadata but not content. (She may believe this will help her to convince a judge to issue a warrant that unminimizes the rest of the message content, or to secure a conviction with or without the full message content, thereby gaining a promotion.)

Alex wins the game if his privacy remains protected in line with the best traditions of Fourth Amendment case law, by the content of his communication remaining hidden and him remaining unprosecuted for its contents unless a probable cause warrant is obtained.

Due to the nature of NSA mass surveillance and data retention, Sam is the only active player. Alex, the U. S. person, doesn’t even know queries are happening, and may never know even if he is prosecuted. If he wants to litigate the constitutionality of this collection or querying, he won’t be able to prove it happened, because all the proof is in the government’s hands, and they’re under no obligation to disclose it. Even if the government accidentally leaks that Sam queried Alex, and that their prosecution of Alex rested on that query, Alex will still likely be barred from litigating, because of the state secrets privilege. Under the SAFE Act, records of U. S. person queries will only be kept if they are “covered queries” – i.e., the kind of queries that trigger a warrant by revealing (the whole of) the communication’s content, so even if discovery were permitted, there would be no records to unearth.

It’s clear, from a game-theoretic perspective, that under the SAFE Act, the FBI agent can follow a “Twenty Questions” strategy, evade both a warrant and the necessity of record-keeping, and discover substantial information about the content; and that the U. S. person queried would have no possible winning strategy.

Examples and Scenarios

Here are some fictitious examples of unconstitutional query sequences that would be allowed by the SAFE Act, but precluded by our proposed fix (further below).

Scenario 1: Racial Justice Protests

Alex, a US citizen, is present – in any capacity – at Black Lives Matter protests, a movement we know to have been previously the object of Section 702 surveillance. She holds first-hand information from observations of protester and police activities that is in the public interest and intended for imminent distribution. However, out of frustration with what she perceives as a lack of appropriate coverage by US media outlets, she contacts journalists in foreign media outlets, such as the UK, Germany and Qatar, with a sample of the information in her possession.

At the event, the police arrested a journalist from a small paper critical of the local government. A legal dispute is ongoing, in which the government’s defense counsel alleges that the journalist attacked a police officer. Alex believes her information can break this deadlock, and is in the public interest.

However, one of the journalists Alex has contacted, unbeknownst to her, has previously interviewed an individual source that was hostile to US forces in Afghanistan and was therefore on the FISA target list. Now the foreign journalist is on the target list too. Alex’s email to Marwan is below.

FBI Agent Sam is conducting a counterterrorism assessment after a group designated by the U. S. as a “foreign terrorist organization” threatened a “black day for the FBI.” She has received a report about a suspect taking photos of a sensitive facility in Seattle the FBI director will visit.

She queries a date range and the selectors “Marwan photos violent attack plan Seattle security director.” The above message’s metadata appears but the content is “minimized” and does not appear on screen.

Agent Sam wants to know what is in the email, and executes the following sequence of queries to intentionally learn more about the Fourth Amendment-protected message content.

Marwan violent attack plan Seattle security director blueprint → 0 matches
Marwan photos violent attack plan Seattle security director black → 1 matches

Agent Sam realizes the query was overbroad and tries to clarify whether this corresponds to the threat.

Marwan photos violent attack plan Seattle security director “black day” → 0 matches

Finding that it does not, on a hunch Agent Sam checks:

Marwan photos violent attack plan Seattle security director “black lives” → 1 matches

At this point the message likely pertains to First Amendment-protected activity inside the United States, and Agent Sam should stop. However, she feels law enforcement is underappreciated, and she has been following the controversy around the journalist’s arrest in Seattle closely. She wants to help the FBI settle the matter in favor of the police if she can. Seeing the date on the email (metadata), the match to “Black Lives” (content selector) and the recipient being an Al Jazeera employee (metadata), Agent Sam decides to test her hypothesis.

Marwan photos violent attack plan Seattle security director journalist -> 1 match
Marwan photos violent attack plan Seattle security director police protest -> 1 match

Satisfied that this message is of use to law enforcement, she sends what he knows about the message to her associate, who is investigating the disturbances during the civil unrest in Seattle. Based on this information alone establishing its relevance, a warrant is obtained for a search of Alex’s devices or home in relation to the evidence of the altercation within the United States, in spite of its lack of relevance to counterterrorism.

Other Scenarios

One could easily, given more time, imagine other scenarios implicating situations of interest to the FBI that are highly controversial. For example:

2. (Non-governmental) persons present outside the Capitol on 6 January 2021

3. Christian religious groups contacting foreign co-religionists

4. Anti-corruption research involving offshore bank employees as sources.

Regardless of the specific scenario and whether it falls along partisan lines, we should all be concerned with a bill that has the intent of protecting the US Constitution, but that due to a technicality, fails to do so adequately as intended.

How do we fix the Twenty Questions problem?

We are not suggesting that Sens. Durbin and Lee switch the language on warrants in the SAFE Act out for the better language on warrants in the Government Surveillance Reform Act or the Protect Liberty and End Warrantless Surveillance Act. That would be politically difficult in this context.

We are suggesting that the Twenty Questions problem can be adequately addressed with a short amendment that allows an agent only a specific small number of queries (not “covered query”, but any query) pertaining to communications of a U. S. person that the FBI agent knows or could reasonably know to be the same individual, within a defined and reasonable time period, before getting a warrant under the procedures the SAFE Act outlines.

We base part of our analysis of the status quo query interfaces on the Attorney General’s publication of the heavily redacted 2020 NSA Querying Procedures, and part on private communications from allied civil rights organizations. A proper mathematical analysis of the problem would require complete documentation on the query interface, which we do not have nor think will ever be provided to the public without excessive redaction, and regulation to ensure that it does not change or become more powerful, which we do not think likely to ever be passed in Congress both for political reasons and because of the technical specialization required.
↩︎