What a Data Clean Room Does With Your Data

There is a phrase you will hear from any company that runs its advertising through a data clean room: we never share personal data, and we never store a profile of you. Both halves are usually true. Neither tells you what happened to your data.

A data clean room is the technology that now sits between most large advertisers, retailers, publishers and the platforms they buy from. It is sold as the privacy-respecting answer to the death of the third-party cookie. The reassurance rests on a real piece of engineering. Understanding that engineering is the only way to see why the reassurance is thinner than it sounds.

What a company is allowed to keep

Start with the rule the whole arrangement is built around. Under the GDPR, information is personal data when it relates to someone who can be identified, directly or indirectly. The American term “personally identifiable information”, or PII, points at the same idea more loosely. The word doing the work is indirectly: a record with no name on it is still personal data if it can be tied back to a person by means reasonably likely to be used.

That same word, identifiable, is the gap companies build into. If data genuinely cannot be linked to a person, it falls outside the rules. So the move is to keep the valuable part of a profile and blur the part that pins it to one individual. A company can hold that you earn within a certain band, shop for certain things and lean a certain way politically, while recording your location as a neighbourhood rather than a street address. The profile now fits perhaps five households instead of one. The claim that follows is that a cohort of five is not an individual, so the data is no longer personal.

That claim holds up to a point, and it is the foundation the clean-room pitch is built on. The catch is the same word a third time. “Identifiable” includes being singled out indirectly, and a blurred profile that fits five people collapses back toward one the moment it is matched against a second dataset that narrows the field.

That collapse is the point of a clean room, not a side effect. The room is engineered to sit inside the identifiability gap: it recovers the precision that blurring gave away, then lets both parties carry on describing the output as though no personal data were involved. The commercial logic settles any doubt about which half matters. What an advertiser pays for is the ability to find, price and target a real person more accurately than before. A genuinely anonymous result, one that could never be acted on at the level of an individual, would not justify the infrastructure. Companies do not build clean rooms to process data that has stopped being about anyone. The precision is the product; the non-personal framing is the packaging.

A company will sometimes reach for gentler explanations. A clean room keeps data sealed against a breach, and it cuts the volume of raw records changing hands. Both are true, and neither pays for itself. Storage makes the point plainly. The profile described above, a row of attributes like an income band, a few interests and a neighbourhood, comes to a kilobyte or two. A one-terabyte drive costs around fifty dollars and holds hundreds of millions of such records, more than enough for a profile on every adult in Europe. No company spends a fortune to avoid a fifty-dollar disk. Security tells the same story, answered far more cheaply by tools built for it. What needs this particular architecture, and justifies a platform that costs orders of magnitude more, is the combination at the centre of it.

What a clean room actually does

Two companies each hold data about the same people. A retailer knows what you bought and the city you bought it in. A publisher knows what you read, an income band it has modelled for you, and the device identifiers that follow you around. Neither wants to hand its customer list to the other. A clean room lets them combine the two without either side downloading the other’s table.

The match happens on a shared identifier, usually a hashed email address or a persistent pseudonymous ID. LiveRamp’s RampID and similar identity systems exist precisely to let one company’s encoding of a person line up with another’s, so the two records about you can be joined without either party seeing the raw key. Snowflake, InfoSum, Google’s Ads Data Hub and AWS Clean Rooms all build the same basic room: a sealed environment where data from two or more parties can be queried together, but where the inputs stay invisible to everyone except their owner.

What comes out is not your record. It is an answer to a question. People who bought this product over-index on these interests and this income band. Build a model of who else looks like them. The output is an aggregate or a trained model, pushed straight into ad targeting. No row with your name on it ever changes hands.

Minimisation as architecture, not reduction

This is where the design is genuinely clever, and where the marketing starts to mislead.

The merged, high-accuracy picture of you is never written down as a single profile. It does not need to be. It can be reproduced on demand by running the query again. The room holds the inputs; the profile is a result it can generate whenever an advertiser asks. Saying “we don’t store a profile” describes a filing decision, not a limit on what the system knows about you.

The accuracy gain is the reason any of this is worth doing, and it does not come from averaging two guesses. It comes from two specific moves, set out below.

Move	What one party holds	What the match adds	Result
Corroboration	A retailer’s inferred “high income”, guessed from purchases	A publisher’s declared income band for the same person	A soft guess becomes a confident attribute
Enrichment	The retailer’s purchase history and city	The publisher’s content interests and device graph	A fuller profile than either company held alone

A clean room turns two partial, deniable datasets into one sharp one, then declines to keep the result on the grounds that it was never really there.

You cannot audit a clean room from the outside, and you were never told your data was in one. What you can see is the raw material that feeds it: the records, profiles and identifiers a company would bring to the match. A Snapshot Scan reveals what is already exposed.

See what’s exposed about you

Three things the “no personal data” claim gets wrong

Strip away the architecture and three plain points remain.

First, on roles. Each company that brings data into the room is a controller of that data under the GDPR. It decided to collect it and decided to feed it into a match. The clean-room vendor that operates the environment is a processor acting on instructions. The parties to a match are often joint controllers of the matching step itself. The Court of Justice has been clear, in the Fashion ID case, that you can be a joint controller of one stage of processing without having any access to, or responsibility for, what a partner does before or after it. A company does not escape controller status by never seeing the other side’s rows. The architecture is built to feel like shared, diluted responsibility. The law locates responsibility at each step where a party decides what happens.

Second, on the disappearing profile. “We never store a merged record, so there is no personal data” quietly swaps one argument for another. Whether a profile is kept is a question about storage. Whether something is personal data is a question about identifiability. The second does not follow from the first. Each party still holds and processes real, identified data about you. The merged view simply is not saved as a row. A profile that can be regenerated at will, and is regenerated every time an advertiser asks, has not stopped existing because it lives in a query instead of a table.

Third, on what the law actually regulates. The GDPR governs processing, and its definition of processing in Article 4(2) explicitly includes “combination”. Matching two datasets about a person and computing over the result is processing personal data, saved or not. The output being an aggregate does not change what was done to produce it. A clean room can be a sound way to minimise who sees raw data. It does not make the underlying combination disappear from the reach of the rules.

The strongest counter-argument, and where it stops

The clean-room industry has a real legal card to play, and it is worth putting fairly.

European identifiability has never been absolute. Since the Breyer ruling in 2016, the test has turned on the “means reasonably likely to be used” to identify someone, weighing whether a combination is realistic or barred by law or cost. In September 2025 the Court of Justice went further. In EDPS v SRB it held that the same pseudonymised dataset can be personal data for the party holding the key to re-identify it, yet not personal data for a recipient who genuinely cannot reverse it and holds nothing else to match it against. Identifiability, the Court said, depends on the circumstances and the position of the actor.

Read quickly, that sounds like permission. It is not, for the situation a clean room creates. SRB concerned a one-way transfer to a recipient who was, in effect, blind: no key, no side data, no realistic path back to a person. A clean room is the opposite arrangement. It exists so that two parties who each already identify the same people can combine what they hold, and the result is routed back toward those same people through the identity graph that made the match possible. The narrow exception the Court carved for a blind recipient does not reach the companies feeding the room, for whom the data never stopped being personal in the first place.

The regulators have not settled the rest. The European Data Protection Board’s pseudonymisation guidelines, issued in draft in January 2025, take the position that data which can be attributed to a person by combining it with additional information remains personal data. Those guidelines are still in draft as this is written, and the Board is reconsidering parts of them in light of SRB. This is a live argument, not a closed one, and none of this is legal advice. The practical point for an individual survives the uncertainty: the company holding your data is processing it whatever a court eventually says about a downstream recipient.

The safety controls, and where they leak

Vendors do not rely on the legal argument alone. They point to technical guarantees, and the better ones are real. Differential privacy, offered in AWS Clean Rooms and enforced in Google’s BigQuery clean rooms, adds calibrated noise so that no single person’s presence or absence changes a result, and cuts queries off once a “privacy budget” is spent. Aggregation thresholds refuse to return an answer that describes too few people.

Notice what Google’s own description concedes: the control “prevents data from being reidentified when it’s shared”. The protection exists because, without it, the join can identify people. That is the admission underneath the architecture.

Two-layer diagram: the combination of two identified datasets is processing personal data under GDPR Article 4(2) and carries no control, while differential privacy and aggregation thresholds sit only on the output exit, cleaning the answer but not undoing the join.

These controls also have soft edges. Differential privacy is usually opt-in, applied only if the data owner switches it on, and its strength is a dial the owner sets. A weak setting buys weak protection. Aggregation thresholds stop the most obvious singling-out, but a determined analyst can ask many narrowly different questions, and small or unusual cohorts leak. None of these mechanisms touch the part that matters most to you: that two identified datasets about you were combined to produce the answer in the first place. They police the exit. They do not undo the processing.

Why this lands on you

A clean room is, from your seat, a place where companies you may never have heard of merge what they each know about you, generate a profile sharp enough to price and target you, decline to keep that profile as a record, and treat the whole arrangement as privacy-friendly. There is no row to ask them to delete, because the row is assembled on demand. There was no notice, because the match happened between two businesses.

You cannot reach inside these rooms. You can shrink what they have to work with. Every clean-room match starts from data a company already holds about you: records bought from brokers, profiles on people-search sites, identifiers leaked or scraped and re-onboarded into an identity graph. Reduce that raw material at its source and you reduce what any match can reconstruct.

Knowing what is being combined starts with seeing what you expose. A Snapshot Scan shows what a search returns about you today; the Mirror maps your full discoverable footprint, the records and identifiers a company would bring to a match. Where that exposure traces back to data brokers and people-search platforms, the Eraser removes it at the source, so there is less of you to rebuild.

Talk to an Analyst

Sources

The technology

Snowflake, “What Is a Data Clean Room? How It Works and Use Cases”.
AWS, “AWS Clean Rooms Differential Privacy”.
Google Cloud, “Differential privacy enforcement in BigQuery data clean rooms”.
LiveRamp documentation, RampID and RampID Translation.
InfoSum, product overview.

The law and guidance

Court of Justice of the European Union, EDPS v SRB, Case C-413/23 P, judgment of 4 September 2025.
Court of Justice of the European Union, Breyer v Bundesrepublik Deutschland, Case C-582/14, judgment of 19 October 2016.
Court of Justice of the European Union, Fashion ID, Case C-40/17, judgment of 29 July 2019.
General Data Protection Regulation, Article 4(1) (personal data), Article 4(2) (processing), and Recital 26 (identifiability and anonymisation).
European Data Protection Board, Guidelines 01/2025 on Pseudonymisation (draft, January 2025).

Inside the Data Clean Room: How Your Profile Is Rebuilt Without Being Stored