How Data Brokers Use Your Digital Footprint

A single fact about you is worth almost nothing. Your email address, your postcode, the make of the phone you are reading this on: raw attributes like these trade for fractions of a cent in data markets. One widely cited estimate puts a basic demographic record at around $0.0005.

By the time that same information has been cleaned, matched to your name, scored, and placed in front of a lender, it can decide whether you are approved for a mortgage and what interest rate you pay over the next thirty years.

Nothing magical was added along the way. The same data was refined by brokers, identity-resolution firms, consumer-reporting agencies, fraud vendors, and pricing systems until it became decision-grade.

This is not a definition of a digital footprint or a self-audit checklist; it starts one step later, when traces have been collected, matched, and used as inputs for credit, insurance, housing, fraud, advertising, and pricing decisions.

Most writing about personal data stops at the first step: your data is collected, and it is sold. That is true, and we cover the collection side in depth across our data broker ecosystems hub. This article follows what happens after collection: who refines the footprint, which decision systems use it, and why reducing the underlying records is an Eraser problem as much as a Mirror problem.

What a footprint reveals once it becomes an inference

The reason raw data is potent is that it predicts things you never disclosed.

In 2013, researchers Michal Kosinski and David Stillwell published a study in the Proceedings of the National Academy of Sciences. Using only Facebook "Likes" from around 58,000 volunteers, their model predicted male sexual orientation in 88% of cases, distinguished Black and white respondents in 95% of cases, and separated Democrats from Republicans in 85% of cases. It also estimated personality, intelligence, religious views, and substance use.

None of those volunteers had stated any of those things. The Likes were the kind of data most people consider trivial: a film, a band, a brand. Combined and modelled, they became a profile of attributes the person had kept private. This is the principle the rest of the chain depends on. A footprint is not a stack of facts you handed over; it is raw signal, and modelled, signal becomes inference. Inference is what decisions run on.

Identity-resolution firms turn scattered traces into one profile

Raw signals are scattered across hundreds of sources: a purchase here, a sign-up there, a device seen on one site and again on another. Before any of it can be used to make a decision about a specific person, it has to be stitched together and attached to a name. That step is called identity resolution, and it is its own industry.

LiveRamp is one of the larger operators. Companies send it customer records, and it matches those records against its identity databases to return a single pseudonymous identifier (a "RampID") that represents you across otherwise disconnected datasets. Its data marketplace, by the company's own description, hosts more than 100,000 audience segments drawn from 125 data providers.

The Austrian research group Cracked Labs documented this model in detail in its report on identity surveillance for marketing. The point of resolution is to make the scattered coherent: to take a thousand low-value fragments and assemble them into one addressable profile that can be bought, scored, and acted on. Working out which firms in this chain are brokers is a separate exercise, since few use the label.

Credit scoring uses footprint signals as risk inputs

The clearest demonstration that ordinary digital exhaust carries financial weight comes from credit research.

In a 2020 study in The Review of Financial Studies, Tobias Berg and colleagues analysed the "digital footprint" left by customers of an e-commerce lender: simple variables such as the device and operating system used, the time of day of the order, and whether the customer's name appeared in their email address. These easily collected signals matched the predictive power of traditional credit-bureau scores for forecasting default.

Whether you shop from an expensive phone, whether your email looks like a real name or a string of numbers, whether you order at 2am, none of which you would think of as financial information, can shape how a lender prices you. Where alternative-data scoring is used, the inputs are broad and frequently invisible to the applicant. What the lender sees is a number and a set of reason codes. What produced that number is rarely shown.

Insurance exchanges turn claims history into pricing signals

Insurance runs on a similar logic, with an established record exchange behind it.

When you apply for auto or home cover in the United States, the insurer can pull a C.L.U.E. report, the Comprehensive Loss Underwriting Exchange, operated by LexisNexis Risk Solutions. It holds up to seven years of claims history, and LexisNexis states that roughly 99.6% of the auto insurance industry contributes data to it. The report lists each claim, the amounts paid, and a field labelled "Fault Indicator" recording who the reporting insurer held responsible.

So a single phone call to ask whether a small incident is covered can become a logged event that a future insurer reads years later, and prices accordingly. Because C.L.U.E. is governed by the Fair Credit Reporting Act, you are entitled to request your own report and correct errors. Few people know it exists, and fewer check it before it is used to set their premium.

Tenant-screening vendors turn records into housing decisions

Housing works the same way, and here the consequences have been tested in court.

Many landlords no longer read an application; they read a score. SafeRent Solutions, formerly CoreLogic Rental Property Solutions, produces one of the widely used tenant scores, built from credit history, non-tenancy debt, and eviction records. The landlord sees a single number and an accept-or-decline recommendation.

In 2024, SafeRent settled a federal case in Massachusetts, Louis v. SafeRent Solutions, for $2.28 million. The plaintiffs argued the score disproportionately disadvantaged Black and Hispanic applicants who used housing vouchers, in part because the model did not account for the voucher's contribution to rent. As part of the settlement, SafeRent agreed to stop using its score to screen certain voucher holders without an alternative review. The company did not admit wrongdoing.

Beyond the settlement figure, two details are worth noting. The scoring was opaque even to the people it judged: the inputs and weightings were not disclosed. And a guaranteed source of rent, the voucher, was treated as if it were not there. A decision that shaped where a family could live was delegated to a model neither the applicant nor, arguably, the landlord fully understood.

One decision profile can sit behind a loan, a premium, a fraud check, and a tenancy decision before you ever speak to a person. The Eraser reduces the broker and provider records that feed those systems; the Mirror maps what is visible first.

Reduce broker exposure

Surveillance pricing uses personal data to set what you pay

In the cases above, your data decides whether you are accepted. In the next, it decides what you are charged.

In January 2025, the US Federal Trade Commission published the initial findings of a study into what it called surveillance pricing: the practice of setting individual prices using personal data rather than a single market price. The agency had ordered information from eight intermediary firms in 2024 and based its first report on documents from Mastercard, Accenture, PROS, Bloomreach, Revionics, and McKinsey.

The findings described pricing tuned to signals including a shopper's precise location and browsing history. The FTC noted that behaviour as granular as mouse movements on a page, or the specific items left abandoned in a cart, could be used to tailor what a person is shown and what they are asked to pay. Law firms now describe surveillance pricing as a coming front in privacy litigation.

With surveillance pricing, the price stops describing the product and starts describing you: your urgency, your alternatives, your willingness to pay, inferred from data you did not know was being read.

Fraud and identity systems score whether you look real

The same footprint that prices you is also used to judge whether you are real.

When you log in or check out, many banks and merchants quietly run the transaction past a fraud-and-identity engine. LexisNexis ThreatMetrix is one of the largest. The company reports insight into 1.4 billion tokenised digital identities and says it informs 110 million authentication and trust decisions a day, combining device characteristics and behavioural signals into a real-time risk score that approves, challenges, or blocks the attempt.

For most people this is invisible and, when it works, beneficial: it is part of why obvious fraud is stopped. But it is the same mechanism seen from another angle: your device, your location, and your patterns are continuously scored, and a profile that looks unusual can see a legitimate person declined with no explanation and no obvious way to appeal.

Adjacent use case: scraped personal data can also end up in AI training corpora, where removal is harder than correcting a record in a database. That is important enough for its own article. Here the focus stays on systems that use a footprint to score, price, approve, challenge, or reject a person.

The decision layer: seven places a digital footprint is used and what the buyer sees in each. Credit and lending (a default-risk score), insurance (LexisNexis C.L.U.E.), tenant screening (SafeRent), personalised pricing (FTC-studied Revionics and Bloomreach), fraud and identity (LexisNexis ThreatMetrix), ad targeting (LiveRamp RampID), and AI training (web scraping)

How much is a decision-ready footprint worth?

Trace one person's data through these stages and the price tells the story.

A raw attribute, a demographic field or an email, trades for roughly $0.0005 to $0.01.
Cleaned and enriched into a fuller profile, it is worth on the order of $0.10 a record, and more for richer data.
Resolved to a named person and packaged as an aged marketing lead, $0.50 to $5.
Sharpened into a fresh, high-intent lead in a valuable category, $10 to $50 for insurance and $20 to $200 for a mortgage or refinance.
Turned into the decision report a buyer actually pulls, a tenant screen or background check, $30 to $75.
And the decision that report informs, the loan, the tenancy, the premium across years, is worth thousands.

The value-escalation ladder on a log scale: one person's data priced from $0.0005 as a raw attribute, rising through enrichment, identity resolution and decision-grade scoring to a decision worth thousands - roughly a 100,000-fold climb that the person captures none of

A caution on reading this: these are different markets, not one record literally resold up a single chain. Enrichment APIs, lead generation, and regulated screening reports are distinct businesses. The ladder is illustrative; it shows the pattern of how value compounds as data is refined.

From raw attribute to decision product, the value climbs by a factor of tens of thousands. The person the data describes captures none of that increase, and carries the full weight of the decision at the end of it.

Downstream use: some systems do more than decide. A footprint can produce a prediction, the prediction can drive an ad, price, prompt or nudge, and the response can become fresh data. That behaviour-change loop matters, but it sits downstream of the decision layer: first the footprint becomes a prediction, then the prediction becomes an intervention.

Who captures the value, and who carries the consequence?

Two things recur across every one of these layers.

Concentration. The same handful of firms appear again and again. LexisNexis Risk Solutions sits behind the insurance exchange (C.L.U.E.), a major fraud-scoring network (ThreatMetrix), and identity and credit-adjacent data. The refiners are few, and they see you from many directions at once.

Asymmetry. The value created by refining your footprint accrues to the companies that do the refining. The consequences land on you: the rate, the rejection, the price, the nudge. You supply the crude for nothing and buy back the finished product in the form of a worse deal, usually without being told that is what happened.

Your rights are a weak backstop

There are legal limits. In the EU and UK, GDPR Article 22 restricts purely automated decisions with significant effects and gives a right to human review; in the US, the Fair Credit Reporting Act governs credit, insurance, tenant, and employment reports, and gives you the right to see and dispute them. These matter, and we cover how to use them in our work on access requests and data broker removal. But they are a backstop you have to know about and actively invoke, against systems designed to run silently. Most people never see the decision being made, so they never reach for the right that might check it.

How to reduce your footprint as a decision input

You cannot opt out of every scoring system. You can reduce how much raw material those systems receive.

The practical goal is not disappearance; it is reducing your exposure as an input. That means knowing what is currently visible and inferable about you, removing what can be removed at the source, and correcting the records (credit, C.L.U.E., screening files) that feed the highest-stakes decisions. The less coherent and current your profile, the less confidently any of these systems can price, gate, or steer you. The Eraser removes broker and provider records at the source; a Mirror investigation is the lighter mapping-first option when you need to see what is findable and inferable today.

Sources

Academic research

Michal Kosinski, David Stillwell & Thore Graepel, Private traits and attributes are predictable from digital records of human behavior (PNAS, 2013).
Tobias Berg, Valentin Burg, Ana Gombović & Manju Puri, On the Rise of FinTechs: Credit Scoring Using Digital Footprints (The Review of Financial Studies, 2020; NBER w24551).
Nicholas Carlini et al., Extracting Training Data from Large Language Models (USENIX Security, 2021).
Shoshana Zuboff, The Age of Surveillance Capitalism (2019).
Garrett Johnson et al., The Online Display Ad Effectiveness Funnel & Carryover (Wharton).

Products, filings and reporting

LexisNexis C.L.U.E. Auto (Comprehensive Loss Underwriting Exchange).
Louis v. SafeRent Solutions, $2.28M settlement (Cohen Milstein, 2024).
FTC surveillance pricing study (Federal Trade Commission, January 2025).
LexisNexis ThreatMetrix / Digital Identity Network.
Cracked Labs, Pervasive Identity Surveillance for Marketing Purposes (LiveRamp analysis).
CNIL, Monetisation of personal data: how much is our data worth?

How Your Digital Footprint Becomes Prices, Scores and Decisions