A single fact about you is worth almost nothing. Your email address, your postcode, the make of the phone you are reading this on — brokers trade raw attributes like these for a fraction of a cent each. One widely cited estimate puts a basic demographic record at around $0.0005.
By the time that same information has been cleaned, matched to your name, scored, and placed in front of a lender, it can decide whether you are approved for a mortgage and what interest rate you pay over the next thirty years.
Nothing was added along the way. The same data was cleaned, matched, and scored until it was worth acting on — the way crude oil gains its value in the refinery, not the ground.
Most writing about personal data stops at the first step: your data is collected, and it is sold. That is true, and we cover the collection side in depth across our data broker ecosystems hub. This article follows what happens next — what your digital footprint is actually used for, who sees it at each stage, and how its value grows as it moves down the chain toward a decision about your life.
What your digital footprint actually reveals about you
The reason raw data is potent is that it predicts things you never disclosed.
In 2013, researchers Michal Kosinski and David Stillwell published a study in the Proceedings of the National Academy of Sciences. Using only Facebook "Likes" from around 58,000 volunteers, their model predicted sexual orientation in 88% of cases, distinguished Black and white respondents in 95% of cases, and separated Democrats from Republicans in 85% of cases. It also estimated personality, intelligence, religious views, and substance use.
None of those volunteers had stated any of those things. The Likes were the kind of data most people consider trivial: a film, a band, a brand. Combined and modelled, they became a profile of attributes the person had kept private. This is the principle the rest of the chain depends on. A footprint is not a stack of facts you handed over; it is raw signal, and modelled, signal becomes inference. Inference is what decisions run on.
From raw data to a named profile: how identity resolution works
Raw signals are scattered across hundreds of sources — a purchase here, a sign-up there, a device seen on one site and again on another. Before any of it can be used to make a decision about a specific person, it has to be stitched together and attached to a name. That step is called identity resolution, and it is its own industry.
LiveRamp is one of the larger operators. Companies send it customer records, and it matches those records against its identity databases to return a single pseudonymous identifier — a "RampID" — that represents you across otherwise disconnected datasets. Its data marketplace, by the company's own description, hosts more than 100,000 audience segments drawn from 125 data providers.
The Austrian research group Cracked Labs documented this model in detail in its report on identity surveillance for marketing. The point of resolution is to make the scattered coherent: to take a thousand low-value fragments and assemble them into one addressable profile that can be bought, scored, and acted on. Once resolved, a profile is ready to be scored, priced, and used.
How your digital footprint is used in credit scoring
The clearest demonstration that ordinary digital exhaust carries financial weight comes from credit research.
In a 2020 study in The Review of Financial Studies, Tobias Berg and colleagues analysed the "digital footprint" left by customers of an e-commerce lender: simple variables such as the device and operating system used, the time of day of the order, and whether the customer's name appeared in their email address. These easily collected signals matched the predictive power of traditional credit-bureau scores for forecasting default.
Whether you shop from an expensive phone, whether your email looks like a real name or a string of numbers, whether you order at 2am — none of which you would think of as financial information — can shape how a lender prices you. Where alternative-data scoring is used, the inputs are broad and frequently invisible to the applicant. What the lender sees is a number and a set of reason codes. What produced that number is rarely shown.
How insurance companies use your data: the C.L.U.E. report
Insurance runs on a similar logic, with an established record exchange behind it.
When you apply for auto or home cover in the United States, the insurer can pull a C.L.U.E. report — the Comprehensive Loss Underwriting Exchange, operated by LexisNexis Risk Solutions. It holds up to seven years of claims history, and LexisNexis states that roughly 99.6% of the auto insurance industry contributes data to it. The report lists each claim, the amounts paid, and a field labelled "Fault Indicator" recording who the reporting insurer held responsible.
So a single phone call to ask whether a small incident is covered can become a logged event that a future insurer reads years later, and prices accordingly. Because C.L.U.E. is governed by the Fair Credit Reporting Act, you are entitled to request your own report and correct errors. Few people know it exists, and fewer check it before it is used to set their premium.
How tenant-screening algorithms decide if you get the apartment
The same pattern reaches housing, and here the consequences have been tested in court.
Many landlords no longer read an application; they read a score. SafeRent Solutions, formerly CoreLogic Rental Property Solutions, produces one of the widely used tenant scores, built from credit history, non-tenancy debt, and eviction records. The landlord sees a single number and an accept-or-decline recommendation.
In 2024, SafeRent settled a federal case in Massachusetts, Louis v. SafeRent Solutions, for $2.28 million. The plaintiffs argued the score disproportionately disadvantaged Black and Hispanic applicants who used housing vouchers, in part because the model did not account for the voucher's contribution to rent. As part of the settlement, SafeRent agreed to stop using its score to screen certain voucher holders without an alternative review. The company did not admit wrongdoing.
Beyond the settlement figure, two details are worth noting. The scoring was opaque even to the people it judged: the inputs and weightings were not disclosed. And a guaranteed source of rent, the voucher, was treated as if it were not there. A decision that shaped where a family could live was delegated to a model neither the applicant nor, arguably, the landlord fully understood.
The same profile sits behind a loan, a premium, and a tenancy decision before you ever speak to a person. A Mirror investigation shows you what those systems can see about you today.
Talk to an AnalystWhat is surveillance pricing? How your data sets the price you pay
In the cases above, your data decides whether you are accepted. In the next, it decides what you are charged.
In January 2025, the US Federal Trade Commission published the initial findings of a study into what it called surveillance pricing: the practice of setting individual prices using personal data rather than a single market price. The agency had ordered information from eight intermediary firms in 2024 and based its first report on documents from Mastercard, Accenture, PROS, Bloomreach, Revionics, and McKinsey.
The findings described pricing tuned to signals including a shopper's precise location and browsing history. The FTC noted that behaviour as granular as mouse movements on a page, or the specific items left abandoned in a cart, could be used to tailor what a person is shown and what they are asked to pay. Law firms now describe surveillance pricing as a coming front in privacy litigation.
With surveillance pricing, the price stops describing the product and starts describing you: your urgency, your alternatives, your willingness to pay, inferred from data you did not know was being read.
How your data is used for fraud and identity risk scoring
The same footprint that prices you is also used to judge whether you are real.
When you log in or check out, many banks and merchants quietly run the transaction past a fraud-and-identity engine. LexisNexis ThreatMetrix is one of the largest. The company reports insight into 1.4 billion tokenised digital identities and says it informs 110 million authentication and trust decisions a day, combining device characteristics and behavioural signals into a real-time risk score that approves, challenges, or blocks the attempt.
For most people this is invisible and, when it works, beneficial — it is part of why obvious fraud is stopped. But it is the same mechanism seen from another angle: your device, your location, and your patterns are continuously scored, and a profile that looks unusual can see a legitimate person declined with no explanation and no obvious way to appeal.
Is your data used to train AI?
The newest buyer of your footprint is the model.
Large language models are trained on vast quantities of scraped text, and that text contains personal information. In a 2021 paper presented at the USENIX Security Symposium, Nicholas Carlini and colleagues showed that an attacker could prompt GPT-2 into reproducing hundreds of verbatim sequences from its training data, including names, phone numbers, and email addresses of real individuals.
This is the difference with AI training. A credit or insurance score is a calculation derived from your data. A trained model can, under the right conditions, hold and surface the data itself. Once your information becomes training material, removing it from the system that learned from it is far harder than correcting a record in a database.
How much is your digital footprint worth? The value-escalation ladder
Trace one person's data through these stages and the price tells the story.
- A raw attribute — a demographic field, an email — trades for roughly $0.0005 to $0.01.
- Cleaned and enriched into a fuller profile, it is worth on the order of $0.10 a record, and more for richer data.
- Resolved to a named person and packaged as an aged marketing lead, $0.50 to $5.
- Sharpened into a fresh, high-intent lead in a valuable category, $10 to $50 for insurance and $20 to $200 for a mortgage or refinance.
- Turned into the decision report a buyer actually pulls — a tenant screen or background check — $30 to $75.
- And the decision that report informs — the loan, the tenancy, the premium across years — is worth thousands.
A caution on reading this: these are different markets, not one record literally resold up a single chain. Enrichment APIs, lead generation, and regulated screening reports are distinct businesses. The ladder is illustrative — it shows the pattern of how value compounds as data is refined.
From raw attribute to decision product, the value climbs by a factor of tens of thousands. The person the data describes captures none of that increase, and carries the full weight of the decision at the end of it.
The feedback loop: how your footprint is used to change your behaviour
So far this is a supply chain — raw material in, decision out. The harder idea is that it is a loop.
In The Age of Surveillance Capitalism (2019), Shoshana Zuboff argues that personal data is claimed as raw material, processed into prediction products, and sold in what she calls behavioural futures markets to parties who want to know what you will do. Her sharper claim is the next one: predictions are not only used to anticipate behaviour but to shape it — what she terms economies of action, a structure aimed at nudging people toward profitable outcomes.
The shaping is measurable, which is how the industry knows it works. Controlled advertising experiments, the gold standard for measuring effect, find that display advertising produces a median lift of around 16.6% in site visits and 8.1% in conversions. Brand-lift and incrementality tests exist precisely to quantify how much a campaign moved you compared with a group that saw nothing.
Put the steps together and the line becomes a circle. Your footprint produces a prediction. The prediction drives an intervention aimed at you — an ad, a price, a prompt. Your response is measured. The behaviour you were steered into becomes new footprint data, which sharpens the next prediction. Research has even framed behaviour modification as a way to improve prediction: change the person, and they become easier to forecast.
You are, in effect, fed your own footprint, scored on how you react, and gradually moved toward the behaviour the system finds profitable. At that point your footprint is doing more than describing you. It is being used to move you.
Who captures the value, and who carries the consequence
Across all these layers, two patterns recur.
The first is concentration. The same handful of firms appear again and again. LexisNexis Risk Solutions sits behind the insurance exchange (C.L.U.E.), a major fraud-scoring network (ThreatMetrix), and identity and credit-adjacent data. The refiners are few, and they see you from many directions at once.
The second is asymmetry. The value created by refining your footprint accrues to the companies that do the refining. The consequences — the rate, the rejection, the price, the nudge — land on you. You supply the crude for nothing and buy back the finished product in the form of a worse deal, usually without being told that is what happened.
Your rights are a weak backstop
There are legal limits. In the EU and UK, GDPR Article 22 restricts purely automated decisions with significant effects and gives a right to human review; in the US, the Fair Credit Reporting Act governs credit, insurance, tenant, and employment reports, and gives you the right to see and dispute them. These matter, and we cover how to use them in our work on access requests and data broker removal. But they are a backstop you have to know about and actively invoke, against systems designed to run silently. Most people never see the decision being made, so they never reach for the right that might check it.
How to reduce your digital footprint as a decision input
You cannot opt out of being scored. You can reduce how much raw material the scoring runs on.
The practical goal is not disappearance; it is reducing your exposure as an input. That means knowing what is currently visible and inferable about you, removing what can be removed at the source, and correcting the records — credit, C.L.U.E., screening files — that feed the highest-stakes decisions. The less coherent and current your profile, the less confidently any of these systems can price, gate, or steer you. A Mirror investigation maps what is findable and inferable today; the Eraser removes what can be removed at the source.
Sources
Academic research
- Michal Kosinski, David Stillwell & Thore Graepel, Private traits and attributes are predictable from digital records of human behavior (PNAS, 2013)
- Tobias Berg, Valentin Burg, Ana Gombović & Manju Puri, On the Rise of FinTechs: Credit Scoring Using Digital Footprints (The Review of Financial Studies, 2020; NBER w24551)
- Nicholas Carlini et al., Extracting Training Data from Large Language Models (USENIX Security, 2021)
- Shoshana Zuboff, The Age of Surveillance Capitalism (2019) — Harvard Gazette interview
- Garrett Johnson et al., The Online Display Ad Effectiveness Funnel & Carryover (Wharton)
Products, filings and reporting
- LexisNexis C.L.U.E. Auto (Comprehensive Loss Underwriting Exchange)
- Louis v. SafeRent Solutions — $2.28M settlement (Cohen Milstein, 2024)
- FTC surveillance pricing study (Federal Trade Commission, January 2025)
- LexisNexis ThreatMetrix / Digital Identity Network
- Cracked Labs, Pervasive Identity Surveillance for Marketing Purposes (LiveRamp analysis, PDF)
- CNIL, Monetisation of personal data: how much is our data worth?