METHOD

Username and Alias Correlation: Methodology, Tooling, and Likelihood Assessment

A username is not anonymous. It is a behavioural fingerprint dressed as a pseudonym. Tracing aliases across platforms is one of the most reliable methods in modern OSINT.

Most people choose their first handle at fourteen or fifteen and never fully abandon it. They modify it slightly — adding a number, swapping a letter, appending a birth year — but the core string persists. It reappears on gaming platforms, forums, email addresses, commit histories, crypto wallets, and job applications. By the time someone believes their online persona is separate from their real identity, they have typically left dozens of data points connecting the two. The exposure this creates for executives and public figures is part of a wider pattern covered in our Executive Digital Privacy hub.

This article is a practitioner walkthrough of how username-to-identity correlation actually works: the methodology, the tooling, the structure mapping, and the likelihood framework used to assess confidence before drawing conclusions. It covers the full investigative arc, from a single handle to a confirmed attribution, with documented cases at each layer.

The same process used to investigate threat actors is used against private individuals — by employers, by estranged parties, by data brokers building commercial profiles. Understanding the methodology from the analyst’s side is a prerequisite for understanding the exposure it creates.

Why Usernames Persist — The Behavioural Foundation

Before the methodology, the mechanism.

A username is not random. It is selected from a constrained personal vocabulary: a nickname, a character from media consumed in adolescence, a combination of a name and a memorable number, a meaningful word in a second language. The selection is not consciously strategic. The person who chose “darkwolf1994” on a gaming forum in 2010 was not thinking about OSINT. They were thinking about what felt like them.

That feeling of self-identification is exactly why the handle persists. People return to strings that feel right. When they create a new account — a professional email, a development repo, a forum registration — they reach for the same vocabulary. The string may be compressed, reversed, split, or suffixed with a number. But the root survives.

This persistence creates what analysts call the username footprint: a distributed set of handles, across platforms and time, that share enough structural features to be recognisable as products of the same mind.

Four patterns dominate:

Direct reuse. The same string appears on multiple platforms: darkwolf1994 on Reddit, darkwolf1994 on Steam, darkwolf1994 on a defunct forum from 2012. This is the weakest OPSEC posture and the most common.

Incremental variation. The base string is modified systematically: darkwolf, darkwolf94, darkw0lf, d4rkwolf. Each variant is traceable to the root. People who believe variations provide cover are often wrong — the variation pattern itself becomes a signature.

Contextual segmentation. A person uses different handles for different contexts: darkwolf1994 for gaming, d.wolff for professional contexts, project_daemon for development work. This is better OPSEC but rarely complete. The handles leak into each other — a Stack Overflow answer links to a GitHub profile that links to a professional account.

Platform migration. When a platform is breached or shut down, users migrate and often carry their handles. The migration itself creates a trail: the same handle appearing on Forum B two weeks after Forum A closed is a strong temporal correlation.

The Investigation Structure

Username-to-identity correlation runs across four layers. Each layer increases confidence and introduces different evidence types. A rigorous investigation does not skip to the deepest layer — it builds a chain from surface to attribution, with documented reasoning at each step.

Layer 0 — Surface enumeration
    The handle itself, cross-platform presence, account ages
        ↓
Layer 1 — Structural analysis
    Variation patterns, registration timing, platform overlap
        ↓
Layer 2 — Behavioural correlation
    Writing style, timezone, topic focus, metadata artefacts
        ↓
Layer 3 — Hard identity anchors
    Email addresses, phone numbers, real names, addresses — found in account data
        ↓
Attribution — Confidence-weighted conclusion

Most investigations do not need Layer 3 to produce actionable findings. Layers 0–2 frequently produce strong circumstantial attribution. Layer 3 is confirmation, not origin.

Layer 0 — Surface Enumeration

The Starting Point

You have a single handle. The investigation begins by establishing its full cross-platform presence. Sherlock, Maigret, or standard search engines help find the first accounts connected to the handle. The same handle on a different platform is nowhere near enough to assume this is the same person.

What to Record at Layer 0

For each platform where the handle is confirmed present:

FieldWhat to note
PlatformName, category (gaming / professional / forum / code repo)
Account URLDirect link to the profile
Account ageCreation date if visible — or inferred from earliest post
Activity statusLast active date if visible
Public dataWhat the profile exposes by default (bio, location, linked accounts)
IdentifiersAny email, phone, real name in public fields

At this stage you are building a map, not drawing conclusions. A handle present on 12 platforms is not automatically the same person — it requires structural analysis to assess whether the accounts are connected.

Variant Generation

After mapping the primary handle, generate likely variants and search those as well.

Systematic variant patterns:

Original: darkwolf1994

Direct variants:
  darkwolf94
  darkwolf
  darkwolf_1994
  dark.wolf1994
  dark_wolf1994

Leetspeak variants:
  d4rkwolf1994
  darkw0lf1994
  d4rkw0lf

Number manipulation:
  darkwolf93 / darkwolf95 / darkwolf96
  darkwolf19 / darkwolf94
  darkwolf2 / darkwolf_v2

Suffix/prefix patterns:
  thedarkwolf
  realdarkwolf
  darkwolfreal
  darkwolfx
  xdarkwolf
  darkwolf_official

Run Sherlock/Maigret on each variant. Cross-reference results for platform overlap — the same person often uses different variants on different platforms, but the presence of two variants with accounts on the same platform is suspicious (usually a second account, often after a ban).

Layer 1 — Structural Analysis

Surface enumeration tells you where the handle exists. Structural analysis tells you whether the accounts are connected and starts building a temporal and contextual picture.

Account Age Analysis

Account creation dates are often publicly visible or inferable. The pattern matters.

An account created in 2012 on a gaming forum, 2014 on Reddit, 2016 on GitHub, and 2019 on a professional forum is consistent with a single person’s online evolution — joining platforms as they became relevant to their life stage. A cluster of accounts created within 48 hours of each other suggests either a fresh identity setup (new OPSEC posture) or a migration event (platform shutdown prompted a move).

What to look for:

  • Clustering: multiple accounts created close together (migration or fresh setup)
  • Gaps: accounts created years apart may represent different life phases — or different people
  • Temporal inconsistency: an account claiming to be from 2009 with post IDs that don’t match the claimed age

Reddit, for example, encodes account creation date in the user’s profile. Older accounts on breached platforms often expose creation dates in the leaked data. Forum software typically shows join date on the profile page.

Platform Category Mapping

The combination of platforms an account appears on reveals the subject’s profile. A single person is typically consistent across platform categories relevant to their age, profession, and interests.

Presence on: a specific gaming franchise forum + a Linux distribution forum + a cybersecurity subreddit + a development code repository is internally consistent. Presence on: a children’s gaming platform + a professional derivatives trading forum + a Norwegian language learning site is inconsistent and may indicate handle reuse by different people, or deliberate platform diversity as a cover.

Build a platform category map:

Gaming:         Steam (2011), Xbox Live (2012), GameFAQs (2013)
Development:    GitHub (2015), Stack Overflow (2015), GitLab (2018)
Professional:   LinkedIn (2017)
Forums:         r/netsec (2014), Hacker News (2016)
Dark web:       [redacted forum] (2016) — same handle, confirmed below

This map is the first structural argument for or against a unified identity.

The Migration Trail

When a platform shuts down, users migrate. The migration itself is evidence.

Documented example: The Silk Road migration trail

When Silk Road was seized in October 2013, a significant portion of its vendor community migrated to Agora, Black Market Reloaded, and later AlphaBay. Vendors who had been careful to use unique handles on Silk Road made a critical error: they carried their reputation ratings with them, and reputation in dark web markets is tied to consistent identity. Analysts tracking specific product types could follow vendor handles from platform to platform because the vendors themselves advertised the connection (“Previously trading as X on Silk Road, same products, verified by escrow”).

The migration trail is also relevant outside dark web contexts. When MySpace declined, users moved to Facebook — and the usernames, profile photos, and linked email addresses they used in the transition created a documented bridge between old and new identities.

How to exploit the migration trail in an investigation:

  1. Identify the platforms the subject appears on and their creation dates
  2. Research what platforms closed or declined just before each new account appeared
  3. Check if the subject mentions their previous platform presence in their early posts on the new platform (“Been lurking since [Platform X], finally made an account here”)
  4. Check archive.org and cached versions of old platform profiles

Layer 2 — Behavioural Correlation

Behavioural evidence is harder to fake than structural evidence, and patterns emerge whether the subject intends them to or not.

Writing Style Analysis (Stylometry)

Every person writes in a recognisable way. The combination of: vocabulary range, sentence length variation, punctuation habits, common misspellings, preferred transitional phrases, capitalisation patterns, and punctuation style creates a fingerprint that is consistent across contexts.

Stylometric analysis requires sufficient text volume (typically 500+ words per sample) and computational tools. For investigative purposes, a lighter approach works: identifying consistent wording and spelling features and testing for their presence across candidate accounts. A lighter analysis produces lower-confidence results, which means the need for supporting evidence from other layers rises proportionally.

Indicators to document:

Punctuation habits:

  • Does the subject use double spaces after periods or a space before a period?
  • Do they use em dashes or en dashes or double-hyphens?
  • Do they use Oxford commas consistently? (This can also indicate nationality.)
  • Do they omit apostrophes (its vs it’s, cant vs can’t)?
  • Do they over-use ellipses?
  • Do they use the same literary devices and stylistic errors?

Capitalisation patterns:

  • Full sentences with capitalised first word, or stream-of-consciousness lowercase?
  • All-caps for emphasis, or italics/asterisks?
  • Capitalisation of proper nouns that others don’t capitalise (specific games, software, concepts)

Vocabulary and phrasing:

  • Unusual word choices that appear across multiple accounts
  • Non-native language patterns that persist (a French speaker will often produce specific calque constructions in English that are consistent and hard to fake)
  • Technical jargon from a specific field used in non-technical contexts

Post length and structure:

  • Does this person write short, punchy posts or long elaborated ones?
  • Do they use headers and structure in long-form posts?
  • Do they use bullet points or run everything together?

Timezone and Activity Pattern Analysis

Every account has an activity pattern. Forum posts, code commits, social media activity, and gaming sessions all carry timestamps. Over time, these timestamps reveal when the subject is active — which maps to their timezone, sleep schedule, and work hours.

How to extract activity patterns:

For forum accounts: collect timestamps of every post visible. Plot frequency by hour of day (UTC). The distribution will cluster around waking hours in a specific timezone.

For Reddit: Reddit Investigator (or manual review of post timestamps) maps posting frequency over time. Heavy activity at 09:00–18:00 UTC-5 is consistent with US Eastern working hours. Heavy activity at 20:00–02:00 UTC is consistent with a European evening poster.

For GitHub: commit timestamps are exact and publicly visible. A developer who commits primarily between 14:00 and 23:00 UTC is almost certainly in the Americas or working odd hours in Europe.

For dark web forums: same analysis, with the note that sophisticated actors use Tor’s latency to blur timezone signals — but forum session lengths and posting cadence still leak information.

Combining timezone analysis with structural data:

A Reddit account posting consistently in UTC+1 evening hours, combined with a GitHub account with commits in the same UTC+1 window, combined with forum posts on a French cybersecurity forum — is strong evidence of a French-timezone subject regardless of what language they post in.

What breaks this: deliberate schedule manipulation, VPN use that shifts apparent timezone, or collaborative accounts (multiple people sharing a handle — uncommon but documented in activist and criminal contexts).

Topic Focus and Expertise Signals

A person’s genuine expertise and interests bleed through across accounts even when they are trying to maintain separate personas.

If someone is genuinely expert in industrial control system security, they will reveal that knowledge in conversations even when operating under a different handle. The technical depth, the specific references, the tools they mention, the vulnerabilities they discuss with familiarity — these are hard to fake and hard to suppress.

Build an expertise map for each candidate account:

  • What domains does this account post about?
  • At what depth? (Observer vs. practitioner vs. expert)
  • What tools, products, or frameworks do they reference?
  • What is their stance toward specific technical debates? (Opinions on specific tools or approaches persist across accounts)
  • What do they know that is niche enough to narrow the pool of people who would know it?

An account that demonstrates deep familiarity with a specific proprietary enterprise software package, combined with another account that mentions working at a specific type of company where that software is used, is narrowing rapidly toward attribution.

Cross-Platform Mention Analysis

People talk about themselves. Even when operating under a pseudonym, people mention things that happened to them: a job change, a move, a health event, a device purchase, a trip. Across multiple accounts over multiple years, these mentions accumulate into a biographical sketch.

Systematic collection of self-referential statements across all candidate accounts:

Account A (gaming forum, 2013):
  "just started a new job so haven't been playing as much"
  "just moved to a new city, still unpacking"
  "my dog keeps knocking over my headset"

Account B (development forum, 2015):
  "works at a startup now, long hours"
  "dog just knocked over my coffee"
  "remote work setup is finally sorted"

Account C (security forum, 2018):
  "contracting now, much better hours"
  "dog woke me up again at 4am"
  "back in the same city I left in 2013 apparently"

This is the mosaic effect applied to username correlation: three fragments that individually prove nothing, but together describe a consistent life trajectory and a persistent detail (the dog) that links all three.

Build a biographical timeline from these mentions and test it for internal consistency. Inconsistencies may indicate different people or fabricated backstory. Consistency across years and platforms — especially when the details are mundane and unremarkable — is strong evidence of a single identity.

Layer 3 — Hard Identity Anchors

Hard anchors are direct links between a pseudonymous handle and verifiable real-world identity data. They are the most evidentially significant findings but often the hardest to reach through open sources.

Email Address Extraction

Email addresses are the single most valuable identity anchor in username correlation. They link accounts, they appear in breach data tied to real names, and they often follow patterns that reveal real identity.

Sources for email-to-handle correlation:

Breach databases: HaveIBeenPwned indexes email addresses from public breaches. Supplementary sources — Dehashed (paid), LeakCheck (paid), Snusbase — provide the reverse lookup: given an email, what breaches contain it, and what other data (username, phone, address, real name) appeared alongside it.

Given a username, the search path is:

  1. Find accounts where the email is exposed in public profile fields (many older forums showed emails before privacy defaults changed)
  2. Check if the email is mentioned in any posts (people used to post email addresses in forum signatures routinely before 2010)
  3. Search breach data for the username — some breach datasets include username + email pairs
  4. If an email is found, run it through HaveIBeenPwned and supplementary sources for cross-breach data

Git repository commits: GitHub, GitLab, and Bitbucket embed the committer’s configured email in every commit. The commit history of a public repository is a gold mine for email extraction.

Documented example: Ross Ulbricht / Dread Pirate Roberts

The investigation that led to Ross Ulbricht’s arrest as the operator of Silk Road began with a single username. The handle “altoid” appeared in a BitcoinTalk post in 2011 advertising a “new kind of online marketplace” — an early, pre-launch advertisement for what became Silk Road. The same handle “altoid” had been used in a Stack Overflow post asking for technical advice about the Tor-based hidden service he was building, and in that post, the user had linked to their personal Gmail address: rossulbricht@gmail.com.

The opsec failure was complete: the same pseudonymous handle that promoted Silk Road had asked for development help under a real-name email address. The FBI matched them in January 2013. Ulbricht was arrested in October 2013.

The lesson is not that Ulbricht was careless in a unique way. It is that username reuse across contexts — combined with a single moment of not switching context — produces an irreversible link.

Phone Number Correlation

Phone numbers appear in breach data, in account recovery fields that some platforms expose, and in “people search” databases that aggregate carrier records.

Given a confirmed email from Layer 3 extraction, reverse-lookup services (Spokeo, Whitepages, BeenVerified for US; European aggregators for EU) will often return a phone number associated with that email. That phone number can then be run through social media lookup tools — Telegram’s reverse phone lookup (accessible via the app’s search before privacy changes), WhatsApp status lookup — to find additional accounts.

The chain: username → email → phone → additional accounts.

Real Name in Account Data

Many platforms exposed real names in their public profile fields at some point in their history, even if those fields are now hidden or removed. Two sources recover this data:

Web archives: The Wayback Machine (archive.org) and Cachedview.nl capture historical versions of profile pages. A profile that now shows only a username may have shown a real name in 2014 when the platform’s privacy defaults were different.

Breach data: When a platform is breached and its user database leaked, the schema often includes whatever real data the platform collected. A forum that required a real name on registration will have that name in its leaked database, tied to the username.

Documented example: The Sony PlayStation Network Breach

The 2011 PlayStation Network breach exposed 77 million user accounts. The leaked data included not just emails and usernames but real names, addresses, and dates of birth — the full registration data Sony had collected. For any user who had maintained a separate PSN handle as part of a pseudonymous identity, that breach connected their handle to their real name irrevocably, years before they discovered the breach had occurred.

Structure Mapping — The Attribution Graph

Across a full investigation, the evidence collected produces an attribution graph: a network of nodes (handles, emails, platforms, identifiers) connected by edges (confirmed links, probable links, speculative links).

Mapping this graph visually serves two purposes: it makes the reasoning transparent, and it reveals gaps and contradictions that need resolution before confidence can be assessed.

Node Types

● Username / Handle
◆ Email address
■ Phone number
▲ Real name
★ IP address (from logs, if available)
○ Platform account (containing username + available data)

Edge Types

━━━  Confirmed link (same account, same breach record, same commit)
╍╍╍  Probable link (strong stylometric match, consistent timeline)
┄┄┄  Speculative link (platform overlap, shared topic interest only)

Example Graph Construction

Starting from handle project_daemon:

Surface enumeration returns:

  • Reddit: u/project_daemon (active 2014–2019)
  • GitHub: project-daemon (active 2015–2020)
  • Defunct security forum: project_daemon (active 2013–2016)
  • Stack Overflow: project_daemon (created 2015)

GitHub email extraction returns:

  • p.daemon.dev@protonmail.com (most commits)
  • james.k.92@gmail.com (3 early commits before switching to ProtonMail)

Breach data lookup on james.k.92@gmail.com returns:

  • Adobe breach (2013): email + username jknight1992 + password hash
  • LinkedIn breach (2016): email + full name James Knight + employer Meridian Security Consulting

Structural analysis:

  • Reddit timezone: posting primarily 19:00–23:00 UTC → GMT+0 or GMT+1
  • GitHub commits: primarily 20:00–00:00 UTC → consistent
  • Forum posts reference “working in London” in 2014

Biographical mentions across accounts:

  • Reddit (2015): “just moved from Leeds to London for work”
  • Forum (2014): “contracting now, doing ICS assessments”
  • Stack Overflow profile bio: “security engineer, UK”

Resulting graph:

project_daemon ━━━ GitHub: project-daemon ━━━ james.k.92@gmail.com
      |                                                |
      ┄┄┄ Reddit u/project_daemon                     ▼
      |                                    jknight1992 (Adobe breach)
      ┄┄┄ Stack Overflow: project_daemon            |
                                                     ▼
                                          James Knight (LinkedIn breach)
                                          Meridian Security Consulting
                                          London / Leeds biographical match

The chain from pseudonym to real identity runs through two breaches and a single careless early commit. The confidence level is high — but must be documented as a conclusion, not assumed.

Likelihood Correlation Framework

Not all evidence is equal. A likelihood framework forces explicit reasoning about confidence and prevents overconfident attribution.

This is where investigator discipline matters most. The framework exists to counter the natural tendency to treat accumulating signals as proof. Every signal must be weighed independently, and the confidence of the overall attribution is constrained by the weakest confirmed link in the chain — not by how many signals point in the same direction. Facts should present the evidence. The investigator should not be looking for facts to support a narrative they have already constructed.

Use a five-level framework:

LevelLabelMeaning
5ConfirmedDirect link: same account, same breach record, real name in profile
4High confidenceMultiple independent convergent lines of evidence; would hold up to scrutiny
3ProbableStrong circumstantial case; one or two lines of evidence could be coincidence, but combination is unlikely
2PossibleSuggestive but inconclusive; could be explained by coincidence
1SpeculativeSingle weak signal; noted but not treated as evidence

Applying the Framework

Every link in the attribution graph receives a level. The confidence of the final attribution is constrained by the weakest link in the chain.

In the James Knight example above:

LinkEvidenceLevel
project_daemon → GitHub project-daemonSame handle, same platform category, accounts created same month4
GitHub project-daemon → james.k.92@gmail.comExtracted from git commit log — direct5
james.k.92@gmail.com → jknight1992 (Adobe)Breach record — direct5
jknight1992 → James Knight (LinkedIn)Breach record — direct5
James Knight → London / ICS security profileBiographical mentions consistent with LinkedIn employer3

Overall attribution confidence: Level 4 (the weakest link — GitHub handle to project_daemon — is probable but not confirmed from the same breach record).

To reach Level 5, you need: a breach record or document that places the string project_daemon in the same record as james.k.92@gmail.com, or a public statement by the subject connecting the handles.

Common Errors That Inflate Confidence

False positive accumulation: Each individual signal is treated as evidence, but signals from the same upstream source (e.g., two breach databases that both sourced from the same original leak) are counted twice. Track sources, not instances.

Confirmation bias in stylometry: Once you have a candidate identity, you will find stylometric matches more easily because you are looking for them. Stylometric analysis should be conducted before a candidate identity is identified where possible.

Handle collision: Common handles exist on multiple platforms held by multiple different people. darkwolf1994 may be a coincidental reuse by two people with similar naming habits. Platform-specific base rates matter: a handle that exists on 300 platforms is more likely to be the same person than a handle that exists on 12 (which could be the same person on their main platforms and different people on the rest).

Temporal fallacy: Two accounts with the same handle created in the same year may both be attempts to claim a popular name before someone else takes it. Test for timeline consistency, not just co-occurrence.

Documented Cases: Pattern Lessons

The GeoHot / George Hotz Pattern

George Hotz, who gained notoriety for jailbreaking the iPhone in 2007 and the PlayStation 3 in 2010, operated under the handle geohot consistently across every platform he used. His case illustrates the inverse lesson: coherent identity management where no separation was attempted. For public figures who want to maintain a pseudonymous secondary identity, the GeoHot pattern — a consistent public handle tied directly to real identity — is the opposite of what they need. But it shows how immediately an analyst can map a public persona when no separation exists.

The Silk Road Vendor Tracking Pattern

DEA and FBI investigators tracking Silk Road vendors used username correlation as their primary method before any technical compromise of the platform. The methodology:

  1. A vendor’s handle on Silk Road appeared consistently with specific product descriptions and shipping times that indicated a particular geographic region
  2. The same handle, or a close variant, appeared on harm-reduction forums where users reviewed vendor quality — these forums were not anonymised, and posts included IP metadata and often email registration data
  3. The bridging event: a vendor, proud of their reputation, linked their Silk Road feedback profile to a clear-web forum post under their regular handle

Multiple arrests followed this exact pattern: vendor handle → clear-web forum presence → real-world identity. The technical anonymity of Tor was not compromised. The operational security was.

The Sabu / Hector Monsegur Attribution Chain

The full chain of Monsegur’s identification follows the Layer 0 to Layer 3 progression precisely:

Layer 0: Handle Sabu appeared on multiple IRC networks and in press coverage of LulzSec operations.

Layer 1: Account age analysis revealed Sabu had a long history on hacktivist IRC channels predating LulzSec, consistent with someone embedded in the community since at least 2007.

Layer 2: Writing style, technical expertise (web application attacks, social engineering), and timezone-consistent activity (active primarily in Eastern US hours) were documented across IRC logs.

Layer 3: An earlier handle — connected to Sabu through stylometric overlap and platform migration — had been used in a context where a real IP address was logged. That IP address resolved to a housing project in New York. Cross-referencing with public records identified a small number of residents. Monsegur was identified as a likely match based on known online activity and background. The FBI approached him, and he cooperated.

The IP log was the anchor that turned a probable attribution into a confirmed one.

What Breaks Attribution Chains

Understanding where attribution chains fail is as important as knowing how to build them.

Consistent OPSEC from the start. If every account was created from Tor, uses a unique handle generated for each platform, carries no biographical detail, and has no email address in common with any other account — the chain cannot be built. This standard is rarely maintained in practice. The effort required to be consistently anonymous over years is higher than almost anyone sustains.

Handle recycling by platforms. Some platforms recycle abandoned handles. An account that appears to be the same subject may be a different person who claimed the handle after the original user deleted their account. Check account age and activity history before assuming continuity.

Deliberate misinformation. A sophisticated subject may plant false biographical details to mislead. If someone consistently claims to be from a specific city but their timezone analysis contradicts it, the biographical claim is unreliable. Weight observable metadata (timezone, writing style) over stated claims.

Collective handles. Some accounts are operated by multiple people — administrative accounts, hacktivist collective accounts, customer service personas. Attribution to an individual from a collective account is impossible without additional evidence.

Persona exhaustion. A subject who has been burned before may abandon a complete online identity and start fresh. The fresh identity leaves no trail into the old one — unless the bridge event occurs: the moment they mention the old handle, link to old work, or reuse a structural element from the old persona.

The OPSEC Failure Taxonomy

Across documented cases, attribution chains break at predictable points. Understanding the taxonomy of failures helps both investigators (who are looking for them) and subjects (who are trying to avoid them).

Category 1: The Bridge Post. A single post that connects two identities. Ross Ulbricht’s Stack Overflow question with his Gmail address. A vendor linking their Silk Road profile in a clear-web forum post. An activist posting under their real name in a thread they started under a pseudonym. Bridge posts are often made under time pressure, frustration, or technical error (logged into the wrong account). They are nearly impossible to fully erase once made.

Category 2: The Credential Carry. Reusing a password across a pseudonymous account and a real-identity account. When either account is breached, the password hash appears in the dataset. If the hash is cracked, the same password linked to the real-name account confirms the connection. Password managers and unique passwords per account are the countermeasure. Few people maintain this discipline consistently.

Category 3: The Platform Migration. As documented above — carrying a handle or reputation from a closed pseudonymous context into a new one. The continuity that makes migration convenient is the same continuity that makes attribution possible.

Category 4: The Metadata Artefact. A photo uploaded under a pseudonymous account that contains GPS coordinates in its EXIF data. A document uploaded to a repository with Author metadata that contains a real name. A code commit with a real-name email configured from before the developer understood the implications.

Category 5: The Temporal Slip. Activity at a time that is inconsistent with the claimed identity. If someone claims to be based in Australia but their posting activity peaks at 09:00–17:00 UTC (consistent with European working hours), the claim is contradicted by observable behaviour.

Category 6: The Content Anchor. Unique knowledge that narrows the subject pool to a handful of people — a specific internal workplace incident, a medical condition, a very specific childhood event. Even when communicated pseudonymously, sufficiently unique content can identify a person to anyone who knows them in real life. Content anchors are not exploitable by all analysts, but they are exploitable by people who know the subject.

Applying This to Your Own Exposure

The same methodology described in this article is used commercially to build profiles of private individuals. Data brokers aggregate cross-platform identity links as part of their standard product. Employers run informal versions of this process before hiring decisions. Journalists run it as background research. Estranged parties run it after separations.

Running this methodology against your own handles — treating yourself as the subject — reveals what is findable before someone else finds it.

The specific questions to answer:

Which handles are linked across platforms? Run Sherlock and Maigret on every handle you have used in the past decade, including ones you consider abandoned.

What email addresses appear in commit history? If you have contributed to public repositories, run the git log extraction against your own commits.

What does HaveIBeenPwned return for every email address you have used? The breach data for your email addresses contains whatever real data was in those platforms’ databases — often including your name, address, and phone number.

What biographical content is scattered across your accounts? The 2011 post mentioning your employer, the 2014 post mentioning your neighbourhood, the 2017 post mentioning your car — assembled together, they describe you.

What does your posting timezone reveal? If your pseudonymous accounts post consistently in a timezone that matches your real location, that is a data point.

The investigation that maps this exposure before someone with adverse intent does is the difference between assumed privacy and confirmed exposure. If this kind of exposure affects your organisation, a Corporate Audit maps the full surface — contact us.

If this is your situation

If this kind of exposure affects your organisation, a Corporate Audit maps the full surface.

See Corporate Audit

Share this briefing

If this was useful, sharing it helps others protect themselves. It also helps keep the intelligence briefings free.