KYC Bypass: AI Voice Cloning Beats ID Checks

A complete KYC bypass — voice cloning to defeat phone verification, virtual cameras to defeat identity-document liveness checks, and real-time MFA interception to defeat session controls — can now be assembled from freely available tools. Two controls were supposed to make credential theft irrelevant. Multi-factor authentication meant that stolen passwords alone were no longer sufficient. KYC liveness checks meant that stolen identity documents alone were no longer sufficient. Both assumptions are now demonstrably broken. Commercially available AI bypasses both controls — defeated not by nation-state capabilities but by freely available tools, free open-source software, and breach databases that supply the raw material at scale.

Understanding how each layer was defeated requires understanding where the source data comes from. Every component of these attacks is assembled from publicly available or previously leaked information. The technique is the application. The data is the fuel. For a wider view of how credential leaks and breach data are weaponised, see our Credential Leaks & Breach Response hub.

The OSINT Foundation: Where the Raw Material Comes From

Before any synthetic voice is generated, before any face model is built, before any phishing panel is deployed, an attacker sources the components. This sourcing phase is pure OSINT — no system access required, no tools beyond a browser.

Voice samples are sourced from LinkedIn video introductions, company YouTube channels, earnings call recordings published on investor relations pages, conference talk recordings, podcast appearances, and media interviews. A 3–5 second clip is sufficient for modern voice synthesis tools to clone with 85% accuracy. Executives, sales leaders, and anyone with a public-facing role have typically published far more than that. A single earnings call contains hours of clean audio.

Facial data is sourced from LinkedIn profile photos, Instagram, Facebook, company team pages, and press photography. At scale, it is sourced from breach databases: the Odido breach released passport scans for 5 million individuals. The IDMerit leak exposed approximately one billion KYC verification records including facial imagery from 26 countries. These are not hypothetical sources — they are actively indexed and traded on dark web forums. iProov identified 31 new crews selling packaged KYC bypass toolkits in 2024 alone, and found a dedicated dark web operation farming identity data specifically for verification spoofing.

Organisational context for vishing pretexts is sourced from LinkedIn (job titles, team structures, who reports to whom), company websites (executive biographies, office locations, technology partnerships), press releases (recent system migrations, new software deployments, organisational changes), and job postings (which reveal the internal technology stack in detail). A vishing caller who opens with a reference to the company’s recent Okta migration or mentions a colleague by name is not guessing — they read the job posting and the LinkedIn profile.

Layer 1: The Voice

AI voice synthesis has crossed a threshold that makes voice-based authentication and phone-based identity confirmation unreliable as standalone controls. Modern cloning tools require 3–5 seconds of audio, achieving 85% voice accuracy. The output replicates not just pitch and tone but cadence, hesitation patterns, filler words, and accent — the subtle signals human listeners use to judge authenticity.

The documented impact is not theoretical. In early 2024, a UK energy firm lost €220,000 after an employee received instructions from what sounded exactly like the company’s CEO. In early 2025, a European energy conglomerate lost $25 million when a cloned CFO voice issued live wire transfer instructions — pauses, tone, and context all matching. In February 2025, the group UNC6040 cloned a CFO voice to infiltrate a Canadian insurance company, resulting in $12 million in unauthorised transfers and exfiltration of sensitive financial data.

At the corporate vishing level used by ShinyHunters and Scattered Spider, voice cloning is combined with caller ID spoofing and a pre-researched pretext. The caller knows the target’s name, their manager’s name, the name of the IT ticketing system, and which SSO platform is in use. That context is assembled entirely from open sources before any call is placed. Deepfake-enabled vishing surged 1,600% in Q1 2025 compared to the end of 2024.

Layer 2: The Session — Bypassing MFA in Real Time

Multi-factor authentication was designed under an assumption that credentials and authentication factors would be captured in isolation — at different times, via different channels. Adversary-in-the-middle (AiTM) phishing invalidates that assumption by sitting between the target and the legitimate service during live authentication.

Tycoon 2FA, dismantled by Europol in March 2026 after generating tens of millions of phishing emails monthly, was the most operationally significant example. The mechanism: a phishing link redirects the target to a proxy server that forwards all traffic to the real Microsoft or Google login page. The user authenticates — entering their password and approving their MFA prompt — entirely successfully. The proxy captures the session cookie that the authentication service issues after successful MFA completion. The attacker then uses that cookie to access the account with no further authentication required. MFA was not bypassed — it completed correctly. It was simply irrelevant to the outcome.

By mid-2025, Tycoon 2FA had reached over 500,000 organisations and accounted for 60% of all Microsoft-blocked phishing attempts. Its dismantling removes one platform, not the technique. The AiTM methodology is widely documented and the infrastructure for running it is rebuilt rapidly.

Layer 3: The Face — KYC Video Injection

KYC liveness detection was designed to distinguish a live person from a static photograph. The requirement that a user blink, turn their head, or track a moving target was intended to make spoofing with printed images impractical. It solved that problem. It did not anticipate the video injection attack.

The attack pipeline does not attempt to fool liveness detection with a photograph. Instead, it replaces the camera feed entirely before the KYC system receives it.

On Android, the camera API is a software interface. An application requesting camera access receives a video stream — but the operating system does not inherently guarantee that stream originates from the physical camera sensor. A virtual camera application installs itself as a camera driver and intercepts API calls before they reach hardware. The KYC application requests the camera; it receives the virtual camera’s feed instead.

iProov’s Red Team demonstrated this using two free tools: Faceswap, an open-source desktop application that applies generative AI to produce real-time face-swapped video, and Virtual Camera: Live Assist, a free Android app that replaces the device camera feed with an incoming video stream. The face model is generated from photographs sourced via the OSINT phase — LinkedIn profiles, company websites, or breach databases. The result: the KYC system receives a video of a face blinking, turning, and tracking on command. All passive and active liveness checks pass. The attacker is not in the room.

iProov recorded a 783% increase in injection attacks targeting mobile web KYC applications in 2024, a 2,665% spike in virtual camera tool usage for identity fraud, and a 300% surge in real-time face-swap attacks. The cost of the full toolchain: effectively zero. The tools are free, the documentation is public, and the attack requires no programming knowledge.

The Neobank Exposure

The practical consequence of this attack chain is most visible in neobank account opening. bunq, operating under a Dutch banking licence issued by DNB, opens accounts in five minutes via its mobile app. The process requires a passport or ID scan and a selfie — verified remotely, with no branch visit, no address documentation, and no secondary identity check. Revolut uses the same core model: document scan plus selfie liveness check, fully remote. These onboarding processes were designed to remove friction. They achieved that goal precisely by reducing the number of identity signals required to one document and one face.

The injection attack maps directly onto this process. A stolen or fabricated ID document plus a face model built from the victim’s photos, fed through a virtual camera during the selfie step, passes both checks. Synthetic identity fraud — using a manufactured identity that passes KYC because it has no fraud history — cost US lenders $3.1 billion in 2023 and is growing at over 20% annually. The fraud workflow circulating on dark web forums involves generating a synthetic face with open-source tools, pairing it with a document template purchased for under $20, and completing the selfie step via virtual camera injection. Account access is then used for money laundering, fraudulent transfers, or sold as a verified account.

The vulnerability here is structural: an onboarding process optimised for conversion and accessibility has a narrow attack surface, but that surface is now well-documented and cheaply exploitable.

What Detection Actually Requires

Each layer has a genuine countermeasure. None of them are defaults.

Against voice cloning: Designated call-back protocols to verified numbers (not numbers provided during the call), live code word systems rotated per session, and organisational policies that prohibit credential changes over inbound phone calls regardless of caller identity. Voice alone cannot be trusted as an authentication factor.

Against AiTM: FIDO2 hardware security keys are the only control that structurally defeats session cookie theft. Unlike TOTP codes, a FIDO2 key cryptographically binds authentication to the specific domain being authenticated — a proxy server relaying traffic from a phishing domain cannot obtain a valid FIDO2 response for the real domain. Google reported zero successful phishing attacks against its 85,000+ employees after FIDO2 deployment. Microsoft attributes its strongest authentication outcomes to the same control.

Against KYC video injection: Hardware attestation (verifying the video stream originates from a real camera sensor via device integrity checks), depth-sensor liveness using IR mapping (a flat video feed cannot produce a 3D facial depth map), and emulator detection through sensor fingerprinting (real devices produce specific accelerometer and gyroscope patterns that emulators replicate imperfectly). These controls exist. They are not universally implemented.

The Hardware Key Caveat

FIDO2 is genuinely the strongest available control against AiTM attacks — but the ShinyHunters and Scattered Spider methodology exposes the precise point where it remains vulnerable: not the key itself, but the process for managing it.

In July 2025, the PoisonSeed campaign demonstrated that social engineering can bypass FIDO2 protections by targeting the provisioning and removal flow rather than the authentication flow. The attack does not try to defeat the cryptographic challenge. Instead, it calls an IT administrator and uses a vishing pretext — a system migration, a security upgrade, a compliance requirement — to persuade them to remove the hardware key from the account and re-enrol with a TOTP authenticator instead. Once the account has been degraded from FIDO2 to TOTP, AiTM interception works normally.

This is precisely the ShinyHunters playbook. Scattered Spider’s documented tactics include calls to IT helpdesk staff requesting credential resets, SSO reconfiguration, and MFA method changes — all under convincing pretexts assembled from public OSINT. The hardware key is only as strong as the policy governing when it can be removed and who has the authority to approve that removal.

Where This Is Going

The current attack chain is effective because the compute required for real-time voice synthesis and face-swap generation has become accessible on consumer hardware. The next phase removes remaining friction.

On-device neural processing units — now standard in flagship Android and iOS devices — are fast enough to run real-time deepfake generation locally, with no perceptible latency and no cloud dependency. Xanthorox AI, documented in 2025, already automates both voice cloning and call delivery without manual preparation. As multimodal AI matures, the combination of voice, video, and contextual text in a single attack session — a caller who sounds right, shows the right face on a video call, and references accurate internal context — will not require specialist skills to deploy.

On the KYC side, as depth-sensor liveness and hardware attestation become more common, the attack surface shifts toward the earlier part of the pipeline: compromising the device itself (via malware that intercepts at a lower level than the camera API) or targeting the identity document supply chain rather than the biometric check. The arms race is already underway.

The implication is not that these controls are useless — it is that they operate on a deteriorating half-life. Controls that were robust in 2023 required supplementing in 2024 and defending differently in 2025. Assuming any single control is permanently sufficient is the error the attack chain was designed to exploit.

Sources

Further Reading: Your Digital Footprint: What Every Layer Exposes · After LockBit: The Ransomware Market Never Shrinks

Bypassed: How Voice Cloning, Virtual Cameras, and Real-Time Interception Defeated the Controls Everyone Trusted

The OSINT Foundation: Where the Raw Material Comes From

Layer 1: The Voice

Layer 2: The Session — Bypassing MFA in Real Time

Layer 3: The Face — KYC Video Injection

The Neobank Exposure

What Detection Actually Requires

The Hardware Key Caveat

Where This Is Going

Sources

If this is your situation