Video calls have been compromised by deepfakes.AFP via Getty ImagesFor years, the video call solved a basic trust problem. If an email looked suspicious, a person could get on camera and confirm the request. In business, family disputes, online dating, remote work and even legal settings, seeing and hearing someone became a practical shortcut for believing them.Deepfakes are breaking that shortcut. The old assumption that seeing someone on screen is enough is no longer safe. I have written before about why seeing is believing is dead for visual evidence. Live video calls bring that same problem into the moment when people are making decisions.The Arup case shows the problem. In its 2024 financial statement, Arup said criminals used “fake voice, signatures and images” to execute a fraud, while also stating that its networks were not compromised and that no personal or project information was accessed. Arup’s financial statement described the incident as a social engineering-led attack, not a traditional network breach.If the video call was the step used to reassure the target, then video itself was not the solution. It was part of the delivery mechanism.After-the-fact digital forensics can help reconstruct what happened, preserve evidence, test files, analyze devices and support litigation. But a digital forensic report does not pull back a wire transfer, unsay a disclosure or undo a false admission made during a live call. Further, in a digital forensic examination, an examiner may have the original file, metadata, device history, and surrounding evidence. In a live-call attack, the synthetic media may never become a preserved original on the victim’s device. The call happens, the instruction is given, and the harm moves immediately.MORE FOR YOUReal-time detection matters because it can interrupt trust before action. It can tell the user, the platform, or the organization that something about the face, voice, timing, or stream deserves verification through another channel.Yes, detection is not perfect. A detector produces a score, a flag or a warning. It should not be treated as absolute proof that a person is real or fake, for the same reason AI detection tools cannot carry the whole burden of authentication. But the warning does not have to make the final decision. Sometimes it only needs to make everyone slow down, verify the request through another channel or bring in an expert before the conversation turns into evidence, money leaves an account or sensitive data is disclosed.Cloud Detection: The Second Data ProblemThe natural next step is video call plus deepfake detection. If video alone can be faked, then video needs a technical warning layer.But the architecture matters. If detection requires a live face and voice stream to be sent to a cloud API, the organization has solved one risk by creating another. It has moved biometric data, call content, or derived signals away from the endpoint and into another system.This is not just a privacy-law issue. Faces and voices are among the most sensitive data a system can process. GDPR treats certain biometric data as a special category of personal data, and Illinois’ biometric privacy law includes voiceprints and scans of face geometry in its definition of biometric identifiers. GDPR Article 9 and 740 ILCS 14/10 show why moving biometric signals around is not just a technical decision.If a deepfake detector sends a live face or voice stream to a provider, the user has to trust that provider’s security, logging, retention and deletion practices. If that data is saved, there is always a breach risk. On-premises detection may reduce vendor exposure, but the stream still leaves the endpoint and lands on another system that must be secured.On-device detection is cleaner. The call can be analyzed where it is already being seen and heard. The detector can flag risk without sending the raw stream to a vendor, a cloud endpoint, or a separate internal server.On-Device Detection: Technical ChallengesThe privacy argument does not make the engineering easy. A detector that runs on a laptop or phone has to live with limited processing power, battery pressure, heat, latency, and whatever else the device is doing during the call.Larger models may catch more subtle artifacts, but they are harder to run live on consumer devices. Smaller models may be fast enough for a video call, but they have to be efficient without becoming useless against better fakes. That is why on-device detection is an important step, but not a finished endpoint. The work is not just about building a detector. It is about making detection fast enough, light enough and private enough that it can run where people actually need it: inside the live call experience.That is the right frame for on-device detection. It is not a courtroom conclusion. It is not a replacement for digital forensics. It is a local, real-time signal that something is wrong and that the user should stop before money, access, testimony or sensitive information moves.The Best Security Is The Layer People Do Not Have To Think AboutThe adoption argument may be just as important as the privacy argument. Security that depends on perfect human behavior fails because humans are busy, rushed and social. Poor security habits survive because better habits ask too much from the person at the wrong moment.Cloud deepfake detection can become another step someone has to enable, route, approve and trust before or during a call. Those extra steps are often the first things to disappear when people are busy, embarrassed, pressured or trying to move quickly. That may work in a security operations center. It is less likely to work in a scam against an elderly person, a video call on a dating app or website, an employee receiving an urgent executive request, a lawyer preparing a remote deposition or a witness in a high-pressure video meeting.On-device detection can be closer to invisible security. It can be built into the call experience, run by default and stay out of the user’s way unless something looks wrong. When risk rises, the user does not need to remember a separate process. The system can surface the one action humans can actually perform in the moment: pause and verify another way.