Deepfake Detection in the 2026 AI Safety Report

Haydar Talib

Feb 6, 2026

AI Safety Report 2026 cover page with tagline

This article originally appeared on LinkedIn

Earlier this week Yoshua Bengio posted the 2026 International AI Safety Report, which featured 100 leading AI experts representing dozens of nations worldwide. The 200+ page report summarizes risks and threats to society posed by general-purpose AI (i.e. massive AI models). The authors compile evidence from extensive sources across a swath of subtopics in an attempt to provide, in one document, a complete and accurate description of AI capabilities, as they stand today, in a digestible form for policymakers worldwide so that they can make the most informed decisions on new laws regarding AI development.

This edition of the Deepfake Field Notes will review the portions of the AI Safety Report that pertain to deepfakes and deepfake detection. Before I get into my critiques, I want to highlight that the AI Safety Report is a laudable effort, representing some of the world's greatest minds on AI exercising their moral conscience towards trying to do good for society. The scope and ambition of an undertaking such as this is incredible, and all contributors deserve recognition for devoting the time towards this project.

The Risks & Dangers of AI

Since the report is on AI safety, much of the content is about how AI can misbehave (alignment problem), lead to disproportionate power dynamics (financial and social inequality), or be abused to nefarious ends (e.g. cyber attacks or deepfakes).

Based on my read it's clear that the expertise of the majority of contributors leans heavily into AI model development itself, and therefore the bulk of the recommendations and accompanying ideas revolve around model alignment (at the risk of oversimplifying). For those unfamiliar with the idea of alignment as it pertains to AI, the simplest definition is that it entails developing AI technologies that uphold human values. At this point the astute reader will burst in an outpouring of all kinds of questions, and you can be assured that the people building AI, who are trying to tackle alignment, have those very same questions. But I digress.

Alignment, in any case, can be thought of as an inward-facing problem, in the sense that it is something AI developers need to think about in order to adjust how they build AI; this relates directly to one of the original 10 Key Ideas on Deepfakes, which called for AI developers to develop their tech responsibly.

The problem of deepfakes is about AI running rampant in the world. Strictly-speaking, it is the scenario where there is an absence of safeguards, or if they've been otherwise bypassed. And so this is an outward-facing problem. The AI outputs would now be interacting with us, and they're up to no good.

Regardless of how well we collectively address AI alignment at the source, we have to work on the outward-facing problem as well. That is, we have to assume that AI abuses will always take place, whether it's deepfakes used for misinformation, blackmail, reputational harm, identity theft, or any other fraud and criminal activity.

The Deepfake Threat & Mitigations, Per the Report

Section 2.1.1. of the report, titled AI-generated content and criminal activity, describes such abuses with some disturbing detail and numbers. The majority of reported deepfakes are of a sexually abusive nature, disproportionately targeting women and children. While the reported numbers may not seem massive today (an average of 250 cases per month), it is more likely that the reporting itself is many orders of magnitude off from the reality. And the problem is growing.

Still in section 2.1.1, under Mitigations (emphasis mine) - "Certain AI and machine learning tools can be trained to detect anomalies in images and videos and thus to identify fake images, but their effectiveness remains limited (317)."

It's from this very first mention of deepfake detection in the report that I began to realize how it misses the mark on the subject.

"While not foolproof on their own, new research shows that a combination of these mitigations within a broader ecosystem of standards and policies can compensate for their respective limitations and help users detect AI-generated content more reliably (324*)."

The mitigations now include the idea of watermarking deepfake content as a method for assuring media provenance.

Section 3.2 Risk management practices introduces something called "Defence in Depth," which is a familiar concept to anyone interested in security, however this is entirely discussed in the context of defense layers within the AI model creation and operation process. i.e. it is entirely an inward-facing process, as discussed in the report.

We have to get to section 3.3 Technical safeguards and monitoring before any mention of deepfake detection appears again, as "AI-generated content detection." Only three short paragraphs are devoted to this subtopic, two of which discuss watermarking. Watermarking already had its own subsection titled "AI system provenance techniques help trace the uses and impacts of systems," and strictly speaking watermarking is not AI-generated content detection.

Ok, but there's still one more paragraph on deepfake detection, right? It's gotta be good!

Here is that paragraph in its entirety (emphasis mine) -

"Researchers are also working to develop AI-generated content detectors (1203, 1204, 1205*) to help identify AI-generated content in the wild, even when no watermark or metadata is available. However, these identification techniques have a limited success rate."

This is the end of any kind of commentary or analysis on deepfake detection in the entire report. The authors have seemingly ignored or missed an entire, fully complementary approach to defending against AI abuses; one that can deliver results today, where there continues to be much scientific advancement as well as commercial growth.

Are There ANY Valuable Takeaways on Deepfake Detection?

At this point, to give the report a proper benefit of the doubt in terms of its dismissal of deepfake detection, I read through all of the works referenced whenever deepfake detection was mentioned. And as I read through those papers, I continued down the rabbit hole to several other works that were mentioned as large claims were made. Maybe my own biases as an expert working on identification, authentication and deepfake detection for the past decade, have clouded my judgment somehow? Maybe the works cited in the report, which I'd not been aware of before, were blind spots in my own knowledge?

Nope.

An extensive written deep dive of each of the references (and their references thereafter) will be way too much for this one article. The main issue is not even the quality of the individual papers that have been selected, but rather the choice of papers used in the first place.

No seminal works from the deepfake detection domain appear. Focusing on audio deepfake detection as an example, I noticed that no papers from the many years of ASVspoof were referenced, nor any papers from major speech technology conferences where much deepfake detection work has appeared over the past ten years (ICASSP, Interspeech or Odyssey).

Of the five papers cited, only two put forward any kind of performance analysis. In both cases, the studies are extremely limited on the one hand, and it would be a gross extrapolation to say that they demonstrate "limited success rate" of deepfake detection methods. This is not a slight against the original works, because their authors were unlikely to have foreseen how their papers would be referenced here.

To get into some detail, I will focus a little bit on the paper that had a more thorough analysis of deepfake detection technologies, the citation for which is -

317 N. A. Chandra, R. Murtfeldt, L. Qiu, A. Karmakar, H. Lee, E. Tanumihardja, K. Farhat, B. Caffee, S. Paik, C. Lee, J. Choi, A. Kim, O. Etzioni, Deepfake-Eval-2024: A Multi-Modal In-the-Wild Benchmark of Deepfakes Circulated in 2024, arXiv [cs.CV] (2025); http://arxiv.org/abs/2503.02857.

From an evaluation of both open source models and a handful of modern commercial systems, the authors compared deepfake detection technologies to human "specialists" in their ability to determine if media (video, audio or image) is AI-generated/manipulated or real.

The key conclusions are -

Commercial systems largely outperformed open source models, reaching up to 89% deepfake detection accuracy
Commercial systems are therefore better suited for the real world, likely built with datasets that better capture real world conditions

The second conclusion is of course interesting to us, since we covered it in Key Idea 5 of the 10 Key Ideas on Deepfakes.

Key Idea 5 – the datasets used to develop deepfake countermeasures must be realistic, rich and abundant

Regarding performance, a near-90% deepfake detection accuracy is in fact a realistic, reasonable level for a modern day deepfake detection system. While there continues to be room for improvement, a performance such as this, working at scale, would be a major boon towards countering AI abuses in the world. Especially when we contrast this against the average human-level performance, which can at best notice that a piece of media is AI-generated 20% of the time (per the AI Safety Report, pg. 45).

It is possible today to build a commercially viable deepfake detection system, one that can more or less achieve 90% accuracy. Why "more or less"? Because rather than fixate on a static moment-in-time snapshot of a performance measure, we should recall Key Idea 11 -

Key Idea 11 - you must keep up with the evolving quality of generative AI by continually improving deepfake detection and voice authentication technologies

Dear Policymakers,

To conclude, the 2026 International AI Safety Report, while an impressive and ambitious accomplishment, unfortunately is lacking in the deepfake detection department. If this only meant that deepfake detection was overlooked, at least the damage would be relegated to blindspot territory. Unfortunately, a blanket statement of "limited success" made with little supporting evidence is detrimental towards the continued evolution of deepfake detection, which is a critical piece of security in the "defense in depth" narrative.

Rather, it's the proposals found in one of the papers referenced by the AI Safety Report that I find offer more constructive advice to policymakers. The paper, entitled Generative AI models should include detection mechanisms as a condition for public release, was co-authored by Yoshua Bengio in 2023.

Some of the key passages are -

"A detection tool has a certain cost, both in its development and in its deployment to users. But we should note that AI-generated content detection is emerging as a commercial field in its own right [...]"

"Any detector tool can be expected to make errors, both false positives (identification of human-generated text as AI-generated) and false negatives (identification of AI-generated texts as human-generated). Decisions will have to be made as to what level of these errors is acceptable."

"State agencies could also fund research on detection tools, which then could be made available to companies; arguably states have some responsibility in providing AI safety ‘infrastructure’ of this kind, especially if they enact rules that require such infrastructure. When considering cost, it is also important to bear in mind the cost of not having a reliable detection tool, both on individual users in specific domains (e.g., the additional costs for teachers, in checking for AI-generated work) and more general on society (e.g., the destabilization of democracies through AI-generated disinformation)."

No further notes.