Dublin Student Presents AI Deception Research At Conference

EHS junior Mrinal Agarwal presented a paper on a new method for AI deception detection at the NeurIPS 2025 conference.

Michael Wittner, Patch Staff

Posted Tue, Feb 17, 2026 at 4:32 pm PT

EHS junior Mrinal Agarwal presented a paper on a new method for AI deception detection at the NeurIPS 2025 conference. (Mrinal Agarwal)

DUBLIN, CA — An Emerald High School junior recently presented artificial intelligence research to researchers and academics at NeurIPS 2025, one of the world’s leading machine learning conferences.

Mrinal Agarwal served as the lead author on the paper WOLF: Werewolf-based Observations for LLM Deception and Falsehoods, which introduces new benchmarks for studying misinformation in large language models like ChatGPT. His new framework uses the social deduction party game “Werewolf,” in which some players are secretly designated “werewolves” who must eliminate other players. Players use debate and vote to eliminate suspected werewolves.

Agarwal’s paper describes how to use a similar setup with LLMs to find out whether information presented is correct. “Rather than asking a model once whether a statement is true or false, Werewolf creates a system in which models are forced to have long drawn out conversations where they have incentives to mislead, withhold information, redirect suspicion, while appearing honest,” he explained.

Find out what's happening in Dublinfor free with the latest updates from Patch.

“Every statement is logged and analyzed: the speaker records whether they were being deceptive and why, while other agents judge whether they believe the statement and how suspicious it seems. This lets us pinpoint when deception happens, what kind it is (lying outright vs. omission or misdirection).”

According to Agarwal, research has shown that AI can be talented at misleading consumers, but not talented in detecting deception. In his experiments, deceptive agents avoided being identified by other models for the majority of experiments.

Find out what's happening in Dublinfor free with the latest updates from Patch.

“As AI systems are increasingly used in multi-agent settings, like automated negotiations, moderation systems, or decision support, the ability to detect strategic dishonesty is lagging behind the ability to produce it. Werewolf gives us a controlled way to study that imbalance over time before implementing these systems into the real world, rather than treating deception as a basic one-off classification problem,” he said.

WOLF is currently just a benchmark, though Agarwal is working on a website to showcase the project and allow users to run models through it. He is also working on a separate paper focused on LLM security, and has developed a training-free method that monitors how a model reacts to inputs in order to detect attempts to manipulate a model’s behavior.

Agarwal is also president of the EHS math club, a competitive debater, and has qualified on the American Invitational Mathematics Examination.

SEE ALSO:

Get more local news delivered straight to your inbox. Sign up for free Patch newsletters and alerts.

Nearby Communities

State Edition

National Edition

Schools

Dublin Student Presents AI Deception Research At Conference

EHS junior Mrinal Agarwal presented a paper on a new method for AI deception detection at the NeurIPS 2025 conference.

Michael Wittner, Patch Staff

Find out what's happening in Dublinfor free with the latest updates from Patch.

Find out what's happening in Dublinfor free with the latest updates from Patch.

More from Dublin

Dublin Family Gets Record $36M Settlement In Deputy Double Murder Case

Dublin Student Named National Merit Scholar

Emerald Freshman Wins BVAL Tennis Title In Team Sweep

FRESH MILLIONS in Dublin - FRESH & FAST. GREAT FOR EVERY MEAL. $10 Bowls!

AAUW and Towne Center Books Hosts Book Talk with Allison Larkin

Mystery Case Sale at Darcie Kent Estate Winery: As low as $8 per bottle.

Spring Fling and White Wine Sale May 23-25

Valley Dance Theatre's Spring Rep 2026

Grainbakers Baking School

Three Little Pigs

Shepherd's Gate New Life Thrift Store

Rosalie Yu Photography

Happy Valley Art School

Jazzercise Dublin CA

Nearby Communities

State Edition

National Edition

Dublin Student Presents AI Deception Research At Conference

EHS junior Mrinal Agarwal presented a paper on a new method for AI deception detection at the NeurIPS 2025 conference.

Michael Wittner, Patch Staff<img alt="Verified Patch Staff Badge" class="styles_Badge__np_hU" src="https://patch.com/img/cdn/assets/layout/badges/verified-patch-staff.svg"/>

Find out what's happening in Dublinfor free with the latest updates from Patch.

Find out what's happening in Dublinfor free with the latest updates from Patch.

Michael Wittner, Patch Staff