Google’s new diffusion AI agent mimics human writing to improve enterprise research

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now

Google researchers have developed a new framework for AI research agents that outperforms leading systems from rivals OpenAI, Perplexity, and others on key benchmarks.

The new agent, called Test-Time Diffusion Deep Researcher (TTD-DR), is inspired by the way humans write by going through a process of drafting, searching for information, and making iterative revisions.

The system uses diffusion mechanisms and evolutionary algorithms to produce more comprehensive and accurate research on complex topics.

For enterprises, this framework could power a new generation of bespoke research assistants for high-value tasks that standard retrieval augmented generation (RAG) systems struggle with, such as generating a competitive analysis or a market entry report.

AI Scaling Hits Its Limits

Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are:

Turning energy into a strategic advantage
Architecting efficient inference for real throughput gains
Unlocking competitive ROI with sustainable AI systems

Secure your spot to stay ahead: https://bit.ly/4mwGngO

According to the paper’s authors, these real-world business use cases were the primary target for the system.

Deep research (DR) agents are designed to tackle complex queries that go beyond a simple search. They use large language models (LLMs) to plan, use tools like web search to gather information, and then synthesize the findings into a detailed report with the help of test-time scaling techniques such as chain-of-thought (CoT), best-of-N sampling, and Monte-Carlo Tree Search.

However, many of these systems have fundamental design limitations. Most publicly available DR agents apply test-time algorithms and tools without a structure that mirrors human cognitive behavior. Open-source agents often follow a rigid linear or parallel process of planning, searching, and generating content, making it difficult for the different phases of the research to interact with and correct each other.

Example of linear research agent (source: arXiv)

This can cause the agent to lose the global context of the research and miss critical connections between different pieces of information.

As the paper’s authors note, “This indicates a fundamental limitation in current DR agent work and highlights the need for a more cohesive, purpose-built framework for DR agents that imitates or surpasses human research capabilities.”

Unlike the linear process of most AI agents, human researchers work iteratively. They typically start with a high-level plan, create an initial draft, and then engage in multiple revision cycles. During these revisions, they search for new information to strengthen their arguments and fill in gaps.

The Google researchers observed that this human process could be emulated with the mechanism of a diffusion model augmented with a retrieval component. (Diffusion models are often used in image generation. They begin with a noisy image and gradually refine it until it becomes a detailed image.)

As the researchers explain, “In this analogy, a trained diffusion model initially generates a noisy draft, and the denoising module, aided by retrieval tools, revises this draft into higher-quality (or higher-resolution) outputs.”

TTD-DR is built on this blueprint. The framework treats the creation of a research report as a diffusion process, where an initial, “noisy” draft is progressively refined into a polished final report.

TTD-DR uses an iterative approach to refine its initial research plan (source: arXiv)

This is achieved through two core mechanisms. The first, which the researchers call “Denoising with Retrieval,” starts with a preliminary draft and iteratively improves it. In each step, the agent uses the current draft to formulate new search queries, retrieves external information, and integrates it to “denoise” the report by correcting inaccuracies and adding detail.

The second mechanism, “Self-Evolution,” ensures that each component of the agent (the planner, the question generator, and the answer synthesizer) independently optimizes its own performance. In comments to VentureBeat, Rujun Han, research scientist at Google and co-author of the paper, explained that this component-level evolution is crucial because it makes the “report denoising more effective.” This is akin to an evolutionary process where each part of the system gets progressively better at its specific task, providing higher-quality context for the main revision process.

Each of the components in TTD-DR use evolutionary algorithms to sample and refine multiple responses in parallel and finally combine them to create a final answer (source: arXiv)

“The intricate interplay and synergistic combination of these two algorithms are crucial for achieving high quality research outcomes,” the authors state. This iterative process directly results in reports that are not just more accurate, but also more logically coherent. As Han notes, since the model was evaluated on helpfulness, which includes fluency and coherence, the performance gains are a direct measure of its ability to produce well-structured business documents.

According to the paper, the resulting research companion is “capable of generating helpful and comprehensive reports for complex research questions across diverse industry domains, including finance, biomedical, recreation, and technology,” putting it in the same class as deep research products from OpenAI, Perplexity, and Grok.

To build and test their framework, the researchers used Google’s Agent Development Kit (ADK), an extensible platform for orchestrating complex AI workflows, with Gemini 2.5 Pro as the core LLM (though you can swap it for other models).

They benchmarked TTD-DR against leading commercial and open-source systems, including OpenAI Deep Research, Perplexity Deep Research, Grok DeepSearch, and the open source GPT-Researcher.

The evaluation focused on two main areas. For generating long-form comprehensive reports, they used the DeepConsult benchmark, a collection of business and consulting-related prompts, alongside their own LongForm Research dataset. For answering multi-hop questions that require extensive search and reasoning, they tested the agent on challenging academic and real-world benchmarks like Humanity’s Last Exam (HLE) and GAIA.

The results showed TTD-DR consistently outperforming its competitors. In side-by-side comparisons with OpenAI Deep Research on long-form report generation, TTD-DR achieved win rates of 69.1% and 74.5% on two different datasets. It also surpassed OpenAI’s system on three separate benchmarks that required multi-hop reasoning to find concise answers, with performance gains of 4.8%, 7.7%, and 1.7%.

TTD-DR outperforms other deep research agents on key benchmarks (source: arXiv)

While the current research focuses on text-based reports using web search, the framework is designed to be highly adaptable. Han confirmed that the team plans to extend the work to incorporate more tools for complex enterprise tasks.

A similar “test-time diffusion” process could be used to generate complex software code, create a detailed financial model, or design a multi-stage marketing campaign, where an initial “draft” of the project is iteratively refined with new information and feedback from various specialized tools.

“All of these tools can be naturally incorporated in our framework,” Han said, suggesting that this draft-centric approach could become a foundational architecture for a wide range of complex, multi-step AI agents.

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.