By Maya Boeye, AI Researcher & Analyst - DiligentIQ
DiligentIQ's new Agent Mode is our latest step in making Large Language Models (LLMs) work better for you. The Agent Mode feature adds an intelligent decision-making layer to query processing. At its core, Agent Mode acts as an intermediary, refining user queries and determining the most effective path for information retrieval and analysis.
When activated, Agent Mode executes a two-step process:
This dynamic approach chooses between local document analysis and web-based information retrieval. To implement this, we utilize Langgraph, a library developed by the creators of Langchain. Langgraph provides a flexible framework for defining each step in query processing as individual nodes, such as query rewriting and retrieval, within a graph. Importantly, not all nodes are executed in every interaction. At certain points in the graph, we ask the LLM to determine the course of action, which then dictates the path taken. While this branching decision is currently the sole decision point, the framework allows for future expansion, enabling additional decision nodes to be added as we continue refining and enhancing the Agent Mode feature.
In September 2024, I, along with AI Researchers Lila Karl and Michael Van Demark, conducted a comprehensive evaluation of the Agent Mode feature's effectiveness during the query rewrite step. This study involved testing 30 prompts with Agent Mode activated and the same 30 prompts without the feature activated, across three advanced language models: ChatGPT-4 Omni (August), ChatGPT-4 Omni (May), and Claude V3.5 Sonnet. The queries tested were chosen because they require only standard RAG on Deal documents, allowing us to assess whether the agent would unnecessarily opt for web searching or scraping when the provided deal documents were sufficient. As anticipated, the agent consistently chose the standard RAG approach, demonstrating its ability to make correct decisions in query processing. Our test results and complete statistical analysis can be found in this Agent Mode Testing Report.
Performance Comparison: Agent Mode On vs. Off
Following the validation of Agent Mode's decision-making, we compared the models' response quality with and without Agent Mode to measure its impact on performance. The responses were rated on accuracy, problem solving, relevance, and sourcing. The findings varied slightly by model, but the overarching takeaway was clear: the introduction of this new decision-making layer did not harm overall performance.
These results demonstrate that even with an additional layer of query rephrasing and decision-making, the accuracy and relevance of responses remained stable across all models tested. Agent Mode was able to identify the correct path and refine the user query without negatively impacting the models' overall ability to return valuable information.
General LLM Performance Insights
Beyond the impact of the Agent Mode feature, our testing provided valuable insights into the general performance of these cutting-edge language models. ChatGPT-4 Omni August consistently demonstrated superior performance across most metrics when operating without Agent Mode on. It excelled particularly in accuracy and sourcing capabilities. The May version of ChatGPT-4 Omni, while slightly behind its August successor, still showed strong overall performance, especially in terms of relevance to user queries.
Claude V3.5 Sonnet presented a unique and diverse performance profile during testing. While the ChatGPT-4 models excelled in accuracy and sourcing, Claude V3.5 Sonnet showcased impressive strengths in other areas. Claude V3.5 Sonnet exhibited higher usage of problem-solving capabilities and a propensity for generating additional insights beyond the provided information. This suggests that Claude V3.5 Sonnet could be particularly valuable for tasks that require creative thinking or complex reasoning. For more information regarding each model’s performance, check out our full report linked above!
The introduction of Agent Mode represents an advancement in how AI processes and responds to queries. By incorporating a two-step decision-making process, powered by tools like Langgraph, we’ve created a more flexible system that dynamically chooses the best path for each task. Our testing revealed that the models maintained their strong performance, with no statistically significant drops in accuracy, sourcing, relevance, or problem solving when this new feature was activated. This recent study demonstrates that Agent Mode successfully interprets and rephrases queries while selecting the appropriate path for each task, all without diminishing the precision of the models’ output.
Moving forward, Agent Mode is poised to balance innovation and reliability, allowing for future growth and more intelligent AI interactions. With the flexibility to expand decision nodes and the continued refinement of the underlying technology, we are confident that Agent Mode will be a critical tool in pushing AI boundaries, without sacrificing performance or precision.