Imagine having to analyze a 3,500-page document with only the first 500 pages available to you. That's essentially what professionals face when using standalone Large Language Models (LLMs) like ChatGPT and Claude, which can only process 5-10 files at a time and generally limit the size of any file as well. In today's data-driven business environment, this limitation isn't just inconvenient—it's a bottleneck that can lead to incomplete analysis and missed insights.
“People may not realize when you use the web or mobile apps from OpenAI and Anthropic, while you get ease of use and great features vs. the API access, there are different rules and guard rails that get applied. These will directly affect the responses you get from the models. In simpler cases the differences won't be noticeable but the more complex the content, the more varied the documents and the sophistication of your Q&A will all be impacted in both subtle and obvious ways.” - Ed Brandman, Founder & CEO, DiligentIQ
Our latest research at DiligentIQ tackles this challenge head-on. DiligentIQ Researcher Lila Karl and I just completed an extensive study comparing how LLMs perform both within and outside our platform, and the results are exciting. Using publicly available Sonos documents to create a “mock” VDR with 57 files covering a range of content, we've uncovered compelling evidence that while standalone LLMs are powerful tools, they're just scratching the surface of what's possible when integrated with advanced platforms like DiligentIQ that use the API capabilities to access multiple models from different LLM providers, specialized indexing for retrieving content referred to as hybrid search and vector stores to extend the capacity beyond the limitations of token context windows.
We conducted an extensive analysis using a our mock VDR, testing the models across three key categories: Sustainability, Consumer Goods, and Financial analysis. What makes this study particularly interesting is its dual focus on both quantitative metrics (like processing speed) and qualitative factors (such as accuracy and relevance).
Quality: The Platform Advantage
DiligentIQ really shined in response quality. The platform demonstrated consistently superior performance in several key areas:
Accuracy: DiligentIQ-integrated models achieved "Good Performance" ratings across nearly all tests, with only one moderate accuracy rating in the Financial category.
Thoroughness: Responses were notably more structured and detailed compared to standalone LLM outputs.
Document Handling: DiligentIQ’s ability to process entire VDRs simultaneously (compared to ChatGPT’s 10-file and Claude’s 5-file limits) resulted in more comprehensive analyses.
Specialized Performance Across Categories
Different models displayed unique strengths. Leveraging both LLMs, DiligentIQ excelled in governance, sustainability reporting, and financial detail, capturing operational nuances often missed by standalone models. Standalone ChatGPT-4-Omni provided creative insights, especially in social impact analysis, while Standalone Claude V3.5 Sonnet delivered fast and concise, but slightly less comprehensive environmental assessments.
Speed
While you might expect integrated platform solutions to add processing overhead, the results painted a more nuanced picture. Claude V3.5 Sonnet emerged as the speed champion, consistently completing tasks in the 20-30 second range. However, this came with an important caveat: the faster completion times occasionally resulted in less comprehensive responses. ChatGPT-4 Omni presented an interesting contrast. Despite quick initial response times, it showed longer overall processing times, often exceeding 40-50 seconds.
To reinforce the significance of an integrated approach, we should consider the bottleneck scenario we opened with: if your analysis of a massive document set is limited by a standalone LLM’s file-processing constraints, crucial insights can be missed. DiligentIQ directly addresses these limitations by allowing users to engage with entire VDRs at once, eliminating the need for selective file inclusion and reducing potential gaps in analysis.
This doesn't mean standalone LLMs don't have their place. For general queries and broad overviews, they remain highly capable tools. However, when it comes to detailed analysis of large document sets or specialized business applications, the platform advantage becomes clear.
These findings have significant implications for businesses considering AI implementation strategies. While standalone LLMs offer impressive capabilities, the addition of specialized platforms like DiligentIQ can significantly enhance their practical business value, particularly in situations requiring detailed analysis of large document sets or specialized domain knowledge.
The key takeaway? The future of business AI might not lie in choosing between standalone LLMs or integrated platforms, but in understanding when and how to leverage each for maximum benefit. As these technologies continue to evolve, this nuanced understanding will become increasingly valuable for organizations looking to optimize their AI strategies.