By Maya Boeye, AI Researcher/Analyst - DiligentIQ
In the fast-paced world of AI, it's crucial to grasp how our tools are evolving. Last month AI researchers, Lila Karl and Michael Van Demark, conducted a comprehensive analysis comparing the May and August versions of the ChatGPT-4 Omni model. Their findings reveal both significant improvements and ongoing challenges that are crucial for anyone looking to use AI to drive insights.
The August 2024 Chart Test was conducted in the DiligentIQ Data Chat environment, working with this bakery sales Kaggle xlsx dataset. We leverage the Assistant API from OpenAI to generate data visualizations in our Data Chat environment. The Data Chat environment can be used to “talk” to complex data in spreadsheets, similar to building formulas in Excel to get results. The Bakery Sales File contains two tabs, both of which the models had to interpret and integrate to create the results in Lila and Michael’s August 2024 Chart Testing report. Both May and August versions of the ChatGPT-4 Omni model were tested using the same methodology, outlined in the report linked above.
We separated the charts into three categories: successful, unsuccessful, and failed. Successful charts worked consistently and passed data validation checks. Unsuccessful charts struggled with formatting or visual clarity but represented data accurately, and failed charts were those that had incorrect data validation or in cases where the LLM was unable to generate a chart.
Successful Charts
Stacked Bar Chart, Area Chart, Correlation Chart, Heat Map, Line Chart, Bar Chart, Histogram
Unsuccessful Charts
Bubble Chart, Scatter Plot Chart, Dendrogram, and Box Plot all struggle with formatting issues.
Failed Charts
Ridgeline Plot, Pie Chart, Venn Diagram, Radar Chart
One of the most exciting takeaways from this report is the noticeable leap in efficiency seen in the August model. The team’s comparison showed reductions in processing times across various chart types. In industries where time is of the essence, this improvement can translate into more timely and informed decision-making.
The August model displayed notable improvement over the May model in generating correlation charts and heatmaps, consistently delivering results across rounds with more accuracy and efficiency than its predecessor.
While the August model did outperform the May model in many cases, it’s important to note that there are still hurdles to overcome. Lila and Michael identified persistent issues in accuracy when generating pie charts, Venn diagrams, radar charts, and ridgeline plots; see example below.
We observed some advancement in the August model generating Scatter Plot Charts, but challenges remain. While the earlier May model only plotted a subset of the data requested, the August model successfully included all desired data points. However, as you can see in the image below, both versions still displayed the data points as X’s rather than using the appropriate dot markers.
By fine-tuning specific elements of prompts, such as clarifying data points or specifying chart formats, our team successfully mitigated certain issues in the May model and partially in the August model. For example, when generating bubble charts, the initial prompts produced X-shaped points instead of circles. After refining the prompt to explicitly request “circles instead of X's,” the May model responded with a correctly formatted bubble chart, see below.
Initial Prompt: "Create a bubble chart based on the data I uploaded. Include top 5 articles, total sales, and quantity."
Refined Prompt: "Create a bubble chart based on the data I uploaded with circles instead of X's. Include top 5 articles, total sales, and quantity."
Despite some improvements thanks to prompt engineering, in some cases the August model continued to display improper formatting even after applying the same prompt adjustments that worked for the May model. The results of this latest testing reinforce the necessity of ongoing prompt optimization and testing. Further examples of prompt engineering used during this test can be found in the August 2024 Chart Testing Report, check it out to see what worked and what didn't.
The ChatGPT-4 Omni August model’s improvements in chart accuracy and processing speed are particularly exciting for the field of private equity (PE). While PE professionals rely heavily on raw data for their investment decisions, pairing AI-driven visualizations with that data adds a valuable layer of clarity. Visualizations, when used alongside the underlying data, help facilitate data analysis and improve how complex financial information is communicated—without taking away from the importance of the data itself.
These advancements are also a big step forward in reducing the manual effort involved in producing data visualizations, freeing up analysts to spend more time on high-level insights. By letting AI handle some of the more time-consuming tasks, professionals can focus on the bigger picture. Ultimately, AI will add value by streamlining processes and enhancing communication, all while allowing PE firms to stay grounded in their data-driven decision-making.
At DiligentIQ, we are continuously striving to refine our product, ensuring that we not only meet but exceed the needs of our clients. The August model represents a significant step forward, but it also serves as a reminder that AI is a journey, not a destination. We are committed to ongoing research and development, and we will continue to share our progress as we work towards perfecting LLM-driven charting.
For those interested in diving deeper into the details, I encourage you to review the full August 2024 Chart Testing report by Lila Karl and Michael Van Demark. Their report provides a thorough examination of the models’ current capabilities, an overview of the prompt engineering used for this project, additional LLM chart outputs, and speed data.
As we continue to enhance our tools, we remain focused on delivering the best possible solutions for our clients. Stay tuned for more updates as we push the boundaries of what AI can achieve in the realm of private equity.