Charting with Gen AI: Insights from August 2024

By Maya Boeye, AI Researcher/Analyst - DiligentIQ

In the fast-paced world of AI, it's crucial to grasp how our tools are evolving. Last month AI researchers, Lila Karl and Michael Van Demark, conducted a comprehensive analysis comparing the May and August versions of the ChatGPT-4 Omni model. Their findings reveal both significant improvements and ongoing challenges that are crucial for anyone looking to use AI to drive insights.

The August 2024 Chart Test was conducted in the DiligentIQ Data Chat environment, working with this bakery sales Kaggle xlsx dataset. We leverage the Assistant API from OpenAI to generate data visualizations in our Data Chat environment. The Data Chat environment can be used to “talk” to complex data in spreadsheets, similar to building formulas in Excel to get results. The Bakery Sales File contains two tabs, both of which the models had to interpret and integrate to create the results in Lila and Michael’s August 2024 Chart Testing report. Both May and August versions of the ChatGPT-4 Omni model were tested using the same methodology, outlined in the report linked above.

We separated the charts into three categories: successful, unsuccessful, and failed. Successful charts worked consistently and passed data validation checks. Unsuccessful charts struggled with formatting or visual clarity but represented data accurately, and failed charts were those that had incorrect data validation or in cases where the LLM was unable to generate a chart.

Successful Charts

Stacked Bar Chart, Area Chart, Correlation Chart, Heat Map, Line Chart, Bar Chart, Histogram

Unsuccessful Charts

Bubble Chart, Scatter Plot Chart, Dendrogram, and Box Plot all struggle with formatting issues.

Failed Charts

Ridgeline Plot, Pie Chart, Venn Diagram, Radar Chart

‍

From May to August ➡️ What’s Improved?

One of the most exciting takeaways from this report is the noticeable leap in efficiency seen in the August model. The team’s comparison showed reductions in processing times across various chart types. In industries where time is of the essence, this improvement can translate into more timely and informed decision-making.

The average time it took for the models to generate their responses decreased from 49.47 seconds to 34.86 seconds.

The August model displayed notable improvement over the May model in generating correlation charts and heatmaps, consistently delivering results across rounds with more accuracy and efficiency than its predecessor.

May (left) vs. August (right) Correlation Charts. The May model did not pass a data validation check, but the August model was proven to be accurate.

Where Challenges Remain

While the August model did outperform the May model in many cases, it’s important to note that there are still hurdles to overcome. Lila and Michael identified persistent issues in accuracy when generating pie charts, Venn diagrams, radar charts, and ridgeline plots; see example below.

May (left) vs. August (right) Radar Charts. The May model failed to generate standard Radar Charts, and although the August model showed some improvement with the help of prompt engineering, it still displayed significant inconsistencies in formatting, consistency, and color differentiation, undermining the reliability of the charts for accurate data interpretation.

We observed some advancement in the August model generating Scatter Plot Charts, but challenges remain. While the earlier May model only plotted a subset of the data requested, the August model successfully included all desired data points. However, as you can see in the image below, both versions still displayed the data points as X’s rather than using the appropriate dot markers.

The above, May (left) vs. August (right) models were unable to correct their datapoint format for scatter plot charts.

Prompt Engineering

By fine-tuning specific elements of prompts, such as clarifying data points or specifying chart formats, our team successfully mitigated certain issues in the May model and partially in the August model. For example, when generating bubble charts, the initial prompts produced X-shaped points instead of circles. After refining the prompt to explicitly request “circles instead of X's,” the May model responded with a correctly formatted bubble chart, see below.

Initial Prompt: "Create a bubble chart based on the data I uploaded. Include top 5 articles, total sales, and quantity."

Refined Prompt: "Create a bubble chart based on the data I uploaded with circles instead of X's. Include top 5 articles, total sales, and quantity."

ChatGPT-4 Omni May’s response from the Initial prompt (left) vs the engineered prompt (right).

Despite some improvements thanks to prompt engineering, in some cases the August model continued to display improper formatting even after applying the same prompt adjustments that worked for the May model. The results of this latest testing reinforce the necessity of ongoing prompt optimization and testing. Further examples of prompt engineering used during this test can be found in the August 2024 Chart Testing Report, check it out to see what worked and what didn't.

The Impact on Private Equity

The ChatGPT-4 Omni August model’s improvements in chart accuracy and processing speed are particularly exciting for the field of private equity (PE). While PE professionals rely heavily on raw data for their investment decisions, pairing AI-driven visualizations with that data adds a valuable layer of clarity. Visualizations, when used alongside the underlying data, help facilitate data analysis and improve how complex financial information is communicated—without taking away from the importance of the data itself.

These advancements are also a big step forward in reducing the manual effort involved in producing data visualizations, freeing up analysts to spend more time on high-level insights. By letting AI handle some of the more time-consuming tasks, professionals can focus on the bigger picture. Ultimately, AI will add value by streamlining processes and enhancing communication, all while allowing PE firms to stay grounded in their data-driven decision-making.

Looking Ahead

At DiligentIQ, we are continuously striving to refine our product, ensuring that we not only meet but exceed the needs of our clients. The August model represents a significant step forward, but it also serves as a reminder that AI is a journey, not a destination. We are committed to ongoing research and development, and we will continue to share our progress as we work towards perfecting LLM-driven charting.

For those interested in diving deeper into the details, I encourage you to review the full August 2024 Chart Testing report by Lila Karl and Michael Van Demark. Their report provides a thorough examination of the models’ current capabilities, an overview of the prompt engineering used for this project, additional LLM chart outputs, and speed data.

As we continue to enhance our tools, we remain focused on delivering the best possible solutions for our clients. Stay tuned for more updates as we push the boundaries of what AI can achieve in the realm of private equity.