By Maya Boeye, AI Researcher/Analyst
In the ever-evolving world of data analytics, the ability to create effective and insightful visualizations is crucial. At DiligentIQ, we leverage the Assistant API from OpenAI to enhance our Data Chat capabilities, making it easier than ever to generate high-quality charts and visualizations from datasets in xlxs files. Join us in exploring how to successfully harness the power of Generative AI to transform data visualization.
In July 2024, we conducted a Chart Test using the DiligentIQ App with ChatGPT-4 Omni in a Restricted Data Chat environment, working with this Kaggle xlsx dataset. The Data Chat feature on the DiligentIQ App can be used to “talk” to the data in spreadsheets, similar to building formulas in Excel to get results. Restricted Chat is a feature that allows users to limit the scope of their conversation to specific categories, tags, or documents; for this test we implemented Restricted Chat, focusing only on the Bakery Sales File linked above. The Bakery Sales File contains two tabs, both of which the model had to interpret and integrate to create the data visualizations you see in the report linked below. Through this testing, we gained key insights into prompt engineering for chart creation and identified several chart types that consistently produce high-quality visualizations: pie chart, area chart, bar chart, stacked bar chart, box plot, correlation chart, dendrogram, heat map, histogram, line chart, ridgeline plot, and Venn diagram.
Key Takeaway ➡️ Prompt Engineering Matters
To get the best charting responses, we recommend being explicit in your prompts. When requesting a chart, include specific details about the data you want to visualize. For instance, if you query, 'Create a stacked bar chart showing monthly sales by article,' you might get lucky and receive the exact chart you’re looking for. Less descriptive prompts leave a lot of room for interpretation. The model will generate what it believes you want, but it could interpret your request in unexpected ways, such as making the x-axis represent products, the y-axis represent sales dollars, and stacking the bars by month. Instead, use a prompt like:
"Create a stacked bar chart based on the data provided. Show monthly sales by article for the top 10 articles. Use month/year on the x-axis, quantity sold on the y-axis, and stack the bars by different articles."
This level of specificity minimizes ambiguity and ensures the model creates the chart you desire on the first attempt. It also prevents the model from making incorrect assumptions about your data, such as misinterpreting axis labels or including too many data points, which can clutter the chart.
Integrating Gen AI into data visualization offers remarkable opportunities for enhancing our insights. By leveraging the Assistant API from OpenAI, DiligentIQ has elevated its capabilities, making it easier than ever to generate precise and insightful charts. In today's data-driven world, the ability to quickly and accurately visualize data is crucial. Gen AI streamlines this process and broadens access to sophisticated analysis tools, enabling more stakeholders to make informed decisions based on clear visual data.
Download the full report here to review all of the successful prompts and their resulting data visualizations, along with the challenges related to suboptimal model responses. For those interested in replicating or building upon this work, you can access the dataset we used through the Kaggle link provided at the top of this post. Your results will likely vary. We have our own engineering, guardrails, system prompts and use a combination of LLMs (including Vision) from OpenAI, Anthropic and Cohere. We also leverage Textract from AWS which adds ML capabilities to how we "understand" documents. We hope this is helpful and we welcome feedback. A special thank you to Matthieu Gimbert for the invaluable dataset that made this test possible. Happy charting!