Doing Data Better: Learnings from the ESG Use Case

Introduction to ESG Reporting and the Challenge of Report Generation at Scale

Environmental, Social, and Governance (ESG) reporting has become increasingly important for companies to assess and communicate their impact on these critical factors. However, generating ESG reports is often a time-consuming and resource-intensive process. Many VCs produce personalised reports for every company in their portfolio every quarter, resulting in enormous administrative overhead. ESG teams struggle to produce these reports at scale while ensuring accuracy and consistency. This specific use case came about from a request from an employee at VC we work with who wanted to figure out a way to deliver compliant reporting at scale without it taking up 30+ days every quarter.

Why AI for ESG Reporting?

AI offers a promising solution to the challenges of ESG reporting. By leveraging AI, companies can:

1. Create Personalised Narratives: AI-powered natural language processing can generate personalised narratives based on the specific data and context of each company, providing more meaningful and relevant insights. Furthermore, an AI can perform detailed analysis a human would be unable to do due to the time constraints imposed on them.

2. End-to-End Automation: With AI, companies can automate the entire ESG reporting process, from data collection to report generation, saving significant time and resources.

However, AI systems are prone to hallucinations and inaccuracies, which make them unsuitable for ESG reporting on their own. If you were just to hand ChatGPT your data and ask it to produce a report, you might get something that looks accurate, but you’d be taking a massive risk with critical reporting, and it would be nearly impossible to track down the source of any errors it might make. You can’t easily roll back, or debug an LLM to see if it made an issue because it used data that was out of date or if it misunderstood the relationship between different tables in your data.

The Importance of Data Supply Chains

While AI holds great potential for ESG reporting, it is crucial to ensure the accuracy and reliability of AI-generated reports. This is where data supply chains come into play. A robust data supply chain ensures:

1. Data Provenance: Companies can trust the reports and trace the origin of the figures, verifying their accuracy and up-to-dateness.

2. Consistency: By presenting data to AI in a consistent manner, data supply chains enable reliable and comparable reports across different companies and time periods.

3. Supply Status: Data supply chains provide metadata that includes provenance so you can ensure that the data the AI is ingesting is up to date.

4. Attribution: Attribution allows you to check the source of information in the report, that is, who imputed it into the system and on what date, so that you know exactly what where the data you’re using is coming from. By using a data supply chain with attribution you showcase compliance with the EU SFDR framework and plan for compliance with the EU AI Act.

Combining Data Supply Chains and AI for Optimal Results

When data supply chains and AI are used together, they create a powerful solution for ESG reporting:

1. Ensuring Accuracy Data supply chains ensure the accuracy and reliability of the data fed into the AI models, resulting in trustworthy and verifiable reports.

2. Maintaining Consistency: Even if the underlying data changes, the way it is presented to AI remains consistent, ensuring the comparability and integrity of the generated reports.

How We Built The ESG Reporting Solution

1. Define the Business Problem

- Consolidate data across a portfolio of companies into one data model and generate detailed reports for the portfolio and each company using AI while ensuring attribution of decision-making by the AI.

2. Define the Data Model Needed

- For ESG, data requirements were obtained from our contact and defined as an information need in gather360.

- These data requirements were published into a recipe in gather360.

- Suppliers were set up, and the supply of data was orchestrated and managed via gather360.

3. Setting Up the Streamlit App

- The Streamlit app is initialized with a page title and layout.

- Session state variables are defined to store conversation history, credentials, report content, and other relevant data.

4. Configuring Credentials

- The Streamlit sidebar is used to input Gather360 credentials (user, password, account, warehouse, database, schema) and the OpenAI API key.

- The `update_creds()` function is defined to update the session state with the entered credentials.

5. Fetching Data from Snowflake

- The `fetch_data_from_snowflake()` function establishes a connection to Snowflake using the provided credentials.

- It executes a SQL query to fetch survey data from the specified table.

- The fetched data is converted into a Pandas DataFrame.

6. Processing Survey Data

- The `process_survey_data()` function takes the survey data and a question-answer mapping as input.

- It filters the data based on each question and counts the responses for each allowed answer.

- The function calculates the percentage of each answer and returns the processed results.

7. Comparing Company to Cohort

- The `compare_company_to_cohort()` function separates the company data from the cohort data based on the company ID.

- It applies the `process_survey_data()` function to both the company and cohort data.

- The function returns a comparison of the company results against the cohort results.

8. Generating Company Report

- The `generate_company_report()` function is triggered when the user selects a company and clicks the "Generate Report for Selected Company" button.

- It fetches the survey data from Snowflake and processes it using the `compare_company_to_cohort()` function.

- The function prepares a detailed report prompt, including the comparison results, calculation details, and report instructions.

- It creates an OpenAI assistant and thread, and sends the report prompt to the assistant for generating the report.

- The generated report is displayed in the Streamlit app and stored in the session state.

9. Generating Summary Report

- The `generate_report()` function is triggered when the user clicks the "Generate Summary Report for All Companies" button.

- It follows a similar process to the company report generation but processes the survey data for all companies.

- The function prepares a summary report prompt and sends it to the OpenAI assistant for generating the report.

- The generated summary report is displayed in the Streamlit app and stored in the session state.

10. Downloading the Report

- If a report is generated, a "Download Report as Markdown" button is displayed in the Streamlit sidebar.

- Clicking the button allows the user to download the generated report in Markdown format.

11. Asking Follow-up Questions

- If a report is generated, the user can ask follow-up questions related to the report.

- The user's question is sent to the OpenAI assistant, and the generated response is displayed in the Streamlit app.

Key Learnings and Insights

1. Maturity of AI Frameworks: We realised that current AI frameworks are still insufficiently mature to be relied upon completely. Reliability is paramount, and our solution is only possible when we can be confident in the accuracy and attestability of the results. This highlighted the importance of combining AI with robust validation mechanisms to ensure output quality.

2. Judicious Use of Language Models: We learned to use large language models (LLMs) judiciously, leveraging them for their strengths in generating natural language narratives while avoiding over-reliance for data processing tasks. For instance, while LLMs excel at creating comprehensive reports, it can sometimes misunderstand technical details or API documentation, necessitating human oversight and specialised algorithms for data handling.

3. Importance of Data Quality: The success of our solution heavily depends on the quality and consistency of the data fed into the AI models. By ensuring a reliable data supply chain, we can trust the outputs generated by the AI and provide accurate and verifiable reports to our clients. This emphasises the need for rigorous data provenance and consistency protocols to maintain data integrity.

4. User Interface Design: An intuitive and user-friendly interface is critical for the adoption of AI solutions. Streamlit proved to be an excellent choice for creating interactive and accessible user interfaces, enabling users to input data easily and generate reports efficiently. This underscored the importance of considering the end-user experience in AI application development.

Try It Out for Yourself

We invite you to experience the power of our AI-powered ESG reporting solution firsthand. By combining the strengths of data supply chains and AI, we have created a tool that streamlines the ESG reporting process, enabling companies to generate accurate, personalised, and scalable reports efficiently.

Visit our web app to see the solution in action and explore how it can transform your company's ESG reporting capabilities.

Conclusion

Our journey in building an AI-powered ESG reporting solution has been a fantastic learning experience in building reliable AI systems.

We’ve always believed that data is the critical component to AI project success, and now we’re starting to build up the evidence to prove that. Our ESG solution only works because it’s built on top of data supply chains; without them, this solution simply does not work.

This approach of combining data supply chains and AI has the potential to revolutionise not only ESG reporting but also all domains where accurate, consistent, and scalable data-driven reporting is essential.

We’re on a mission to solve the AI readiness-data problem within organisations. This is only the beginning. We want to work with other people who are passionate about this problem to help figure out the optimal solution so that we can democratise the benefits of AI. You can join our community or reach out to me via Twitter or LinkedIn to share your ideas on how we can do data better.