Data Supply Chain for AI Framework V1.1

Intro

We've just published V1 of our AI-driven Data Ops Framework, which is based on our experience in AI and data operations.

Our goal is to develop the framework collaboratively with the community to create a set of principles that directly address the challenges in Data Ops + AI integration. Adhering to the principles will allow you to build solutions more effectively and help you avoid common pitfalls.

V1 is directly derived from our projects with clients and our experiences (both positive and negative) in getting AI to work with their data. We welcome your feedback to develop them further as a community.

Principles

Accelerate Data Analysis with AI to accelerate delivery.

Leverage AI to accelerate the design, build, and execution of data analysis, reporting and insight processes.
Utilise AI-generated code to streamline development.

Standardise AI Deployment and Integration to ensure compatibility and interoperability between AI solutions.

Establish standardised processes for AI solutions to access and interact with data.
Define clear expectations and standards for AI-data interfaces to ensure consistency and reduce learning curves.
Standardise data formats and schemas.

Separate Data Operations from AI to optimise efficiency and reliability in AI deployment.

Separate data management from AI models to reduce complexity and ensure reusability.
Implement a data-supply chain protocol approach to data operations to ensure the necessary information required for an AI to operate is always available.

Provide Rich and Relevant Data Context to maximise the accuracy and value of AI analysis.

Ensure AI is provided with well-formed, contextually rich, and historically maintained data.
Combine data model context and historical data.
Constrain AI to operate only on relevant and available data to ensure explainability and attribution.

Monitor Data Supply Status to ensure that the analysis is up-to-date and accurate.

Provide AI with real-time information about data availability and freshness.
Establish mechanisms for AI to consume and utilise supply status information effectively.
Define best practices for monitoring and maintaining a continuous data supply.

Ensure Responsible and Trustworthy AI to assess AI performance, accuracy, and fairness.

Establish clear boundaries and ethical guidelines for AI operation.
Prioritise the development of explainable and interpretable AI models.
Implement mechanisms for continuous monitoring and feedback.

Don’t Use AI Models to Run Code to ensure reliability and performance.

Utilise non-AI infrastructure optimised for executing analysis to ensure performance and cost control.
Provide best practices for executing code efficiently and managing infrastructure resources.
Ensure the underlying infrastructure is scalable and elastic to handle varying workloads and data volumes.

Ensure Data Quality and Reliability to ensure the accuracy, integrity and maintainability of AI solutions that are cost-effective.

Implement data validation and error-handling mechanisms.
Establish data quality standards and best practices for data preprocessing and cleansing.
Continuously monitor and assess the quality of data used for AI model training and inference.
Enable collaboration with the key systems, stakeholders and information owners to accelerate data quality remediation.
Capture and log all key decisions.