Article 10Enforced 2025-08-02

EU AI Act Article 10Data and data governance

Article 10 of the EU AI Act requires high-risk AI providers to govern training, validation, and testing data — including quality criteria, examination for bias and protected attributes, and representativeness for the intended purpose. Datasets must be appropriate to the geographical, contextual, behavioural or functional setting in which the system is intended to be used (Art. 10(4)). Enforceable from 2 August 2026.

Source: Regulation (EU) 2024/1689 (EU AI Act), CELEX:32024R1689.

In development —License Compliance Checker partially addresses Article 10 — it surfaces training-data licence and provenance risk via the dataset risk registry (top-50 known datasets flagged with critical / high / medium tiers). Full Article 10 coverage — dataset lineage across runs, bias examination, statistical-property characterisation — is the scope of TraceForge, which is in development. LCC today is the first half of the toolchain.

What Article 10 actually says

1. Providers of general-purpose AI models shall: (a) draw up and keep up-to-date the technical documentation of the model, including its training and testing process and the results of its evaluation, which shall contain, at a minimum, the information set out in Annex XI for the purpose of providing it, upon request, to the AI Office and the national competent authorities; (c) put in place a policy to comply with Union law on copyright and related rights, and in particular to identify and comply with, including through state-of-the-art technologies, a reservation of rights expressed pursuant to Article 4(3) of Directive (EU) 2019/790; (d) draw up and make publicly available a sufficiently detailed summary about the content used for training of the general-purpose AI model, according to a template provided by the AI Office.

Paragraphs: 53(1)(a) · 53(1)(c) · 53(1)(d)

Application date

2025-08-02

Status: ENFORCED

Penalty band

Up to €15M or 3% of global annual turnover

Sanction route: Article 101(1)

Article 53 has been enforceable since 2 August 2025 — GPAI providers placing models on the EU market today owe technical documentation, a copyright-compliance policy aligned with Directive (EU) 2019/790 Article 4(3), and a public training-data summary on the AI Office template. Non-compliance is sanctionable by the Commission under Article 101 at up to €15M or 3% of global annual turnover. The operational reality is that the copyright-policy and training-data-summary obligations require auditable evidence at the dataset level, not assertions in a model card.

Practical compliance with License Compliance Checker

License Compliance Checker partially addresses Article 10 — it surfaces training-data licence and provenance risk via the dataset risk registry (top-50 known datasets flagged with critical / high / medium tiers). Full Article 10 coverage — dataset lineage across runs, bias examination, statistical-property characterisation — is the scope of TraceForge, which is in development. LCC today is the first half of the toolchain.

  • 53(1)(a)Generates per-model SBOM-style training-data manifests fit for inclusion in the Annex XI technical documentation pack
  • 53(1)(c)Detects rights-reservation signals (TDM opt-outs, robots.txt, ai.txt) across training corpora to evidence the Art. 4(3) Directive 2019/790 policy
  • 53(1)(c)Flags incompatible-licence content (NC, ND, viral copyleft) before it enters a training run, with provenance trail
  • 53(1)(d)Produces the public training-content summary in the AI Office template format, regenerated on each dataset revision

Install in 30 seconds

bashpip install license-compliance-checker

Frequently asked questions

Direct answers to common questions about Article 10 and how License Compliance Checker addresses it. Regulatory citations reference EUR-Lex CELEX:32024R1689.

What does EU AI Act Article 53 require?
Providers of general-purpose AI models must keep up-to-date Annex XI technical documentation, put a copyright-compliance policy in place aligned with Directive (EU) 2019/790 Article 4(3), and publish a sufficiently detailed training-data summary using the AI Office template. Source: Regulation (EU) 2024/1689 Article 53(1).
Is LCC a substitute for legal review?
No. LCC produces audit evidence — SBOMs, license-conflict reports, training-data risk registries — that legal counsel reviews. The tool does not provide legal opinions or substitute for qualified counsel.
What ecosystems and file formats does LCC scan?
Eight package ecosystems (Python, Node.js, Go, Rust, Ruby, Java, .NET, HuggingFace) plus AI model files in GGUF and ONNX formats — covering Ollama and llama.cpp deployments. The full feature list is in the documentation.
Does LCC detect AI model licenses, not just code dependencies?
Yes. LCC includes an AI license registry covering RAIL, OpenRAIL, Llama, Gemma, Mistral, BigScience and other AI-specific licenses. It also detects HuggingFace model references in Python, YAML, and JSON code (e.g. `from_pretrained`, `model=`).
Can LCC generate the AI Office training-data summary template?
LCC produces inputs for that summary — a per-model training-data manifest with provenance and licensing. The final AI Office template completion is a documentation task; LCC supplies the structured data needed to fill it. Treat the output as evidence, not as the certified summary itself.
What is the penalty for non-compliance with Article 53?
Up to €15M or 3% of global annual turnover, whichever is higher, imposed by the European Commission under Article 101(1). Note that GPAI fines are Commission-imposed under Art. 101 — distinct from the Article 99 fines that member-state market-surveillance authorities impose for high-risk-system violations.
Is LCC really free? What is the catch?
LCC is Apache 2.0 licensed, free for any use including commercial. There is no telemetry, no remote calls, no enterprise tier locked behind paywalls. The "catch" is that you run it on your own infrastructure and review the output yourself.
Does LCC scan transitive dependencies?
Yes, when a lock file is present (poetry.lock, package-lock.json, etc.). Without a lock file, LCC scans declared direct dependencies only and warns about the transitive gap.

When the findings land on a governance desk

Tools surface problems. Programmes solve them.

Training-data accountability traces upstream through providers, downstream through deployers. The Liability Ledger framework maps that supply chain end-to-end and pairs with the LCC + TraceForge toolchain when the toolchain ships in full.

Framework: Liability Ledger — at AskAjay.ai, the advisory arm of AI Exponent LLC.

Explore the Liability Ledger framework →