AI tools in practical testing – what is really permitted in terms of use and training?

11.09.2025

12 min

What if the AI application you are building today poses risks tomorrow? Personal data may suddenly appear in the model, or the origin of training data may be unclear. As a product owner, data scientist or project manager, your daily decisions can have legal consequences. With clarity from the start, you ensure efficient processes and avoid pitfalls such as data protection breaches or copyright issues. Discover how to make your AI projects legally compliant and which external tools are legally recommended.

Arrange an initial, non-binding consultation

Content

AI systems often work with large amounts of data, much of which is personal. Therefore, when developing and using AI systems, the provisions of the General Data Protection Regulation (GDPR) must always be observed. The AI Act supplements the GDPR by imposing specific requirements on AI systems, but does not replace it. Both regulations apply in parallel and must be considered together when AI systems process personal data within the meaning of Art. 4 No. 1 GDPR. A key question, therefore, is which personal data is processed by the AI system. The use of exclusively anonymous data is not subject to the GDPR. Therefore, it should be examined whether personal data can be avoided or anonymised.

Newsletter

For your Inbox

Current updates and important information on topics such as data law, information security, technology, artificial intelligence, and much more. (only in German)

Finding the right legal basis

Processing personal data always requires a legal basis. The AI Act itself does not provide one. The following legal bases may be considered for the use of AI:

Consent (Art. 6(1)(a) GDPR): This is possible in principle, but difficult in practice as consent is informal, revocable, and subject to transparency requirements that are hard to fulfil in complex AI systems. It is often not feasible to technically delete data from trained models in the event of revocation.
Legitimate interest (Art. 6(1)(f) GDPR): This is often a relevant basis for using AI, for example to increase efficiency, improve quality or prevent fraud. It requires careful consideration of the company's interests against the fundamental rights of the data subjects, and this should be documented (e.g. as part of a legitimate interest assessment).
Special categories of personal data (Art. 9 GDPR): Processing sensitive data (e.g. health data, biometric data, data from applicants or employees) requires special legitimation. Legitimate interest pursuant to Art. 6(1)(f) GDPR is not sufficient for this.

Managing external service providers

When using external AI tools (e.g. via the cloud or an API), order processing must be carried out in accordance with Art. 28 GDPR often applies. This requires an effective order processing agreement (OPA) to be concluded. The OPA ensures that the provider only processes data on behalf of the client, and not for its own purposes — particularly not for training underlying AI models. Many AI-as-a-Service (AIaaS) platforms (such as ChatGPT or DeepL) offer paid versions with a DPA, whereas the free versions may use the input for training purposes. The enterprise versions 'ChatGPT Teams/Enterprise' and 'Microsoft Azure OpenAI Services' generally offer DPAs, although caution should be exercised with regard to the latter's preview features. DeepL Pro is considered GDPR-compliant because it does not use data for training purposes and its servers are located in the EU. Meta AI is not recommended for business purposes under data protection law due to the lack of an AAP and the fact that the data processing is unclear.

Legal certainty for data transfers to third countries

Another critical issue is the transfer of data to third countries when processing takes place outside the EU/EEA, as is often the case with US-based AI providers. According to Article 46 of the GDPR, appropriate safeguards must be in place, such as standard contractual clauses (SCCs). However, simply concluding SCCs is not enough; an adequate level of data protection must also be ensured, and a transfer impact assessment (TIA) must be carried out. Additionally, the access options of foreign authorities must be evaluated in light of the relevant legal framework (e.g. US legislation such as FISA 702 and Executive Order 12333). While the EU-US Data Privacy Framework can provide a basis for adequacy, it relies on self-certification by US companies.

Personal data in the model: an underestimated risk

One challenge for providers of AI systems is establishing whether personal data is stored in the trained AI model. The Hamburg data protection authority, for example, argues that the data used is implicitly anonymised through generalisation. However, other authorities disagree, pointing to privacy inversion attacks, which can enable data to be reconstructed. The European Data Protection Board (EDPB) therefore calls for a risk-based assessment that considers both the technical probability of data extraction and the potential impact on individuals.

Assessing risks

A data protection impact assessment (DPIA) pursuant to Art. 35 of the GDPR is generally mandatory when using new technologies, such as generative AI, especially in cases involving large-scale processing or the combination of data sets. The DPIA methodology can largely be transferred to the risk analysis under Article 9 of the AI Act; the DPIA explicitly serves to protect the rights and freedoms of natural persons.

Conclusion/note

The use of AI tools in business requires careful examination under data protection law. The 'GDPR traffic light' is green if there is a clear legal basis; an effective DPA has been concluded if relevant; third-country transfers are secured (SCC, TIA); the GDPR's principles (data minimisation, transparency) are observed; the risks of data storage in the model have been assessed; and a DSFA has been carried out.

AI input and output: Copyright dos and don'ts

What data is used to train AI?

AI models are often trained using external content based on data obtained through text and data mining (TDM). Although Section 44b of the German Copyright Act (UrhG), which implements Article 4 of the DSM Directive, provides a possible legal basis for mining large amounts of data, this provision only regulates the process of data collection, not the AI training itself. The suitability of TDM data for training depends heavily on factors such as data quality, timeliness, scope and relevance. High-quality, structured, clean data sets are essential as poor-quality or inconsistent data negatively affects the quality of AI models.

Licensed data sets are an attractive alternative in this regard as they are usually legally compliant, high quality, structured, and efficient to use. While they guarantee companies clear usage options, they are often costly and limited in their adaptability. TDM, on the other hand, allows for greater flexibility and customisation, and is usually more cost-effective, but often involves more effort in data cleansing and carries greater legal risk.

Overall, it is clear that combining licensed data sets with supplementary TDM data is often the best strategy for making the most of their respective advantages.

One way to protect your own content from TDM is to declare an opt-out usage restriction. For works that are accessible online, this opt-out must be in machine-readable form. It is unclear whether a robots.txt file is sufficient for this purpose; TDM-specific protocols are considered a better approach.

Who owns the AI output?

For something to be eligible for copyright protection, it must be a personal intellectual creation, i.e. a creation by a human. This prerequisite is generally not met in the case of output generated entirely by AI; such content is therefore generally in the public domain and does not enjoy copyright protection. However, when using content from the public domain, there remains a residual risk that protected works were used in the training of the AI and are reproduced in the AI output. To protect your own AI output by copyright, it must undergo substantial human editing that goes beyond purely technical or minor adjustments and demonstrates independent creative effort. Since AI-generated content cannot establish original copyright, exclusive usage rights cannot be sold for it. However, contractual usage agreements can be concluded that contain warranties or guarantees of freedom from rights, for example.

Risks associated with copying styles, plagiarism, etc.

Copying a style, such as the typical 'Ghibli look', does not violate copyright law because styles are not protected. However, it becomes inadmissible if the output incorporates specific protected elements of the original work. The threshold for protection is higher for specialist texts, which often reproduce facts. Nevertheless, an AI-supported paraphrase that lacks creative input or correct citation of sources can be considered plagiarism. Anyone using works licensed under licences such as 'CC BY-ND' should also be aware that these licences do not permit editing or redistribution.

Practical check with examples

In practice, it is always necessary to check which licence conditions apply to AI tools and the content used.

Canva

Designs created with a Canva Pro account can be transferred to and used by one client or employer, provided they comply with the licence requirements. The customer does not need their own account, but this makes rights management easier.

Suno

Music created with a free Suno account is restricted to non-commercial use. A Pro or Premier plan is required for commercial use; otherwise, the rights remain with Suno.

YouTube

Finally, you may not use YouTube content in your own projects, either as music or as a transcript, without the express licence of the rights holder. Only pieces from the YouTube Audio Library or works under appropriate Creative Commons licences are permitted, and even then, only within the scope of the respective conditions.

Verifiability in the event of a dispute

While it is not trivial in practice to prove that content was created using AI, it is possible. The burden of proof lies with the party asserting copyright or ancillary copyright. Evidence that can be used includes platform-internal usage logs, stored prompt histories, file metadata (such as EXIF information or digital watermarks), stylometric or forensic reports, and other circumstantial evidence (such as unusual creation behaviour). While a comparison with the original training data would be highly informative, this is rarely feasible because these datasets usually remain proprietary. According to the court's assessment of the evidence as a whole, absolute certainty is not required; conviction based on the overall picture is sufficient.

Practical tips for the legal use of AI

As explained in our guide, Using AI Safely in 9 Steps, legally compliant AI use is an ongoing process requiring proactive action. Companies should start addressing challenges such as compliance with the AI Act now.

Key practical tips

Establish AI governance

Ideally, establish an AI governance system supported by interdisciplinary teams. Take an inventory.
Identify and document how and where AI systems are used or planned to be used within the company (i.e. create an AI register).

Guidelines

Create a company-specific AI policy.

Determine the scope of application and risk classification

Review the personal scope of application of the AI Act, determining the risk category of AI use cases according to the AI Act risk taxonomy.

Use existing processes

Use existing documentation, such as the directory of processing activities (ROPA), to facilitate compliance with the AI Act.
The data protection impact assessment (DPIA) methodology can also be used for risk analysis.

Documentation

Complete the necessary technical documentation and records.

AI literacy

Ensure that the company has a sufficient level of AI literacy (as defined in Art. 4 of the AI Act), for example by providing training and raising awareness.
It is also advisable to appoint an AI officer to act as the central contact person for AI issues and promote AI within the company.

Human oversight

Plan and implement human oversight for high-risk AI systems.

Continuous adaptation

Establish continuous monitoring of legal developments and adapt your compliance measures in an agile manner.
In addition to the EU level, national implementing legislation (e.g. the draft law implementing the AI Act) will also need to be monitored on an ongoing basis in future.
Violations of the AI Act that are not already subject to fines under the Act itself may be subject to separate fines under national implementing legislation.

AI in practice: act in an informed manner and stay on the right side of the law

While the use of AI tools offers companies immense potential, it is inextricably linked to legal requirements. The AI Act, the GDPR and copyright law form the basic legal framework.

To minimise risks and ensure legal compliance, it is crucial to:

Select the right tools: Carefully review AI tools in terms of how they work, data processing, the conclusion of data processing agreements, and server locations, especially for personal data. Consider recommendations for GDPR-compliant tools, such as Google Gemini Business, ChatGPT (Teams/Enterprise) and DeepL Pro. Avoid tools that are not suitable for business purposes, such as Meta AI.
Proactively review and continuously adapt: Conduct the necessary risk assessments, such as a DSFA. Ensure AI competence and implement human oversight for high-risk systems. Ensure compliance with transparency obligations and copyright rules. Leverage synergies with existing compliance processes.

The legal landscape for AI is constantly evolving. Companies that act in an informed manner, understand the legal framework, and adapt their processes accordingly can safely exploit the opportunities offered by AI and secure their future viability.

Schedule your initial consultation

Describe your situation to us in a no-obligation phone call, and our lawyers will work with you to find the best solution.

Schedule consultation

Our AI services at a glance

Regulatory mapping:
Identifying relevant legal requirements through detailed mapping in accordance with various national specifications and EU data regulations.
Data and AI governance:
Development and customisation of governance structures; identification of requirements; and preparation for the AI Regulation.
Training courses:
Workshops on the scope and implementation of the AI Act and the provision of AI competence in accordance with Art. 4 of the AI Act for managers, product teams and developers.
AI inventory:
Support in creating an overview of all AI systems within the company and determining whether a system should be classified as AI.
Contract drafting:
Drafting contracts in connection with AI projects, such as development contracts, AI-as-a-Service (AIaaS) contracts, and more.
Advice on external Kl applications:
Providing advice and guidance on the use of external Kl applications, as well as reviewing third-party applications.
Anonymisation and pseudonymisation:
Design and advice on anonymisation and pseudonymisation concepts.
Risk assessments:
Advice on risk assessments in the context of data protection and fundamental rights, and consequences assessments in relation to AI systems.
Advice on copyright:
Advice on the copyright implications of GenAl, including rights to data input and the protectability of prompts and output.
Legally compliant data use:
Advice on the legal use of big data, machine learning and generative AI in relation to data protection law, trade secrets and database rights.
Advice on AI development:
Comprehensive advice on contract management, compliance, and other legal aspects of AI development projects.