GENERATIVE AI CONTRACTS — LEGAL CHECKLIST
Negotiating B2B generative AI contracts involves careful consideration of various legal, technical, and business aspects. This article highlights some legal matters that arise when negotiating such contracts and how parties may navigate such issues.
What is Generative AI?
As a brief introduction, in simple terms, generative AI has been described as “a form of AI that can autonomously generate new content, such as text, images, audio, and video” (See Zhihan Lv, Generative artificial intelligence in the metaverse era). This process is achieved by using algorithms and models that can generate content based on patterns and data used to train such models. Large Language Models (LLMs) are a specific type of generative AI trained using massive datasets to understand and generate natural language output. Due to the size of the data needed to train the generative AI models, generative AI has raised and continues to raise several concerns around IP rights and data protection (among others). At the time of writing this piece, at least a couple of class action suits have been filed against OpenAI and Google regarding their generative AI platforms, ChatGPT and Bard (See P.M & ors. v. Open AI LP & ors. and J.L. v. Alphabet Inc.). These suits allege mass violations of individuals’ privacy rights (including obtaining personal data without consent through techniques like web scraping), infringement of IP rights of creators/owners of online works, and unfair and unlawful competitive practices, among others, and highlight the various concerns with the training of LLMs.
In the context of the above, parties to B2B generative AI contracts must identify potential challenges in the training and use of AI models and negotiate adequate protections to address such challenges. As we discuss these matters, we will focus on two main types of contracts that generative AI companies enter into with partners and customers — (i) data sharing contracts (DSC) and (ii) service agreements.
a. Scope of Contract — As noted above, generative AI models rely on large amounts of training data to autonomously generate new content. Generative AI service providers gather this data in several ways, including through the processing of already existing datasets. These datasets could include books, video content, voice recordings, images, and other data a partner provides to the service provider for training their models. The parties must therefore negotiate a DSC that defines clearly the purpose for which the datasets are collected, how the data is to be provided, the length of time during which the data will be used for training the models, and the timeline for return of the data once the training is completed. The party providing the data will want specific confirmation in the agreement that the data provided is only to be used to train the service provider’s AI models and not for any other purpose.
b. IP Ownership and Licensing — We will discuss this matter in 2 contexts, first in the context of the DSC and then also in the context of the service agreement. Regarding the DSC, the contract must specify that the party providing the training data owns and retains the IP rights (including copyrights and trademarks) in the training data. The DSC then specifies that during the agreement term, such party grants a license (with standard limitations to the scope) to the service provider to use, modify, or otherwise process the training data to train the AI model. On the other hand, the IP in the trained models (i.e., the result of using the training data to improve the AI model) vests in the service provider. The contract must also specify that the service provider has the right (unlimited) to commercialize the trained models. This is necessary to ensure that the data provider is entitled to license the trained models to third parties without any challenge or interference from the party providing the data. Service providers should also retain rights to their pre-existing works (i.e., all IP developed by the service provider independently, including their platforms, software, or methods for training the models).
Similarly, in service agreements in which the service provider provides its generative AI model to B2B customers for use in their products, the different IP rights held by the service provider and customer must be clearly specified. In most cases, the customer would provide some form of input into the service provider’s platform to generate content. This input could consist of images, text, or voice prompts, among other kinds of data. The service agreement should specify that the customer owns all rights and interests in such customer content (including IP rights) or that a third party has licensed or assigned such rights to the customer. This is where the question of ownership of the underlying content is crucial. Suppose the customer obtained its content from third parties, e.g., authors or voice actors. In that case, the customer must ensure that the third party has consented adequately to the use of or assigned their IP or other rights in the content or that the work properly constitutes “work made for hire” as defined in the US Copyright Act (17 US Code Sec. 101). In this regard, content obtained by the customer through practices such as web scraping will pose significant risks for the service provider as the customer may be unable to demonstrate that it has sufficient rights to provide such content to the service provider. We will discuss in subsequent paragraphs how the service provider may protect itself from such circumstances through contractual mechanisms.
Also, in service agreements, the service provider should have all rights, titles, and interests to its platforms and software, which it provides to the customer for access to the generative AI services. The service provider will grant the customer a license (with standard limitations as to the scope) to use the services per the terms of the agreement and subject to payment of relevant fees. The ownership of IP and other rights in the service output (i.e., content generated from the customer’s use of the services) will typically be owned by the customer. However, the service provider must clarify in the contract that service output may be similar across users, given the nature of generative AI services.
c. Representations and Warranties — Both types of contracts include standard mutual representations and warranties (R&Ws) of the parties, including — that each party has the right to enter the contract, that each party has obtained relevant consents and approvals, and that the contract contains valid and enforceable obligations of either party. A vital representation that the service provider must obtain from the customer in a service agreement or, in the case of the DSC, the provider of the training data, is that they have the relevant consent, releases, or permission necessary to share the customer content or training data with the provider. This is highly relevant, especially in the context of third-party data, which is included in the customer content or training data. Given that R&Ws provide a right of action against the party making the representations if they are breached, the inclusion of this representation shifts the risk to the customer or party providing the data. It gives a cushion for the service provider if proper consent is not given for the use of such third-party data. Although the fundamentals of this R&W should not be compromised, parties may negotiate around how to obtain this consent, release, or permission. In certain instances where it is impractical to obtain written consent, release, or permission from all third parties, service providers may be amenable to removing a requirement for such consent, release, or permission to be in writing. Thus, actions or conduct from which one can infer consent could be a reasonable compromise.
d. Exclusivity — In DSCs, exclusivity may be a crucial negotiation point for both parties. The training data provider may be wary of the trained models being accessed by its competitors. To safeguard against this, parties negotiate exclusivity terms into the DSCs. The exclusivity provisions typically state that the service provider, for a certain period, would only provide the trained data to the provider for its internal business purposes and would not provide the trained data to a specified number of named competitors. These terms must spell out the scope of the exclusivity (i.e., geographical locations, list of competitors, timeline for exclusivity). The scope of exclusivity is essential as the service provider would want to avoid any instances in which its operations are unduly limited because of the DSC. For instance, the service provider would want to avoid provisions prohibiting the service provider from engaging in “all forms of business interactions” with an extensive list of competitors, even outside the field where the data provider operates.
e. Limitation of Liability and Disclaimers — In light of the broad and unpredictable range of liability that could arise from generative AI platforms, service providers seek protections in the form of liability limitations within the contract, especially in the context of service agreements. While the customer would generally be amenable to limiting damages to direct damages and excluding indirect, special, and consequential damages, parties would typically bargain for certain major breaches to be excluded from this limitation, especially if the breach is because of willful misconduct or gross negligence. Such exclusions include breaches of confidentiality, representations, and indemnification provisions. In any case, the service provider must ensure these exclusions are mutual and not one-sided. Separately, similar to other kinds of service agreements, service providers push for monetary liability limits for any breaches arising from the agreement. This protects the service provider from incurring significantly more liability than the effort and benefit obtained from the contract. The monetary limit is a commercial point negotiated between the parties and can start anywhere from 2x the fees paid in the 12 months prior to the incident. Service providers also include standard disclaimers of warranties for the generative AI services to exclude exposure to liability for warranties such as for fitness of purpose and merchantability, among others.
f. Indemnification — This is another significant risk-allocation mechanism within generative AI contracts, similar to other service agreements. In service agreements, the parties may agree on mutual indemnification obligations for breaches of obligations under the agreement. On more specific terms, customers typically seek indemnification against third-party allegations that the service provider’s IP infringes on or violates a third party’s IP. The service provider would typically seek indemnification against any claims that the customer-provided data or content infringes any third-party IP or any other rights (including property rights). This protection is significant, especially considering the IP infringement issues that arise in the context of generative AI tools. Suppose the customer does not have the consents, releases, and permissions it represented to have, and the affected parties institute an action against the service provider for the AI models that were trained with such data. In that case, this provision will enable the service provider to seek recovery, not just for damages but for costs and expenses incurred in defending the claim.
g. Data Protection — DSCs and service agreements for generative AI must contain adequate provisions regarding data protection. This is especially relevant in the wake of the claims against generative AI Companies for massive data privacy breaches ranging from — failure to obtain consent of the owners of personal data or not having any other lawful basis for processing their data, failure to provide notify data subjects of the reasons for the use, inability of data subjects to exercise their “opt-out” rights with respect to trained models, as provided by data protection laws such as the EU’s General Data Protection Regulation 2017 (GDPR) and the California Consumer Privacy Act 2018 (CCPA). Accordingly, the parties typically agree as a baseline that each party shall comply with applicable data protection laws and regulations in processing personal data and will also have adequate technical and organizational security measures to prevent unauthorized processing, loss, or destruction of personal data. Customers who provide training data or other forms of customer data, including personally identifiable information, typically require that the service provider execute a data protection addendum (DPA) per the GDPR or other form of data protection agreement required by applicable law. However, in such agreements, service providers aim to avoid taking on more onerous obligations than applicable laws require. They may also consider imposing monetary limits on liability for breaches of the data protection agreement.
h. Termination — In generative AI contracts, outlining the duration of the contract and the manner in which it may be terminated is crucial. Due to the nature of DSCs and even in service agreements, the service provider may desire a ‘lock-in’ period during which the parties cannot terminate the contract for convenience to ensure that the service provider maximizes its access to the training data or other benefit in the case of service agreements. This lock-in period can be anywhere from 6 months to a couple of years. In such cases, parties should ensure that this lock-in does not prevent a termination where there are gross infractions or material breaches, for instance, if the training data is later found to contain data that were included without the relevant consent. Recognizing this, parties can negotiate limited exceptions to the lock-in period in which a party may terminate for a specified material breach. Once the parameters of the lock-in period are established, the parties should also specify the rules for termination after the lock-in period. Essentially, the parties may terminate for convenience upon giving notice (typically 30–90 days before the conclusion of the relevant term). The parties may also terminate for material breaches immediately if not cured within a specified cure period. Parties also include the right to terminate if the other party undergoes a bankruptcy or insolvency event. In addition, the obligations of the parties as regards the training data following termination (including deletion, return, or preservation of confidentiality), must be clearly specified.
In conclusion, negotiating generative AI contracts requires an in-depth consideration of various legal, business, and technical issues. This piece highlights some of the negotiating points in generative AI contracts, and the writers note that there are other matters that are not addressed here. This piece does not constitute legal advice, and readers are encouraged to seek proper legal counsel when negotiating contracts of this nature.