Making an AI Singing Model?

Aug 14, 2023 | Law Student Blog,

Making an AI Singing Model?

By Emma Zhuang, Washington University in St. Louis School of Law, LLM candidate, 2023

The Rise of the “AI Singer”
The emergence of AI singers marks a significant milestone in the realm of music. These virtual vocalists are brought to life through the training of AI models using vast collections of human vocal recordings. By assimilating patterns, nuances, and vocal techniques from such datasets, these models can generate new vocal performances that closely resemble those sung by actual human singers.

With the rapid advancement of AI technology, it has become possible to create an “AI singer” voice model using just a few audio materials and a capable NVIDIA graphics card within a remarkably short span of time. This accessibility has fostered a vibrant online community that shares models and resources, guided by the principle of “for learning purposes only.”

However, amid the flourishing landscape of the AI field, a copyright conflict has already surfaced in the United States, igniting a war over AI-generated songs.

The “Muted” Discord Community
A thriving community on the Discord app known as AI Hub, where users share their self-trained So-Vits-SVC/RVC voice models, making it a popular hub for AI speech synthesis models. This community boasts over 140,000 active Discord users. Within the voice-model channel, one can find an array of voice models, including beloved anime characters, singers from different eras and countries, and even AI Rap Battles featuring rappers.

Nevertheless, such recent developments have seen Discord receiving a copyright infringement notice from the Recording Industry Association of America (RIAA) urging the closure of AI Hub’s server and the disclosure of certain users’ identities. Discord has not complied directly with the RIAA’s request to shut down its server, but the links identified as “copyright infringements” in the notice are no longer accessible.

The RIAA, comprised of multiple American record companies, serves as a trade organization representing the U.S. recording industry. Alongside establishing audio standards, managing music licensing, collecting royalties, and certifying sales, its primary mission is to combat copyright infringement and take legal action against piracy.

The RIAA’s request to shut down AI Hub seems to stem from the alleged “sharing of copyrighted audio recordings,” as mentioned in the notice. Based on available information, it is understood that training these “AI singers” voice models necessitates a significant amount of original audio material. Therefore, it is possible that certain shared posts on AI Hub, containing original audio files used for training purposes, have triggered the copyright conflict.

Copyright Issues Involved in Obtaining Original Sound Materials
Obtaining original audio materials typically involves two methods: utilizing music recordings and regular speech files. In both cases, storing the music recordings or speech files on an AI model’s servers is necessary for the production and extraction of the original audio materials.

From a copyright law perspective, storing music recordings or speech files on an AI model’s server can be considered a form of reproduction. If the individual responsible for the reproduction also engages in subsequent acts of distribution or other forms of public use without the necessary authorization from the relevant rights holders, there is a legal risk of copyright infringement. However, if the reproduction is limited to the act itself and does not involve any additional public use, establishing a violation of the rights holder’s reproduction right may prove challenging.

Moreover, if the speech files are recorded by the AI creator through unauthorized performances, it may infringe upon the copyright holder’s reproduction right and potentially violate the performer’s rights as well.

Copyright Issues Involved in Obtaining an AI Singing Model
Training a vocal timbre conversion model using original audio materials involves the internal processing of the data to obtain relevant vocal characteristics and timbre, which are then stored as model parameters. The question arises as to whether this action infringes upon the corresponding rights holder’s adaptation right.

Firstly, from a copyright law perspective, the adaptation right refers to the right to modify a work and create a new work with some originality. However, vocal timbre, voice characteristics, singing techniques, and other personal attributes are considered individual traits. Singing style and techniques, which differ from the original expression fixed on the medium, do not fall under the protected subject matter of copyright law. As these materials themselves do not constitute works, the processing and training of collected audio data to simulate a similar vocal timbre to a specific individual do not employ expressive elements from the original work.

Therefore, the act of obtaining an AI vocal model should not involve the infringement of adaptation rights concerning the sound data or the adaptation of the original musical work.

In addition to copyright, does AI singer give rise to potential risks of infringing upon other rights?Drawing insights from legal precedents such as Midler v. Ford Motor Co. (1998) sheds illuminating light on the deliberate replication of a professional singer’s distinct voice for commercial gain. In this instance, Ford aired a commercial featuring a “sound alike” singer performing one of Bette Midler’s songs. The court’s ruling resounded with significance, asserting that intentionally imitating a singer’s unique voice to promote a product constitutes a misappropriation of the singer’s identity. The court underscored that an individual’s voice holds a stature as distinctive and intimate as their physical appearance, representing an unmistakable reflection of their identity.

Likewise, within the context of Waits v. Frito-Lay, Inc. (1992), the court confronted the matter of imitating a professional singer’s singular voice to endorse a product. Frito-Lay had broadcasted a commercial employing a vocalist who uncannily replicated Tom Waits’ voice without securing his authorization. The court’s verdict resoundingly deemed this calculated mimicry of Waits’ exceptional voice for promotional purposes as impermissible. The court elucidated that if a party’s utilization of an individual’s identity lacks substantive informational value and instead capitalizes on the identity, it could be deemed a wrongful act.

Consequently, AI-generated singing could potentially encroach upon the inherent right to personality of the original performer. From a certain perspective, an individual’s voice can be regarded as integral to their personal rights. Famous singers have voices that have become firmly ingrained in public awareness. These distinct tonal qualities have seamlessly woven themselves into their very identity.

However, when AI emulates a comparable voice, the risk of listeners confusing the AI-generated rendition with that of the original artist escalates significantly. In such a scenario, the AI’s vocal reproduction could potentially undermine the fundamental facets of the original performer’s personal identity, possibly culminating in a legal assertion of infringement upon their right to personality.

Works Protected by Copyright
In the context of copyright law, the definition of a “work” typically encompasses an original intellectual creation. If AI-generated content were to be deemed eligible for copyright protection, it logically opens the door to similar considerations for other non-human entities, such as animals, as evidenced by the “monkey selfie case.” This incident unfolded during a British photographer’s endeavor to capture monkey behavior in Indonesia. Unexpectedly, a curious monkey seized the photographer’s camera and, in a twist of events, inadvertently snapped a selfie by mimicking the photographer’s actions. After the photographer reclaimed the camera, he shared the accidental self-portrait with a friend, who subsequently shared it online, triggering a significant online reaction. This occurrence led to a legal dispute, with an animal welfare organization initiating a lawsuit in a U.S. court. Their argument centered on the notion that since the monkey, in its role as an inadvertent photographer, captured the selfie, it should be granted copyright ownership over the self-portrait.

The court ultimately ruled that the monkey did not qualify as an author under copyright law, consequently rendering these photographs ineligible for copyright protection. The Compendium of U.S. Copyright Office Practices, Third Edition, updated to reflect the latest standards, notably includes an example of works that do not meet the criteria of human authorship – the case of “photos taken by a monkey.” This case effectively dismisses the notion of non-human entities assuming authorship and thereby obtaining copyright rights for creative works.

The foundational purpose of copyright law revolves around stimulating and rewarding human ingenuity. Hence, the stipulation of “originality” as a prerequisite for copyright safeguarding is intrinsically rooted in human intellectual processes. Works eligible for copyright protection are inherently expected to be the result of human intellectual endeavors.

Applying copyright protection to AI-generated content might inadvertently blur the line between human and non-human contributions, thus challenging the very essence of copyright law. By including both AI and animals in the realm of copyright protection, the core principle of encouraging human creativity, which underpins copyright law, could be undermined.

Does AI Singing Infringe the Copyright of the Original Songs?
When AI performs covers of classic songs, which involve reproducing the content of the original songs, it may potentially infringe upon the reproduction right of the original songs. AI singing requires storing the original songs on AI model servers, separating instrumental tracks from the vocals of the original song (performed by the original singer), which may infringe upon the reproduction rights of the copyright holders of the lyrics and music composition, the producers of the sound recordings, and the performers.

Furthermore, if AI singing uses the instrumental audio of a classic song (or audio from other rights holders), it may infringe upon the rights of the music recording producers of that classic song (or other rights holders).

Is AI-related Law Still Too Far Away?
Nowadays, the emergence and rapid development of various forms of artificial intelligence signify significant technological progress and a leap in new concepts. Naturally, we hold the hope that artificial intelligence will contribute to a better world and enhance our lives. However, it is essential to establish order and regulations to govern any evolving phenomenon effectively.

At present, the competitive landscape among different artificial intelligence technologies has resulted in a state of disorder within the AI industry. Therefore, it becomes crucial for legal authorities to promptly enact comprehensive laws and regulations that foster norms and restore order within the AI industry. Such measures will not only safeguard rights but also nurture innovation while providing clear guidelines for the responsible application of AI technology. We eagerly anticipate the timely implementation of a legal framework by the relevant authorities, one that aligns with the current AI landscape. This will help create a stable and orderly environment for the sustainable development of the AI industry.

Our summer associate program is supported by:

The Bar Plan logo