OpenAI, the leading artificial intelligence company, has unveiled a voice cloning model that requires only a 15-second audio sample to operate. This new technology, called Voice Engine, promises to open up new possibilities in a variety of industries, from education to healthcare.
According to OpenAI, Voice Engine can create a synthetic voice based on a short audio clip of a person. This AI-generated voice is capable of reading text prompts in the same language as the original speaker or in several other languages. The company has highlighted that these small-scale implementations are helping to inform its approach, safeguards and thinking about how Voice Engine could be used ethically and beneficially in various industries.
So far, some companies have had limited access to Voice Engine, including Age of Learning, HeyGen, Dimagi, Livox and Lifespan. For example, Age of Learning has been using the technology to generate pre-written voice-over content and to provide “real-time personalized responses” to students, written by GPT-4.
Voice Engine has been developed by OpenAI since late 2022 and has already powered pre-set voices for the text-to-speech API and ChatGPT’s Read Aloud feature. According to Jeff Harris, a member of OpenAI’s product team for Voice Engine, the model was trained on a combination of licensed and publicly available data.
However, OpenAI has been cautious with the distribution of this technology. Currently, it is only available to about 10 developers, and the company has established strict usage policies to ensure its ethical application. These policies include requiring explicit, informed consent from the original speaker, prohibiting impersonation of individuals or organizations without consent, and adding watermarks to audio clips to track their origin.
While text-to-audio generation with AI is an area that continues to evolve, with some notable examples such as Podcastle and ElevenLabs, there are still concerns about its ethical use. The U.S. government, for example, has banned automated calls using AI voices after people received unwanted calls with President Joe Biden’s voice cloned by AI.
OpenAI suggests several steps to limit the risks associated with tools like Voice Engine, including the gradual implementation of policies to protect the use of people’s voices in AI and the development of AI content tracking systems.