Openai Voice Engine – What is it and how does it work?

Openai Voice Engine

OpenAI recently unveiled a new speech cloning tool, indicating that it is prepared to enter the voice assistant market. Sam Altman, the man behind ChatGPT, has said that security issues will prevent this technology from being available to the general public.

Openai Voice Engine 

Voice Engine is a text-to-speech creation platform that OpenAI developed.  It can use a 15-second audio clip to create a synthetic voice.  However, access to the platform is restricted. The voice produced by AI can read text prompts aloud in multiple languages or the identical language as the individual who speaks upon request. 

According to OpenAI representatives, These small-scale implementations are serving to inform our strategy, security measures, and ideas about how Voice Engine might be used to benefit humanity across various industries.

Some companies that have access are Age of Learning, a visual storytelling platform; Livox, a provider of AI communication apps; Dimagi, a maker of primary health software; and health system Lifespan.

When was Openai Voice Engine created, and which companies offer tools and technologies for AI voice cloning?

According to OpenAI, voice engine development started in late 2022, and the technology was previously used to power text-to-speech API preset voices and ChatGPT’s read-aloud function. 

Jeff Harris, a member of OpenAI’s Voice Engine product team, stated that the model was developed using a “mix of authorized and publicly accessible data.” According to OpenAI, the model will only be accessible to roughly ten developers.

One aspect of generative AI that is still developing is text-to-audio generation using AI. Fewer have concentrated on voice generation, partly because of the queries OpenAI raised, whereas the majority concentrate on orchestral or natural sounds. 

Among the names in space are companies like ElevenLabs and Podcastle, which offer tools like AI voice cloning technologies that Vergecast examined a year ago.

How is the voice Synthesizing?

Surprisingly, Voice Engine hasn’t been trained or optimized using user data. This is partially due to the transient nature of speech generation produced by the model, which combines a transformer and a diffusion process.

Harris claims, “We use a small audio sample and text to generate authentic speech that resembles the actual speaker.”The used audio is deleted as soon as the request is approved.

As he put it, the model creates a corresponding voice without creating a unique model for each speaker by concurrently evaluating the speech data it uses and the text data intended to be read aloud.

Higher-quality speech is available with OpenAI’s Voice Engine, which costs $15 for one million characters. Although it is no longer included in marketing materials, the pricing is less than $1 per hour and less than ElevenLabs’ $11 for 100,000 characters monthly. 

While the 15-second voice sample’s intensity will endure into future generations, Voice Engine does not offer tone, pitch, or rhythm customization options.

When will you expect the general Release of the Openai Voice Engine?

OpenAI may make the tool available to a larger developer base when the preview concludes given the public’s reaction to Voice Engine. Still, the company is now hesitant to make any firm commitments.

However, Harris provided a preview of Voice Engine’s plans, stating that OpenAI is testing a security feature that requires users to read aloud words generated at random to verify their presence and awareness of how their voice is being utilized. 

This may be the beginning, or it may provide OpenAI the assurance it needs to make Voice Engine available to more people, according to Harris.

He stated that the development of voice-matching technology will mostly be driven by the lessons learned from the pilot, any safety concerns that are found, and the mitigations that have already been put in place. We do not want people to mistakenly believe that artificial voices are the same as real voices.

Voice talent as a source of income

The talent industry has long been battling the existential danger created by generative AI, so it wouldn’t exactly be taken off guard. More and more voice actors are being asked to cede their voice rights so that businesses can use artificial intelligence (AI) to make synthetic versions of them that could eventually replace the real actors.

Some AI voice platforms, like Replica Studios and ElevenLabs, which provide synthetic voices and payment to original producers, are attempting to find a middle ground.

At least not shortly, OpenAI will not create such labor union agreements or marketplaces. Instead, users will only need to get the “explicit consent” of the individuals whose voices are being cloned, disclose which voices are generated by AI, and promise not to utilize the voices of young people, the deceased, or prominent politicians from their generation.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top