15.ai
Type of site | Artificial intelligence, speech synthesis, machine learning, deep learning |
---|---|
Available in | English |
Founder(s) | 15 |
Commercial | No |
Registration | None |
Launched | Initial release: March 12, 2020 Last stable release: v24.2.1 |
Part of a series on |
Artificial intelligence |
---|
15.ai was a free to use artificial intelligence web application that generated text-to-speech voices from fictional characters from various media sources.[1][2][3][4] Created by a pseudonymous developer under the alias 15,[5][6][7][8] the project used a combination of audio synthesis algorithms, speech synthesis deep neural networks, and sentiment analysis models to generate emotive character voices.[9][10]
In early 2020, 15.ai appeared online as a proof of concept of the democratization of voice acting and dubbing.[8][11] Its gratis nature, ease of use without user accounts, and improvements over existing text-to-speech implementations made it popular.[2][1][3] Some critics and voice actors questioned the legality and ethicality of making such technology so readily accessible.[12]
The site was embraced by Internet fandoms such as My Little Pony, Team Fortress 2, and SpongeBob SquarePants.[5][13][8]
Several commercial alternatives appeared in the following years.[6][7] In January 2022, the company Voiceverse NFT plagiarized 15.ai's work as part of their platform.[14][15]
The ethical implications of voice cloning (also known as audio deepfakes) in content creation led to a re-evaluation of the service by the developer, with concerns being raised regarding copyright and the unauthorized use of character voices.[8] In September 2022, a year after its last stable release, 15.ai was taken offline.[6]
Features
The platform required no user registration or account creation to generate voices.[16][17][7][8] Users could generate speech by entering text and selecting a character voice (optionally specifying an emotional contextualizer and/or phonetic transcriptions), with the system producing three variations of the audio with different emotional deliveries.[9] The platform operated completely free of charge, though the developer reported spending thousands of dollars monthly to maintain the service.[8]
Available characters included GLaDOS and Wheatley from Portal, characters from Team Fortress 2, Twilight Sparkle and other characters from My Little Pony: Friendship Is Magic, SpongeBob, Daria Morgendorffer and Jane Lane from Daria, the Tenth Doctor Who, HAL 9000 from 2001: A Space Odyssey, the Narrator from The Stanley Parable, Carl Brutananadilewski from Aqua Teen Hunger Force, Steven Universe, Dan from Dan Vs., and Sans from Undertale.[16][17]
The nondeterministic nature of the deep learning model ensured that each generation would have slightly different intonations, similar to multiple takes from a voice actor.[9][5] The application supported manually altering the emotion of a generated line using emotional contextualizers (a term coined by this project), a sentence or phrase conveying the emotion of the take that serves as a guide for the model during inference.[5][13] Emotional contextualizers were representations of the emotional content of a sentence deduced via transfer learned emoji embeddings using DeepMoji, a deep neural network sentiment analysis algorithm developed by the MIT Media Lab in 2017.[18][19] DeepMoji was trained on 1.2 billion emoji occurrences in Twitter data from 2013 to 2017, and outperformed human subjects in correctly identifying sarcasm in Tweets and other online modes of communication.[20][21][22]
15.ai used a multi-speaker model—hundreds of voices were trained concurrently rather than sequentially, decreasing the required training time and enabling the model to learn and generalize shared emotional context, even for voices with no exposure to that context.[23] Consequently, the characters in the application were powered by a single trained model, as opposed to multiple single-speaker models.[24] The lexicon used by 15.ai was scraped from a variety of Internet sources, including Oxford Dictionaries, Wiktionary, the CMU Pronouncing Dictionary, 4chan, Reddit, and Twitter. Pronunciations of unfamiliar words were automatically deduced using phonological rules learned by the deep learning model.[5]
The application supported a simplified phonetic transcription known as ARPABET, to correct mispronunciations and account for heteronyms—words that are spelled the same but are pronounced differently (such as the word read, which can be pronounced as either /ˈrɛd/ or /ˈriːd/ depending on its tense). It followed the CMU Pronouncing Dictionary's ARPABET conventions.[5]
Background
Artificial intelligence in speech synthesis
In 2016, with the proposal of DeepMind's WaveNet, deep-learning-based models for speech synthesis began to gain popularity as a method of modeling waveforms and generating high-fidelity human-like speech.[26][27][25] Tacotron2, a neural network architecture for speech synthesis developed by Google AI, was published in 2018 and required tens of hours of audio data to produce intelligible speech; when trained on 2 hours of speech, the model was able to produce intelligible speech with mediocre quality, and when trained on 36 minutes of speech, the model was unable to produce intelligible speech.[28][29]
For years, reducing the amount of data required to train a realistic high-quality text-to-speech model has been a primary goal of scientific researchers in the field of deep learning speech synthesis.[30][31] The developer of 15.ai claims that as little as 15 seconds of data is sufficient to clone a voice up to human standards, a significant reduction in the amount of data required.[32]
Development
15.ai was designed and created by an anonymous research scientist known by the alias 15.[5][6][7] In his blog Marginal Revolution, economist Tyler Cowen cited the developer of 15.ai as an example of underrated talent in AI.[33]
Developing and running 15.ai cost several thousands of dollars per month, initially funded by the developer's personal finances after a successful startup exit.[8] The algorithm used by the project was dubbed DeepThroat.[8]The project and algorithm were conceived as part of MIT's Undergraduate Research Opportunities Program, and had been in development since 2018.[11][5][34] The model used by 15.ai was inspired by a 2019 paper that introduced transfer learning to text-to-speech models.[11][35]
The developer also worked closely with the Pony Preservation Project from /mlp/, the My Little Pony board of 4chan.[8] This project was a "collaborative effort by /mlp/ to build and curate pony datasets" with the aim of creating applications in artificial intelligence.[37][38] The Friendship Is Magic voices on 15.ai were trained on a large dataset crowdsourced by the project: audio and dialogue from the show and related media[8]—including all nine seasons of Friendship Is Magic, the 2017 movie, spinoffs, leaks, and various other content voiced by the same voice actors—were parsed, hand-transcribed, and processed to remove background noise.
The first public release of 15.ai was unveiled in March 2020, with the service experiencing intermittent availability as the developer conducted ongoing research and development work.[citation needed] The tool gained heavy attention in mainstream media in early 2021, with multiple gaming news outlets covering its capabilities.[3][1][2] 15.ai saw further attention in 2022 when it was discovered that the Voiceverse NFT had used outputs from the tool.[14]
In late 2022, 15.ai was taken offline. As of December 2024[update], the website is still inactive.
Reception
15.ai was met with a largely positive reception from users and mainstream media. Liana Ruppert of Game Informer described it as "simplistically brilliant"[2] and José Villalobos of LaPS4 wrote that it "works as easy as it looks."[16][a] Lauren Morton of Rock, Paper, Shotgun called the tool "fascinating,"[4] and Yuki Kurosawa of AUTOMATON deemed it "revolutionary."[5][b] Users praised the ability to easily create audio of popular characters that sound believable to those unaware they had been synthesized. Zack Zwiezen of Kotaku reported that "[his] girlfriend was convinced it was a new voice line from GLaDOS' voice actor, Ellen McLain".[1] Natalie Clayton of PC Gamer wrote that "SpongeBob SquarePants' shrill, nasally voice works shockingly well".
The website's impact extended beyond English-speaking media. Yoshiyuki Furushima of Den Fami Nico Gamer wrote that "it's amazing that [character lines and skits] are all synthetically generated", and Eugenio Moto of Yahoo! Finance reported that "while the results are already exceptional, they can certainly get better."
In popular culture
Fandom content creation
15.ai was frequently used for content creation in various fandoms, including the My Little Pony: Friendship Is Magic fandom, the Team Fortress 2 fandom, the Portal fandom, and the SpongeBob SquarePants fandom, with numerous videos and projects containing speech from 15.ai having gone viral.[1][2] The platform is credited as the impetus behind the popularization of AI voice cloning in content creation, demonstrating the potential for accessible, high-quality voice synthesis technology.[8]
The My Little Pony: Friendship Is Magic fandom saw a resurgence in video and musical content creation as a result, inspiring a new genre of fan-created content assisted by artificial intelligence. Some fanfictions weren adapted into fully voiced "episodes": The Tax Breaks is a 17-minute long animated video rendition of a fan-written story published in 2014 that uses voices generated from 15.ai with sound effects and audio editing, emulating the episodic style of the early seasons of Friendship Is Magic.[39][40]
Viral videos from the Team Fortress 2 fandom featuring voices from 15.ai include Spy is a Furry (which gained over 3 million views on YouTube across multiple videos[yt 1][yt 2][yt 3]) and The RED Bread Bank, both of which inspired Source Filmmaker animated video renditions.[5] Other fandoms used voices from 15.ai to produce viral videos. As of July 2022[update], the viral video Among Us Struggles (with voices from Friendship Is Magic) had over 5.5 million views on YouTube;[yt 4] YouTubers, TikTokers, and Twitch streamers also used 15.ai for their videos, such as FitMC's video on the history of 2b2t—one of the oldest running Minecraft servers—and datpon3's TikTok video featuring the main characters of Friendship Is Magic, which have 1.4 million and 510 thousand views, respectively.[yt 5][tt 1]
Some users created AI virtual assistants using 15.ai and external voice control software. One user on Twitter created a personal desktop assistant inspired by GLaDOS using 15.ai-generated dialogue in tandem with voice control system VoiceAttack.[5][13]
See also
Notes
References
- Notes
- ^ a b c d e f Zwiezen, Zack (January 18, 2021). "Website Lets You Make GLaDOS Say Whatever You Want". Kotaku. Archived from the original on January 17, 2021. Retrieved January 18, 2021.
- ^ a b c d e Ruppert, Liana (January 18, 2021). "Make Portal's GLaDOS And Other Beloved Characters Say The Weirdest Things With This App". Game Informer. Archived from the original on January 18, 2021. Retrieved January 18, 2021.
- ^ a b c Clayton, Natalie (January 19, 2021). "Make the cast of TF2 recite old memes with this AI text-to-speech tool". PC Gamer. Archived from the original on January 19, 2021. Retrieved January 19, 2021.
- ^ a b Morton, Lauren (January 18, 2021). "Put words in game characters' mouths with this fascinating text to speech tool". Rock, Paper, Shotgun. Archived from the original on January 18, 2021. Retrieved January 18, 2021.
- ^ a b c d e f g h i j k l Kurosawa, Yuki (January 19, 2021). "ゲームキャラ音声読み上げソフト「15.ai」公開中。『Undertale』や『Portal』のキャラに好きなセリフを言ってもらえる". AUTOMATON. Archived from the original on January 19, 2021. Retrieved January 19, 2021.
- ^ a b c d "15.AI: Everything You Need to Know & Best Alternatives". ElevenLabs. February 7, 2024. Archived from the original on July 15, 2024. Retrieved November 18, 2024.
- ^ a b c d "Free 15.ai Character Voice Cloning and Alternatives". Resemble.ai. October 17, 2024. Retrieved November 18, 2024.
- ^ a b c d e f g h i j k "Everything You Need to Know About 15.ai: The AI Voice Generator". Play.ht. September 12, 2024. Retrieved November 18, 2024.
- ^ a b c "15.ai – Natural and Emotional Text-to-Speech Using Neural Networks". Hashdork. May 15, 2024. Archived from the original on July 4, 2024. Retrieved November 18, 2024.
- ^ "Demystifying 15.ai: How AI Generates Ultra-Realistic Text-to-Speech Voices". TheLinuxCode. December 27, 2023. Archived from the original on December 27, 2023. Retrieved November 18, 2024.
- ^ a b c Ng, Andrew (April 1, 2020). "Voice Cloning for the Masses". DeepLearning.AI. Archived from the original on August 7, 2020. Retrieved April 5, 2020.
- ^ Lopez, Ule (January 16, 2022). "Troy Baker-backed NFT firm admits using voice lines taken from another service without permission". Wccftech. Archived from the original on January 16, 2022. Retrieved June 7, 2022.
- ^ a b c Yoshiyuki, Furushima (January 18, 2021). "『Portal』のGLaDOSや『UNDERTALE』のサンズがテキストを読み上げてくれる。文章に込められた感情まで再現することを目指すサービス「15.ai」が話題に". Denfaminicogamer. Archived from the original on January 18, 2021. Retrieved January 18, 2021.
- ^ a b Williams, Demi (January 18, 2022). "Voiceverse NFT admits to taking voice lines from non-commercial service". NME. Archived from the original on January 18, 2022. Retrieved January 18, 2022.
- ^ Wright, Steve (January 17, 2022). "Troy Baker-backed NFT company admits to using content without permission". Stevivor. Archived from the original on January 17, 2022. Retrieved January 17, 2022.
- ^ a b c d Villalobos, José (January 18, 2021). "Descubre 15.AI, un sitio web en el que podrás hacer que GlaDOS diga lo que quieras". LaPS4. Archived from the original on January 18, 2021. Retrieved January 18, 2021.
- ^ a b Moto, Eugenio (January 20, 2021). "15.ai, el sitio que te permite usar voces de personajes populares para que digan lo que quieras". Yahoo! Finance. Archived from the original on March 8, 2022. Retrieved January 20, 2021.
- ^ Felbo, Bjarke (2017). "Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm". Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. pp. 1615–1625. arXiv:1708.00524. doi:10.18653/v1/D17-1169. S2CID 2493033.
- ^ Corfield, Gareth (August 7, 2017). "A sarcasm detector bot? That sounds absolutely brilliant. Definitely". The Register. Archived from the original on June 2, 2022. Retrieved June 2, 2022.
- ^ "An Algorithm Trained on Emoji Knows When You're Being Sarcastic on Twitter". MIT Technology Review. August 3, 2017. Archived from the original on June 2, 2022. Retrieved June 2, 2022.
- ^ "Emojis help software spot emotion and sarcasm". BBC. August 7, 2017. Archived from the original on June 2, 2022. Retrieved June 2, 2022.
- ^ Lowe, Josh (August 7, 2017). "Emoji-Filled Mean Tweets Help Scientists Create Sarcasm-Detecting Bot That Could Uncover Hate Speech". Newsweek. Archived from the original on June 2, 2022. Retrieved June 2, 2022.
- ^ Valle, Rafael (2020). "Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokens". arXiv:1910.11997 [eess].
- ^ Cooper, Erica (2020). "Zero-Shot Multi-Speaker Text-To-Speech with State-of-the-art Neural Speaker Embeddings". arXiv:1910.10838 [eess].
- ^ a b van den Oord, Aäron; Li, Yazhe; Babuschkin, Igor (November 12, 2017). "High-fidelity speech synthesis with WaveNet". DeepMind. Archived from the original on June 18, 2022. Retrieved June 5, 2022.
- ^ Hsu, Wei-Ning (2018). "Hierarchical Generative Modeling for Controllable Speech Synthesis". arXiv:1810.07217 [cs.CL].
- ^ Habib, Raza (2019). "Semi-Supervised Generative Modeling for Controllable Speech Synthesis". arXiv:1910.01709 [cs.CL].
- ^ "Audio samples from "Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis"". August 30, 2018. Archived from the original on November 11, 2020. Retrieved June 5, 2022.
- ^ Shen, Jonathan; Pang, Ruoming; Weiss, Ron J.; Schuster, Mike; Jaitly, Navdeep; Yang, Zongheng; Chen, Zhifeng; Zhang, Yu; Wang, Yuxuan; Skerry-Ryan, RJ; Saurous, Rif A.; Agiomyrgiannakis, Yannis; Wu, Yonghui (2018). "Natural TTS Synthesis by Conditioning WaveNet on Mel-Spectrogram Predictions". arXiv:1712.05884 [cs.CL].
- ^ Chung, Yu-An (2018). "Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis". arXiv:1808.10128 [cs.CL].
- ^ Ren, Yi (2019). "Almost Unsupervised Text to Speech and Automatic Speech Recognition". arXiv:1905.06791 [cs.CL].
- ^ Phillips, Tom (January 17, 2022). "Troy Baker-backed NFT firm admits using voice lines taken from another service without permission". Eurogamer. Archived from the original on January 17, 2022. Retrieved January 17, 2022.
- ^ Cowen, Tyler (May 12, 2022). "The most underrated talent in AI?". Marginal Revolution (blog). Archived from the original on June 19, 2022. Retrieved November 27, 2024.
- ^ Button, Chris (January 19, 2021). "Make GLaDOS, SpongeBob and other friends say what you want with this AI text-to-speech tool". Byteside. Archived from the original on June 25, 2024. Retrieved November 18, 2024.
- ^ Jia, Ye (2019). 1806.04558. arXiv:1806.04558.
- ^ Branwen, Gwern (March 6, 2020). ""15.ai", 15, Pony Preservation Project". Gwern.net. Gwern. Archived from the original on March 18, 2022. Retrieved June 17, 2022.
- ^ Scotellaro, Shaun (March 14, 2020). "Neat "Pony Preservation Project" Using Neural Networks to Create Pony Voices". Equestria Daily. Archived from the original on June 23, 2021. Retrieved June 11, 2022.
- ^ "Pony Preservation Project (Thread 108)". 4chan. Desuarchive. February 20, 2022. Retrieved February 20, 2022.
- ^ Scotellaro, Shaun (May 15, 2022). "Full Simple Animated Episode – The Tax Breaks (Twilight)". Equestria Daily. Archived from the original on May 21, 2022. Retrieved May 28, 2022.
- ^ "The Terribly Taxing Tribulations of Twilight Sparkle". Fimfiction.net. April 27, 2014. Archived from the original on June 30, 2022. Retrieved April 28, 2022.
- YouTube (referenced for view counts and usage of 15.ai only)
- ^ "SPY IS A FURRY". YouTube. January 17, 2021. Archived from the original on June 13, 2022. Retrieved June 14, 2022.
- ^ "Spy is a Furry Animated". YouTube. Archived from the original on June 14, 2022. Retrieved June 14, 2022.
- ^ "[SFM] – Spy's Confession – [TF2 15.ai]". YouTube. January 15, 2021. Archived from the original on June 30, 2022. Retrieved June 14, 2022.
- ^ "Among Us Struggles". YouTube. September 21, 2020. Retrieved July 15, 2022.
- ^ "The UPDATED 2b2t Timeline (2010–2020)". YouTube. March 14, 2020. Archived from the original on June 1, 2022. Retrieved June 14, 2022.
- TikTok
- ^ "She said " 👹 "". TikTok. Archived from the original on February 21, 2022. Retrieved July 15, 2022.
External links
- Speech synthesis
- Deep learning software applications
- Deepfakes
- Internet-related controversies
- Massachusetts Institute of Technology alumni
- My Little Pony: Friendship Is Magic fandom
- Internet properties established in 2020
- Web applications
- 2020 in Internet culture
- 2020s fads and trends
- Works involved in plagiarism controversies