Jump to content

15.ai: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Included the same Andrew Ng image from the DeepLearning.AI article.
Development: Unsourced, irrelevant. It is gone. There is no need for a "still gone" running commentary.
 
(27 intermediate revisions by 11 users not shown)
Line 12: Line 12:
| launch_date = '''Initial release''': {{start date and age|2020|03|12}}<br/>'''Last stable release''': v24.2.1
| launch_date = '''Initial release''': {{start date and age|2020|03|12}}<br/>'''Last stable release''': v24.2.1
| type = [[Artificial intelligence]], [[speech synthesis]], [[machine learning]], [[deep learning]]
| type = [[Artificial intelligence]], [[speech synthesis]], [[machine learning]], [[deep learning]]
| website = {{URL|https://15.ai}}
| language = English
| language = English
}}
}}
{{Artificial intelligence}}
{{Artificial intelligence}}
'''15.ai''' was a [[freeware]] [[artificial intelligence]] [[web application]] that generated [[text-to-speech]] voices from fictional characters from various media sources.<ref name="kotaku">{{cite web
'''15.ai''' was a free to use [[artificial intelligence]] [[web application]] that generated [[text-to-speech]] voices from fictional characters from various media sources.<ref name="kotaku">{{cite web
|url= https://kotaku.com/this-website-lets-you-make-glados-say-whatever-you-want-1846062835
|url= https://kotaku.com/this-website-lets-you-make-glados-say-whatever-you-want-1846062835
|title= Website Lets You Make GLaDOS Say Whatever You Want
|title= Website Lets You Make GLaDOS Say Whatever You Want
Line 64: Line 63:
|archive-url= https://web.archive.org/web/20210118213308/https://www.rockpapershotgun.com/2021/01/18/put-words-in-game-characters-mouths-with-this-fascinating-text-to-speech-tool/
|archive-url= https://web.archive.org/web/20210118213308/https://www.rockpapershotgun.com/2021/01/18/put-words-in-game-characters-mouths-with-this-fascinating-text-to-speech-tool/
|url-status= live
|url-status= live
}}</ref> Created by a [[pseudonym]]ous developer under the alias '''15''',<ref name="automaton"/><ref name="elevenlabs">{{cite web
}}</ref> Created by a [[pseudonym]]ous developer under the alias '''15''',<ref name="automaton">{{cite web
|url= https://automaton-media.com/articles/newsjp/20210119-149494/
|title= ゲームキャラ音声読み上げソフト「15.ai」公開中。『Undertale』や『Portal』のキャラに好きなセリフを言ってもらえる
|last= Kurosawa
|first= Yuki
|date= 2021-01-19
|website= [[:jp:AUTOMATON|AUTOMATON]]
|access-date= 2021-01-19
|quote=
|archive-date= 2021-01-19
|archive-url= https://web.archive.org/web/20210119103031/https://automaton-media.com/articles/newsjp/20210119-149494/
|url-status= live
}}</ref><ref name="elevenlabs">{{cite web
|url=https://elevenlabs.io/blog/15-ai
|url=https://elevenlabs.io/blog/15-ai
|title=15.AI: Everything You Need to Know & Best Alternatives
|title=15.AI: Everything You Need to Know & Best Alternatives
Line 85: Line 96:
|date=2024-09-12
|date=2024-09-12
|access-date=2024-11-18
|access-date=2024-11-18
}}</ref> the project used a combination of [[audio synthesis]] algorithms, [[speech synthesis]] [[deep neural networks]], and [[sentiment analysis]] models to generate emotive character voices faster than real-time.{{efn|The term ''"faster than real-time"'' in speech synthesis means that the system can generate audio more quickly than the actual duration of the speech &ndash; for example, generating 10 seconds of speech in less than 10 seconds would be considered faster than real-time.}}<ref name="hashdork">{{cite web
}}</ref> the project used a combination of [[audio synthesis]] algorithms, [[speech synthesis]] [[deep neural networks]], and [[sentiment analysis]] models to generate emotive character voices.<ref name="hashdork">{{cite web
|url=https://hashdork.com/15-ai/
|url=https://hashdork.com/15-ai/
|title=15.ai – Natural and Emotional Text-to-Speech Using Neural Networks
|title=15.ai – Natural and Emotional Text-to-Speech Using Neural Networks
Line 106: Line 117:


In early 2020, 15.ai appeared online as a [[proof of concept]] of the [[democratization of technology|democratization]] of [[voice acting]] and [[dubbing]].<ref name="play.ht"/><ref name="thebatch">
In early 2020, 15.ai appeared online as a [[proof of concept]] of the [[democratization of technology|democratization]] of [[voice acting]] and [[dubbing]].<ref name="play.ht"/><ref name="thebatch">
{{cite web |last=Ng |first=Andrew |date=2020-04-01 |title=Voice Cloning for the Masses |url=https://blog.deeplearning.ai/blog/the-batch-ai-against-coronavirus-datasets-voice-cloning-for-the-masses-finding-unexploded-bombs-seeing-see-through-objects-optimizing-training-parameters |url-status=dead |archive-url=https://web.archive.org/web/20200807111844/https://blog.deeplearning.ai/blog/the-batch-ai-against-coronavirus-datasets-voice-cloning-for-the-masses-finding-unexploded-bombs-seeing-see-through-objects-optimizing-training-parameters |archive-date=2020-08-07 |access-date=2020-04-05 |website=[[DeepLearning.AI]] |quote=}}
{{cite web |last=Ng |first=Andrew |date=2020-04-01 |title=Voice Cloning for the Masses |url=https://blog.deeplearning.ai/blog/the-batch-ai-against-coronavirus-datasets-voice-cloning-for-the-masses-finding-unexploded-bombs-seeing-see-through-objects-optimizing-training-parameters |url-status=dead |archive-url=https://web.archive.org/web/20200807111844/https://blog.deeplearning.ai/blog/the-batch-ai-against-coronavirus-datasets-voice-cloning-for-the-masses-finding-unexploded-bombs-seeing-see-through-objects-optimizing-training-parameters |archive-date=2020-08-07 |access-date=2020-04-05 |website=DeepLearning.AI |quote=}}
</ref> Its gratis nature, ease of use without [[user account|user accounts]], and improvements over existing text-to-speech implementations made it popular.<ref name="gameinformer"/><ref name="kotaku" /><ref name="pcgamer" /> Some critics and [[voice actor]]s questioned the [[15ai#Copyrighted material in deep learning|legality]] and [[Ethics of artificial intelligence|ethicality]] of making such technology so readily accessible.<ref name="wccftech"/>
</ref> Its gratis nature, ease of use without [[user account|user accounts]], and improvements over existing text-to-speech implementations made it popular.<ref name="gameinformer"/><ref name="kotaku" /><ref name="pcgamer" /> Some critics and [[voice actor]]s questioned the [[15ai#Copyrighted material in deep learning|legality]] and [[Ethics of artificial intelligence|ethicality]] of making such technology so readily accessible.<ref name="wccftech">{{cite web |last=Lopez |first=Ule |date=2022-01-16 |title=Troy Baker-backed NFT firm admits using voice lines taken from another service without permission |url=https://wccftech.com/voiceverse-nft-service-uses-stolen-technology-from-15ai/ |url-status=live |archive-url=https://web.archive.org/web/20220116194519/https://wccftech.com/voiceverse-nft-service-uses-stolen-technology-from-15ai/ |archive-date=2022-01-16 |access-date=2022-06-07 |website=Wccftech}}</ref>


The site was credited as the impetus behind the popularization of AI [[audio deepfake|voice cloning]] (also known as ''[[deepfakes|audio deepfakes]]'') in [[content creation]].<ref name="play.ht"/> It was embraced by Internet [[fandom]]s such as [[My Little Pony: Friendship Is Magic fandom|''My Little Pony'']], ''[[Team Fortress 2]]'', and ''[[SpongeBob SquarePants]]''.<ref name="automaton"/><ref name="Denfaminicogamer"/><ref name="play.ht"/>
The site was embraced by Internet [[fandom]]s such as [[My Little Pony: Friendship Is Magic fandom|''My Little Pony'']], ''[[Team Fortress 2]]'', and ''[[SpongeBob SquarePants]]''.<ref name="automaton"/><ref name="Denfaminicogamer">{{cite web
|url= https://news.denfaminicogamer.jp/news/210118f
|title= 『Portal』のGLaDOSや『UNDERTALE』のサンズがテキストを読み上げてくれる。文章に込められた感情まで再現することを目指すサービス「15.ai」が話題に
|last= Yoshiyuki
|first= Furushima
|date= 2021-01-18
|website= Denfaminicogamer
|access-date= 2021-01-18
|quote=
|archive-date= 2021-01-18
|archive-url= https://web.archive.org/web/20210118051321/https://news.denfaminicogamer.jp/news/210118f
|url-status= live
}}</ref><ref name="play.ht"/>


Several commercial alternatives appeared in the following years.<ref name="elevenlabs"/><ref name="resemble"/> In January 2022, the company Voiceverse NFT [[plagiarism|plagiarized]] 15.ai's work as part of their platform.<ref name="nme">{{cite web
Several commercial alternatives appeared in the following years.<ref name="elevenlabs"/><ref name="resemble"/> In January 2022, the company Voiceverse NFT [[plagiarism|plagiarized]] 15.ai's work as part of their platform.<ref name="nme">{{cite web
Line 137: Line 160:
}}</ref>
}}</ref>


The ethical implications of [[audio deepfake|voice cloning]] (also known as ''[[deepfakes|audio deepfakes]]'') in [[content creation]] led to a re-evaluation of the service by the developer, with concerns being raised regarding copyright and the unauthorized use of character voices.<ref name="play.ht"/> In September 2022, a year after its last stable release, 15.ai was taken offline.<ref name="elevenlabs"/>
In September 2022, a year after its last stable release, 15.ai was taken offline.<ref name="elevenlabs"/> As of November 2024, the website was still offline, with the creator's most recent post being dated February 2023.<ref>{{Cite tweet |number=1628834708653068290 |user=fifteenai |title=If all goes well, the next update should be the culmination of a year and a half of nonstop work put into a huge number of fixes and major improvements to the algorithm. Just give me a bit more time – it should be worth it.}}</ref>


== Features ==
== Features ==
Line 143: Line 166:
The platform required no [[user registration]] or [[user (computing)|account creation]] to generate voices.<ref name="LaPS4"/><ref name="yahoofin"/><ref name="resemble"/><ref name="play.ht"/> Users could generate speech by entering text and selecting a character voice (optionally specifying an emotional contextualizer and/or phonetic transcriptions), with the system producing three variations of the audio with different emotional deliveries.<ref name="hashdork"/> The platform operated completely [[freeware|free of charge]], though the developer reported spending thousands of dollars monthly to maintain the service.<ref name="play.ht"/>
The platform required no [[user registration]] or [[user (computing)|account creation]] to generate voices.<ref name="LaPS4"/><ref name="yahoofin"/><ref name="resemble"/><ref name="play.ht"/> Users could generate speech by entering text and selecting a character voice (optionally specifying an emotional contextualizer and/or phonetic transcriptions), with the system producing three variations of the audio with different emotional deliveries.<ref name="hashdork"/> The platform operated completely [[freeware|free of charge]], though the developer reported spending thousands of dollars monthly to maintain the service.<ref name="play.ht"/>


Available characters included [[GLaDOS]] and [[Wheatley (Portal)|Wheatley]] from ''[[Portal (series)|Portal]]'', characters from ''[[Team Fortress 2]]'', [[Twilight Sparkle]] and other [[List of My Little Pony: Friendship Is Magic characters|characters]] from ''[[My Little Pony: Friendship Is Magic]]'', [[SpongeBob SquarePants (character)|SpongeBob]], [[Daria Morgendorffer]] and [[Jane Lane (Daria)|Jane Lane]] from ''[[Daria]]'', the [[Tenth Doctor|Tenth Doctor Who]], [[HAL 9000]] from ''[[2001: A Space Odyssey (film)|2001: A Space Odyssey]]'', the Narrator from ''[[The Stanley Parable]]'', [[Carl Brutananadilewski]] from ''[[Aqua Teen Hunger Force]]'', [[Steven Universe (character)|Steven Universe]], Dan from ''[[Dan Vs.]]'', and [[Sans (Undertale)|Sans]] from ''[[Undertale]]''.<ref name="Denfaminicogamer">{{cite web
Available characters included [[GLaDOS]] and [[Wheatley (Portal)|Wheatley]] from ''[[Portal (series)|Portal]]'', characters from ''[[Team Fortress 2]]'', [[Twilight Sparkle]] and other [[List of My Little Pony: Friendship Is Magic characters|characters]] from ''[[My Little Pony: Friendship Is Magic]]'', [[SpongeBob SquarePants (character)|SpongeBob]], [[Daria Morgendorffer]] and [[Jane Lane (Daria)|Jane Lane]] from ''[[Daria]]'', the [[Tenth Doctor|Tenth Doctor Who]], [[HAL 9000]] from ''[[2001: A Space Odyssey (film)|2001: A Space Odyssey]]'', the Narrator from ''[[The Stanley Parable]]'', [[Carl Brutananadilewski]] from ''[[Aqua Teen Hunger Force]]'', [[Steven Universe (character)|Steven Universe]], Dan from ''[[Dan Vs.]]'', and [[Sans (Undertale)|Sans]] from ''[[Undertale]]''.<ref name="LaPS4">{{cite web
|url= https://news.denfaminicogamer.jp/news/210118f
|title= 『Portal』のGLaDOSや『UNDERTALE』のサンズがテキストを読み上げてくれる。文章に込められた感情まで再現することを目指すサービス「15.ai」が話題に
|last= Yoshiyuki
|first= Furushima
|date= 2021-01-18
|website= Denfaminicogamer
|access-date= 2021-01-18
|quote=
|archive-date= 2021-01-18
|archive-url= https://web.archive.org/web/20210118051321/https://news.denfaminicogamer.jp/news/210118f
|url-status= live
}}</ref><ref name="automaton">{{cite web
|url= https://automaton-media.com/articles/newsjp/20210119-149494/
|title= ゲームキャラ音声読み上げソフト「15.ai」公開中。『Undertale』や『Portal』のキャラに好きなセリフを言ってもらえる
|last= Kurosawa
|first= Yuki
|date= 2021-01-19
|website= [[:jp:AUTOMATON|AUTOMATON]]
|access-date= 2021-01-19
|quote=
|archive-date= 2021-01-19
|archive-url= https://web.archive.org/web/20210119103031/https://automaton-media.com/articles/newsjp/20210119-149494/
|url-status= live
}}</ref><ref name="LaPS4">{{cite web
|url= https://www.laps4.com/noticias/descubre-15-ai-un-sitio-web-en-el-que-podras-hacer-que-glados-diga-lo-que-quieras/
|url= https://www.laps4.com/noticias/descubre-15-ai-un-sitio-web-en-el-que-podras-hacer-que-glados-diga-lo-que-quieras/
|title= Descubre 15.AI, un sitio web en el que podrás hacer que GlaDOS diga lo que quieras
|title= Descubre 15.AI, un sitio web en el que podrás hacer que GlaDOS diga lo que quieras
Line 252: Line 251:
In 2016, with the proposal of [[DeepMind]]'s [[WaveNet]], deep-learning-based models for speech synthesis began to gain popularity as a method of modeling waveforms and generating high-fidelity human-like speech.<ref name="arxiv1">{{cite arXiv |last=Hsu |first=Wei-Ning |eprint=1810.07217 |title=Hierarchical Generative Modeling for Controllable Speech Synthesis |class=cs.CL |date=2018 }}</ref><ref name="arxiv2">{{cite arXiv |last=Habib |first=Raza |eprint=1910.01709 |title=Semi-Supervised Generative Modeling for Controllable Speech Synthesis |class=cs.CL |date=2019 }}</ref><ref name="deepmind">{{cite web|url=https://www.deepmind.com/blog/high-fidelity-speech-synthesis-with-wavenet|title=High-fidelity speech synthesis with WaveNet|last1=van den Oord|first1=Aäron|last2=Li|first2=Yazhe|last3=Babuschkin|first3=Igor|date=2017-11-12|website=[[DeepMind]]|access-date=2022-06-05|archive-date=2022-06-18|archive-url=https://web.archive.org/web/20220618205838/https://www.deepmind.com/blog/high-fidelity-speech-synthesis-with-wavenet|url-status=live}}</ref> Tacotron2, a neural network architecture for speech synthesis developed by [[Google AI]], was published in 2018 and required tens of hours of audio data to produce intelligible speech; when trained on 2 hours of speech, the model was able to produce intelligible speech with mediocre quality, and when trained on 36 minutes of speech, the model was unable to produce intelligible speech.<ref name="tacotron">{{cite web|url=https://google.github.io/tacotron/publications/semisupervised/index.html|title=Audio samples from "Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis"|date=2018-08-30|access-date=2022-06-05|archive-date=2020-11-11|archive-url=https://web.archive.org/web/20201111222714/https://google.github.io/tacotron/publications/semisupervised/index.html|url-status=live}}</ref><ref name="arxiv3">{{cite arXiv |eprint=1712.05884 |title=Natural TTS Synthesis by Conditioning WaveNet on Mel-Spectrogram Predictions |class=cs.CL |date=2018 |last1=Shen |first1=Jonathan |last2=Pang |first2=Ruoming |last3=Weiss |first3=Ron J. |last4=Schuster |first4=Mike |last5=Jaitly |first5=Navdeep |last6=Yang |first6=Zongheng |last7=Chen |first7=Zhifeng |last8=Zhang |first8=Yu |last9=Wang |first9=Yuxuan |last10=Skerry-Ryan |first10=RJ |last11=Saurous |first11=Rif A. |last12=Agiomyrgiannakis |first12=Yannis |last13=Wu |first13=Yonghui }}</ref>
In 2016, with the proposal of [[DeepMind]]'s [[WaveNet]], deep-learning-based models for speech synthesis began to gain popularity as a method of modeling waveforms and generating high-fidelity human-like speech.<ref name="arxiv1">{{cite arXiv |last=Hsu |first=Wei-Ning |eprint=1810.07217 |title=Hierarchical Generative Modeling for Controllable Speech Synthesis |class=cs.CL |date=2018 }}</ref><ref name="arxiv2">{{cite arXiv |last=Habib |first=Raza |eprint=1910.01709 |title=Semi-Supervised Generative Modeling for Controllable Speech Synthesis |class=cs.CL |date=2019 }}</ref><ref name="deepmind">{{cite web|url=https://www.deepmind.com/blog/high-fidelity-speech-synthesis-with-wavenet|title=High-fidelity speech synthesis with WaveNet|last1=van den Oord|first1=Aäron|last2=Li|first2=Yazhe|last3=Babuschkin|first3=Igor|date=2017-11-12|website=[[DeepMind]]|access-date=2022-06-05|archive-date=2022-06-18|archive-url=https://web.archive.org/web/20220618205838/https://www.deepmind.com/blog/high-fidelity-speech-synthesis-with-wavenet|url-status=live}}</ref> Tacotron2, a neural network architecture for speech synthesis developed by [[Google AI]], was published in 2018 and required tens of hours of audio data to produce intelligible speech; when trained on 2 hours of speech, the model was able to produce intelligible speech with mediocre quality, and when trained on 36 minutes of speech, the model was unable to produce intelligible speech.<ref name="tacotron">{{cite web|url=https://google.github.io/tacotron/publications/semisupervised/index.html|title=Audio samples from "Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis"|date=2018-08-30|access-date=2022-06-05|archive-date=2020-11-11|archive-url=https://web.archive.org/web/20201111222714/https://google.github.io/tacotron/publications/semisupervised/index.html|url-status=live}}</ref><ref name="arxiv3">{{cite arXiv |eprint=1712.05884 |title=Natural TTS Synthesis by Conditioning WaveNet on Mel-Spectrogram Predictions |class=cs.CL |date=2018 |last1=Shen |first1=Jonathan |last2=Pang |first2=Ruoming |last3=Weiss |first3=Ron J. |last4=Schuster |first4=Mike |last5=Jaitly |first5=Navdeep |last6=Yang |first6=Zongheng |last7=Chen |first7=Zhifeng |last8=Zhang |first8=Yu |last9=Wang |first9=Yuxuan |last10=Skerry-Ryan |first10=RJ |last11=Saurous |first11=Rif A. |last12=Agiomyrgiannakis |first12=Yannis |last13=Wu |first13=Yonghui }}</ref>


For years, reducing the amount of data required to train a realistic high-quality text-to-speech model has been a primary goal of scientific researchers in the field of deep learning speech synthesis.<ref>{{cite arXiv |last=Chung |first=Yu-An |eprint=1808.10128 |title=Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis |class=cs.CL |date=2018 }}</ref><ref>{{cite arXiv |last=Ren |first=Yi |eprint=1905.06791 |title=Almost Unsupervised Text to Speech and Automatic Speech Recognition |class=cs.CL |date=2019 }}</ref> The developer of 15.ai claims that as little as 15 seconds of data is sufficient to clone a voice up to human standards, a significant reduction in the amount of data required.<ref name="eurogamer"/>
For years, reducing the amount of data required to train a realistic high-quality text-to-speech model has been a primary goal of scientific researchers in the field of deep learning speech synthesis.<ref>{{cite arXiv |last=Chung |first=Yu-An |eprint=1808.10128 |title=Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis |class=cs.CL |date=2018 }}</ref><ref>{{cite arXiv |last=Ren |first=Yi |eprint=1905.06791 |title=Almost Unsupervised Text to Speech and Automatic Speech Recognition |class=cs.CL |date=2019 }}</ref> The developer of 15.ai claims that as little as 15 seconds of data is sufficient to clone a voice up to human standards, a significant reduction in the amount of data required.<ref name="eurogamer">{{cite web |last=Phillips |first=Tom |date=2022-01-17 |title=Troy Baker-backed NFT firm admits using voice lines taken from another service without permission |url=https://www.eurogamer.net/articles/2022-01-17-troy-baker-backed-nft-firm-admits-using-voice-lines-taken-from-another-service-without-permission |url-status=live |archive-url=https://web.archive.org/web/20220117164033/https://www.eurogamer.net/articles/2022-01-17-troy-baker-backed-nft-firm-admits-using-voice-lines-taken-from-another-service-without-permission |archive-date=2022-01-17 |access-date=2022-01-17 |website=[[Eurogamer]] |quote=}}</ref>

=== Copyrighted material in deep learning ===
{{Main|Artificial intelligence and copyright}}
[[Authors Guild, Inc. v. Google, Inc.|A landmark case]] between [[Google]] and the [[Authors Guild]] in 2013 ruled that [[Google Books]]—a service that searches the full text of printed copyrighted books—was [[Transformative use#Second Circuit—Authors Guild|transformative]], thus meeting all requirements for fair use.<ref>- F.2d – (2d Cir, 2015). (temporary cites: 2015 U.S. App. LEXIS 17988;
[https://salsa3.salsalabs.com/o/50260/images/AGvGoogle.pdf Slip opinion]{{Dead link|date=September 2024 |bot=InternetArchiveBot |fix-attempted=yes }} (October 16, 2015))</ref> This case set an important legal precedent for the field of deep learning and artificial intelligence: using copyrighted material to train a [[discriminative model]] or a ''non-commercial'' [[generative model]] was deemed legal. The legality of ''commercial'' generative models trained using copyrighted material is still under debate; due to the black-box nature of machine learning models, any allegations of copyright infringement via direct competition would be difficult to prove.<ref>{{cite journal |last1=Li |first1=Y. |last2=Li |first2=J. |title=Does Black-Box Machine Learning Shift the US Fair Use Doctrine? |year=2021 |journal=SSRN |url=https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3998805 |doi=10.2139/ssrn.3998805 |doi-broken-date=November 29, 2024 |ssrn=3998805 |access-date=November 18, 2024 |archive-date=January 25, 2022 |archive-url=https://web.archive.org/web/20220125142022/https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3998805 |url-status=live }}</ref>


== Development ==
== Development ==
Line 267: Line 261:
|date= 2022-05-12
|date= 2022-05-12
|website= [[Marginal Revolution (blog)]]
|website= [[Marginal Revolution (blog)]]
|publisher= [[Marginal Revolution (blog)]]
|access-date= 2024-11-27
|access-date= 2024-11-27
|url-status= live
|url-status= live
Line 274: Line 267:
}}</ref>
}}</ref>


Developing and running 15.ai cost several thousands of dollars per month, initially funded by the developer's personal finances after a successful [[startup]] exit.<ref name="play.ht"/> The algorithm used by the project was dubbed '''DeepThroat.'''<ref name="play.ht"/><ref name="15aiabout">{{cite web |last= |first= |date=2022-02-20 |title=15.ai – About |url=https://15.ai/about |url-status=dead |archive-url=https://archive.today/20211006074716/https://15.ai/about |archive-date=2021-10-06 |access-date=2022-02-20 |website=15.ai |publisher= |quote=}}</ref> The project and algorithm were conceived as part of MIT's [[Undergraduate Research Opportunities Program]], and had been in development since 2018.<ref name="thebatch"/><ref name="automaton"/><ref>{{cite web
Developing and running 15.ai cost several thousands of dollars per month, initially funded by the developer's personal finances after a successful [[startup]] exit.<ref name="play.ht"/> The algorithm used by the project was dubbed '''DeepThroat.'''<ref name="play.ht"/>The project and algorithm were conceived as part of MIT's [[Undergraduate Research Opportunities Program]], and had been in development since 2018.<ref name="thebatch"/><ref name="automaton"/><ref>{{cite web
|url=https://www.byteside.com/2021/01/15-ai-deepmoji-glados-spongebob-characters-ai-text-to-speech/
|url=https://www.byteside.com/2021/01/15-ai-deepmoji-glados-spongebob-characters-ai-text-to-speech/
|title=Make GLaDOS, SpongeBob and other friends say what you want with this AI text-to-speech tool
|title=Make GLaDOS, SpongeBob and other friends say what you want with this AI text-to-speech tool
Line 311: Line 304:
|quote= }}</ref> The ''Friendship Is Magic'' voices on 15.ai were trained on a large dataset [[crowdsource]]d by the project: audio and dialogue from the show and related media<ref name="play.ht"/>—including [[List of My Little Pony: Friendship Is Magic episodes|all nine seasons of ''Friendship Is Magic'']], [[My Little Pony: The Movie (2017 film)|the 2017 movie]], [[Pony Life|spinoffs]], [[data breach|leaks]], and various other content voiced by the same voice actors—were [[audio signal processing|parsed]], [[transcription (linguistics)|hand-transcribed]], and [[noise reduction|processed]] to remove background noise.
|quote= }}</ref> The ''Friendship Is Magic'' voices on 15.ai were trained on a large dataset [[crowdsource]]d by the project: audio and dialogue from the show and related media<ref name="play.ht"/>—including [[List of My Little Pony: Friendship Is Magic episodes|all nine seasons of ''Friendship Is Magic'']], [[My Little Pony: The Movie (2017 film)|the 2017 movie]], [[Pony Life|spinoffs]], [[data breach|leaks]], and various other content voiced by the same voice actors—were [[audio signal processing|parsed]], [[transcription (linguistics)|hand-transcribed]], and [[noise reduction|processed]] to remove background noise.


The first public release of 15.ai was unveiled in March 2020, with the service experiencing intermittent availability as the developer conducted ongoing [[research and development]] work.{{citation needed}} The tool gained heavy attention in [[mainstream media]] in early 2021, with multiple gaming news outlets covering its capabilities.<ref name="pcgamer"/><ref name="kotaku"/><ref name="gameinformer"/> 15.ai saw further attention in 2022 when it was discovered that a company that voice actor [[Troy Baker]] had partnered with plagiarized outputs from the tool.<ref name="nme"/> {{See below|Troy Baker / Voiceverse NFT plagiarism scandal}}
The first public release of 15.ai was unveiled in March 2020, with the service experiencing intermittent availability as the developer conducted ongoing [[research and development]] work.{{citation needed|date=December 2024}} The tool gained heavy attention in [[mainstream media]] in early 2021, with multiple gaming news outlets covering its capabilities.<ref name="pcgamer"/><ref name="kotaku"/><ref name="gameinformer"/> 15.ai saw further attention in 2022 when it was discovered that the Voiceverse NFT had used outputs from the tool.<ref name="nme"/>

In late 2022, 15.ai was taken offline. {{As of|November 2024}}, the website is still inactive.


== Reception ==
== Reception ==
15.ai was met with a largely positive reception from users and [[mainstream media]]. Liana Ruppert of ''[[Game Informer]]'' described it as "simplistically brilliant"<ref name="gameinformer"/> and José Villalobos of ''[[:es:LaPS4|LaPS4]]'' wrote that it "works as easy as it looks."<ref name="LaPS4"/>{{efn|Translated from original quote written in Spanish: ''"La dirección es 15.AI y funciona tan fácil como parece."''<ref name="LaPS4"/>}} Lauren Morton of ''[[Rock, Paper, Shotgun]]'' called the tool "fascinating,"<ref name="rockpapershotgun"/> and Yuki Kurosawa of ''[[:jp:AUTOMATON|AUTOMATON]]'' deemed it "revolutionary."<ref name="automaton"/>{{efn|Translated from original quote written in Japanese: ''"しかし15.aiが画期的なのは「データが30秒しかない文字でも、ほぼ100%の発音精度を達成できること」そして「ごくわずかなデータのみを使って、自然な感情のこもった音声を数百以上生成できること」だという。"''<ref name="automaton"/>}} Users praised the ability to easily create audio of popular characters that sound believable to those unaware they had been synthesized. Zack Zwiezen of ''[[Kotaku]]'' reported that "[his] girlfriend was convinced it was a new voice line from [[GLaDOS]]' voice actor, [[Ellen McLain]]".<ref name="kotaku"/> Natalie Clayton of ''[[PC Gamer]]'' wrote that "[[SpongeBob SquarePants]]' shrill, nasally voice works shockingly well".
15.ai was met with a largely positive reception from users and [[mainstream media]]. Liana Ruppert of ''[[Game Informer]]'' described it as "simplistically brilliant"<ref name="gameinformer"/> and José Villalobos of ''[[:es:LaPS4|LaPS4]]'' wrote that it "works as easy as it looks."<ref name="LaPS4"/>{{efn|Translated from original quote written in Spanish: ''"La dirección es 15.AI y funciona tan fácil como parece."''<ref name="LaPS4"/>}} Lauren Morton of ''[[Rock, Paper, Shotgun]]'' called the tool "fascinating,"<ref name="rockpapershotgun"/> and Yuki Kurosawa of ''[[:jp:AUTOMATON|AUTOMATON]]'' deemed it "revolutionary."<ref name="automaton"/>{{efn|Translated from original quote written in Japanese: ''"しかし15.aiが画期的なのは「データが30秒しかない文字でも、ほぼ100%の発音精度を達成できること」そして「ごくわずかなデータのみを使って、自然な感情のこもった音声を数百以上生成できること」だという。"''<ref name="automaton"/>}} Users praised the ability to easily create audio of popular characters that sound believable to those unaware they had been synthesized. Zack Zwiezen of ''[[Kotaku]]'' reported that "[his] girlfriend was convinced it was a new voice line from [[GLaDOS]]' voice actor, [[Ellen McLain]]".<ref name="kotaku"/> Natalie Clayton of ''[[PC Gamer]]'' wrote that "[[SpongeBob SquarePants]]' shrill, nasally voice works shockingly well".


The website's impact extended beyond English-speaking media. Furushima Yoshiyuki of ''[[Den Fami Nico Gamer]]'' wrote that "it's amazing that [character lines and skits] are all synthetically generated", and Eugenio Moto of ''[[Yahoo! Finance]]'' reported that "while the results are already exceptional, they can certainly get better."
The website's impact extended beyond English-speaking media. Yoshiyuki Furushima of ''[[Den Fami Nico Gamer]]'' wrote that "it's amazing that [character lines and skits] are all synthetically generated", and Eugenio Moto of ''[[Yahoo! Finance]]'' reported that "while the results are already exceptional, they can certainly get better."


== In popular culture ==
== In popular culture ==
Line 337: Line 328:
|url-status= live
|url-status= live
|archive-url= https://web.archive.org/web/20220521132423/https://www.equestriadaily.com/2022/05/full-simple-animated-episode-tax-breaks.html
|archive-url= https://web.archive.org/web/20220521132423/https://www.equestriadaily.com/2022/05/full-simple-animated-episode-tax-breaks.html
}}</ref><ref>{{Cite book |date=27 April 2014 |title=The Terribly Taxing Tribulations of Twilight Sparkle |url=https://www.fimfiction.net/story/185725 |url-status=live |archive-url=https://web.archive.org/web/20220630170105/https://www.fimfiction.net/story/185725 |archive-date=30 June 2022 |access-date=28 April 2022 |website=Fimfiction.net}}</ref>
}}</ref><ref>{{Cite web |date=27 April 2014 |title=The Terribly Taxing Tribulations of Twilight Sparkle |url=https://www.fimfiction.net/story/185725 |url-status=live |archive-url=https://web.archive.org/web/20220630170105/https://www.fimfiction.net/story/185725 |archive-date=30 June 2022 |access-date=28 April 2022 |website=Fimfiction.net}}</ref>


Viral videos from the ''Team Fortress 2'' fandom featuring voices from 15.ai include ''Spy is a [[furry fandom|Furry]]'' (which gained over 3 million views on YouTube across multiple videos<ref group="yt">{{cite web|url=https://www.youtube.com/watch?v=TAmhr6Was3E|title=SPY IS A FURRY|work=[[YouTube]]|date=January 17, 2021 |access-date=June 14, 2022|archive-date=June 13, 2022|archive-url=https://web.archive.org/web/20220613094918/https://www.youtube.com/watch?v=TAmhr6Was3E|url-status=live}}</ref><ref group="yt">{{cite web|url=https://www.youtube.com/watch?v=lwQn7ISVV_8|title=Spy is a Furry Animated|work=[[YouTube]]|access-date=June 14, 2022|archive-date=June 14, 2022|archive-url=https://web.archive.org/web/20220614203255/https://www.youtube.com/watch?v=lwQn7ISVV_8|url-status=live}}</ref><ref group="yt">{{cite web|url=https://www.youtube.com/watch?v=r0FLyW86owo|title=[SFM] – Spy's Confession – [TF2 15.ai]|work=[[YouTube]]|date=January 15, 2021 |access-date=June 14, 2022|archive-date=June 30, 2022|archive-url=https://web.archive.org/web/20220630170113/https://www.youtube.com/watch?v=r0FLyW86owo|url-status=live}}</ref>) and ''The RED Bread Bank'', both of which inspired [[Source Filmmaker]] animated video renditions.<ref name="automaton"/> Other fandoms used voices from 15.ai to produce viral videos. {{As of|July 2022}}, the viral video ''[[Among Us]] Struggles'' (with voices from ''Friendship Is Magic'') had over 5.5 million views on YouTube;<ref group="yt">{{cite web|url=https://www.youtube.com/watch?v=UPE3vnLY3TE|title=Among Us Struggles|work=[[YouTube]]|date=September 21, 2020 |access-date=July 15, 2022}}</ref> [[YouTubers]], [[TikTokers]], and [[Twitch (service)|Twitch]] streamers also used 15.ai for their videos, such as FitMC's video on the history of [[2b2t]]&mdash;one of the oldest running ''[[Minecraft]]'' servers&mdash;and datpon3's TikTok video featuring the main characters of ''Friendship Is Magic'', which have 1.4 million and 510 thousand views, respectively.<ref group="yt">{{cite web|url=https://www.youtube.com/watch?v=1V1O2gTdqHw|title=The UPDATED 2b2t Timeline (2010–2020)|work=[[YouTube]]|date=March 14, 2020 |access-date=June 14, 2022|archive-date=June 1, 2022|archive-url=https://web.archive.org/web/20220601085855/https://www.youtube.com/watch?v=1V1O2gTdqHw|url-status=live}}</ref><ref group="tt">{{cite web|url=https://www.tiktok.com/@datpon3/video/6813618431217241350|title=She said " 👹 "|work=[[TikTok]]|access-date=July 15, 2022|archive-date=February 21, 2022|archive-url=https://web.archive.org/web/20220221225053/https://www.tiktok.com/@datpon3/video/6813618431217241350|url-status=live}}</ref>
Viral videos from the ''Team Fortress 2'' fandom featuring voices from 15.ai include ''Spy is a [[furry fandom|Furry]]'' (which gained over 3 million views on YouTube across multiple videos<ref group="yt">{{cite web|url=https://www.youtube.com/watch?v=TAmhr6Was3E|title=SPY IS A FURRY|work=[[YouTube]]|date=January 17, 2021 |access-date=June 14, 2022|archive-date=June 13, 2022|archive-url=https://web.archive.org/web/20220613094918/https://www.youtube.com/watch?v=TAmhr6Was3E|url-status=live}}</ref><ref group="yt">{{cite web|url=https://www.youtube.com/watch?v=lwQn7ISVV_8|title=Spy is a Furry Animated|work=[[YouTube]]|access-date=June 14, 2022|archive-date=June 14, 2022|archive-url=https://web.archive.org/web/20220614203255/https://www.youtube.com/watch?v=lwQn7ISVV_8|url-status=live}}</ref><ref group="yt">{{cite web|url=https://www.youtube.com/watch?v=r0FLyW86owo|title=[SFM] – Spy's Confession – [TF2 15.ai]|work=[[YouTube]]|date=January 15, 2021 |access-date=June 14, 2022|archive-date=June 30, 2022|archive-url=https://web.archive.org/web/20220630170113/https://www.youtube.com/watch?v=r0FLyW86owo|url-status=live}}</ref>) and ''The RED Bread Bank'', both of which inspired [[Source Filmmaker]] animated video renditions.<ref name="automaton"/> Other fandoms used voices from 15.ai to produce viral videos. {{As of|July 2022}}, the viral video ''[[Among Us]] Struggles'' (with voices from ''Friendship Is Magic'') had over 5.5 million views on YouTube;<ref group="yt">{{cite web|url=https://www.youtube.com/watch?v=UPE3vnLY3TE|title=Among Us Struggles|work=[[YouTube]]|date=September 21, 2020 |access-date=July 15, 2022}}</ref> [[YouTubers]], [[TikTokers]], and [[Twitch (service)|Twitch]] streamers also used 15.ai for their videos, such as FitMC's video on the history of [[2b2t]]&mdash;one of the oldest running ''[[Minecraft]]'' servers&mdash;and datpon3's TikTok video featuring the main characters of ''Friendship Is Magic'', which have 1.4 million and 510 thousand views, respectively.<ref group="yt">{{cite web|url=https://www.youtube.com/watch?v=1V1O2gTdqHw|title=The UPDATED 2b2t Timeline (2010–2020)|work=[[YouTube]]|date=March 14, 2020 |access-date=June 14, 2022|archive-date=June 1, 2022|archive-url=https://web.archive.org/web/20220601085855/https://www.youtube.com/watch?v=1V1O2gTdqHw|url-status=live}}</ref><ref group="tt">{{cite web|url=https://www.tiktok.com/@datpon3/video/6813618431217241350|title=She said " 👹 "|work=[[TikTok]]|access-date=July 15, 2022|archive-date=February 21, 2022|archive-url=https://web.archive.org/web/20220221225053/https://www.tiktok.com/@datpon3/video/6813618431217241350|url-status=live}}</ref>


Some users created AI [[virtual assistant]]s using 15.ai and external voice control software. One user on Twitter created a personal desktop assistant inspired by [[GLaDOS]] using 15.ai-generated dialogue in tandem with voice control system VoiceAttack.<ref name="automaton"/><ref name="Denfaminicogamer"/>
Some users created AI [[virtual assistant]]s using 15.ai and external voice control software. One user on Twitter created a personal desktop assistant inspired by [[GLaDOS]] using 15.ai-generated dialogue in tandem with voice control system VoiceAttack.<ref name="automaton"/><ref name="Denfaminicogamer"/>

=== Troy Baker / Voiceverse NFT plagiarism scandal ===
{{Main|Troy Baker#Partnership scandal}}
On January 14, 2022, it was discovered that Voiceverse NFT, a company that video game and [[anime]] [[dubbing|dub]] [[voice actor]] [[Troy Baker]] announced his partnership with, had plagiarized voice lines generated from 15.ai as part of their marketing campaign.<ref name="nme"/><ref name="stevivor"/> [[logging (software)|Log files]] showed that Voiceverse had generated audio of characters from ''[[My Little Pony: Friendship Is Magic]]'' using 15.ai, pitched them up to make them sound unrecognizable from the original voices to market their own platform—in violation of 15.ai's terms of service.<ref name="eurogamer">{{cite web
|url= https://www.eurogamer.net/articles/2022-01-17-troy-baker-backed-nft-firm-admits-using-voice-lines-taken-from-another-service-without-permission
|title= Troy Baker-backed NFT firm admits using voice lines taken from another service without permission
|last= Phillips
|first= Tom
|date= 2022-01-17
|website= [[Eurogamer]]
|access-date= 2022-01-17
|quote=
|archive-date= 2022-01-17
|archive-url= https://web.archive.org/web/20220117164033/https://www.eurogamer.net/articles/2022-01-17-troy-baker-backed-nft-firm-admits-using-voice-lines-taken-from-another-service-without-permission
|url-status= live
}}</ref><ref name="wccftech">{{cite web
|url= https://wccftech.com/voiceverse-nft-service-uses-stolen-technology-from-15ai/
|title= Troy Baker-backed NFT firm admits using voice lines taken from another service without permission
|last= Lopez
|first= Ule
|date= 2022-01-16
|website= Wccftech
|access-date= 2022-06-07
|url-status= live
|archive-date= 2022-01-16
|archive-url= https://web.archive.org/web/20220116194519/https://wccftech.com/voiceverse-nft-service-uses-stolen-technology-from-15ai/
}}</ref> Voiceverse claimed that someone in their marketing team used the voice without properly crediting 15.ai, and in response, 15 tweeted "Go fuck yourself."<ref name="nme" /><ref name="stevivor"/><ref name="eurogamer"/><ref group="tweet">{{Cite tweet |user=fifteenai |number=1482088782765576192|date = January 14, 2022 |title=Go fuck yourself.}}</ref>

== Legacy ==
===Impact on voice cloning technology===
15.ai introduced several technical innovations in [[voice cloning]].<ref name="automaton"/> While traditional text-to-speech systems like [[Google]]'s Tacotron2 required tens of hours of audio data to produce intelligible speech in 2017,<ref name="tacotron"/><ref name="arxiv3"/> 15.ai claimed to achieve high-quality voice cloning with as little as 15 seconds of training data.<ref name="eurogamer"/><ref name="play.ht"/> This reduction in required training data represented a breakthrough in the field of speech synthesis.<ref name="hashdork"/><ref name="play.ht"/>

The project also introduced the concept of "emotional contextualizers" for controlling speech emotion through [[sentiment analysis]].<ref name="automaton"/><ref name="Denfaminicogamer"/><ref name="hashdork"/>

===Reactions from voice actors and pundits===
[[File:Andrew Ng at TechCrunch Disrupt SF 2017.jpg|thumb|[[Andrew Ng]] in 2017]]
Some voice actors have publicly decried the use of voice cloning technology. Cited reasons include concerns about [[copyright infringement]], [[right to privacy]], [[Deepfake#Concerns|impersonation and fraud]], unauthorized use of an actor's voice in [[Deepfake pornography|pornography]] or [[explicit content]], and the potential of [[technological unemployment|AI being used to make voice actors obsolete]].<ref name="elevenlabs"/><ref name="wccftech"/><ref name="play.ht"/><ref name="hashdork"/>

In his 2020 assessment of 15.ai in [[artificial intelligence]] [[newsletter]] ''[[DeepLearning.AI#The Batch|The Batch]]'', computer scientist [[Andrew Ng]] wrote:
{{Quote|"Voice cloning could be enormously productive. In [[Cinema of the United States|Hollywood]], it could revolutionize the use of virtual actors. In cartoons and audiobooks, it could enable voice actors to participate in many more productions. In online education, kids might pay more attention to lessons delivered by the voices of favorite personalities. And how many YouTube how-to video producers would love to have a synthetic [[Morgan Freeman]] narrate their scripts?<ref name="thebatch"/>}}

However, he also wrote:

{{Quote|"...but synthesizing a human actor's voice without consent is arguably unethical and possibly illegal. And this technology will be catnip for deepfakers, who could scrape recordings from [[social networking service|social network]]s to impersonate private individuals."<ref name="thebatch"/>}}


== See also ==
== See also ==
Line 406: Line 353:
;Notes
;Notes
{{reflist}}
{{reflist}}
;Tweets
{{reflist|group=tweet|35em}}
;YouTube (referenced for view counts and usage of 15.ai only)
;YouTube (referenced for view counts and usage of 15.ai only)
{{reflist|group=yt|35em}}
{{reflist|group=yt|35em}}
Line 414: Line 359:


==External links==
==External links==
* [https://ghostarchive.org/archive/iA306 Archived frontend]
* {{Official website|15.ai}}
* {{Twitter | id= fifteenai | name= 15 }}
* {{Twitter | id= fifteenai | name= 15 }}
* [https://www.youtube.com/watch?v=QLGlrY7cooY ''The Tax Breaks (Twilight) (15.ai)'']


{{Differentiable computing}}
{{Differentiable computing}}

Latest revision as of 08:47, 17 December 2024

15.ai
Type of site
Artificial intelligence, speech synthesis, machine learning, deep learning
Available inEnglish
Founder(s)15
CommercialNo
RegistrationNone
LaunchedInitial release: March 12, 2020; 4 years ago (2020-03-12)
Last stable release: v24.2.1

15.ai was a free to use artificial intelligence web application that generated text-to-speech voices from fictional characters from various media sources.[1][2][3][4] Created by a pseudonymous developer under the alias 15,[5][6][7][8] the project used a combination of audio synthesis algorithms, speech synthesis deep neural networks, and sentiment analysis models to generate emotive character voices.[9][10]

In early 2020, 15.ai appeared online as a proof of concept of the democratization of voice acting and dubbing.[8][11] Its gratis nature, ease of use without user accounts, and improvements over existing text-to-speech implementations made it popular.[2][1][3] Some critics and voice actors questioned the legality and ethicality of making such technology so readily accessible.[12]

The site was embraced by Internet fandoms such as My Little Pony, Team Fortress 2, and SpongeBob SquarePants.[5][13][8]

Several commercial alternatives appeared in the following years.[6][7] In January 2022, the company Voiceverse NFT plagiarized 15.ai's work as part of their platform.[14][15]

The ethical implications of voice cloning (also known as audio deepfakes) in content creation led to a re-evaluation of the service by the developer, with concerns being raised regarding copyright and the unauthorized use of character voices.[8] In September 2022, a year after its last stable release, 15.ai was taken offline.[6]

Features

HAL 9000, known for his sinister robotic voice, was one of the available characters on 15.ai.[1]

The platform required no user registration or account creation to generate voices.[16][17][7][8] Users could generate speech by entering text and selecting a character voice (optionally specifying an emotional contextualizer and/or phonetic transcriptions), with the system producing three variations of the audio with different emotional deliveries.[9] The platform operated completely free of charge, though the developer reported spending thousands of dollars monthly to maintain the service.[8]

Available characters included GLaDOS and Wheatley from Portal, characters from Team Fortress 2, Twilight Sparkle and other characters from My Little Pony: Friendship Is Magic, SpongeBob, Daria Morgendorffer and Jane Lane from Daria, the Tenth Doctor Who, HAL 9000 from 2001: A Space Odyssey, the Narrator from The Stanley Parable, Carl Brutananadilewski from Aqua Teen Hunger Force, Steven Universe, Dan from Dan Vs., and Sans from Undertale.[16][17]

The nondeterministic nature of the deep learning model ensured that each generation would have slightly different intonations, similar to multiple takes from a voice actor.[9][5] The application supported manually altering the emotion of a generated line using emotional contextualizers (a term coined by this project), a sentence or phrase conveying the emotion of the take that serves as a guide for the model during inference.[5][13] Emotional contextualizers were representations of the emotional content of a sentence deduced via transfer learned emoji embeddings using DeepMoji, a deep neural network sentiment analysis algorithm developed by the MIT Media Lab in 2017.[18][19] DeepMoji was trained on 1.2 billion emoji occurrences in Twitter data from 2013 to 2017, and outperformed human subjects in correctly identifying sarcasm in Tweets and other online modes of communication.[20][21][22]

15.ai used a multi-speaker model—hundreds of voices were trained concurrently rather than sequentially, decreasing the required training time and enabling the model to learn and generalize shared emotional context, even for voices with no exposure to that context.[23] Consequently, the characters in the application were powered by a single trained model, as opposed to multiple single-speaker models.[24] The lexicon used by 15.ai was scraped from a variety of Internet sources, including Oxford Dictionaries, Wiktionary, the CMU Pronouncing Dictionary, 4chan, Reddit, and Twitter. Pronunciations of unfamiliar words were automatically deduced using phonological rules learned by the deep learning model.[5]

The application supported a simplified phonetic transcription known as ARPABET, to correct mispronunciations and account for heteronyms—words that are spelled the same but are pronounced differently (such as the word read, which can be pronounced as either /ˈrɛd/ or /ˈrd/ depending on its tense). It followed the CMU Pronouncing Dictionary's ARPABET conventions.[5]

Background

Artificial intelligence in speech synthesis

A stack of dilated casual convolutional layers used in DeepMind's WaveNet.[25]

In 2016, with the proposal of DeepMind's WaveNet, deep-learning-based models for speech synthesis began to gain popularity as a method of modeling waveforms and generating high-fidelity human-like speech.[26][27][25] Tacotron2, a neural network architecture for speech synthesis developed by Google AI, was published in 2018 and required tens of hours of audio data to produce intelligible speech; when trained on 2 hours of speech, the model was able to produce intelligible speech with mediocre quality, and when trained on 36 minutes of speech, the model was unable to produce intelligible speech.[28][29]

For years, reducing the amount of data required to train a realistic high-quality text-to-speech model has been a primary goal of scientific researchers in the field of deep learning speech synthesis.[30][31] The developer of 15.ai claims that as little as 15 seconds of data is sufficient to clone a voice up to human standards, a significant reduction in the amount of data required.[32]

Development

15.ai was designed and created by an anonymous research scientist known by the alias 15.[5][6][7] In his blog Marginal Revolution, economist Tyler Cowen cited the developer of 15.ai as an example of underrated talent in AI.[33]

Developing and running 15.ai cost several thousands of dollars per month, initially funded by the developer's personal finances after a successful startup exit.[8] The algorithm used by the project was dubbed DeepThroat.[8]The project and algorithm were conceived as part of MIT's Undergraduate Research Opportunities Program, and had been in development since 2018.[11][5][34] The model used by 15.ai was inspired by a 2019 paper that introduced transfer learning to text-to-speech models.[11][35]

The Pony Preservation Project from 4chan's /mlp/ board has been integral to the development of 15.ai.[36]

The developer also worked closely with the Pony Preservation Project from /mlp/, the My Little Pony board of 4chan.[8] This project was a "collaborative effort by /mlp/ to build and curate pony datasets" with the aim of creating applications in artificial intelligence.[37][38] The Friendship Is Magic voices on 15.ai were trained on a large dataset crowdsourced by the project: audio and dialogue from the show and related media[8]—including all nine seasons of Friendship Is Magic, the 2017 movie, spinoffs, leaks, and various other content voiced by the same voice actors—were parsed, hand-transcribed, and processed to remove background noise.

The first public release of 15.ai was unveiled in March 2020, with the service experiencing intermittent availability as the developer conducted ongoing research and development work.[citation needed] The tool gained heavy attention in mainstream media in early 2021, with multiple gaming news outlets covering its capabilities.[3][1][2] 15.ai saw further attention in 2022 when it was discovered that the Voiceverse NFT had used outputs from the tool.[14]

Reception

15.ai was met with a largely positive reception from users and mainstream media. Liana Ruppert of Game Informer described it as "simplistically brilliant"[2] and José Villalobos of LaPS4 wrote that it "works as easy as it looks."[16][a] Lauren Morton of Rock, Paper, Shotgun called the tool "fascinating,"[4] and Yuki Kurosawa of AUTOMATON deemed it "revolutionary."[5][b] Users praised the ability to easily create audio of popular characters that sound believable to those unaware they had been synthesized. Zack Zwiezen of Kotaku reported that "[his] girlfriend was convinced it was a new voice line from GLaDOS' voice actor, Ellen McLain".[1] Natalie Clayton of PC Gamer wrote that "SpongeBob SquarePants' shrill, nasally voice works shockingly well".

The website's impact extended beyond English-speaking media. Yoshiyuki Furushima of Den Fami Nico Gamer wrote that "it's amazing that [character lines and skits] are all synthetically generated", and Eugenio Moto of Yahoo! Finance reported that "while the results are already exceptional, they can certainly get better."

Fandom content creation

15.ai was frequently used for content creation in various fandoms, including the My Little Pony: Friendship Is Magic fandom, the Team Fortress 2 fandom, the Portal fandom, and the SpongeBob SquarePants fandom, with numerous videos and projects containing speech from 15.ai having gone viral.[1][2] The platform is credited as the impetus behind the popularization of AI voice cloning in content creation, demonstrating the potential for accessible, high-quality voice synthesis technology.[8]

The My Little Pony: Friendship Is Magic fandom saw a resurgence in video and musical content creation as a result, inspiring a new genre of fan-created content assisted by artificial intelligence. Some fanfictions weren adapted into fully voiced "episodes": The Tax Breaks is a 17-minute long animated video rendition of a fan-written story published in 2014 that uses voices generated from 15.ai with sound effects and audio editing, emulating the episodic style of the early seasons of Friendship Is Magic.[39][40]

Viral videos from the Team Fortress 2 fandom featuring voices from 15.ai include Spy is a Furry (which gained over 3 million views on YouTube across multiple videos[yt 1][yt 2][yt 3]) and The RED Bread Bank, both of which inspired Source Filmmaker animated video renditions.[5] Other fandoms used voices from 15.ai to produce viral videos. As of July 2022, the viral video Among Us Struggles (with voices from Friendship Is Magic) had over 5.5 million views on YouTube;[yt 4] YouTubers, TikTokers, and Twitch streamers also used 15.ai for their videos, such as FitMC's video on the history of 2b2t—one of the oldest running Minecraft servers—and datpon3's TikTok video featuring the main characters of Friendship Is Magic, which have 1.4 million and 510 thousand views, respectively.[yt 5][tt 1]

Some users created AI virtual assistants using 15.ai and external voice control software. One user on Twitter created a personal desktop assistant inspired by GLaDOS using 15.ai-generated dialogue in tandem with voice control system VoiceAttack.[5][13]

See also

Notes

  1. ^ Translated from original quote written in Spanish: "La dirección es 15.AI y funciona tan fácil como parece."[16]
  2. ^ Translated from original quote written in Japanese: "しかし15.aiが画期的なのは「データが30秒しかない文字でも、ほぼ100%の発音精度を達成できること」そして「ごくわずかなデータのみを使って、自然な感情のこもった音声を数百以上生成できること」だという。"[5]

References

Notes
  1. ^ a b c d e f Zwiezen, Zack (January 18, 2021). "Website Lets You Make GLaDOS Say Whatever You Want". Kotaku. Archived from the original on January 17, 2021. Retrieved January 18, 2021.
  2. ^ a b c d e Ruppert, Liana (January 18, 2021). "Make Portal's GLaDOS And Other Beloved Characters Say The Weirdest Things With This App". Game Informer. Archived from the original on January 18, 2021. Retrieved January 18, 2021.
  3. ^ a b c Clayton, Natalie (January 19, 2021). "Make the cast of TF2 recite old memes with this AI text-to-speech tool". PC Gamer. Archived from the original on January 19, 2021. Retrieved January 19, 2021.
  4. ^ a b Morton, Lauren (January 18, 2021). "Put words in game characters' mouths with this fascinating text to speech tool". Rock, Paper, Shotgun. Archived from the original on January 18, 2021. Retrieved January 18, 2021.
  5. ^ a b c d e f g h i j k l Kurosawa, Yuki (January 19, 2021). "ゲームキャラ音声読み上げソフト「15.ai」公開中。『Undertale』や『Portal』のキャラに好きなセリフを言ってもらえる". AUTOMATON. Archived from the original on January 19, 2021. Retrieved January 19, 2021.
  6. ^ a b c d "15.AI: Everything You Need to Know & Best Alternatives". ElevenLabs. February 7, 2024. Archived from the original on July 15, 2024. Retrieved November 18, 2024.
  7. ^ a b c d "Free 15.ai Character Voice Cloning and Alternatives". Resemble.ai. October 17, 2024. Retrieved November 18, 2024.
  8. ^ a b c d e f g h i j k "Everything You Need to Know About 15.ai: The AI Voice Generator". Play.ht. September 12, 2024. Retrieved November 18, 2024.
  9. ^ a b c "15.ai – Natural and Emotional Text-to-Speech Using Neural Networks". Hashdork. May 15, 2024. Archived from the original on July 4, 2024. Retrieved November 18, 2024.
  10. ^ "Demystifying 15.ai: How AI Generates Ultra-Realistic Text-to-Speech Voices". TheLinuxCode. December 27, 2023. Archived from the original on December 27, 2023. Retrieved November 18, 2024.
  11. ^ a b c Ng, Andrew (April 1, 2020). "Voice Cloning for the Masses". DeepLearning.AI. Archived from the original on August 7, 2020. Retrieved April 5, 2020.
  12. ^ Lopez, Ule (January 16, 2022). "Troy Baker-backed NFT firm admits using voice lines taken from another service without permission". Wccftech. Archived from the original on January 16, 2022. Retrieved June 7, 2022.
  13. ^ a b c Yoshiyuki, Furushima (January 18, 2021). "『Portal』のGLaDOSや『UNDERTALE』のサンズがテキストを読み上げてくれる。文章に込められた感情まで再現することを目指すサービス「15.ai」が話題に". Denfaminicogamer. Archived from the original on January 18, 2021. Retrieved January 18, 2021.
  14. ^ a b Williams, Demi (January 18, 2022). "Voiceverse NFT admits to taking voice lines from non-commercial service". NME. Archived from the original on January 18, 2022. Retrieved January 18, 2022.
  15. ^ Wright, Steve (January 17, 2022). "Troy Baker-backed NFT company admits to using content without permission". Stevivor. Archived from the original on January 17, 2022. Retrieved January 17, 2022.
  16. ^ a b c d Villalobos, José (January 18, 2021). "Descubre 15.AI, un sitio web en el que podrás hacer que GlaDOS diga lo que quieras". LaPS4. Archived from the original on January 18, 2021. Retrieved January 18, 2021.
  17. ^ a b Moto, Eugenio (January 20, 2021). "15.ai, el sitio que te permite usar voces de personajes populares para que digan lo que quieras". Yahoo! Finance. Archived from the original on March 8, 2022. Retrieved January 20, 2021.
  18. ^ Felbo, Bjarke (2017). "Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm". Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. pp. 1615–1625. arXiv:1708.00524. doi:10.18653/v1/D17-1169. S2CID 2493033.
  19. ^ Corfield, Gareth (August 7, 2017). "A sarcasm detector bot? That sounds absolutely brilliant. Definitely". The Register. Archived from the original on June 2, 2022. Retrieved June 2, 2022.
  20. ^ "An Algorithm Trained on Emoji Knows When You're Being Sarcastic on Twitter". MIT Technology Review. August 3, 2017. Archived from the original on June 2, 2022. Retrieved June 2, 2022.
  21. ^ "Emojis help software spot emotion and sarcasm". BBC. August 7, 2017. Archived from the original on June 2, 2022. Retrieved June 2, 2022.
  22. ^ Lowe, Josh (August 7, 2017). "Emoji-Filled Mean Tweets Help Scientists Create Sarcasm-Detecting Bot That Could Uncover Hate Speech". Newsweek. Archived from the original on June 2, 2022. Retrieved June 2, 2022.
  23. ^ Valle, Rafael (2020). "Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokens". arXiv:1910.11997 [eess].
  24. ^ Cooper, Erica (2020). "Zero-Shot Multi-Speaker Text-To-Speech with State-of-the-art Neural Speaker Embeddings". arXiv:1910.10838 [eess].
  25. ^ a b van den Oord, Aäron; Li, Yazhe; Babuschkin, Igor (November 12, 2017). "High-fidelity speech synthesis with WaveNet". DeepMind. Archived from the original on June 18, 2022. Retrieved June 5, 2022.
  26. ^ Hsu, Wei-Ning (2018). "Hierarchical Generative Modeling for Controllable Speech Synthesis". arXiv:1810.07217 [cs.CL].
  27. ^ Habib, Raza (2019). "Semi-Supervised Generative Modeling for Controllable Speech Synthesis". arXiv:1910.01709 [cs.CL].
  28. ^ "Audio samples from "Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis"". August 30, 2018. Archived from the original on November 11, 2020. Retrieved June 5, 2022.
  29. ^ Shen, Jonathan; Pang, Ruoming; Weiss, Ron J.; Schuster, Mike; Jaitly, Navdeep; Yang, Zongheng; Chen, Zhifeng; Zhang, Yu; Wang, Yuxuan; Skerry-Ryan, RJ; Saurous, Rif A.; Agiomyrgiannakis, Yannis; Wu, Yonghui (2018). "Natural TTS Synthesis by Conditioning WaveNet on Mel-Spectrogram Predictions". arXiv:1712.05884 [cs.CL].
  30. ^ Chung, Yu-An (2018). "Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis". arXiv:1808.10128 [cs.CL].
  31. ^ Ren, Yi (2019). "Almost Unsupervised Text to Speech and Automatic Speech Recognition". arXiv:1905.06791 [cs.CL].
  32. ^ Phillips, Tom (January 17, 2022). "Troy Baker-backed NFT firm admits using voice lines taken from another service without permission". Eurogamer. Archived from the original on January 17, 2022. Retrieved January 17, 2022.
  33. ^ Cowen, Tyler (May 12, 2022). "The most underrated talent in AI?". Marginal Revolution (blog). Archived from the original on June 19, 2022. Retrieved November 27, 2024.
  34. ^ Button, Chris (January 19, 2021). "Make GLaDOS, SpongeBob and other friends say what you want with this AI text-to-speech tool". Byteside. Archived from the original on June 25, 2024. Retrieved November 18, 2024.
  35. ^ Jia, Ye (2019). 1806.04558. arXiv:1806.04558.
  36. ^ Branwen, Gwern (March 6, 2020). ""15.ai"⁠, 15, Pony Preservation Project". Gwern.net. Gwern. Archived from the original on March 18, 2022. Retrieved June 17, 2022.
  37. ^ Scotellaro, Shaun (March 14, 2020). "Neat "Pony Preservation Project" Using Neural Networks to Create Pony Voices". Equestria Daily. Archived from the original on June 23, 2021. Retrieved June 11, 2022.
  38. ^ "Pony Preservation Project (Thread 108)". 4chan. Desuarchive. February 20, 2022. Retrieved February 20, 2022.
  39. ^ Scotellaro, Shaun (May 15, 2022). "Full Simple Animated Episode – The Tax Breaks (Twilight)". Equestria Daily. Archived from the original on May 21, 2022. Retrieved May 28, 2022.
  40. ^ "The Terribly Taxing Tribulations of Twilight Sparkle". Fimfiction.net. April 27, 2014. Archived from the original on June 30, 2022. Retrieved April 28, 2022.
YouTube (referenced for view counts and usage of 15.ai only)
  1. ^ "SPY IS A FURRY". YouTube. January 17, 2021. Archived from the original on June 13, 2022. Retrieved June 14, 2022.
  2. ^ "Spy is a Furry Animated". YouTube. Archived from the original on June 14, 2022. Retrieved June 14, 2022.
  3. ^ "[SFM] – Spy's Confession – [TF2 15.ai]". YouTube. January 15, 2021. Archived from the original on June 30, 2022. Retrieved June 14, 2022.
  4. ^ "Among Us Struggles". YouTube. September 21, 2020. Retrieved July 15, 2022.
  5. ^ "The UPDATED 2b2t Timeline (2010–2020)". YouTube. March 14, 2020. Archived from the original on June 1, 2022. Retrieved June 14, 2022.
TikTok
  1. ^ "She said " 👹 "". TikTok. Archived from the original on February 21, 2022. Retrieved July 15, 2022.