Jump to content

Draft:SynthesizerV: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Mooseee66 (talk | contribs)
Submitting using AfC-submit-wizard
Citation bot (talk | contribs)
Add: website, date, title. Changed bare reference to CS1/2. | Use this bot. Report bugs. | Suggested by Лисан аль-Гаиб | #UCB_webform 272/802
 
(7 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{AFC submission|d|v|u=Mooseee66|ns=118|decliner=QEnigma|declinets=20250104190629|ts=20250104180902}} <!-- Do not remove this line! -->

{{AFC comment|1=In addition to the comments above, this article fails to meet the [[Wikipedia:Writing better articles#Information style and tone|encyclopaedic]] writing standards and structure required. A possibly related draft ([[Draft:Synthesizer V]]) also exhibits similar issues. [[User:QEnigma|<b style="font-family:Arial;font-size:1.0em;color:#0057B7"> QEnigma</b>]] <span style="border-radius:8em;padding:2px 5px;background:#FF8C00;font-size:0.75em">[[User talk:QEnigma|<span style="color:#FFF">'''talk'''</span>]]</span> 19:17, 4 January 2025 (UTC)}}

----

{{Short description|Singing voice synthesizer software}}
{{Short description|Singing voice synthesizer software}}
{{Draft topics|media|east-asia|stem}}
{{Draft topics|media|east-asia|stem}}
{{AfC topic|product}}
{{AfC topic|product}}
{{AfC submission|||ts=20250104180902|u=Mooseee66|ns=118}}
{{AfC submission/draft}}


Synthesizer V, also referred to as SynthV is a [[Speech synthesis|vocal synthesis engine]] developed by Dreamtonics. A Technical Preview was announced August 8, 2018 and distribution began on August 19, 2018.<ref>https://twitter.com/khuasw/status/1027354095227523072?s=20</ref>
Synthesizer V, also referred to as SynthV is a [[Speech synthesis|vocal synthesis engine]] developed by Dreamtonics. A Technical Preview was announced August 8, 2018 and distribution began on August 19, 2018.<ref>{{cite web | url=https://mixdownmag.com.au/reviews/review-dreamtonics-synthesiser-v-studio-pro/ | title=Review: Dreamtonics Synthesizer V Studio Pro | date=25 July 2024 }}</ref>
<ref>https://resource.dreamtonics.com/download/|title=index - powered by h5ai v0.29.2 (https://larsjung.de/h5ai/)|website=resource.dreamtonics.com|accessdate=2021-01-10|archive-date=2021-01-12|archive-url=https://web.archive.org/web/20210112022759/https://resource.dreamtonics.com/download/</ref>.<ref>https://twitter.com/khuasw/status/1027354095227523072?s=20</ref>
The full Production Release officially released on December 24, 2018.<ref>https://twitter.com/khuasw/status/1077189890180149248?s=20</ref>
The full Production Release officially released on December 24, 2018.<ref>https://twitter.com/khuasw/status/1077189890180149248?s=20</ref>
'''Synthesizer V Studio''' was unveiled June 25, 2020 in a press release alongside voice databases Kotonoha Akane & Aoi and Saki.<ref>https://twitter.com/ahsoft/status/1276351253073629185</ref> [[AI]] support for Synthesizer V AI was released for Synthesizer V Studio as an update on December 25 alongside an update for Saki known as Saki AI.
'''Synthesizer V Studio''' was unveiled June 25, 2020 in a press release alongside voice databases Kotonoha Akane & Aoi and Saki.<ref>https://twitter.com/ahsoft/status/1276351253073629185</ref> [[AI]] support for Synthesizer V AI was released for Synthesizer V Studio as an update on December 25 alongside an update for Saki known as Saki AI.
{{Infobox software
{{Infobox software
Line 28: Line 33:
| website = {{URL|https://dreamtonics.com/synthesizerv/}}
| website = {{URL|https://dreamtonics.com/synthesizerv/}}
}}
}}
Voice databases that were not recorded with the AI method are known as "Standard voice databases", which are generally recorded at the Dreamtonics studio in Tokyo.<ref> https://twitter.com/OfficialVolor/status/1392293297847037954</ref><ref>https://www.youtube.com/watch?v=ET6Kdw15L_8</ref>
Voice databases that were not recorded with the AI method are known as "Standard voice databases", which are generally recorded at the Dreamtonics studio in Tokyo.<ref> https://twitter.com/OfficialVolor/status/1392293297847037954</ref><ref>{{cite web | url=https://www.youtube.com/watch?v=ET6Kdw15L_8 | title=Q&A Livestream 1 - Synthesizer V SOLARIS project &#124; Eclipsed Sounds | website=[[YouTube]] | date=22 May 2021 }}</ref><ref name=":2" />


==History==
==History==
Kanru Hua stated on Twitter, the first line of code, ''the proto-proto libllsm'' (written in March 2015), eventually became part of Synthesizer V.<ref>https://twitter.com/khuasw/status/1112006856283570177?s=20</ref>
Kanru Hua stated on Twitter, the first line of code, ''the proto-proto libllsm'' (written in March 2015), eventually became part of Synthesizer V.<ref>{{Cite web |url=https://dreamtonics.com/synthesizerv/ |title=Synthesizer V 官方網站 |date=15 April 2020 |access-date=2021-04-11 |archive-date=2021-05-31 |archive-url=https://web.archive.org/web/20210531143432/https://dreamtonics.com/synthesizerv/}}</ref><ref>https://twitter.com/khuasw/status/1112006856283570177?s=20</ref>


Development for Synthesizer V began in 2017. Kanru Hua released a demo<ref>https://soundcloud.com/kanru-hua/synthesizer-v-demo</ref> using three vocals known tentatively at the time as ''ENG-F1'', ''JA-F1'' & ''MAN-M1''. Synthesizer V made its first official debut on December 1, 2017.<ref>https://twitter.com/khuasw/status/936463709203062784?s=20</ref>
Development for Synthesizer V began in 2017. Kanru Hua released a demo<ref>{{cite web | url=https://soundcloud.com/kanru-hua/synthesizer-v-demo | title=SoundCloud - Hear the world's sounds }}</ref> using three vocals known tentatively at the time as ''ENG-F1'', ''JA-F1'' & ''MAN-M1''. Synthesizer V made its first official debut on December 1, 2017.<ref>https://twitter.com/khuasw/status/936463709203062784?s=20</ref>


In February 2018, Kanru Hua posted a listening test to recieve feedback on a new singing pitch model for Synthesizer V. <ref>https://twitter.com/khuasw/status/969065287155888128?s=20</ref>
In February 2018, Kanru Hua posted a listening test to recieve feedback on a new singing pitch model for Synthesizer V.<ref>https://twitter.com/khuasw/status/969065287155888128?s=20</ref>
In August 2018, Kanru Hua released a "Technical Preview" version of Synthesizer V<ref>https://twitter.com/khuasw/status/1027354095227523072?s=20</ref>
In August 2018, Kanru Hua released a "Technical Preview" version of Synthesizer V<ref>https://twitter.com/khuasw/status/1027354095227523072?s=20</ref>


On August 20, 2018, Kanru Hua released a survey asking for user feedback on the Technical Preview to be used for future improvements.<ref>https://twitter.com/khuasw/status/1031446551732707328?s=20</ref>
On August 20, 2018, Kanru Hua released a survey asking for user feedback on the Technical Preview to be used for future improvements.<ref>https://twitter.com/khuasw/status/1031446551732707328?s=20</ref>


On December 24, 2018, Dreamtonics released the Production Release edition of Synthesizer V for sale <ref>https://twitter.com/khuasw/status/1077189890180149248?s=20</ref> with substantial improvements over the Technical Preview edition. <ref>https://twitter.com/khuasw/status/1078547103561920512?s=20</ref>
On December 24, 2018, Dreamtonics released the Production Release edition of Synthesizer V for sale<ref>https://twitter.com/khuasw/status/1077189890180149248?s=20</ref> with substantial improvements over the Technical Preview edition. <ref>https://twitter.com/khuasw/status/1078547103561920512?s=20</ref>


In March 2019, Kanru Hua announced an application for licensed purchasers of Synthesizer V for early testing of the macOS edition. <ref>https://twitter.com/khuasw/status/1103616971822661632?s=20</ref> This version officially released on March 12, 2019 <ref>https://twitter.com/khuasw/status/1105509415162007552?s=20</ref> and all currently released voices were given macOS versions. <ref>https://twitter.com/khuasw/status/1110499169486045184?s=20</ref>
In March 2019, Kanru Hua announced an application for licensed purchasers of Synthesizer V for early testing of the macOS edition.<ref>https://twitter.com/khuasw/status/1103616971822661632?s=20</ref> This version officially released on March 12, 2019 <ref>https://twitter.com/khuasw/status/1105509415162007552?s=20</ref> and all currently released voices were given macOS versions. <ref>https://twitter.com/khuasw/status/1110499169486045184?s=20</ref>


On May 31, 2019 Kanru Hua, with DREAMTONICS, announced in a tweet that he was accepting applications for [[C++]] software engineers to work on the next iteration of Synthesizer V <ref>https://twitter.com/khuasw/status/1134381595270320129?s=20</ref> which later became known as '''Synthesizer V Release 2''' or "SVR2" <ref>https://twitter.com/khuasw/status/1154988155256262657?s=20</ref>. This would later on in 2020 become known as Synthesizer V Studio.
On May 31, 2019 Kanru Hua, with DREAMTONICS, announced in a tweet that he was accepting applications for [[C++]] software engineers to work on the next iteration of Synthesizer V <ref>https://twitter.com/khuasw/status/1134381595270320129?s=20</ref> which later became known as '''Synthesizer V Release 2''' or "SVR2" <ref>https://twitter.com/khuasw/status/1154988155256262657?s=20</ref>. This would later on in 2020 become known as Synthesizer V Studio.


On April 9, 2020 it was announced that the second generation of Synthesizer V would be released soon, and said a demo of Chiyu using the new engine would be coming soon.<ref>https://www.weibo.com/5056093502/ICxlpBPw4</ref> The demo was released on April 11.<ref>https://www.bilibili.com/video/BV1ba4y1x7pg</ref>
On April 9, 2020 it was announced that the second generation of Synthesizer V would be released soon, and said a demo of Chiyu using the new engine would be coming soon.<ref>{{cite web | url=https://www.weibo.com/5056093502/ICxlpBPw4 | title=Sina Visitor System }}</ref> The demo was released on April 11.<ref>https://www.bilibili.com/video/BV1ba4y1x7pg</ref>
Synthesizer V Studio Pro and Synthesizer V Studio Basic were formally announced on June 26 by [[AH-Software]] in a press release, as well as the voice databases Kotonoha Akane & Aoi and Saki.<ref>https://www.ah-soft.com/press/synth-v/</ref>
Synthesizer V Studio Pro and Synthesizer V Studio Basic were formally announced on June 26 by [[AH-Software]] in a press release, as well as the voice databases Kotonoha Akane & Aoi and Saki.<ref>{{cite web | url=https://www.ah-soft.com/press/synth-v/ | title=新世代歌声合成ソフトウェアが登場!「Synthesizer Vシリーズ」 2020年7月30日発売|AHS(AH-Software) }}</ref>


Animen's ANiCUTE store for international customers opened on July 12th and Synthesizer Studio Pro with voice databases Genbu & AiKO available for purchase on July 15.<ref>https://twitter.com/dreamtonics_en/status/1283386080251830272</ref> Special discounts were made available for VIP members and purchasers of the 1st generation Synthesizer V editor.
Animen's ANiCUTE store for international customers opened on July 12th and Synthesizer Studio Pro with voice databases Genbu & AiKO available for purchase on July 15.<ref>https://twitter.com/dreamtonics_en/status/1283386080251830272</ref> Special discounts were made available for VIP members and purchasers of the 1st generation Synthesizer V editor.
Line 53: Line 58:
On August 2, Dreamtonics opened a beta-test application for [[Virtual Studio Technology|VST]] and [[Audio Units]] versions of Synthesizer V Studio to anyone that purchased the software.<ref>https://twitter.com/dreamtonics_en/status/1289851729668747271</ref>
On August 2, Dreamtonics opened a beta-test application for [[Virtual Studio Technology|VST]] and [[Audio Units]] versions of Synthesizer V Studio to anyone that purchased the software.<ref>https://twitter.com/dreamtonics_en/status/1289851729668747271</ref>


On February 4, 2022, according to AH-Software, the Synthesizer V series had been selling much more than expected within a year ever since it became compatible with AI.<ref>https://www.ah-soft.com/press/synth-v/20220204.html</ref>
On February 4, 2022, according to AH-Software, the Synthesizer V series had been selling much more than expected within a year ever since it became compatible with AI.<ref>{{cite web | url=https://www.ah-soft.com/press/synth-v/20220204.html | title=待望の男性歌声データベース2種類がついに登場!『Synthesizer V AI Ryo』 『Synthesizer V AI Kevin』本日発売開始|AHS(AH-Software) }}</ref>


On February 28, 2023, Dreamtonics announced that Synthesizer V Studio would soon add [[Cantonese|Cantonese Chinese]] as its fourth supported language. This would allow the engine to support both voice libraries dedicated to the language as well as [[Bilingual|Cross-lingual Singing Synthesis]]. The company also announced the future support of [[Rap|rap]] vocals, showing a demo of a new male vocalist rapping in Mandarin Chinese and English. Support for Japanese rap was expected in the future.<ref>https://www.bilibili.com/video/BV1zs4y1f7QJ/</ref><ref>https://www.youtube.com/watch?v=mcJU0Wq-u7w</ref>
On February 28, 2023, Dreamtonics announced that Synthesizer V Studio would soon add [[Cantonese|Cantonese Chinese]] as its fourth supported language. This would allow the engine to support both voice libraries dedicated to the language as well as [[Bilingual|Cross-lingual Singing Synthesis]]. The company also announced the future support of [[Rap|rap]] vocals, showing a demo of a new male vocalist rapping in Mandarin Chinese and English. Support for Japanese rap was expected in the future.<ref>{{cite web | url=https://www.bilibili.com/video/BV1zs4y1f7QJ/ | title=「歌声技术」Synthesizer V AI 技术预览:粤语与说唱合成 (2023)_哔哩哔哩_bilibili }}</ref><ref>{{cite web | url=https://www.youtube.com/watch?v=mcJU0Wq-u7w | title=Technical Demo - Cantonese Singing Synthesis (And More!) | website=[[YouTube]] | date=28 February 2023 }}</ref>


On March 2, Dreamtonics posted a response to fans' concerns with the implementation of Cantonese Chinese and noted that they were checking and fixing the issues with the demonstration clips as reported by the user base. They also noted that Synthesizer V Studio supported the input of lyrics in [[Jyutping]], which was the 1993 version of the Cantonese spelling scheme. It was not equivalent to the [[X-SAMPA]] phonetic scheme above the lyric notes on the editor. The X-SAMPA phonetic scheme for a Chinese character was also not equivalent to the [[Pinyin]] reading of the character.<ref>https://t.bilibili.com/768416404500119586</ref>
On March 2, Dreamtonics posted a response to fans' concerns with the implementation of Cantonese Chinese and noted that they were checking and fixing the issues with the demonstration clips as reported by the user base. They also noted that Synthesizer V Studio supported the input of lyrics in [[Jyutping]], which was the 1993 version of the Cantonese spelling scheme. It was not equivalent to the [[X-SAMPA]] phonetic scheme above the lyric notes on the editor. The X-SAMPA phonetic scheme for a Chinese character was also not equivalent to the [[Pinyin]] reading of the character.<ref>{{cite web | url=https://t.bilibili.com/768416404500119586 | title=动态-哔哩哔哩 }}</ref>


On March 15, after receiving feedback in improving the song to be more in line with Cantonese songwriting habits, Dreamtonics replaced the [[Bilibili]] version of the debut video, which implemented corrections made to the male vocal's and Feng Yi's demos.<ref>http://www.bilibili.com/video/BV1zs4y1f7QJ - "Dreamtonics 已于 3 月 15 日将 Synthesizer V AI 粤语歌声合成技术预览的测试曲目更换为更加符合粤语歌曲创作习惯的版本,感谢各位创作者的关心与鞭策。未来 Dreamtonics 将陆续发布更多关于粤语歌声合成与跨语言合成的信息,敬请期待。"</ref><ref>https://t.bilibili.com/773207552189005831</ref>
On March 15, after receiving feedback in improving the song to be more in line with Cantonese songwriting habits, Dreamtonics replaced the [[Bilibili]] version of the debut video, which implemented corrections made to the male vocal's and Feng Yi's demos.<ref>http://www.bilibili.com/video/BV1zs4y1f7QJ - "Dreamtonics 已于 3 月 15 日将 Synthesizer V AI 粤语歌声合成技术预览的测试曲目更换为更加符合粤语歌曲创作习惯的版本,感谢各位创作者的关心与鞭策。未来 Dreamtonics 将陆续发布更多关于粤语歌声合成与跨语言合成的信息,敬请期待。"</ref><ref>{{cite web | url=https://t.bilibili.com/773207552189005831 | title=动态-哔哩哔哩 }}</ref>


The [[Rap|rap]] feature for English and Mandarin Chinese, and the implementation of Cantonese Chinese Cross-lingual Singing Synthesis was officially planned to be fully implemented in Version 1.9.0, with a beta version released on April 18. Dreamtonics mentioned that after receiving valuable feedback, they focused on refining pronunciation for an even better user experience. As for how it worked, they said that when the language is set to Cantonese, all Chinese lyrics will be sung with Cantonese pronunciation. If misread lyrics occurred, users can correct them by typing the romanized form in Jyutping directly. Although the phoneme set is largely based on Mandarin Chinese, several phonemes unique to Cantonese were incorporated.<ref>https://dreamtonics.com/synthesizer-v-studio-1-9-0b1-update-rap-cantonese-and-more/</ref>
The [[Rap|rap]] feature for English and Mandarin Chinese, and the implementation of Cantonese Chinese Cross-lingual Singing Synthesis was officially planned to be fully implemented in Version 1.9.0, with a beta version released on April 18. Dreamtonics mentioned that after receiving valuable feedback, they focused on refining pronunciation for an even better user experience. As for how it worked, they said that when the language is set to Cantonese, all Chinese lyrics will be sung with Cantonese pronunciation. If misread lyrics occurred, users can correct them by typing the romanized form in Jyutping directly. Although the phoneme set is largely based on Mandarin Chinese, several phonemes unique to Cantonese were incorporated.<ref>{{cite web | url=https://dreamtonics.com/synthesizer-v-studio-1-9-0b1-update-rap-cantonese-and-more/ | title=Synthesizer V Studio 1.9.0b1 Update: Rap, Cantonese and More &#124; Dreamtonics株式会社 | date=18 April 2023 }}</ref> <ref>{{cite web | url=https://www.soundonsound.com/reviews/dreamtonics-synthesizer-v | title=Dreamtonics Synthesizer V }}</ref>


==Development of Neural Networks==
==Development of Neural Networks==
Following the ''Synthesizer V Studio 1.2.0 Update'' on February 19, 2021, Kanru Hua announced a thread on his personal Twitter account about how the optimized [[Neural network|neural network]] inference functions in the recent Synthesizer V updates.<ref>https://twitter.com/khuasw/status/1362799523156746240</ref>
Following the ''Synthesizer V Studio 1.2.0 Update'' on February 19, 2021, Kanru Hua announced a thread on his personal Twitter account about how the optimized [[Neural network|neural network]] inference functions in the recent Synthesizer V updates.<ref>https://twitter.com/khuasw/status/1362799523156746240</ref>


The following day, Kanru Hua elaborated more on the subject. Synthesizer V Studio 1.2 uses [[Just-in-time compilation|JIT-compiled]] quantized sparse [[Matrix multiplication|Matrix-vector multiplication]] (MVM) kernels.<ref>https://twitter.com/khuasw/status/1363069116467138564</ref>
The following day, Kanru Hua elaborated more on the subject. Synthesizer V Studio 1.2 uses [[Just-in-time compilation|JIT-compiled]] quantized sparse [[Matrix multiplication|Matrix-vector multiplication]] (MVM) kernels.<ref>https://twitter.com/khuasw/status/1363069116467138564</ref>
In his own words "artificial neural network boils down to a bunch of really simple arithmetic operations, e.g. a + b * x1 + c * x2 + ... But when you (purposefully) compose millions of these together, they can be turned into really complicated machines."<ref>https://twitter.com/khuasw/status/1363073616821116931</ref>
In his own words "artificial neural network boils down to a bunch of really simple arithmetic operations, e.g. a + b * x1 + c * x2 + ... But when you (purposefully) compose millions of these together, they can be turned into really complicated machines."<ref>https://twitter.com/khuasw/status/1363073616821116931</ref>


He notes that in order to build a voice, he picks specialized values for the "a, b and c" that best represent the voice, and then plugs the values into millions of equations, These "a, b, c" values are called parameters. Linear algebra is used to help rather than writing each individual equation by hand to make matrices & vectors which are notations that aid with simple math in large batches. Many neural network models are composed of matrix-matrix multiplication.
He notes that in order to build a voice, he picks specialized values for the "a, b and c" that best represent the voice, and then plugs the values into millions of equations, These "a, b, c" values are called parameters. Linear algebra is used to help rather than writing each individual equation by hand to make matrices & vectors which are notations that aid with simple math in large batches. Many neural network models are composed of matrix-matrix multiplication.
In the case of Synthesizer V, the [[Bottleneck (engineering)|bottleneck]] is matrix-vector multiplication, mainly used in a network that generates waveform samples which is known as the “neural vocoder”.<ref>https://twitter.com/khuasw/status/1363073620445192195</ref>
In the case of Synthesizer V, the [[Bottleneck (engineering)|bottleneck]] is matrix-vector multiplication, mainly used in a network that generates waveform samples which is known as the “neural vocoder”.<ref>https://twitter.com/khuasw/status/1363073620445192195</ref>
One of the challenges he notes is that not only having a large network to manage, the network needs to "run tens of thousands of times per second" to synthesize high quality audio in real time.<ref>https://twitter.com/khuasw/status/1363073621741084680</ref>
One of the challenges he notes is that not only having a large network to manage, the network needs to "run tens of thousands of times per second" to synthesize high quality audio in real time.<ref>https://twitter.com/khuasw/status/1363073621741084680</ref>


Due to this, modern [[CPU|CPUs]] are needed as they are able to run at several "gigacycles" per second ("[[GHz]]"). This is on a similar order of magnitude as the number of operations per second above however, he notes that the margin is very tight. He states that because of this, not all CPU cycles can do "useful work" and presents additional challenges.<ref>https://twitter.com/khuasw/status/1363073622860992520</ref>
Due to this, modern [[CPU|CPUs]] are needed as they are able to run at several "gigacycles" per second ("[[GHz]]"). This is on a similar order of magnitude as the number of operations per second above however, he notes that the margin is very tight. He states that because of this, not all CPU cycles can do "useful work" and presents additional challenges.<ref>https://twitter.com/khuasw/status/1363073622860992520</ref>
The goal here is to make this MVM operation perform as fast as possible on modern CPU systems.<ref>https://twitter.com/khuasw/status/1363073624010235905</ref>
The goal here is to make this MVM operation perform as fast as possible on modern CPU systems.<ref>https://twitter.com/khuasw/status/1363073624010235905</ref>


The following day, he elaborated further into the usage of Sparse Matrix-Vector Multiplication (SpMVM) for Synthesizer V's neural networks.<ref>https://twitter.com/khuasw/status/1363431631759925259</ref>
The following day, he elaborated further into the usage of Sparse Matrix-Vector Multiplication (SpMVM) for Synthesizer V's neural networks.<ref>https://twitter.com/khuasw/status/1363431631759925259</ref>
Kanru Hua states that out of the millions of parameters, many are redundant and thrown out without hurting sound quality which results in what is called a ''[[Sparse matrix]]''. <ref>https://twitter.com/khuasw/status/1363431631759925259</ref><ref>https://twitter.com/khuasw/status/1363431633404108803</ref>
Kanru Hua states that out of the millions of parameters, many are redundant and thrown out without hurting sound quality which results in what is called a ''[[Sparse matrix]]''.<ref>https://twitter.com/khuasw/status/1363431631759925259</ref><ref>https://twitter.com/khuasw/status/1363431633404108803</ref>
Some parameters that are truly important can't be thrown away and if too many are removed from too many of the parameters, eventually the quality will drop. "The synthesized voice will sound more and more like from a walkie-talkie until it completely degrades into noise."<ref>https://twitter.com/khuasw/status/1363431634670817287</ref>
Some parameters that are truly important can't be thrown away and if too many are removed from too many of the parameters, eventually the quality will drop. "The synthesized voice will sound more and more like from a walkie-talkie until it completely degrades into noise."<ref>https://twitter.com/khuasw/status/1363431634670817287</ref>


The goal here is to remove the less contributing parameters carefully and remove as many as possible without hurting the quality. Typically over three-fourths of the parameters are thrown out if done properly.<ref>https://twitter.com/khuasw/status/1363431636012986371</ref>
The goal here is to remove the less contributing parameters carefully and remove as many as possible without hurting the quality. Typically over three-fourths of the parameters are thrown out if done properly.<ref>https://twitter.com/khuasw/status/1363431636012986371</ref>
When executing the sparse neural network, the program needs to skip the parameters that were removed. This skipping process adds a sometimes expensive overhead. This aids in boosting the speed up to four-times the initial speed.<ref>https://twitter.com/khuasw/status/1363431637506170882</ref>
When executing the sparse neural network, the program needs to skip the parameters that were removed. This skipping process adds a sometimes expensive overhead. This aids in boosting the speed up to four-times the initial speed.<ref>https://twitter.com/khuasw/status/1363431637506170882</ref>
"Going sparse is an effective way to compress a neural network. If done right, it can still speed up execution by a few times, although this would require highly optimized code for SpMVM."<ref>https://twitter.com/khuasw/status/1363431640085635075</ref>
"Going sparse is an effective way to compress a neural network. If done right, it can still speed up execution by a few times, although this would require highly optimized code for SpMVM."<ref>https://twitter.com/khuasw/status/1363431640085635075</ref>


Over the course of the following three days, Kanru Hua posted three additional threads further elaborating on the intricises of developing the neural networks.<ref>https://twitter.com/khuasw/status/1363828518686138375</ref><ref>https://twitter.com/khuasw/status/1364164654231031815</ref><ref>https://twitter.com/khuasw/status/1364774324599615494</ref>
Over the course of the following three days, Kanru Hua posted three additional threads further elaborating on the intricises of developing the neural networks.<ref>https://twitter.com/khuasw/status/1363828518686138375</ref><ref>https://twitter.com/khuasw/status/1364164654231031815</ref><ref>https://twitter.com/khuasw/status/1364774324599615494</ref>
After making matrices sparse, integers are quantized to scale the values down before doing MVM to make sure the result will be in the range, it was noted that if the addition or multiplication programming goes out of range, they could be wrapped back to the lower end of the range resulting in an overflow and the synthesized voice could potentially sound akin to that of a mistuned radio or be complete noise.<ref>https://twitter.com/khuasw/status/1364164660132372481</ref>
After making matrices sparse, integers are quantized to scale the values down before doing MVM to make sure the result will be in the range, it was noted that if the addition or multiplication programming goes out of range, they could be wrapped back to the lower end of the range resulting in an overflow and the synthesized voice could potentially sound akin to that of a mistuned radio or be complete noise.<ref>https://twitter.com/khuasw/status/1364164660132372481</ref>


He states that neural networks used in Synthesizer V AI come in many different sizes, some of which can be made sparse, some can not. The software is able to run on every x86 CPU since the [[Pentium 4]] processor that was developed in 2004.<ref>https://twitter.com/khuasw/status/1364774332874985474</ref>
He states that neural networks used in Synthesizer V AI come in many different sizes, some of which can be made sparse, some can not. The software is able to run on every x86 CPU since the [[Pentium 4]] processor that was developed in 2004.<ref>https://twitter.com/khuasw/status/1364774332874985474</ref>
Line 97: Line 102:
{{Authority control}}
{{Authority control}}


[[Category:2018 software]]
[[:Category:2018 software]]
[[Category:Music production software]]
[[:Category:Music production software]]
[[Category:Japanese inventions]]
[[:Category:Japanese inventions]]

Latest revision as of 12:26, 5 January 2025

  • Comment: In addition to the comments above, this article fails to meet the encyclopaedic writing standards and structure required. A possibly related draft (Draft:Synthesizer V) also exhibits similar issues. QEnigma talk 19:17, 4 January 2025 (UTC)

Synthesizer V, also referred to as SynthV is a vocal synthesis engine developed by Dreamtonics. A Technical Preview was announced August 8, 2018 and distribution began on August 19, 2018.[1] [2].[3] The full Production Release officially released on December 24, 2018.[4] Synthesizer V Studio was unveiled June 25, 2020 in a press release alongside voice databases Kotonoha Akane & Aoi and Saki.[5] AI support for Synthesizer V AI was released for Synthesizer V Studio as an update on December 25 alongside an update for Saki known as Saki AI.

Synthesizer V
Original author(s)Kanru Hua
Developer(s)Dreamtonics
Initial releaseAugust 19, 2018; 6 years ago (2018-08-19)
Stable release
Synthesizer V Studio 1.11.2 Update / September 12, 2024; 3 months ago (2024-09-12)
Operating systemMicrosoft Windows
macOS
Linux
Available inJapanese, English, Spanish, Chinese
TypeVoice synthesizer software
LicenseProprietary
Websitedreamtonics.com/synthesizerv/

Voice databases that were not recorded with the AI method are known as "Standard voice databases", which are generally recorded at the Dreamtonics studio in Tokyo.[6][7][8]

History

[edit]

Kanru Hua stated on Twitter, the first line of code, the proto-proto libllsm (written in March 2015), eventually became part of Synthesizer V.[9][10]

Development for Synthesizer V began in 2017. Kanru Hua released a demo[11] using three vocals known tentatively at the time as ENG-F1, JA-F1 & MAN-M1. Synthesizer V made its first official debut on December 1, 2017.[12]

In February 2018, Kanru Hua posted a listening test to recieve feedback on a new singing pitch model for Synthesizer V.[13] In August 2018, Kanru Hua released a "Technical Preview" version of Synthesizer V[14]

On August 20, 2018, Kanru Hua released a survey asking for user feedback on the Technical Preview to be used for future improvements.[15]

On December 24, 2018, Dreamtonics released the Production Release edition of Synthesizer V for sale[16] with substantial improvements over the Technical Preview edition. [17]

In March 2019, Kanru Hua announced an application for licensed purchasers of Synthesizer V for early testing of the macOS edition.[18] This version officially released on March 12, 2019 [19] and all currently released voices were given macOS versions. [20]

On May 31, 2019 Kanru Hua, with DREAMTONICS, announced in a tweet that he was accepting applications for C++ software engineers to work on the next iteration of Synthesizer V [21] which later became known as Synthesizer V Release 2 or "SVR2" [22]. This would later on in 2020 become known as Synthesizer V Studio.

On April 9, 2020 it was announced that the second generation of Synthesizer V would be released soon, and said a demo of Chiyu using the new engine would be coming soon.[23] The demo was released on April 11.[24] Synthesizer V Studio Pro and Synthesizer V Studio Basic were formally announced on June 26 by AH-Software in a press release, as well as the voice databases Kotonoha Akane & Aoi and Saki.[25]

Animen's ANiCUTE store for international customers opened on July 12th and Synthesizer Studio Pro with voice databases Genbu & AiKO available for purchase on July 15.[26] Special discounts were made available for VIP members and purchasers of the 1st generation Synthesizer V editor.

On August 2, Dreamtonics opened a beta-test application for VST and Audio Units versions of Synthesizer V Studio to anyone that purchased the software.[27]

On February 4, 2022, according to AH-Software, the Synthesizer V series had been selling much more than expected within a year ever since it became compatible with AI.[28]

On February 28, 2023, Dreamtonics announced that Synthesizer V Studio would soon add Cantonese Chinese as its fourth supported language. This would allow the engine to support both voice libraries dedicated to the language as well as Cross-lingual Singing Synthesis. The company also announced the future support of rap vocals, showing a demo of a new male vocalist rapping in Mandarin Chinese and English. Support for Japanese rap was expected in the future.[29][30]

On March 2, Dreamtonics posted a response to fans' concerns with the implementation of Cantonese Chinese and noted that they were checking and fixing the issues with the demonstration clips as reported by the user base. They also noted that Synthesizer V Studio supported the input of lyrics in Jyutping, which was the 1993 version of the Cantonese spelling scheme. It was not equivalent to the X-SAMPA phonetic scheme above the lyric notes on the editor. The X-SAMPA phonetic scheme for a Chinese character was also not equivalent to the Pinyin reading of the character.[31]

On March 15, after receiving feedback in improving the song to be more in line with Cantonese songwriting habits, Dreamtonics replaced the Bilibili version of the debut video, which implemented corrections made to the male vocal's and Feng Yi's demos.[32][33]

The rap feature for English and Mandarin Chinese, and the implementation of Cantonese Chinese Cross-lingual Singing Synthesis was officially planned to be fully implemented in Version 1.9.0, with a beta version released on April 18. Dreamtonics mentioned that after receiving valuable feedback, they focused on refining pronunciation for an even better user experience. As for how it worked, they said that when the language is set to Cantonese, all Chinese lyrics will be sung with Cantonese pronunciation. If misread lyrics occurred, users can correct them by typing the romanized form in Jyutping directly. Although the phoneme set is largely based on Mandarin Chinese, several phonemes unique to Cantonese were incorporated.[34] [35]

Development of Neural Networks

[edit]

Following the Synthesizer V Studio 1.2.0 Update on February 19, 2021, Kanru Hua announced a thread on his personal Twitter account about how the optimized neural network inference functions in the recent Synthesizer V updates.[36]

The following day, Kanru Hua elaborated more on the subject. Synthesizer V Studio 1.2 uses JIT-compiled quantized sparse Matrix-vector multiplication (MVM) kernels.[37] In his own words "artificial neural network boils down to a bunch of really simple arithmetic operations, e.g. a + b * x1 + c * x2 + ... But when you (purposefully) compose millions of these together, they can be turned into really complicated machines."[38]

He notes that in order to build a voice, he picks specialized values for the "a, b and c" that best represent the voice, and then plugs the values into millions of equations, These "a, b, c" values are called parameters. Linear algebra is used to help rather than writing each individual equation by hand to make matrices & vectors which are notations that aid with simple math in large batches. Many neural network models are composed of matrix-matrix multiplication. In the case of Synthesizer V, the bottleneck is matrix-vector multiplication, mainly used in a network that generates waveform samples which is known as the “neural vocoder”.[39] One of the challenges he notes is that not only having a large network to manage, the network needs to "run tens of thousands of times per second" to synthesize high quality audio in real time.[40]

Due to this, modern CPUs are needed as they are able to run at several "gigacycles" per second ("GHz"). This is on a similar order of magnitude as the number of operations per second above however, he notes that the margin is very tight. He states that because of this, not all CPU cycles can do "useful work" and presents additional challenges.[41] The goal here is to make this MVM operation perform as fast as possible on modern CPU systems.[42]

The following day, he elaborated further into the usage of Sparse Matrix-Vector Multiplication (SpMVM) for Synthesizer V's neural networks.[43] Kanru Hua states that out of the millions of parameters, many are redundant and thrown out without hurting sound quality which results in what is called a Sparse matrix.[44][45] Some parameters that are truly important can't be thrown away and if too many are removed from too many of the parameters, eventually the quality will drop. "The synthesized voice will sound more and more like from a walkie-talkie until it completely degrades into noise."[46]

The goal here is to remove the less contributing parameters carefully and remove as many as possible without hurting the quality. Typically over three-fourths of the parameters are thrown out if done properly.[47]

When executing the sparse neural network, the program needs to skip the parameters that were removed. This skipping process adds a sometimes expensive overhead. This aids in boosting the speed up to four-times the initial speed.[48] "Going sparse is an effective way to compress a neural network. If done right, it can still speed up execution by a few times, although this would require highly optimized code for SpMVM."[49]

Over the course of the following three days, Kanru Hua posted three additional threads further elaborating on the intricises of developing the neural networks.[50][51][52] After making matrices sparse, integers are quantized to scale the values down before doing MVM to make sure the result will be in the range, it was noted that if the addition or multiplication programming goes out of range, they could be wrapped back to the lower end of the range resulting in an overflow and the synthesized voice could potentially sound akin to that of a mistuned radio or be complete noise.[53]

He states that neural networks used in Synthesizer V AI come in many different sizes, some of which can be made sparse, some can not. The software is able to run on every x86 CPU since the Pentium 4 processor that was developed in 2004.[54]

[edit]


Category:2018 software Category:Music production software Category:Japanese inventions

  1. ^ "Review: Dreamtonics Synthesizer V Studio Pro". 25 July 2024.
  2. ^ https://resource.dreamtonics.com/download/%7Ctitle=index - powered by h5ai v0.29.2 (https://larsjung.de/h5ai/)%7Cwebsite=resource.dreamtonics.com%7Caccessdate=2021-01-10%7Carchive-date=2021-01-12%7Carchive-url=https://web.archive.org/web/20210112022759/https://resource.dreamtonics.com/download/
  3. ^ https://twitter.com/khuasw/status/1027354095227523072?s=20
  4. ^ https://twitter.com/khuasw/status/1077189890180149248?s=20
  5. ^ https://twitter.com/ahsoft/status/1276351253073629185
  6. ^ https://twitter.com/OfficialVolor/status/1392293297847037954
  7. ^ "Q&A Livestream 1 - Synthesizer V SOLARIS project | Eclipsed Sounds". YouTube. 22 May 2021.
  8. ^ Cite error: The named reference :2 was invoked but never defined (see the help page).
  9. ^ "Synthesizer V 官方網站". 15 April 2020. Archived from the original on 2021-05-31. Retrieved 2021-04-11.
  10. ^ https://twitter.com/khuasw/status/1112006856283570177?s=20
  11. ^ "SoundCloud - Hear the world's sounds".
  12. ^ https://twitter.com/khuasw/status/936463709203062784?s=20
  13. ^ https://twitter.com/khuasw/status/969065287155888128?s=20
  14. ^ https://twitter.com/khuasw/status/1027354095227523072?s=20
  15. ^ https://twitter.com/khuasw/status/1031446551732707328?s=20
  16. ^ https://twitter.com/khuasw/status/1077189890180149248?s=20
  17. ^ https://twitter.com/khuasw/status/1078547103561920512?s=20
  18. ^ https://twitter.com/khuasw/status/1103616971822661632?s=20
  19. ^ https://twitter.com/khuasw/status/1105509415162007552?s=20
  20. ^ https://twitter.com/khuasw/status/1110499169486045184?s=20
  21. ^ https://twitter.com/khuasw/status/1134381595270320129?s=20
  22. ^ https://twitter.com/khuasw/status/1154988155256262657?s=20
  23. ^ "Sina Visitor System".
  24. ^ https://www.bilibili.com/video/BV1ba4y1x7pg
  25. ^ "新世代歌声合成ソフトウェアが登場!「Synthesizer Vシリーズ」 2020年7月30日発売|AHS(AH-Software)".
  26. ^ https://twitter.com/dreamtonics_en/status/1283386080251830272
  27. ^ https://twitter.com/dreamtonics_en/status/1289851729668747271
  28. ^ "待望の男性歌声データベース2種類がついに登場!『Synthesizer V AI Ryo』 『Synthesizer V AI Kevin』本日発売開始|AHS(AH-Software)".
  29. ^ "「歌声技术」Synthesizer V AI 技术预览:粤语与说唱合成 (2023)_哔哩哔哩_bilibili".
  30. ^ "Technical Demo - Cantonese Singing Synthesis (And More!)". YouTube. 28 February 2023.
  31. ^ "动态-哔哩哔哩".
  32. ^ http://www.bilibili.com/video/BV1zs4y1f7QJ - "Dreamtonics 已于 3 月 15 日将 Synthesizer V AI 粤语歌声合成技术预览的测试曲目更换为更加符合粤语歌曲创作习惯的版本,感谢各位创作者的关心与鞭策。未来 Dreamtonics 将陆续发布更多关于粤语歌声合成与跨语言合成的信息,敬请期待。"
  33. ^ "动态-哔哩哔哩".
  34. ^ "Synthesizer V Studio 1.9.0b1 Update: Rap, Cantonese and More | Dreamtonics株式会社". 18 April 2023.
  35. ^ "Dreamtonics Synthesizer V".
  36. ^ https://twitter.com/khuasw/status/1362799523156746240
  37. ^ https://twitter.com/khuasw/status/1363069116467138564
  38. ^ https://twitter.com/khuasw/status/1363073616821116931
  39. ^ https://twitter.com/khuasw/status/1363073620445192195
  40. ^ https://twitter.com/khuasw/status/1363073621741084680
  41. ^ https://twitter.com/khuasw/status/1363073622860992520
  42. ^ https://twitter.com/khuasw/status/1363073624010235905
  43. ^ https://twitter.com/khuasw/status/1363431631759925259
  44. ^ https://twitter.com/khuasw/status/1363431631759925259
  45. ^ https://twitter.com/khuasw/status/1363431633404108803
  46. ^ https://twitter.com/khuasw/status/1363431634670817287
  47. ^ https://twitter.com/khuasw/status/1363431636012986371
  48. ^ https://twitter.com/khuasw/status/1363431637506170882
  49. ^ https://twitter.com/khuasw/status/1363431640085635075
  50. ^ https://twitter.com/khuasw/status/1363828518686138375
  51. ^ https://twitter.com/khuasw/status/1364164654231031815
  52. ^ https://twitter.com/khuasw/status/1364774324599615494
  53. ^ https://twitter.com/khuasw/status/1364164660132372481
  54. ^ https://twitter.com/khuasw/status/1364774332874985474