The author of Moresampler decided to do some shitposting here for having lost his Utaforum password and not willing to do a reset due to the lack of https encryption, in the very last minutes before flying to Hokkaido for a family trip at which point he'll probably (temporarily) disappear from the internet (if Twitter doesn't count) because eastern Hokkaido is WILD.
TL; DR. SHIRO is a tool I built for phoneme alignment, which means, given recorded speech and phoneme transcription, generate some label tracks like this,
How does this relate to UTAU and singing synthesis in general? Because you can generate oto.ini out of such label tracks. And that's what Moresampler internally does.
I basically took the codes from Moresampler and made it into a standalone toolkit.
Installation. Download SHIRO from here https://github.com/Sleepwalking/SHIRO/releases. You also need a Lua interpreter http://luabinaries.sourceforge.net/download.html.
Unzip and make the directory look like this,
Example: doing phoneme alignment on an Arpasing voicebank. I've used too much Klad's vb so this time I randomly picked
Uchuu's vb OUCH sorry I broke it by mistake let me switch to Kiinane's Arpasing vb. For demonstration purposes It doesn't really matter though.
Launch a cmd prompt from this directory.
There's an Arpabet phoneme definition file and an Arpabet phone map under examples/ already so I'll skip a few steps on how to generate these JSON files. You can refer to the Github readme tutorial when you want to do this for another language.
Preparation step. SHIRO also relies on an index.csv file that stores names of the wav files and corresponding phoneme transcriptions. However SHIRO's phonemes are space-delimited, instead of underline-delimited.
So batch replace _ with space. You should also remove all file extensions. It should look like this,
First step - feature extraction.
Second step - create a dummy segmentation from the index file.
Remember to check this output file unaligned.json. If the script fails it'll print the error message into this file.
Third step - training.
We don't have a pre-built model for Arpasing voicebanks yet. Though there's an Arpabet model built on CMU Arctic speech database, it doesn't have the glottal stop /q/ phoneme so it'll crash on Arpasing.
So first create an empty model. Then do a flat-start initialization (-FT) and train it on the voicebank.
It won't be as fast as Moresampler for this being an unoptimized demo version.
Also there're plenty of parameters to tweak. The number of iterations, pruning, training algorithm, model definitions, etc.
Final step - alignment.
Now we use the trained model to align the speech, using unaligned.json as a reference.
When this is done, use shiro-seg2lab.lua to convert JSON segmentation file into Audacity's label files.
Errata: add -p 10 -d 50 (or some number greater than that if you're dealing with low-BPM vocals) after shiro-align or you'll get crappy alignment.
Yeah we're done. Now go to the voicebank directory. You should be able to find a bunch of txt files.
But an Audacity-to-oto.ini converter hasn't been written.
I kindly request anyone who's interested towards building a universal OTO generator to take this challenge of writing a converter which is why I came here to shitpost in the last minutes before flying to eastern Hokkaido and living a secluded life at a place where there're fewer human beings than Mikus.