Conversation
pplantinga
left a comment
There was a problem hiding this comment.
Overall I think this will make a valuable addition to the toolkit, tokenized TTS is a great project. This is quite a large PR and hard to review in its entirety but I have a few initial thoughts.
- I tried to run LibriTTS using "lite" configuration, and the training time was surprisingly reasonable (3 minutes or so) per epoch, but the validation time was very long, >2 hours per epoch. I wonder if it has anything to do with the warning message:
.../site-packages/torch/nn/functional.py:5193: UserWarning: Support for mismatched key_padding_mask and attn_mask is deprecated. Use same type for both instead. - I don't see any readme/documentation/tutorial file that explains tokotron and what this model is trying to accomplish. We may want to at the very least have a README with a link to a paper explaining the model and any results.
- I'm wondering if some of these changes (like
eval.pyorpreparation.py) may make sense to move to a separate PR. They seem like bigger changes that are a bit unrelated to Tokotron.
Again, I believe this will be a nice addition to the toolkit, but it still requires some work and more review due to the size.
| default: any | ||
| the default value | ||
| apply: bool | ||
| if set to true, the value is expected to |
There was a problem hiding this comment.
It could be confusing what value means here since value is also the name of one of the arguments. Come to think of it, perhaps index would be a better name for the argument.
There was a problem hiding this comment.
Actually, in this case it is not an index... The way choice is used is as follows.
spk_emb_discrete_src: !apply:speechbrain.utils.hparams.choice
value: !ref <ssl_model_type>
choices:
wavlm: flexthink/discrete_wavlm_spk_rec_ecapatdn_lite
hubert: flexthink/discrete_hubert_spk_rec_ecapatdn_lite
wav2vec2: flexthink/discrete_wav2vec2_spk_rec_ecapatdn_liteSo the value indicates the value to be mapped using choices... Perhaps it could be renamed to key by analogy to dictionary keys but it is not a numeric index.
The most typical use case for this is this:
- You have an hparam indicating, for instance, the type of base model to use
- Multiple other params will be automatically selected based on this choice
Without choice you'd have to create different hparams files for all the permutations and combinations.
| return result | ||
|
|
||
|
|
||
| class ArchiveTrainLogger: |
There was a problem hiding this comment.
This could clearly be useful, do you know if it works in a DDP setting?
There was a problem hiding this comment.
If not, it might be that @main_process_only should be added to the wrting functions.
| archive_path : str | ||
| The path to the archive. It will be created if it does not exist | ||
| and opened for appending as needed if it does |
There was a problem hiding this comment.
What happens if someone runs a second experiment from the same location as the first experiment. Would it overwrite or just append the results?
There was a problem hiding this comment.
I'm a little fuzzy on the value-add for this file on top of our current metric engines. What is this file accomplishing?
There was a problem hiding this comment.
This file isn't really hparams. I guess its fine to put it here, but perhaps we could consider using another directory called tokens or data or something.
Co-authored-by: Peter Plantinga <plantinga.peter@proton.me>
Co-authored-by: Peter Plantinga <plantinga.peter@proton.me>
Co-authored-by: Peter Plantinga <plantinga.peter@proton.me>
Co-authored-by: Peter Plantinga <plantinga.peter@proton.me>
What does this PR do?
Introduces a simple TTS architecture based on discrete speech representations from self-supervised models
PR review
Reviewer checklist