Tokotron: Tokenized TTS by flexthink · Pull Request #2696 · speechbrain/speechbrain

flexthink · 2024-09-24T13:17:23Z

What does this PR do?

Introduces a simple TTS architecture based on discrete speech representations from self-supervised models

Did you read the contributor guideline?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
Does your code adhere to project-specific code style and conventions?

PR review

Reviewer checklist

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified
Confirm that the changes adhere to compatibility requirements (e.g., Python version, platform)
Review the self-review checklist to ensure the code is ready for review

…nto tokotron

pplantinga

Overall I think this will make a valuable addition to the toolkit, tokenized TTS is a great project. This is quite a large PR and hard to review in its entirety but I have a few initial thoughts.

I tried to run LibriTTS using "lite" configuration, and the training time was surprisingly reasonable (3 minutes or so) per epoch, but the validation time was very long, >2 hours per epoch. I wonder if it has anything to do with the warning message: .../site-packages/torch/nn/functional.py:5193: UserWarning: Support for mismatched key_padding_mask and attn_mask is deprecated. Use same type for both instead.
I don't see any readme/documentation/tutorial file that explains tokotron and what this model is trying to accomplish. We may want to at the very least have a README with a link to a paper explaining the model and any results.
I'm wondering if some of these changes (like eval.py or preparation.py) may make sense to move to a separate PR. They seem like bigger changes that are a bit unrelated to Tokotron.

Again, I believe this will be a nice addition to the toolkit, but it still requires some work and more review due to the size.

speechbrain/lobes/models/eval/utmos.py

speechbrain/nnet/loss/guidedattn_loss.py

pplantinga · 2024-10-30T18:40:00Z

speechbrain/utils/hparams.py

    default: any
        the default value
+    apply: bool
+        if set to true, the value is expected to


It could be confusing what value means here since value is also the name of one of the arguments. Come to think of it, perhaps index would be a better name for the argument.

Actually, in this case it is not an index... The way choice is used is as follows.

spk_emb_discrete_src: !apply:speechbrain.utils.hparams.choice value: !ref <ssl_model_type> choices: wavlm: flexthink/discrete_wavlm_spk_rec_ecapatdn_lite hubert: flexthink/discrete_hubert_spk_rec_ecapatdn_lite wav2vec2: flexthink/discrete_wav2vec2_spk_rec_ecapatdn_lite

So the value indicates the value to be mapped using choices... Perhaps it could be renamed to key by analogy to dictionary keys but it is not a numeric index.

The most typical use case for this is this:

You have an hparam indicating, for instance, the type of base model to use

Multiple other params will be automatically selected based on this choice

Without choice you'd have to create different hparams files for all the permutations and combinations.

pplantinga · 2024-10-30T18:41:16Z

speechbrain/utils/train_logger.py

    return result
+
+
+class ArchiveTrainLogger:


This could clearly be useful, do you know if it works in a DDP setting?

If not, it might be that @main_process_only should be added to the wrting functions.

speechbrain/utils/train_logger.py

pplantinga · 2024-10-30T18:55:15Z

speechbrain/utils/train_logger.py

+    archive_path : str
+        The path to the archive. It will be created if it does not exist
+        and opened for appending as needed if it does


What happens if someone runs a second experiment from the same location as the first experiment. Would it overwrite or just append the results?

speechbrain/lobes/models/discrete/Tokotron.py

pplantinga · 2024-10-31T14:05:02Z

speechbrain/inference/eval.py

I'm a little fuzzy on the value-add for this file on top of our current metric engines. What is this file accomplishing?

pplantinga · 2024-10-31T14:07:13Z

recipes/LJSpeech/TTS/tokotron/hparams/arpabet.txt

This file isn't really hparams. I guess its fine to put it here, but perhaps we could consider using another directory called tokens or data or something.

Co-authored-by: Peter Plantinga <plantinga.peter@proton.me>

flexthink added 4 commits September 18, 2024 14:14

Tokotron: Initial import: single-speaker

4449984

Tokotron: Add multispeaker support

ba3f452

Tokotron: Update batch size

92532ad

Tokotron: Add multispeaker

97868fd

flexthink requested a review from mravanelli September 24, 2024 13:17

flexthink added 7 commits October 3, 2024 13:48

Tokotron: Update to pass consistency and recipe tests

7f7b1e4

Tokotron: Cosmetic changes to pass pre-commit

7e283b5

Merge branch 'develop' into tokotron

c9b916d

Tokotron: Fixes

5cb0e39

Tokotron: Fixes

57f706e

Tokotron: Fix an example

cdc66cd

Tokotron: Add a docstring example to test Tokotron globally

d036ad4

flexthink marked this pull request as ready for review October 4, 2024 00:21

flexthink and others added 5 commits October 7, 2024 22:46

Tokotron: Fixes for train loggers

94fac68

Tokotron: Fixes

6f76c61

Merge branch 'develop' into tokotron

5bf766c

Tokotron: Add filtering / select_n for the validation set, fixes

c2e41e2

Merge branch 'tokotron' of https://github.com/flexthink/speechbrain i…

ee140ca

…nto tokotron

mravanelli requested a review from pplantinga October 8, 2024 20:05

mravanelli assigned flexthink Oct 8, 2024

mravanelli added the enhancement New feature or request label Oct 8, 2024

flexthink and others added 2 commits October 9, 2024 14:47

Tokotron: Sub-sampling for validation, logging improvements

012d030

Merge branch 'develop' into tokotron

2683a80

pplantinga requested changes Oct 31, 2024

View reviewed changes

flexthink and others added 4 commits November 12, 2024 10:35

Update speechbrain/lobes/models/eval/utmos.py

26b9edb

Co-authored-by: Peter Plantinga <plantinga.peter@proton.me>

Update speechbrain/nnet/loss/guidedattn_loss.py

00252c5

Co-authored-by: Peter Plantinga <plantinga.peter@proton.me>

Update speechbrain/lobes/models/discrete/Tokotron.py

14b4a6a

Co-authored-by: Peter Plantinga <plantinga.peter@proton.me>

Update speechbrain/utils/train_logger.py

132f982

Co-authored-by: Peter Plantinga <plantinga.peter@proton.me>

flexthink mentioned this pull request Mar 4, 2025

Tokotron: Tokenized TTS (lite version - minimal dependencies) #2849

Draft

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tokotron: Tokenized TTS#2696

Tokotron: Tokenized TTS#2696
flexthink wants to merge 22 commits intospeechbrain:developfrom
flexthink:tokotron

flexthink commented Sep 24, 2024 •

edited

Loading

Uh oh!

pplantinga left a comment

Uh oh!

Uh oh!

Uh oh!

pplantinga Oct 30, 2024

Uh oh!

flexthink Nov 12, 2024

Uh oh!

pplantinga Oct 30, 2024

Uh oh!

pplantinga Oct 30, 2024

Uh oh!

Uh oh!

pplantinga Oct 30, 2024

Uh oh!

Uh oh!

pplantinga Oct 31, 2024

Uh oh!

pplantinga Oct 31, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

flexthink commented Sep 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

PR review

Uh oh!

pplantinga left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

pplantinga Oct 30, 2024

Choose a reason for hiding this comment

Uh oh!

flexthink Nov 12, 2024

Choose a reason for hiding this comment

Uh oh!

pplantinga Oct 30, 2024

Choose a reason for hiding this comment

Uh oh!

pplantinga Oct 30, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pplantinga Oct 30, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pplantinga Oct 31, 2024

Choose a reason for hiding this comment

Uh oh!

pplantinga Oct 31, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

flexthink commented Sep 24, 2024 •

edited

Loading