PHANTOM
🇮🇳 IN
Skip to content

Tokotron: Tokenized TTS#2696

Open
flexthink wants to merge 22 commits intospeechbrain:developfrom
flexthink:tokotron
Open

Tokotron: Tokenized TTS#2696
flexthink wants to merge 22 commits intospeechbrain:developfrom
flexthink:tokotron

Conversation

@flexthink
Copy link
Collaborator

@flexthink flexthink commented Sep 24, 2024

What does this PR do?

Introduces a simple TTS architecture based on discrete speech representations from self-supervised models

  • Did you read the contributor guideline?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Does your code adhere to project-specific code style and conventions?

PR review

Reviewer checklist
  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified
  • Confirm that the changes adhere to compatibility requirements (e.g., Python version, platform)
  • Review the self-review checklist to ensure the code is ready for review

@flexthink flexthink marked this pull request as ready for review October 4, 2024 00:21
@mravanelli mravanelli requested a review from pplantinga October 8, 2024 20:05
@mravanelli mravanelli added the enhancement New feature or request label Oct 8, 2024
Copy link
Collaborator

@pplantinga pplantinga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think this will make a valuable addition to the toolkit, tokenized TTS is a great project. This is quite a large PR and hard to review in its entirety but I have a few initial thoughts.

  • I tried to run LibriTTS using "lite" configuration, and the training time was surprisingly reasonable (3 minutes or so) per epoch, but the validation time was very long, >2 hours per epoch. I wonder if it has anything to do with the warning message: .../site-packages/torch/nn/functional.py:5193: UserWarning: Support for mismatched key_padding_mask and attn_mask is deprecated. Use same type for both instead.
  • I don't see any readme/documentation/tutorial file that explains tokotron and what this model is trying to accomplish. We may want to at the very least have a README with a link to a paper explaining the model and any results.
  • I'm wondering if some of these changes (like eval.py or preparation.py) may make sense to move to a separate PR. They seem like bigger changes that are a bit unrelated to Tokotron.

Again, I believe this will be a nice addition to the toolkit, but it still requires some work and more review due to the size.

default: any
the default value
apply: bool
if set to true, the value is expected to
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be confusing what value means here since value is also the name of one of the arguments. Come to think of it, perhaps index would be a better name for the argument.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, in this case it is not an index... The way choice is used is as follows.

spk_emb_discrete_src: !apply:speechbrain.utils.hparams.choice
    value: !ref <ssl_model_type>
    choices:
        wavlm: flexthink/discrete_wavlm_spk_rec_ecapatdn_lite
        hubert: flexthink/discrete_hubert_spk_rec_ecapatdn_lite
        wav2vec2: flexthink/discrete_wav2vec2_spk_rec_ecapatdn_lite

So the value indicates the value to be mapped using choices... Perhaps it could be renamed to key by analogy to dictionary keys but it is not a numeric index.

The most typical use case for this is this:

  • You have an hparam indicating, for instance, the type of base model to use
  • Multiple other params will be automatically selected based on this choice

Without choice you'd have to create different hparams files for all the permutations and combinations.

return result


class ArchiveTrainLogger:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could clearly be useful, do you know if it works in a DDP setting?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not, it might be that @main_process_only should be added to the wrting functions.

Comment on lines +498 to +500
archive_path : str
The path to the archive. It will be created if it does not exist
and opened for appending as needed if it does
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if someone runs a second experiment from the same location as the first experiment. Would it overwrite or just append the results?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little fuzzy on the value-add for this file on top of our current metric engines. What is this file accomplishing?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file isn't really hparams. I guess its fine to put it here, but perhaps we could consider using another directory called tokens or data or something.

flexthink and others added 4 commits November 12, 2024 10:35
Co-authored-by: Peter Plantinga <plantinga.peter@proton.me>
Co-authored-by: Peter Plantinga <plantinga.peter@proton.me>
Co-authored-by: Peter Plantinga <plantinga.peter@proton.me>
Co-authored-by: Peter Plantinga <plantinga.peter@proton.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants