PHANTOM
🇮🇳 IN
Skip to content

Added LMAC-TD [ICASSP'25]#2838

Draft
fpaissan wants to merge 5 commits intospeechbrain:developfrom
fpaissan:icass25_lmactd
Draft

Added LMAC-TD [ICASSP'25]#2838
fpaissan wants to merge 5 commits intospeechbrain:developfrom
fpaissan:icass25_lmactd

Conversation

@fpaissan
Copy link
Collaborator

What does this PR do?

Adds LMAC-TD, a novel method for explaining speech and audio models.

Before submitting
  • Did you read the contributor guideline?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Does your code adhere to project-specific code style and conventions?

PR review

Reviewer checklist
  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified
  • Confirm that the changes adhere to compatibility requirements (e.g., Python version, platform)
  • Review the self-review checklist to ensure the code is ready for review

@helemanc
Copy link

helemanc commented Jul 9, 2025

Here's a detailed description of all the changes in this PR:

Update ESC50 recipe: Add LMAC-TD support

This PR enhances the ESC50 recipe with LMAC-TD (Time Domain) interpretation capabilities and modernizes the data preparation script.

Changes to esc50_prepare.py:

  • Updated repository URL to point to the current ESC-50 repository (karoldvl/ESC-50)
  • Replaced custom SpeechBrain logger with standard Python logging
  • Simplified data fetching by removing unnecessary parameters
  • Fixed file path handling in archive extraction
  • Cleaned up JSON file writing

Changes to interpret/README.md:

  • Added comprehensive LMAC-TD documentation section
  • Included performance metrics table with LMAC-TD results across different alpha values
  • Added training instructions for LMAC-TD with ESC50 dataset
  • Documented WHAM! noise augmentation option for improved interpretations
  • Added citation and references to LMAC-TD ICASSP paper and companion website

Changes to interpret/eval.py:

  • Added support for LMAC-TD evaluation alongside existing LMAC and L2I methods
  • Implemented conditional model loading based on interpretation method
  • Added LMAC-TD-specific single sample processing with time-domain output
  • Updated import statements and class references for LMAC-TD
  • Enhanced evaluation workflow to handle different interpreter architectures

Changes to interpret/hparams/lmactd_cnn14.yaml:

  • Created LMAC-TD specific hyperparameters configuration with settings presented in the paper
  • Updated experiment name and method settings for LMAC-TD
  • Optimized model architecture parameters (reduced layers, heads, FFN dimensions)
  • Updated STFT parameters (n_fft increased to 2048)
  • Note for users with limited VRAM: You can reduce memory usage by decreasing N_encoder_out, out_channels, and d_ffn parameters of the SBTransformerBlocks

Changes to interpret/train_lmactd.py:

  • Added LMACTD class
  • Updated training recipe comments and documentation for LMAC-TD
  • Simplified mask processing
  • Enhanced computation steps for time-domain interpretation
  • Updated author information and maintained compatibility with existing workflow

New additions to speechbrain/lobes/models/Cnn14.py:

  • Added CNN14PSI_encoderdecoder class
  • Implemented progressive channel reduction architecture
  • Provided comprehensive documentation and usage examples

Updates to speechbrain/lobes/models/dual_path.py:

  • Added configurable activation function support to Dual_Path_Model
  • Implemented sigmoid activation option alongside existing ReLU
  • Enhanced model flexibility for different interpretation methods

File cleanup:

  • Removed obsolete lmac_sepformerstuff.py file (1,700 lines deleted)

These changes add new time-domain interpretation capabilities while ensuring compatibility with the current ESC-50 dataset repository and improving code maintainability. The LMAC-TD method provides enhanced interpretability for audio classifiers with direct time-domain explanations.

@fpaissan could you update the PR description with this and mark as ready for review when the checks pass?

@TParcollet TParcollet added this to the v1.1.0 milestone Oct 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants