Added LMAC-TD [ICASSP'25] by fpaissan · Pull Request #2838 · speechbrain/speechbrain

fpaissan · 2025-02-27T16:46:29Z

What does this PR do?

Adds LMAC-TD, a novel method for explaining speech and audio models.

Before submitting

Did you read the contributor guideline?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
Does your code adhere to project-specific code style and conventions?

PR review

Reviewer checklist

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified
Confirm that the changes adhere to compatibility requirements (e.g., Python version, platform)
Review the self-review checklist to ensure the code is ready for review

helemanc · 2025-07-09T15:20:07Z

Here's a detailed description of all the changes in this PR:

Update ESC50 recipe: Add LMAC-TD support

This PR enhances the ESC50 recipe with LMAC-TD (Time Domain) interpretation capabilities and modernizes the data preparation script.

Changes to esc50_prepare.py:

Updated repository URL to point to the current ESC-50 repository (karoldvl/ESC-50)
Replaced custom SpeechBrain logger with standard Python logging
Simplified data fetching by removing unnecessary parameters
Fixed file path handling in archive extraction
Cleaned up JSON file writing

Changes to interpret/README.md:

Added comprehensive LMAC-TD documentation section
Included performance metrics table with LMAC-TD results across different alpha values
Added training instructions for LMAC-TD with ESC50 dataset
Documented WHAM! noise augmentation option for improved interpretations
Added citation and references to LMAC-TD ICASSP paper and companion website

Changes to interpret/eval.py:

Added support for LMAC-TD evaluation alongside existing LMAC and L2I methods
Implemented conditional model loading based on interpretation method
Added LMAC-TD-specific single sample processing with time-domain output
Updated import statements and class references for LMAC-TD
Enhanced evaluation workflow to handle different interpreter architectures

Changes to interpret/hparams/lmactd_cnn14.yaml:

Created LMAC-TD specific hyperparameters configuration with settings presented in the paper
Updated experiment name and method settings for LMAC-TD
Optimized model architecture parameters (reduced layers, heads, FFN dimensions)
Updated STFT parameters (n_fft increased to 2048)
Note for users with limited VRAM: You can reduce memory usage by decreasing N_encoder_out, out_channels, and d_ffn parameters of the SBTransformerBlocks

Changes to interpret/train_lmactd.py:

Added LMACTD class
Updated training recipe comments and documentation for LMAC-TD
Simplified mask processing
Enhanced computation steps for time-domain interpretation
Updated author information and maintained compatibility with existing workflow

New additions to speechbrain/lobes/models/Cnn14.py:

Added CNN14PSI_encoderdecoder class
Implemented progressive channel reduction architecture
Provided comprehensive documentation and usage examples

Updates to speechbrain/lobes/models/dual_path.py:

Added configurable activation function support to Dual_Path_Model
Implemented sigmoid activation option alongside existing ReLU
Enhanced model flexibility for different interpretation methods

File cleanup:

Removed obsolete lmac_sepformerstuff.py file (1,700 lines deleted)

These changes add new time-domain interpretation capabilities while ensuring compatibility with the current ESC-50 dataset repository and improving code maintainability. The LMAC-TD method provides enhanced interpretability for audio classifiers with direct time-domain explanations.

@fpaissan could you update the PR description with this and mark as ready for review when the checks pass?

helemanc and others added 5 commits February 27, 2025 17:43

Added lmactd things

a515287

Updated code for lmactd

254b97b

Updated example in CNN14PSI_encoder_decoder

3bebaaa

Updated citation to lmactd

022aefc

Update lmactd_cnn14.yaml

5fe2a01

TParcollet added this to the v1.1.0 milestone Oct 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added LMAC-TD [ICASSP'25]#2838

Added LMAC-TD [ICASSP'25]#2838
fpaissan wants to merge 5 commits intospeechbrain:developfrom
fpaissan:icass25_lmactd

fpaissan commented Feb 27, 2025

Uh oh!

helemanc commented Jul 9, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

fpaissan commented Feb 27, 2025

What does this PR do?

PR review

Uh oh!

helemanc commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

helemanc commented Jul 9, 2025 •

edited

Loading