Large audio tagging models are usually trained or pre-trained on AudioSet, a dataset that encompasses a large amount of different sound classes and acoustic environments. Knowledge distillation has emerged as a method to compress such models without compromising their effectiveness. There are many different applications for audio tagging, some of which require a specialization to a narrow domain of sounds to be classified. For these scenarios, it is beneficial to distill the large audio tagger with respect to a specific subset of sounds of interest. A method to prune a general dataset with respect to a target dataset is presented. By distilling with such a specialized pruned dataset, we obtain a compressed model with better classification accuracy in the specific target domain than with target-agnostic distillation.

 

Zitation:

A. Werning and R. Haeb-Umbach, “Target-Specific Dataset Pruning for Compression of Audio Tagging Models,” European Signal Processing Conference, pp. 61–65, Aug. 2024, doi: 10.23919/eusipco63174.2024.10714950.

 

Mehr Informationen:

Open source: https://doi.org/10.23919/EUSIPCO63174.2024.10714950