I. Deep Generative Modeling
SQ-VAE

[PMLR] [arXiv] [code]
TL;DR: Training vector quantization efficiently and stably with variational Bayes framework.
(ICML2022)
ARELBO

[Elsevier] [arXiv]
TL;DR: Generalizing parameterizations of the data variance in Gaussian VAE to prevent oversmoothness of decoder.
(Neurocomputing2022)
SAN

[arXiv]
TL;DR: Deriving metrizable conditions for GANs from the perspective of sliced optimal transport and modifying the maximization problems.
FP-Diffusion

[arXiv] [code]
TL;DR: Improving density estimation of diffusion models by regularizing with the underlying equation describing the temporal evolution of scores, theoretically supported.
(ICML2023)
Consistency-type Models

[arXiv]
TL;DR: Establishing theoretical equivalence between three consistency concepts of diffusion models, including FP-Diffusion.
(ICML2023 SPIGM workshop)
II. Multimodal NLP & Commonsense AI
PeaCok

[arXiv]
PeaCoK: Persona Commonsense Knowledge for Consistent and Engaging Narratives
(ACL2023, Outstanding Paper Award)
III. Music & Cinematic Technologies
hFT-Transformer

[arXiv] [code]
Automatic Piano Transcription with Hierarchical Frequency-Time Transformer
(ISMIR2023)
CLIPSep

[OpenReview]
CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos
(ICLR2023)
Automatic Music Tagging

[arXiv]
An Attention-based Approach To Hierarchical Multi-label Music Instrument Classification
(ICASSP2023)
Vocal Dereverberation

[arXiv] [demo]
Unsupervised Vocal Dereverberation with Diffusion-based Generative Models
(ICASSP2023)
Mixing Style Transfer

[arXiv] [code] [demo]
Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects
(ICASSP2023)
Music Transcription

[arXiv] [code] [demo]
DiffRoll: Diffusion-based Generative Music Transcription with Unsupervised Pretraining Capability
(ICASSP2023)
Singing Voice Vocoder

[arXiv] [demo]
Hierarchical Diffusion Models for Singing Voice Neural Vocoder
(ICASSP2023)
Distortion Effect Removal

[poster] [arXiv] [demo]
Distortion Audio Effects: Learning How to Recover the Clean Signal
(ISMIR2022)
Automatic Music Mixing

[poster] [arXiv] [code] [demo]
Automatic Music Mixing with Deep Learning and Out-of-Domain Data
(ISMIR2022)
Automatic DJ Transition

[arXiv] [code] [demo]
Automatic DJ Transitions with Differentiable Audio Effects and Generative Adversarial Networks
(ICASSP2022)
Sound Separation

[video] [site]
Glenn Gould and Kanji Ishimaru 2021: A collaboration with AI Sound Separation after 60 years
DCASE Challenge

[DCASE Challenge2023]
Sound Event Localization and Detection Evaluated in Real Spatial Sound Scenes
Sound Event Localization and Detection

[arXiv]
Multi-ACCDOA: Localizing and Detecting Overlapping Sounds from the Same Class with Auxiliary Duplicating Permutation Invariant Training (ICASSP2022)