I. Deep Generative Modeling


TL;DR: Training vector quantization efficiently and stably with variational Bayes framework.


TL;DR: Generalizing parameterizations of the data variance in Gaussian VAE to prevent oversmoothness of decoder.



TL;DR: Deriving metrizable conditions for GANs from the perspective of sliced optimal transport and modifying the maximization problems.


TL;DR: Improving density estimation of diffusion models by regularizing with the underlying equation describing the temporal evolution of scores, theoretically supported.


Consistency-type Models


TL;DR: Establishing theoretical equivalence between three consistency concepts of diffusion models, including FP-Diffusion.
(ICML2023 SPIGM workshop)

TL;DR: Solving blind inverse problems unsupervisedly with Denoising Diffusion Restoration Models.
(ICML2023 Oral)

Downstream applications:

II. Multimodal NLP & Commonsense AI



PeaCoK: Persona Commonsense Knowledge for Consistent and Engaging Narratives
(ACL2023, Outstanding Paper Award)


ComFact: A Benchmark for Linking Contextual Commonsense Knowledge
(EMNLP2022 Findings)

III. Music & Cinematic Technologies


Automatic Piano Transcription with Hierarchical Frequency-Time Transformer




CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos

Automatic Music Tagging


An Attention-based Approach To Hierarchical Multi-label Music Instrument Classification

Vocal Dereverberation

Unsupervised Vocal Dereverberation with Diffusion-based Generative Models

Mixing Style Transfer

Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects

Music Transcription

DiffRoll: Diffusion-based Generative Music Transcription with Unsupervised Pretraining Capability

Singing Voice Vocoder

Hierarchical Diffusion Models for Singing Voice Neural Vocoder

Distortion Effect Removal

Distortion Audio Effects: Learning How to Recover the Clean Signal

Automatic Music Mixing

Automatic Music Mixing with Deep Learning and Out-of-Domain Data

Sound Separation


Music Source Separation with Deep Equilibrium Models

Automatic DJ Transition

Automatic DJ Transitions with Differentiable Audio Effects and Generative Adversarial Networks

Singing Voice Conversion

Robust One-Shot Singing Voice Conversion

Sound Separation

Glenn Gould and Kanji Ishimaru 2021: A collaboration with AI Sound Separation after 60 years


Music Demixing Challenge 2021

DCASE Challenge

Sound Event Localization and Detection Evaluated in Real Spatial Sound Scenes

Sound Event Localization and Detection


Multi-ACCDOA: Localizing and Detecting Overlapping Sounds from the Same Class with Auxiliary Duplicating Permutation Invariant Training (ICASSP2022)


