Skip to the content.

I. Deep Generative Modeling


[PMLR] [arXiv] [code]

TL;DR: Training vector quantization efficiently and stably with variational Bayes framework.


[Elsevier] [arXiv]

TL;DR: Generalizing parameterizations of the data variance in Gaussian VAE to prevent oversmoothness of decoder.



TL;DR: Deriving metrizable conditions for GANs from the perspective of sliced optimal transport and modifying the maximization problems.


[arXiv] [code]

TL;DR: Improving density estimation of diffusion models by regularizing with the underlying equation describing the temporal evolution of scores, theoretically supported.


Consistency-type Models


TL;DR: Establishing theoretical equivalence between three consistency concepts of diffusion models, including FP-Diffusion.
(ICML2023 SPIGM workshop)

ICML23 SPIGM workshop


[arXiv] [code]

TL;DR: Solving blind inverse problems unsupervisedly with Denoising Diffusion Restoration Models.
(ICML2023 Oral)

ICML23 Oral
Downstream applications:

II. Multimodal NLP & Commonsense AI



PeaCoK: Persona Commonsense Knowledge for Consistent and Engaging Narratives
(ACL2023, Outstanding Paper Award)


[EMNLP] [arXiv] [code]

ComFact: A Benchmark for Linking Contextual Commonsense Knowledge
(EMNLP2022 Findings)

III. Music & Cinematic Technologies


[arXiv] [code]

Automatic Piano Transcription with Hierarchical Frequency-Time Transformer




CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos

Automatic Music Tagging


An Attention-based Approach To Hierarchical Multi-label Music Instrument Classification

Vocal Dereverberation

[arXiv] [demo]

Unsupervised Vocal Dereverberation with Diffusion-based Generative Models

Mixing Style Transfer

[arXiv] [code] [demo]

Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects

Music Transcription

[arXiv] [code] [demo]

DiffRoll: Diffusion-based Generative Music Transcription with Unsupervised Pretraining Capability

Singing Voice Vocoder

[arXiv] [demo]

Hierarchical Diffusion Models for Singing Voice Neural Vocoder

Distortion Effect Removal

[poster] [arXiv] [demo]

Distortion Audio Effects: Learning How to Recover the Clean Signal

Automatic Music Mixing

[poster] [arXiv] [code] [demo]

Automatic Music Mixing with Deep Learning and Out-of-Domain Data

Sound Separation


Music Source Separation with Deep Equilibrium Models

Automatic DJ Transition

[arXiv] [code] [demo]

Automatic DJ Transitions with Differentiable Audio Effects and Generative Adversarial Networks

Singing Voice Conversion

[arXiv] [demo]

Robust One-Shot Singing Voice Conversion

Sound Separation

[video] [site]

Glenn Gould and Kanji Ishimaru 2021: A collaboration with AI Sound Separation after 60 years


[site] [frontiers]

Music Demixing Challenge 2021

DCASE Challenge

[DCASE Challenge2023]

Sound Event Localization and Detection Evaluated in Real Spatial Sound Scenes

Sound Event Localization and Detection


Multi-ACCDOA: Localizing and Detecting Overlapping Sounds from the Same Class with Auxiliary Duplicating Permutation Invariant Training (ICASSP2022)


Yuki Mitsufuji (