Skip to the content.

I. Deep Generative Modeling


Consistency Trajectory Model (CTM)

[arXiv] [project page]

Unified framework enables diverse samplers and 1-step generation SOTAs
(ICLR2024)

ICLR24

SAN

[arXiv] [code] [project page]

Enhancing GAN with metrizable discriminators
(ICLR2024)

ICLR24

Applications:
[Vocoder]

MPGD

[arXiv] [project page]

Fast, Efficient, Training-Free, and Controllable diffusion-based generation method
(ICLR2024)

ICLR24

HQ-VAE

[OpenReview] [arXiv]

Generalizing hierarchical VQ-VAEs with a Bayesian framework
(TMLR2024)

TMLR

FP-Diffusion

[PMLR] [code]

Improving density estimation of diffusion
(ICML2023)

ICML23

GibbsDDRM

[PMLR] [code]

Achieving blind inversion using DDPM
(ICML2023 Oral)

ICML23 Oral

Applications:
[DeReverb] [SpeechEnhance]

Consistency-type Models

[arXiv]

Theoretically unified framework for "consistency" on diffusion models
(ICML2023 SPIGM workshop)

ICML23 SPIGM workshop

SQ-VAE

[PMLR] [arXiv] [code]

Improving codebook utilization and training stability
(ICML2022)

AR-ELBO

[Elsevier] [arXiv]

Mitigating oversmoothness in VAE
(Neurocomputing2022)

II. Multimodal NLP & Commonsense AI


CPD Challenge 2023

[CPD Challenge 2023]

Commonsense Persona-grounded Dialogue Challenge

PeaCok

[ACL] [arXiv] [code]

PeaCoK: Persona Commonsense Knowledge for Consistent and Engaging Narratives
(ACL2023, Outstanding Paper Award)

ComFact

[EMNLP] [arXiv] [code]

ComFact: A Benchmark for Linking Contextual Commonsense Knowledge
(EMNLP2022 Findings)

III. Music & Cinematic Technologies


STARSS23

[arXiv] [Dataset]

STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events
(NeurIPS2023)

NeurIPS2023

BigVSAN Vocoder

[arXiv] [code] [demo]

BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network
(ICASSP2024)

ICASSP2024

Instr.-Agnostic Trans.

[arXiv]

Timbre-Trap: A Low-Resource Framework for Instrument-Agnostic Music Transcription
(ICASSP2024)

ICASSP2024

Vocal Restoration

[arXiv]

VRDMG: Vocal Restoration via Diffusion Posterior Sampling with Multiple Guidance
(ICASSP2024)

ICASSP2024

Zero-/Few-shot SELD

[arXiv]

Zero- and Few-shot Sound Event Localization and Detection
(ICASSP2024)

ICASSP2024

CLIPSep

[OpenReview] [arXiv] [code] [demo]

CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos
(ICLR2023)

hFT-Transformer

[arXiv] [code]

Automatic Piano Transcription with Hierarchical Frequency-Time Transformer
(ISMIR2023)

Audio Restoration: ViT-AE

[IEEE] [arXiv] [demo]

Extending Audio Masked Autoencoders Toward Audio Restoration
(WASPAA2023)

Diffiner

[ISCA] [arXiv] [code]

Diffiner: A Versatile Diffusion-based Generative Refiner for Speech Enhancement
(INTERSPEECH2023)

Automatic Music Tagging

[arXiv]

An Attention-based Approach To Hierarchical Multi-label Music Instrument Classification
(ICASSP2023)

Vocal Dereverberation

[arXiv] [demo]

Unsupervised Vocal Dereverberation with Diffusion-based Generative Models
(ICASSP2023)

Mixing Style Transfer

[arXiv] [code] [demo]

Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects
(ICASSP2023)

Music Transcription

[arXiv] [code] [demo]

DiffRoll: Diffusion-based Generative Music Transcription with Unsupervised Pretraining Capability
(ICASSP2023)

Singing Voice Vocoder

[arXiv] [demo]

Hierarchical Diffusion Models for Singing Voice Neural Vocoder
(ICASSP2023)

Distortion Effect Removal

[poster] [arXiv] [demo]

Distortion Audio Effects: Learning How to Recover the Clean Signal
(ISMIR2022)

Automatic Music Mixing

[poster] [arXiv] [code] [demo]

Automatic Music Mixing with Deep Learning and Out-of-Domain Data
(ISMIR2022)

Sound Separation

[IEEE]

Music Source Separation with Deep Equilibrium Models
(ICASSP2022)

Automatic DJ Transition

[arXiv] [code] [demo]

Automatic DJ Transitions with Differentiable Audio Effects and Generative Adversarial Networks
(ICASSP2022)

Sound Event Localization and Detection

[IEEE] [arXiv]

Multi-ACCDOA: Localizing and Detecting Overlapping Sounds from the Same Class with Auxiliary Duplicating Permutation Invariant Training
(ICASSP2022)

Singing Voice Conversion

[arXiv] [demo]

Robust One-Shot Singing Voice Conversion

Sound Separation

[video] [site]

Glenn Gould and Kanji Ishimaru 2021: A collaboration with AI Sound Separation after 60 years

MDX21

[site] [frontiers]

Music Demixing Challenge 2021

DCASE Challenge

[DCASE Challenge2023]

Sound Event Localization and Detection Evaluated in Real Spatial Sound Scenes

Contact

Yuki Mitsufuji (yuhki.mitsufuji@sony.com)