Publications

Record Once, Post Everywhere: Automatic Shortening of Audio Stories for Social Media

Bryan Wang, Zeyu Jin, Gautham J. Mysore,

ACM Symposium on User Interface Software and Technology (UIST), 2022

paper | webpage

Emotion Embedding Spaces for Matching Music to Stories

Minz Won, Justin Salamon, Nicholas J. Bryan, Gautham J. Mysore, Xavier Serra

Society for Music Information Retrieval Conference (ISMIR), 2021 paper

Best Student Paper Award

Audio-Based Music Structure Analysis: Current Trends, Open Challenges, and Applications

Oriol Nieto, Gautham J. Mysore, Cheng-i Wang, Jordan B. L. Smith, Jan Schlüter, Thomas Grill, Brian McFee

Transactions of the International Society for Music Information Retrieval (TISMIR), 2020 paper

Controllable Neural Prosody Synthesis

Max Morrison, Zeyu Jin, Justin Salamon, Nicholas J. Bryan, Gautham J. Mysore

Interspeech, 2020 paper

A Differentiable Perceptual Audio Metric Learned from Just Noticeable Differences

Pranay Manocha, Adam Finkelstein, Richard Zhang, Nicholas J. Bryan, Gautham J. Mysore, Zeyu Jin

Interspeech, 2020 paper | webpage

Nominated for a Best Student Paper Award

Recent Advances in Music Signal Processing

Meinard Muller, Bryan Pardo, Gautham J. Mysore, Vesa Valimiki

IEEE Signal Processing Magazine, 2019 paper

B-Script: Transcript-based B-roll Video Editing with Recommendations

Bernd Huber, Hijung Shin, Bryan Russell, Oliver Wang, Gautham J. Mysore

ACM Conference on Human Factors in Computing Systems (CHI), 2019

paper | webpage | blog post

VoiceAssist: Guiding Users to High-Quality Voice Recordings

Prem Seetharaman, Gautham J. Mysore, Bryan Pardo, Paris Smaragdis, Celso Gomes

ACM Conference on Human Factors in Computing Systems (CHI), 2019 paper

Blind Estimation of the Speech Transmission Index for Speech Quality Prediction

Prem Seetharaman, Gautham J. Mysore, Paris Smaragdis, Bryan Pardo

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018 paper

FFTNet: A Real-time Speaker-dependent Neural Vocoder

Zeyu Jin, Adam Finkelstein, Gautham J. Mysore, Jingwan Lu

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018 paper

Crowdsourced Pairwise-comparison for Source Separation Evaluation

Mark Cartwright, Bryan Pardo, Gautham J. Mysore,

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018 paper

LoopMaker: Automatic Creation of Music Loops from Pre-recorded Music

Zhengshan Shi, Gautham J. Mysore

ACM Conference on Human Factors in Computing Systems (CHI), 2018 paper

MedleyAssistant – A System for Personalized Music Medley Creation

Zhengshan Shi, Gautham J. Mysore,

ACM IUI Workshop on Intelligent Music Interfaces for Listening and Creation (MILC), 2018 paper

Re-visiting the Music Segmentation Problem with Crowdsourcing

Cheng-i Wang, Gautham J. Mysore, Shlomo Dubnov

International Society of Music Information Retrieval Conference (ISMIR), 2017

paper

Dynamic Non-negative Models for Audio Source Separation

Paris Smaragdis, Gautham J. Mysore, Nasser Mohammadiha

Book Chapter in Audio Source Separation, Springer, 2018

Temporal extensions of Nonnegative Matrix Factorization

Cédric Févotte, Paris Smaragdis, Nasser Mohammadiha ,Gautham J. Mysore,

Book Chapter in Audio Source Separation and Speech Enhancement, Wiley, 2018

AutoDub: Automatic Redubbing for Voiceover Editing

Shrikant Venkataramani, Paris Smaragdis, Gautham J. Mysore, “

ACM Symposium on User Interface Software and Technology (UIST), 2017 paper

VoCo: Text-based Insertion and Replacement in Audio Narration

Zeyu Jin, Gautham J. Mysore, Stephen DiVerdi, Jingwan Lu, Adam Finkelstein

Proceedings of SIGGRAPH, 2017 paper | webpage | demo video | Adobe MAX video Extensive Press Coverage

Eulerian Video Magnification and Analysis

Neal Wadhwa, Hao-Yu Wu, Abe Davis, Michael Rubinstein, Eugene Shih, Gautham J. Mysore, Justin G. Chen, Oral Buyukozturk, John V. Guttag, William T. Freeman, and Frédo Durand

Communications of the ACM, January 2017 paper

Analysis of Prosody Increment Induced by Pitch Accents for Automatic Emphasis Correction

Yang Zhang, Gautham J. Mysore, , Floraine Berthouzoz, Mark Hasegawa-Johnson

Speech Prosody, 2016 paper

Equalization Matching of Speech Recordings in Real-World Environments

François G. Germain, Gautham J. Mysore, Takako Fujioka

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2016 paper

Structural Segmentation with the Variable Markov Oracle and Boundary Adjustment

Cheng-i Wang, Gautham J. Mysore

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2016 paper

Fast and Easy Crowdsourced Perceptual Audio Evaluation

Mark Cartwright, Bryan Pardo, Gautham J. Mysore, Matt Hoffman

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2016 paper

CUTE: A Concatenative Method for Voice Conversion using Exemplar based Unit Selection

Zeyu Jin, Adam Finkelstein, Stephen DiVerdi, Jingwan Lu, Gautham J. Mysore, “

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2016 paper

Capture-Time Feedback for Recording Scripted Narration

Steve Rubin, Floraine Berthouzoz, Gautham J. Mysore, Maneesh Agrawala

ACM Symposium on User Interface Software and Technology (UIST), 2015 paper | webpage | video | audio results

Speaker and Noise Independent Online Single Channel Speech Enhancement

François G. Germain,, Gautham J. Mysore

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2015 paper

Speech Dereverberation using a Learned Speech Model

Dawen Liang, Matthew D. Hoffman, Gautham J. Mysore

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2015 paper

Efficient Manifold Preserving Audio Source Separation using Locality Sensitive Hashing

Minje Kim, Paris Smaragdis, Gautham J. Mysore, “

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2015 paper

Lamello: Passive Acoustic Sensing for Tangible Input Components

Valkyrie Savage, Andrew Head, Björn Hartmann, Dan Goldman, Gautham J. Mysore, Wilmot Li

ACM Conference on Human Factors in Computing Systems (CHI), 2015

paper | video

Can We Automatically Transform Speech Recorded on Common Consumer Devices in Real-World Environments into Professional Production Quality Speech? - A Dataset, Insights, and Challenges

Gautham J. Mysore

IEEE Signal Processing Letters, August 2015 paper | webpage | dataset

The Visual Microphone: Passive Recovery of Sound from Video

Abe Davis, Michael Rubinstein, Neal Wadhwa, Gautham J. Mysore, Frédo Durand, William T. Freeman

SIGGRAPH, 2014 paper | webpage | video

Extensive Press Coverage

Stopping Criteria for Non-negative Matrix Factorization Based Supervised and Semi-Supervised Source Separation

François G. Germain, Gautham J. Mysore

IEEE Signal Processing Letters, October 2014 paper

Exploiting Long-Term Temporal Dependencies in NMF using Recurrent Neural Networks with Application to Source Separation

Nicolas Boulanger-Lewandowski, Gautham J. Mysore, Matthew Hoffman

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2014 paper

Speech Decoloration based on the Product-of-Filters Model

Dawen Liang, Daniel P. W. Ellis, Matthew Hoffman, Gautham J. Mysore

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2014 paper

Static and Dynamic Source Separation Using Nonnegative Factorizations: A unified view

Paris Smaragdis, Cédric Févotte, Gautham J. Mysore, Nasser Mohammadiha, Matthew Hoffman

IEEE Signal Processing Magazine Special Issue on Source Separation and Applications, May 2014 paper

A Generative Product of Filter Model of Audio

Dawen Liang, Matthew Hoffman, Gautham J. Mysore

International Conference on Learning Representations (ICLR), 2014 paper | code

ISSE: An Interactive Source Separation Editor

Nicholas J. Bryan, Gautham J. Mysore, Ge Wang

ACM Conference on Human Factors in Computing Systems (CHI), 2014

paper | webpage | demo video | demos | code

Source Separation of Polyphonic Music with Interactive User-Feedback on a Piano Roll Display

Nicholas J. Bryan, Gautham J. Mysore, Ge Wang

International Society of Music Information Retrieval Conference (ISMIR), 2013 paper

Combining Modeling of Singing Voice and Background Music for Automatic Separation of Musical Mixtures

Zafar Rafii, François G. Germain, Dennis L. Sun, Gautham J. Mysore

International Society of Music Information Retrieval Conference (ISMIR), 2013 paper

Content-Based Tools for Editing Audio Stories

Steve Rubin, Floraine Berthouzoz, Gautham J. Mysore, Wilmot Li, Maneesh Agrawala

ACM Symposium on User Interface Software and Technology (UIST), 2013

Speaker and Noise Independent Voice Activity Detection

François G. Germain, Dennis L. Sun, Gautham J. Mysore

Interspeech, 2013 paper

Best Student Paper Award

An Efficient Posterior Regularized Latent Variable Model for Interactive Source Separation

Nicholas J. Bryan, Gautham J. Mysore

International Conference on Machine Learning (ICML), 2013

AES Student Design Competition Gold Award

paper | webpage | demo video | demos | code

Universal Speech Models for Speaker Independent Single Channel Source Separation

Dennis L. Sun, Gautham J. Mysore

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2013 paper

Interactive Refinement of Supervised and Semi-supervised Sound Source Separation Estimates

Nicholas J. Bryan, Gautham J. Mysore

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2013 paper | webpage | sound examples | code

UnderScore: Musical Underlays for Audio Stories

Steve Rubin, Floraine Berthouzoz, Gautham J. Mysore, Wilmot Li, Maneesh Agrawala

ACM Symposium on User Interface Software and Technology (UIST), 2012

paper | webpage | video | code

Language Informed Bandwidth Expansion

Jinyu Han, Gautham J. Mysore, Bryan Pardo

IEEE International Workshop on Machine Learning for Signal Processing (MLSP), 2012 paper

Speech Enhancement by Online Non-negative Spectrogram Decomposition in Non-stationary Noise Environments

Zhiyao Duan, Gautham J. Mysore, Paris Smaragdis

Interspeech, 2012 paper

Variational Inference in Non-negative Factorial Hidden Markov Models for Efficient Audio Source Separation

Gautham J. Mysore, Maneesh Sahani

International Conference on Machine Learning (ICML), 2012 paper

Following Musical Sources by Example

Paris Smaragdis, Gautham J. Mysore

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2012 paper

Clustering and Synchronizing Multi-camera Video via Landmark Cross-correlation

Nicholas J. Bryan, Paris Smaragdis, Gautham J. Mysore

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2012 paper

Noise-Robust Dynamic Time Warping Using PLCA Features

Brian King, Paris Smaragdis, Gautham J. Mysore

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2012 paper

A Non-negative Approach to Language Informed Speech Separation

Gautham J. Mysore, Paris Smaragdis

International Conference on Latent Variable Analysis and Signal Separation

(LVA / ICA), 2012 paper | webpage | sound examples

Audio Imputation Using the Non-negative Hidden Markov Model

Jinyu Han, Gautham J. Mysore, Bryan Pardo

International Conference on Latent Variable Analysis and Signal Separation

(LVA / ICA), 2012 paper | webpage

Sound Recognition in Mixtures

Juhan Nam, Gautham J. Mysore, Paris Smaragdis

International Conference on Latent Variable Analysis and Signal Separation

(LVA / ICA), 2012 paper

Online PLCA for Real-Time Semi-supervised Source Separation

Zhiyao Duan, Gautham J. Mysore, Paris Smaragdis

International Conference on Latent Variable Analysis and Signal Separation

(LVA / ICA), 2012 paper

A Convolutive Spectral Decomposition Approach to the Separation of Feedback from Target Speech

Gautham J. Mysore, Paris Smaragdis

IEEE International Workshop on Machine Learning for Signal Processing (MLSP), 2011 paper

A Non-negative Approach to Semi-supervised Separation of Speech from Noise with the use of Temporal Dynamics

Gautham J. Mysore, Paris Smaragdis

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2011 paper

Non-negative Hidden Markov Modeling of Audio with Application to Source Separation

Gautham J. Mysore, Paris Smaragdis, Bhiksha Raj

International Conference on Latent Variable Analysis and Signal Separation

(LVA / ICA), 2010 paper | webpage | sound examples

Best Student Paper Award

A Super-Resolution Spectrogram Using Coupled PLCA

Juhan Nam, Gautham J. Mysore, Joachim Ganseman, Kyogu Lee, Jonathan S. Abel, Interspeech, 2010 paper | webpage

Evaluation of a Score-Informed Source Separation System

Joachim Ganseman, Paul Scheunders, Gautham J. Mysore, Jonathan S. Abel

International Society of Music Information Retrieval Conference (ISMIR), 2010

paper | webpage

Source Separation by Score Synthesis

Joachim Ganseman, Paul Scheunders, Gautham J. Mysore, Jonathan S. Abel

International Computer Music Conference (ICMC), 2010 paper | webpage

"Separation by Humming”: User Guided Sound Extraction from Monophonic Mixtures

Paris Smaragdis, Gautham J. Mysore

IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2009 paper

Relative Pitch Estimation of Multiple Instruments

Gautham J. Mysore, Paris Smaragdis

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2009 paper | webpage

Probabilistic Factorization of Non-Negative Data with Entropic Co-occurrence Constraints

Paris Smaragdis, Madhusudana Shashanka, Bhiksha Raj, Gautham J. Mysore

International Conference on Independent Component Analysis and Signal Separation (ICA), 2009 paper

Singer-Dependent Falsetto Detection for Live Vocal Processing Based on Support Vector Classification

Gautham J. Mysore, Ryan J. Cassidy, Julius O. Smith III

IEEE Asilomar Conference on Signals, Systems, and Computers, 2006 paper

SCUBA: The Self-Contained Unified Bass Augmenter

Juan-Pablo Cáceres, Gautham J. Mysore, Jeffrey Treviño

International Conference on New Interfaces for Musical Expression (NIME), 2005 paper

Ph.D. Thesis

A Non-negative Framework for Joint Modeling of Spectral Structure and Temporal Dynamics in Sound Mixtures

Advisor: Julius O. Smith III

Reading Committee: Paris Smaragdis, Malcolm Slaney, Robert Tibshirani

Stanford University. June 2010

thesis | webpage

Patents

Sound Quality Prediction and Interface to Facilitate High Quality Voice Recordings  

Prem Seetharaman, Gautham J. Mysore, Bryan Pardo
U.S. Patent #11138989 issued in October 2021

Transcript-based Insertion of Secondary Video Content into Primary Video Content

Bernd Huber, Hijung Shin, Bryan Russell, Oliver Wang, Gautham J. Mysore

U.S. Patent #11049525 issued in June 2021

Real-time Speaker Dependent Neural Vocoder

Zeyu Jin, Adam Finkelstein, Gautham J. Mysore, Jingwan Lu

U.S. Patent #10770063 issued in September 2020

Time Interval Sound Alignment

Brian King, Gautham J. Mysore, Paris Smaragdis

U.S. Patent #10638221 issued in April 2020

Generating Audio Loops from an Audio Track

Zhengshan Shi, Gautham J. Mysore

U.S. Patent #10460763 issued in October 2019

Automatic Voiceover Correction System

Shrikant Venkataramani, Paris Smaragdis, Gautham J. Mysore

U.S. Patent #10453475 issued in October 2019

Text-based Insertion and Replacement in Audio Narration

Zeyu Jin, Gautham J. Mysore, Stephen DiVerdi, Jingwan Lu, Adam Finkelstein

U.S. Patent #10347238 issued in July 2019

Variable Sound Decomposition Masks

Gautham J. Mysore, Paris Smaragdis

U.S. Patent #10262680 issued in April 2019

Sound Rate Modification

Brian King, Gautham J. Mysore, Paris Smaragdis

U.S. Patent #10249321 issued in April 2019

Sound Processing using a Product-of-Filters Model

Dawen Liang, Matthew Douglas Hoffman, Gautham J. Mysore

U.S. Patent #10176818 issued in January 2019

Reverberation Matching of Speech

Ramin Anushiravani, Paris Smaragdis, Gautham J. Mysore

U.S. Patent #10069028 issued in September 2018

Intuitive Music Visualization using Efficient Structural Segmentation

Cheng-i Wang, Gautham J. Mysore

U.S. Patent#10074350 issued in September 2018

Irregular Pattern Identification using Landmark based Convolution

Minje Kim, Paris Smaragdis, Gautham J. Mysore

U.S. Patent #10002622 issued in June 2018

Performance Metric Based Stopping Criteria for Iterative Algorithms

François G. Germain, Gautham J. Mysore

U.S. Patent #9866954 issued in January 2018

Automatic Emphasis of Spoken Words

Yang Zhang, Gautham J. Mysore, Floraine Berthouzoz

U.S. Patent #9852743 issued in December 2017

Irregularity Detection in Music

Minje Kim, Gautham J. Mysore, Paris Smaragdis, Peter Merrill

U.S. Patent #9734844 issued in in August 2017

Non-negative Matrix Factorization Regularized by Recurrent Neural Networks for Audio Processing

Nicolas Boulanger-Lewandowski, Gautham J. Mysore, Matthew Hoffman

U.S. patent #9721202 issued in August 2017

Dereverberation Using a Learned Speech Model

Dawen Liang, Matthew D. Hoffman, Gautham J. Mysore

U.S. Patent #9607627 issued in March 2017

Acoustic Matching and Splicing of Sound Tracks

François G. Germain, Gautham J. Mysore

U.S. Patent #9601124 issued in March 2017

Automatic Detection of Dense Ornamentation in Music

Minje Kim, Gautham J. Mysore, Paris Smaragdis, Peter Merrill

U.S. Patent #9514722 issued in December 2016

Sound Feature Priority Alignment

Brian King, Gautham J. Mysore, Paris Smaragdis

U.S. Patent #9451304 issued in September 2016

Pattern Matching of Sound Data using Hashing

Minje Kim, Paris Smaragdis, Gautham J. Mysore

U.S. Patent #9449085 issued in September 2016

General Sound Decomposition Models

Dennis L. Sun, Gautham J. Mysore

U.S. Patent #9437208 issued in September 2016

Sound Alignment Using Timing Information

Brian King, Gautham J. Mysore, Paris Smaragdis

U.S. Patent #9355649 issued in May 2016

Multichannel Sound Source Identification and Localization

Minje Kim, Gautham J. Mysore, Paris Smaragdis

U.S. Patent #9351093 issued in May 2016

Joint Sound Model Generation Techniques

Dennis L. Sun, Gautham J. Mysore

U.S. Patent #9318106 issued in April 2016

Sound Alignment User Interface

Brian King, Gautham J. Mysore, Paris Smaragdis

U.S Patent #9201580 issued in December 2015

Sound Mixture Recognition

Gautham J. Mysore, Paris Smaragdis, Juhan Nam

U.S Patent #9165565 issued in October 2015

Feature Estimation in Sound Sources

Paris Smaragdis, Gautham J. Mysore

U.S Patent #8965832 issued in February 2015

User-Guided Audio Selection from Complex Sound Mixtures

Paris Smaragdis, Gautham J. Mysore

U.S Patent #8954175 issued in February 2015

Clustering and Synchronizing Content

Nicholas J. Bryan, Paris Smaragdis, Gautham J. Mysore

U.S Patent #8924345 issued in December 2014

Language Informed Source Separation

Gautham J. Mysore, Paris Smaragdis

U.S Patent #8843364 issued in September 2014

Semi-supervised Source Separation using Non-negative Techniques

Gautham J. Mysore, Paris Smaragdis

U.S Patent #8812322 issued in August 2014

Noise Robust Template Matching

Gautham J. Mysore, Paris Smaragdis, Brian King

U.S Patent #8775167 issued in July 2014

System and Method for Acoustic Echo Cancellation using Spectral Decomposition

Paris Smaragdis, Gautham J. Mysore

U.S. Patent #8724798 issued in May 2014

Non-negative Hidden Markov Modeling of Signals

Gautham J. Mysore, Paris Smaragdis

U.S. Patent #8554553 issued in October 2013

Method and Apparatus for Relative Pitch Tracking of Multiple Arbitrary Sounds

Paris Smaragdis, Gautham J. Mysore
U.S. Patent #8380331 issued in February 2013

Publications

Ph.D. Thesis

Patents​

​

Patents