User profiles for Zeqian Ju
Zeqian JuUniversity of Science and Technology of China Verified email at mail.ustc.edu.cn Cited by 490 |
Telemelody: Lyric-to-melody generation with a template-based two-stage method
Lyric-to-melody generation is an important task in automatic songwriting. Previous lyric-to-melody
generation systems usually adopt end-to-end models that directly generate melodies …
generation systems usually adopt end-to-end models that directly generate melodies …
Naturalspeech 2: Latent diffusion models are natural and zero-shot speech and singing synthesizers
Scaling text-to-speech (TTS) to large-scale, multi-speaker, and in-the-wild datasets is
important to capture the diversity in human speech such as speaker identities, prosodies, and …
important to capture the diversity in human speech such as speaker identities, prosodies, and …
Audit: Audio editing by following instructions with latent diffusion models
Audio editing is applicable for various purposes, such as adding background sound effects,
replacing a musical instrument, and repairing damaged audio. Recently, some diffusion-…
replacing a musical instrument, and repairing damaged audio. Recently, some diffusion-…
MedDialog: Large-scale medical dialogue datasets
Medical dialogue systems are promising in assisting in telemedicine to increase access to
healthcare services, improve the quality of patient care, and reduce medical costs. To facilitate …
healthcare services, improve the quality of patient care, and reduce medical costs. To facilitate …
Musicbert: Symbolic music understanding with large-scale pre-training
Symbolic music understanding, which refers to the understanding of music from the symbolic
data (eg, MIDI format, but not audio), covers many music applications such as genre …
data (eg, MIDI format, but not audio), covers many music applications such as genre …
Naturalspeech 3: Zero-shot speech synthesis with factorized codec and diffusion models
While recent large-scale text-to-speech (TTS) models have achieved significant progress,
they still fall short in speech quality, similarity, and prosody. Considering speech intricately …
they still fall short in speech quality, similarity, and prosody. Considering speech intricately …
Prompttts 2: Describing and generating voices with text prompt
Speech conveys more information than just text, as the same word can be uttered in various
voices to convey diverse information. Compared to traditional text-to-speech (TTS) methods …
voices to convey diverse information. Compared to traditional text-to-speech (TTS) methods …
On the generation of medical dialogues for COVID-19
Under the pandemic of COVID-19, people experiencing COVID19-related symptoms or
exposed to risk factors have a pressing need to consult doctors. Due to hospital closure, a lot of …
exposed to risk factors have a pressing need to consult doctors. Due to hospital closure, a lot of …
[PDF][PDF] On the generation of medical dialogs for COVID-19
Under the pandemic of COVID-19, people experiencing COVID19-related symptoms have a
pressing need to consult doctors. Because of the shortage of medical professionals, many …
pressing need to consult doctors. Because of the shortage of medical professionals, many …
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
We present RALL-E, a robust language modeling method for text-to-speech (TTS) synthesis.
While previous work based on large language models (LLMs) shows impressive …
While previous work based on large language models (LLMs) shows impressive …