User profiles for Zeqian Ju

Zeqian Ju

University of Science and Technology of China
Verified email at mail.ustc.edu.cn
Cited by 490

Telemelody: Lyric-to-melody generation with a template-based two-stage method

Z Ju, P Lu, X Tan, R Wang, C Zhang, S Wu… - arXiv preprint arXiv …, 2021 - arxiv.org
Lyric-to-melody generation is an important task in automatic songwriting. Previous lyric-to-melody
generation systems usually adopt end-to-end models that directly generate melodies …

Naturalspeech 2: Latent diffusion models are natural and zero-shot speech and singing synthesizers

K Shen, Z Ju, X Tan, Y Liu, Y Leng, L He, T Qin… - arXiv preprint arXiv …, 2023 - arxiv.org
Scaling text-to-speech (TTS) to large-scale, multi-speaker, and in-the-wild datasets is
important to capture the diversity in human speech such as speaker identities, prosodies, and …

Audit: Audio editing by following instructions with latent diffusion models

Y Wang, Z Ju, X Tan, L He, Z Wu… - Advances in Neural …, 2023 - proceedings.neurips.cc
Audio editing is applicable for various purposes, such as adding background sound effects,
replacing a musical instrument, and repairing damaged audio. Recently, some diffusion-…

MedDialog: Large-scale medical dialogue datasets

G Zeng, W Yang, Z Ju, Y Yang, S Wang… - Proceedings of the …, 2020 - aclanthology.org
Medical dialogue systems are promising in assisting in telemedicine to increase access to
healthcare services, improve the quality of patient care, and reduce medical costs. To facilitate …

Musicbert: Symbolic music understanding with large-scale pre-training

M Zeng, X Tan, R Wang, Z Ju, T Qin, TY Liu - arXiv preprint arXiv …, 2021 - arxiv.org
Symbolic music understanding, which refers to the understanding of music from the symbolic
data (eg, MIDI format, but not audio), covers many music applications such as genre …

Naturalspeech 3: Zero-shot speech synthesis with factorized codec and diffusion models

Z Ju, Y Wang, K Shen, X Tan, D Xin, D Yang… - arXiv preprint arXiv …, 2024 - arxiv.org
While recent large-scale text-to-speech (TTS) models have achieved significant progress,
they still fall short in speech quality, similarity, and prosody. Considering speech intricately …

Prompttts 2: Describing and generating voices with text prompt

Y Leng, Z Guo, K Shen, X Tan, Z Ju, Y Liu, Y Liu… - arXiv preprint arXiv …, 2023 - arxiv.org
Speech conveys more information than just text, as the same word can be uttered in various
voices to convey diverse information. Compared to traditional text-to-speech (TTS) methods …

On the generation of medical dialogues for COVID-19

W Yang, G Zeng, B Tan, Z Ju, S Chakravorty… - arXiv preprint arXiv …, 2020 - arxiv.org
Under the pandemic of COVID-19, people experiencing COVID19-related symptoms or
exposed to risk factors have a pressing need to consult doctors. Due to hospital closure, a lot of …

[PDF][PDF] On the generation of medical dialogs for COVID-19

…, Z Li, B Tan, G Zeng, W Yang, X He, Z Ju… - Proceedings of the 59th …, 2021 - par.nsf.gov
Under the pandemic of COVID-19, people experiencing COVID19-related symptoms have a
pressing need to consult doctors. Because of the shortage of medical professionals, many …

RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis

D Xin, X Tan, K Shen, Z Ju, D Yang, Y Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
We present RALL-E, a robust language modeling method for text-to-speech (TTS) synthesis.
While previous work based on large language models (LLMs) shows impressive …