Google Scholar

User profiles for Zeqian Ju

Zeqian Ju

University of Science and Technology of China

Verified email at mail.ustc.edu.cn

Cited by 490

[PDF] arxiv.org

Telemelody: Lyric-to-melody generation with a template-based two-stage method

Z Ju, P Lu, X Tan, R Wang, C Zhang, S Wu… - arXiv preprint arXiv …, 2021 - arxiv.org

Lyric-to-melody generation is an important task in automatic songwriting. Previous lyric-to-melody
generation systems usually adopt end-to-end models that directly generate melodies …

Save Cite Cited by 31 Related articles All 3 versions View as HTML

[PDF] arxiv.org

Naturalspeech 2: Latent diffusion models are natural and zero-shot speech and singing synthesizers

K Shen, Z Ju, X Tan, Y Liu, Y Leng, L He, T Qin… - arXiv preprint arXiv …, 2023 - arxiv.org

Scaling text-to-speech (TTS) to large-scale, multi-speaker, and in-the-wild datasets is
important to capture the diversity in human speech such as speaker identities, prosodies, and …

Save Cite Cited by 87 Related articles All 3 versions View as HTML

[PDF] neurips.cc

Audit: Audio editing by following instructions with latent diffusion models

Y Wang, Z Ju, X Tan, L He, Z Wu… - Advances in Neural …, 2023 - proceedings.neurips.cc

Audio editing is applicable for various purposes, such as adding background sound effects,
replacing a musical instrument, and repairing damaged audio. Recently, some diffusion-…

Save Cite Cited by 20 Related articles All 4 versions View as HTML

[PDF] aclanthology.org

MedDialog: Large-scale medical dialogue datasets

G Zeng, W Yang, Z Ju, Y Yang, S Wang… - Proceedings of the …, 2020 - aclanthology.org

Medical dialogue systems are promising in assisting in telemedicine to increase access to
healthcare services, improve the quality of patient care, and reduce medical costs. To facilitate …

Save Cite Cited by 136 Related articles All 6 versions View as HTML

[PDF] arxiv.org

Musicbert: Symbolic music understanding with large-scale pre-training

M Zeng, X Tan, R Wang, Z Ju, T Qin, TY Liu - arXiv preprint arXiv …, 2021 - arxiv.org

Symbolic music understanding, which refers to the understanding of music from the symbolic
data (eg, MIDI format, but not audio), covers many music applications such as genre …

Save Cite Cited by 101 Related articles All 5 versions View as HTML

[PDF] arxiv.org

Naturalspeech 3: Zero-shot speech synthesis with factorized codec and diffusion models

Z Ju, Y Wang, K Shen, X Tan, D Xin, D Yang… - arXiv preprint arXiv …, 2024 - arxiv.org

While recent large-scale text-to-speech (TTS) models have achieved significant progress,
they still fall short in speech quality, similarity, and prosody. Considering speech intricately …

Save Cite Cited by 10 Related articles All 3 versions View as HTML

[PDF] arxiv.org

Prompttts 2: Describing and generating voices with text prompt

Y Leng, Z Guo, K Shen, X Tan, Z Ju, Y Liu, Y Liu… - arXiv preprint arXiv …, 2023 - arxiv.org

Speech conveys more information than just text, as the same word can be uttered in various
voices to convey diverse information. Compared to traditional text-to-speech (TTS) methods …

Save Cite Cited by 9 Related articles All 3 versions View as HTML

[PDF] arxiv.org

On the generation of medical dialogues for COVID-19

W Yang, G Zeng, B Tan, Z Ju, S Chakravorty… - arXiv preprint arXiv …, 2020 - arxiv.org

Under the pandemic of COVID-19, people experiencing COVID19-related symptoms or
exposed to risk factors have a pressing need to consult doctors. Due to hospital closure, a lot of …

Save Cite Cited by 29 Related articles All 19 versions View as HTML

[PDF] nsf.gov

[PDF][PDF] On the generation of medical dialogs for COVID-19

…, Z Li, B Tan, G Zeng, W Yang, X He, Z Ju… - Proceedings of the 59th …, 2021 - par.nsf.gov

Under the pandemic of COVID-19, people experiencing COVID19-related symptoms have a
pressing need to consult doctors. Because of the shortage of medical professionals, many …

Save Cite Cited by 17 Related articles All 9 versions View as HTML

[PDF] arxiv.org

RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis

D Xin, X Tan, K Shen, Z Ju, D Yang, Y Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

We present RALL-E, a robust language modeling method for text-to-speech (TTS) synthesis.
While previous work based on large language models (LLMs) shows impressive …

Save Cite Cited by 1 Related articles All 2 versions View as HTML

Create alert

Cite

Advanced search

Saved to My library

User profiles for Zeqian Ju

Zeqian Ju

Telemelody: Lyric-to-melody generation with a template-based two-stage method

Naturalspeech 2: Latent diffusion models are natural and zero-shot speech and singing synthesizers

Audit: Audio editing by following instructions with latent diffusion models

MedDialog: Large-scale medical dialogue datasets

Musicbert: Symbolic music understanding with large-scale pre-training

Naturalspeech 3: Zero-shot speech synthesis with factorized codec and diffusion models

Prompttts 2: Describing and generating voices with text prompt

On the generation of medical dialogues for COVID-19

[PDF][PDF] On the generation of medical dialogs for COVID-19

RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis