
CLC number:
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2024-05-24
Cited: 0
Clicked: 2408
Jiaxing YU, Songruoyao WU, Guanting LU, Zijin LI, Li ZHOU, Kejun ZHANG. Suno: potential, prospects, and trends[J]. Frontiers of Information Technology & Electronic Engineering,in press.https://doi.org/10.1631/FITEE.2400299 @article{title="Suno: potential, prospects, and trends", %0 Journal Article TY - JOUR
Suno:潜力、前景与趋势1浙江大学计算机科学与技术学院,中国杭州市,310027 2中央音乐学院音乐人工智能与音乐信息科技系,中国北京市,100031 3中国地质大学(武汉)艺术与传媒学院,中国武汉市,430074 4浙江大学长三角智慧绿洲创新中心,中国嘉兴市,314100 摘要:Suno因其出色的音乐生成能力受到广泛关注,其不仅展现了音乐人工智能技术的进步,也为音乐创作开辟了新的可能,是音乐人工智能生成发展的一个里程碑。本文介绍音乐人工智能生成的技术背景,总结音乐人工智能生成的通用技术框架,分析Suno的优势和局限,并讨论音乐人工智能的未来趋势。 关键词组: Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article
Reference[1]Agostinelli A, Denk TI, Borsos Z, et al., 2023. MusicLM: generating music from text. https://arxiv.org/abs/2301.11325 ![]() [2]Al-Rfou R, Choe D, Constant N, et al., 2019. Character-level language modeling with deeper self-attention. 33rd AAAI Conf on Artificial Intelligence, p.3159-3166. ![]() [3]Ao JY, Wang R, Zhou L, et al., 2022. SpeechT5: unified-modal encoder-decoder pre-training for spoken language processing. Proc 60th Annual Meeting of the Association for Computational Linguistics, p.5723-5738. ![]() [4]Brown TB, Mann B, Ryder N, et al., 2020. Language models are few-shot learners. Proc 34th Int Conf on Neural Information Processing Systems, Article 159. ![]() [5]Coldewey D, 2022. Try Riffusion, an AI Model That Composes Music by Visualizing It. https://techcrunch.com/2022/12/15/try-riffusion-an-ai-model-that-composes-music-by-visualizing-it/ [Accessed on Apr. 6, 2024]. ![]() [6]Copet J, Kreuk F, Gat I, et al., 2023. Simple and controllable music generation. Proc 37th Int Conf on Neural Information Processing Systems, Article 2066. ![]() [7]Dai ZH, Yang ZL, Yang YM, et al., 2019. Transformer-XL: attentive language models beyond a fixed-length context. Proc 57th Conf of the Association for Computational Linguistics, p.2978-2988. ![]() [8]Dhariwal P, Jun H, Payne C, et al., 2020. Jukebox: a generative model for music. https://arxiv.org/abs/2005.00341 ![]() [9]Freyberg K, 2024. Introducing v3. https://www.suno.ai/blog/v3 [Accessed on Apr. 6, 2024]. ![]() [10]Hochreiter S, Schmidhuber J, 1997. Long short-term memory. Neur Comput, 9(8):1735-1780. ![]() [11]Hsiao WY, Liu JY, Yeh YC, et al., 2021. Compound Word Transformer: learning to compose full-song music over dynamic directed hypergraphs. 35th AAAI Conf on Artificial Intelligence, p.178-186. ![]() [12]Huang CZA, Vaswani A, Uszkoreit J, et al., 2019. Music Transformer: generating music with long-term structure. 7th Int Conf on Learning Representations. ![]() [13]Huang QQ, Park DS, Wang T, et al., 2023. Noise2Music: text-conditioned music generation with diffusion models. https://arxiv.org/abs/2302.03917 ![]() [14]Huang YS, Yang YH, 2020. Pop Music Transformer: beat-based modeling and generation of expressive pop piano compositions. Proc 28th ACM Int Conf on Multimedia, p.1180-1188. ![]() [15]Kreuk F, Synnaeve G, Polyak A, et al., 2023. AudioGen: textually guided audio generation. 11th Int Conf on Learning Representations. ![]() [16]Liu HH, Chen ZH, Yuan Y, et al., 2023. AudioLDM: text-to-audio generation with latent diffusion models. Proc 40th Int Conf on Machine Learning, p.21450-21474. ![]() [17]O’Boyle M, 2023. (Re)Discovering Music Theory: AI Algorithm Learns the Rules of Musical Composition and Provides a Framework for Knowledge Discovery. https://csl.illinois.edu/news-and-media/rediscovering-music-theory-ai-algorithm-learns-the-rules-of-musical-composition-and-provides-a-framework-for-knowledge-discovery [Accessed on Apr. 6, 2024]. ![]() [18]Ouyang L, Wu J, Jiang X, et al., 2022. Training language models to follow instructions with human feedback. Proc 36th Int Conf on Neural Information Processing Systems, Article 2011. ![]() [19]Ren Y, He JZ, Tan X, et al., 2020. PopMAG: pop music accompaniment generation. Proc 28th ACM Int Conf on Multimedia, p.1198-1206. ![]() [20]Ren Y, Hu CX, Tan X, et al., 2021. FastSpeech 2: fast and high-quality end-to-end text to speech. 9th Int Conf on Learning Representations. ![]() [21]Touvron H, Martin L, Stone K, et al., 2023. Llama 2: open foundation and fine-tuned chat models. https://arxiv.org/abs/2307.09288 ![]() [22]Vaswani A, Shazeer N, Parmar N, et al., 2017. Attention is all you need. Proc 31st Int Conf on Neural Information Processing Systems, p.6000-6010. ![]() [23]Wu J, Liu XG, Hu XL, et al., 2020. PopMNet: generating structured pop music melodies using neural networks. Artif Intell, 286:103303. ![]() [24]Wu XD, Huang ZJ, Zhang KJ, et al., 2024. MelodyGLM: multi-task pre-training for symbolic melody generation. https://arxiv.org/abs/2309.10738 ![]() [25]Yu HZ, Varshney LR, Taube H, et al., 2022. (Re)Discovering laws of music theory using information lattice learning. IEEE BITS Inform Theory Mag, 2(1):58-75. ![]() [26]Yuan RB, Lin HF, Wang Y, et al., 2024. ChatMusician: understanding and generating music intrinsically with LLM. https://arxiv.org/abs/2402.16153 ![]() [27]Zeng ML, Tan X, Wang R, et al., 2021. MusicBERT: symbolic music understanding with large-scale pre-training. Findings of the Association for Computational Linguistics, p.791-800. ![]() [28]Zhou J, Ke P, Qiu XP, et al., 2023. ChatGPT: potential, prospects, and limitations. Front Inform Technol Electron Eng, early access. ![]() [29]Zou Y, Zou P, Zhao Y, et al., 2022. MELONS: generating melody with long-term structure using transformers and structure graph. IEEE Int Conf on Acoustics, Speech and Signal Processing, p.191-195. ![]() Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou
310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn Copyright © 2000 - 2026 Journal of Zhejiang University-SCIENCE | ||||||||||||||


ORCID:
Open peer comments: Debate/Discuss/Question/Opinion
<1>