AI Music Lecture Series|To Build an Experience: From Technical Paradigms to User-Centered Audio AI System Design

Theme: To Build an Experience: From Technical Paradigms to User-Centered Audio AI System Design
Language: Chinese
Speaker: Yongyi Zang(Director of Research and Labs,Smule)
Host: Jin Ping
Date&Time: Oct. 31, 2025 (Fri.) 10:30
Venue: Recital Hall 102, Academic Office Building (MUS) (2 International University Park Road, Longgang District, Shenzhen)

Admission Free.
Please scan the QR code to register.
*Once the registration is submitted, it will be considered successful. No separate confirmation will be sent; please stay informed of event updates. Please present the event poster at the entrance for campus access.
Notice: Children under 1.2m are not admitted
Abstract
From the telegraph to the phonograph, the history of technological progress reveals a fundamental question: the balance between novelty and reliability. In today's audio AI field, this question resurfaces in new forms: science pursues technical novelty and reproducibility, engineering focuses on robustness and scalability, and art emphasizes uniqueness and authenticity in expression; yet these three operate in isolation, struggling to communicate. In his seminal paper "The Bitter Lesson," Richard Sutton argues that given sufficient data and compute, hand-crafted feature engineering is often less effective than end-to-end learning. Audio AI follows this pattern: treating audio as images (diffusion models) or text sequences (language models), paired with excellent encoders and decoders, sequence modeling methods, and control mechanisms, yields impressive technical metrics. But the real challenges lie elsewhere. Do our control mechanisms actually work? Are models truly aligned with human perception? Are they "listening" or "guessing"? When reward models assign high scores yet users report audio quality issues, where does the problem lie? More fundamentally, scientists say "achieved state-of-the-art performance metrics," engineers say "88% speed improvement," and artists say "poor sonic quality." These three languages face an essential communication barrier. If we don't start from technology and start from experience, first clarify what experience you want to build, then translate that requirement into languages that science, engineering, and art can each understand and execute, then perhaps we can truly create impactful technology products.
About the Speaker
Yongyi Zang
Director of Research and Labs, Smule Inc.
Yongyi Zang is Director of Research and Labs at Smule Inc., a leading global social music platform. Based in Seattle, Washington, Mr. Zang leads the company's research initiatives at the intersection of music technology, machine learning, and social interaction. Mr. Zang received his undergraduate degree from the University of Rochester, where he conducted research under the mentorship of Professor Zhiyao Duan in the Audio Information Research Laboratory. His work has been recognized with the Best Paper Award at ISMIR 2025 (International Society for Music Information Retrieval), one of the premier conferences in music technology research. Prior to his current role, Mr. Zang gained substantial industry experience in China's popular music sector and established a significant presence as a digital content creator, bridging academic research with practical applications in the music industry. His research focuses on developing novel approaches to music understanding, generation, and interactive music experiences.