
MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) has unveiled an AI system capable of producing human-like vocal imitations of sounds without prior training. Inspired by cognitive science, the model mimics how humans communicate through vocalizations, simulating sounds like ambulance sirens, animal calls, and motorboats.
According to MIT News, the co-lead authors — MIT CSAIL PhD students Kartik Chandra SM ’23 and Karima Ma, and undergraduate researcher Matthew Caren — note that computer graphics researchers have long recognized that realism is rarely the ultimate goal of visual expression. For example, an abstract painting or a child’s crayon doodle can be just as expressive as a photograph.
“Over the past few decades, advances in sketching algorithms have led to new tools for artists, advances in AI and computer vision, and even a deeper understanding of human cognition,” notes Chandra. “In the same way that a sketch is an abstract, non-photorealistic representation of an image, our method captures the abstract, non-phono–realistic ways humans express the sounds they hear. This teaches us about the process of auditory abstraction.
The AI uses a model of the human vocal tract to replicate the way vibrations from the voice box are shaped by the throat, tongue, and lips. It also factors in how humans emphasize distinctive sound features, such as imitating a motorboat’s engine rumble.
In tests, human judges favored the AI’s imitations 25% of the time and up to 75% for specific sounds, like motorboats. Researchers see potential applications in sound design, virtual reality, and language learning.
The system still faces challenges with certain sounds, like buzzing or speech, and researchers aim to refine it further. Their work, supported by the Hertz Foundation and NSF, was presented at SIGGRAPH Asia in December.





Leave a comment