In the dimly lit backrooms of bars, where the air is thick with cigarette smoke and the remnants of last night’s beer and piss, there’s a peculiar magic happening. It’s here, amidst the cacophony of mistuned guitars and off-beat drums, where real musicianship is born. Not the sanitized, quantized beats that dominate our digital bedrooms or the large and empty halls of classical music rehearsals. No, only the raw, unfiltered essence of human expression.
Now, imagine a (not naming names here) generative music startup in the Bay Area. The musicians, with their hardened fingers and road-worn instruments, replaced by researchers whose hands are more accustomed to keyboards than fretboards. Happy faces, grindsets and Twitter (not X, fuck that) influencers.
Fuck that. I mean, don’t get me wrong, but still kinda fuck that.
The work being done in ML audio is nothing short of miraculous. We’re teaching machines to compose symphonies, to mimic the long-dead singers, to create soundscapes that have never before graced our human ears. It’s a brave new world of auditory wonder. And what do we want to use it for? Simple and controllable crap generation.
There’s a huge disconnect, a missing link in the chain of creation. The researchers, brilliant as they are, often lack the gritty, lived experience of music. They’ve studied music theory, sure. They can play an instrument or two, perhaps even quite well. But have they ever had to adapt a set list on the fly because the crowd is three drinks away from a bar brawl? Have they felt the pulse of a city through the soles of their feet as they trudge from gig to gig, instrument cases doubling as makeshift pillows?
These experiences, these hard-won insights into the very soul of music, are conspicuously absent from much of the current research. It’s as if we’re trying to teach machines to paint without ever having mixed pigments or felt the resistance of brush against canvas (which we’re also doing tbh).
Now, I’m not suggesting we need to send every ML researcher on a grueling tour of Europe’s dive bars (though that would make for an interesting grant proposal). But we do need to find a way to infuse this research with the raw, unfiltered and experimental essence of musical creation.
Perhaps what we need is a new kind of collaboration. Imagine a research team where the PhD in computer science works side by side with the grizzled noise rock veteran of a thousand underground gigs. Where the algorithms are fine-tuned not just by metrics and datasets, but by the intuition of those who’ve lived and breathed music in its purest form.
This isn’t just about making better ML models, though that would certainly be a welcome side effect. It’s about preserving the soul of music as we hurtle towards an increasingly digital, dishumanized future. It’s about ensuring that as we teach machines to create, we don’t lose sight of what makes music fundamentally human.
So, to my fellow musicians who’ve paid their dues in smoky bars and dingy clubs, I say this: your experience matters. Your insights are valuable. The world of ML audio research needs you, even if it sometimes doesn’t know it yet.
And to the brilliant minds pushing the boundaries of what’s possible in audio ML, I extend an invitation. Step out of the lab, away from the comfort of your datasets and models. Come down to the bars, the clubs, the street corners where music lives and breathes. Let’s bridge this gap together, and in doing so, create something truly revolutionary.
After all, isn’t that what both science and art are all about?