Getting started with Machine Learning for Audio
Getting started with Machine Learning for Audio. Incorporating deep learning and making your own VSTs and Sound Mangling plugins is great fun! If I had to start learning machine learning all over again for use in audio plugins and audio programming, this is the order I’d do it in! Here are some resources you can check out to dip your toe into this daunting field. (no generating ai music in this post!)
I'll post the links at the bottom, and there are a few special mentions along the way, but this thread boils down into 5 steps
Structure
1. Classifier
2. Generator
3. Adaptation
4. Implementation
5. Special mentions
1. Classifiers
Teachable machine is a classifier that you can build with a few examples, and it represents the entire data pipeline of deep learning. You can play with the tool to see how accurate it is, or you can export a tflite model for use in different fields. Personally, I wouldn't get too stuck into Tensorflow lite at this stage. Audio is CRIMINALLY under represented in tutorials and, while the resources for building your own classifier exist, it generally too deep to get started, leading to loads of READING before you use it much.
Instead, you're probably saying, "yeah cool, when would I ever use a classifier", but actually, there are thousands of applications, especially in acoustics, biological sciences, ecological studies and even sound design/dialogue cleaning...the list of applications goes on.
For example, I used teachable machine to build a classifier that detects and removes "ums" and "ahs" by pairing it with some python libraries, running it to get timestamps and then doing some stitching. If you're not sure on that stuff, just use it to dummy check your work!
From here, you probably want to start generating, or at least modifying audio. So, we'll stick with established methods and I'll say your next step should be to train a realtime model.
2. Generation
You have two avenues for training a vst to play with, either Google DDSP or Neutone's RAVE models. Both are have the same sort of tech stack. These build realtime instruments, without breaking your brain building look up tables and really getting messy with C++.
One thing to note with this field, it is VERY VERY diy, and you will bump into a thousand unreadable errors because one person forgot to add which version of some library they were using. I will say that many of the projects and resources are a little howyadoin', so be patient, ask questions where possible, and understand that most people CAN build these things, and once you get the pipeline working it's a simple rinse and repeat with new inputs for new results.
3. Adaptation
Okay, so you've built an instrument and recorded your wife's flute playing for hours in order to build a sampler, but you're not CRAZY about the sound. The next step I'll recommend here is checking out GuitarML, for building a lil effects unit of your very own. This follows much of the same setup, where you’re placing audio files in trainable spots, but has the added complexity of building a JUCE project on top of this, which is a grea step in developing your work into a realtime capable plugin. Lots of this work is done with some tools like cmake and the projucer, which can be new but is an essential part of audio development nowadays!
This is, again, pretty easy to setup and infinitely cool when you get it going, so just get started! I will say the model is somewhat optimized specifically for guitar pedals, but I've been able to get some great results out of it anyway.
4. Implementation
Okay, so you've done all the web training stuff, and you've fiddled around with a classifier, youve made some huge datasets, and youre good to go. Now it's time to get stuck into the hard stuff.
Not all ML can be done using trained models from others, and if getting into this field is something you're serious about, check out the courses on ML by Google, and on huggingface which has a brand new one on audio.
https://huggingface.co/learn/audio-course/chapter0/introduction
https://cloud.google.com/use-cases/generative-ai
For books, I’d also recommend this
5. Additional Resources
There are piles and piles and piles of additional reading, and many people in the space working hard to develop great resources, plugins, tools, frameworks and research papers.
Unless you're wanting to take things a LOT further, there already exists plenty enough to play, generate, and make whacky sounding sfx, that don't require a PhD and years of research into the difference between LTSM and GAN and the layers of MAGIC DUST that goes into this tech.
Here are the resources I've mentioned in this post though!
If you liked this post, id appreciate a follow, you can check out some of my own stuff on YouTube https://youtube.com/@dweaveraudio?si=siB0qDx-YqAxU9NJ… , or follow me @dweaveraudio on Twitter!