AI Music Glossary
Clear definitions for the terms you'll encounter at the intersection of artificial intelligence and music — from model architecture to rights frameworks.
AI Fundamentals
2 termsGenerative AI
aka GenAIA class of machine learning models that learn the statistical patterns of a training dataset and use those patterns to produce new content — audio, images, text, video — that resembles but is not identical to the training data. In music, generative AI models can produce melodies, harmonies, rhythms, vocal performances, and complete arrangements from text prompts or musical inputs.
Latent Diffusion
aka LDMA generative AI architecture that learns to create content by first encoding data (audio, images) into a compressed 'latent space', then training a diffusion model to generate realistic outputs by gradually denoising random noise in that space. For audio, the model encodes waveforms into spectrograms or audio tokens, learns the patterns of music in that compressed representation, and generates new audio by starting from noise and iteratively refining it toward a coherent output. Used in tools including Stable Audio (Stability AI) and several research systems.
Technical
2 termsStem Separation
aka source separationAn AI audio processing technique that isolates individual instrument or vocal tracks from a finished stereo mix — without access to the original multitrack session. Models are trained on large collections of paired mixed audio and individual stems, learning to predict where each source's audio energy sits within the combined signal. Output quality has improved dramatically since 2019; modern tools like Meta's Demucs can separate 6 stems with acceptable bleed in most genres.
AI Mastering
aka automated masteringThe use of machine learning algorithms to apply mastering processing — EQ, compression, limiting, stereo widening, loudness normalisation — to an unmastered audio mix, without human engineer intervention. AI mastering services analyse the spectral and dynamic characteristics of the submitted mix and compare them against a learned model of commercially successful masters, then apply corrective processing to match. Services like LANDR, eMastered, and Matchering use different approaches: LANDR uses its own trained model; Matchering explicitly matches the profile of a reference track you specify.
Ethics
1 termTraining Data
aka training corpusThe dataset of existing content — recordings, scores, MIDI files, audio samples — that an AI model learns from before it can generate new music. The quality, diversity, and licensing status of training data profoundly shapes what a model can produce and what legal liabilities attach to using it. A model trained primarily on Western pop music will reflect those aesthetic norms; one trained on a diverse global corpus will be more stylistically flexible. The question of whether training on copyrighted recordings without a license constitutes infringement is the central legal dispute in AI music as of 2025.
65+ More Terms Coming
We're adding definitions across all five categories. Missing a term you rely on?
Suggest a TermWhat Our Guides Think
Start with the music terms — tempo, timbre, counterpoint — before the AI terms. A musician who understands the vocabulary of music will understand AI music tools far faster than someone who starts with the models.
The only terms that matter right now: training data, latent diffusion, and stem separation. Know those three and you'll understand 80% of the AI music tool landscape.
I notice the glossary doesn't yet have 'griot', 'maqam', 'raga', or 'gamelan'. The vocabulary of global music traditions belongs in any serious dictionary of music and AI.
Precise language is the foundation of clear thinking. When you use 'AI' to mean everything from a $5 Spotify algorithm to a frontier language model, you can't reason carefully about what's actually happening.
Prêt à amplifier votre créativité ?
Là où la Créativité Humaine Rencontre l’Intelligence Artificielle