Skip to main content

Audio Embedding Pipelines

Audio is defined as any human-hearable sound; audio embedding is the process of converting audio files (mp3, wav, etc...) into vector representations. Here, we list some of our built-in pipelines for generating audio embeddings.

Pipelines

Audio tasks have seen incredible strides using 1-dimensional convolutional neural networks. Just as with CNNs used for image embedding, most audio embedding models include some form of preprocessing such as data cropping and downsampling. Towhee maintains the following audio embedding pipelines:

audio-embedding-vggish

This pipeline contains a pre-trained model based on VGGish. VGGish is a supervised model trained using the AudioSet dataset, a large scale audio classification task.

audio-embedding-clmr

This pipeline contains a pre-trained model based on CLMR, also known as Contrastive Learning of Musical Representations. CLMR is a semi-supervised encoder-based model which works well for music fingerprinting. Its performance on generic audio clips is untested.