18 minute read

Banner image taken from a photo by Mark Stoop on Unsplash.

This is an old Datascience challenge proposed by Zindi. It mights be a little incomplete but it could be interesting as an example showing how to deal with audio files.



Southern Africa is home to around 960 bird species, some of which are found nowhere else on the globe. These creatures fill many important ecosystem niches, and can be found in every habitat in the region, from the lush Afromontane forests of the Drakensberg to the shifting dunes of the Kalahari. Unfortunately, some species are under threat due to habitat loss, climate change or disease. It is important to monitor the health of bird populations across the region, both for the conservation of the birds themselves and as a key indicator of overall ecosystem health.

Unlike larger animals, birds can be hard to observe with camera traps, and so most monitoring efforts involve volunteers identifying birds in the wild or tagging birds caught in nets before releasing them. The objective of this competition is to create a model capable of identifying birds by their calls. This could enable automatic identification of birds based on audio collected by remote microphones, drastically reducing the human input required for population monitoring.

To keep things simple, this competition focus on 40 birds whose calls are frequently heard in Southern Africa. The training data consists of 1857 audio files, recorded by hundreds of contributors and shared through xeno-canto. The goal is to use these recordings to build a classification model able to predict which bird is calling in a given audio clip.

*Southern Africa is the area south of the Zambezi, Kunene and Okavango rivers. This includes Namibia, Botswana, Zimbabwe, South Africa, Lesotho, Swaziland and southern and central Mozambique.


The data consists of mp3 files with unique IDs as file names, split into train and test sets and available as zip files in the downloads section. The labels for the training set are contained in train.csv, corresponding to one of the 40 species of bird listed below. Your task is to predict the labels for the test set, following the format in sample_submission.csv.

In cases where more than one species is calling (many recordings contain faint background noise) the labels correspond to the most prominent call, and your predictions should do likewise.

We are grateful to the many citizen scientists and researchers who shared the recordings which made this competition possible. The full list of authors can be found on the Zindi web site or in the file of the challenge (authors.csv).

Files available:

  • Train.csv - has the common name of the bird and corresponding unique mp3 ID for the training files.
  • Test.csv - has the unique mp3 IDs you will be testing your model on. SampleSubmission.csv - is an example of what your submission file should look like. The order of the rows does not matter, but the names of the mp3 must be correct. Your submission should contain probabilities that the mp3 is of each species (with values between 0 and 1 inclusive).
  • Train.zip - mp3 files with unique IDs. Common names of the birds are in Train.csv. You will use these files to train your model. 1857 files.
  • Test.zip - mp3 files with unique IDs. You will use these files to test your model and predict the common name of the main bird in each recording. 911 files.
  • StarterNotebook.ipynb - Credits to Johnowhitaker for this starter notebook and few tips !

Visualizations of some of the bird sounds you will encounter in this challenge.


Some of these recordings are under a Creative Commons Attribution-NonCommercial-NoDerivs 2.5 license, meaning that you cannot sell or distribute modified copies of the calls. If you would like to share example calls, please download them directly from xeno-canto and give proper attribution to the author.

Evaluation metric

The evaluation metric for this challenge is Log Loss.

Some files contain more than one bird call, the goal is to predict the ‘foreground species’ calling the loudest. In the model, one will want to account for background noise. There are 40 classes (birds). Values should be probabilities and can be between 0 and 1 inclusive.

2.Audio Feature Extraction in Python

Different type of audio features and how to extract them.

Audio files cannot be understood directly by the models. We need to convert them into an understandable format : this is where feature extraction is important. It is a process that converts most of the data but into an understandable way. Audio feature extraction is required for all the data science tasks such as classification, prediction and recommendation algorithms.

Here is a summary of this blog post

The audio signal is a three-dimensional signal in which three axes represent time, amplitude and frequency. png

Generate features:
There are many ways to tackle this challenge. Try deep learning on the audio, generate a spectrogram and treat this as an image classification task, use some signal processing tricks to look for close matches, try to extract meaningful features such as dominant frequencies…. It’s up to you :)

shows how to visualize different properties of the waveform, and some features you could use.

For this example, I’ll generate a square spectrogram and save as an image file - not a very elegant approach but let’s see where it gets us.

from google.colab import drive
Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
Mounted at /content/drive

Import all the needed libraries, we’ll be using librosa for analyzing and extracting features of an audio signal. For playing audio we will use pyAudio so that we can play music on a jupyter notebook directly.

import pandas as pd
import numpy as np

import IPython.display as ipd
from matplotlib import pyplot as plt
import seaborn as sns

import librosa # package for music and audio processing, & features extraction 
import os, shutil, glob

Set the path

path_colab = 'drive/My Drive/zindi/'
path_jupyt = './'

# set to True with colab or False with jupyter
colab = False
path = path_colab if colab else path_jupyt

Data insights & look at the submission

sub = pd.read_csv(path + 'SampleSubmission.csv')

# retrieve all the class names in a list (the 1st col is the id)
birds = sub.columns[1:]

# add a col with all files' paths 
sub['file_path'] = path + 'Test/' + sub['ID'] + '.mp3'
ID Ring-necked Dove Black Cuckoo Red-chested Cuckoo Fiery-necked Nightjar Green Wood Hoopoe Crested Barbet Cape Batis Olive Bushshrike Orange-breasted Bushshrike ... White-browed Scrub Robin Cape Robin-Chat White-browed Robin-Chat Chorister Robin-Chat Southern Double-collared Sunbird White-bellied Sunbird African Pipit African Rock Pipit Cape Bunting file_path
0 019OYB 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 ./Test/019OYB.mp3
1 01S9OX 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 ./Test/01S9OX.mp3
2 02CS12 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 ./Test/02CS12.mp3
3 02LM3W 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 ./Test/02LM3W.mp3
4 0C3A2V 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 ./Test/0C3A2V.mp3

5 rows × 42 columns

Let’s listen to a sound in order to know what we get :)