Microsoft Project15 & University of Oxford Capstone Project with Elephant Listening Project Team 4

This article is contributed. See the original author and article here.

Oxford’s AI Group 4 Project 15 Writeup

Who are we?


Abhishekh Baskaran	Bas Geerdink	Chandan Konuri

Henrietta Ridley	Jay Padmanabhan	Paulo Campos

Vishweshwar Manthani

Goal

The goal of the project was to count the number of elephants in a sound file.

To do so, we detected whether rumbles are belonging to the same elephant or not

Literature

Poole, Joyce H. (1999). Signals and assessment in African elephants: evidence from playback experiments. Animal Behaviour, 58(1), 185-193

Jarne, Cecilia (2019). A method for estimation of fundamental frequency for tonal sounds inspired on bird song studies. MethodX, 6, 124-131

Stoeger, Angela S. et al (2012). Visualizing Sound Emission of Elephant Vocalizations: Evidence for Two Rumble Production Types.

O’Connell-Rodwell, C.E. et al (2000). Seismic properties of Asian elephant (Elephas maximus) vocalizations and locomotion. Journal of the Acoustic Society of America, 108(6), 3066-3072

Heffner, R. S., & Heffner, H. E. (1982). Hearing in the elephant (Elephas maximus): Absolute sensitivity, frequency discrimination, and sound localization. Journal of Comparative and Physiological Psychology, 96(6), 926–944

Elephant Listening Project, Cornell University: https://elephantlisteningproject.org/

Project 15, Microsoft: https://microsoft.github.io/project15/

Introduction

Sound files can be analysed by transforming them into a 2D image: a spectrogram of time (seconds) vs frequency (Hertz). The third dimension is sound intensity (decibel), which can be shown as a colour or grayscale.

Elephants produce rumbles to communicate with a typical frequency of 10 – 50 Hz and lasting 2 – 6 seconds

One elephant rumble will have many harmonics, which are sound waves of increasing frequency.

An elephant can be identified by its base frequency. If there are two slightly overlapping or separated rumbles with a different base frequency, they probably belong to separate animals.

Data

We received a set of sounds files (.wav) and metadata that pointed us to the segments where elephants were likely to produce rumbles.

Challenges:

Big data set

Joining the files might be a challenge

Labels / annotations don’t mention the number of elephants

Data Pipeline

Segmenting data: based the metadata files, we create segments of a few seconds that contain the interesting information

Spectrograms: each data segment is transformed into a 2D image of time vs frequency (10-50 Hz), using FFT transformation algorithm, lowpass/highpass filters, and frequency filters

Noise reduction: each spectrogram is reduced of noise and transformed into a simple monochrome (black and white) image

Contours detection: each monochrome image is evaluated with a contour detection algorithm, to distinguish the separate ‘objects’ which in our case are the elephant rumbles

Boxing: for each contour (potential elephant rumble) we calculate the size (height and width) by drawing a box around the contour

Counting: we compare the boxes that identify the rumbles to each other in each spectrogram. Based on a few business rules, we count the number of unique elephant rumbles in each image

Samples

Source Code

The source code is made available at: https://github.com/AI-Cloud-and-Edge-Implementations/Project15-G4

All code is written in Python and runs on premise or in the cloud (Azure)

We used the following frameworks to process and analyze the data:
- boto3 for connecting to Amazon AWS
- Numpy, Pandas, SciPy and MatPlotLib for statistical analysis and visualization
- Librosa for FFT
- noisereduce for noice reduction
- SoundFile
- OpenCV for contour detection

Explanatory video can be found at:

Results

We analysed 3935 elephant sounds:
- 112 spectrograms were identified as containing 0 elephants
- 3277 spectrograms were identified as containing 1 elephant
- 505 spectrograms were identified as containing 2 elephants
- 40 spectrograms were identified as containing 3 elephants

Results of the Boxing algorithm

The boxing algorithm was evaluated by Liz Rowland of Cornell University

The reported accuracy of the model is:
- 97.29 % for the Training dataset (3180 cases)
- 99.29 % for the Testing dataset (758 cases)
- This proves that the model is useful for counting elephants

In combination with other models (elephant detection), many interesting use case can be built with this model, for example visualizing elephant movements and detecting poaching

Project 15 Architecture

Building ML Models

Aim
Using the processed spectrogram data as an input to a CNN to automatically categorise how many elephants are present

Why are we doing this?
- To enable automation the workflow end to end
- To improve accuracy by reducing human error
- To save time, enabling researchers to focus their attention on complex problems

Our Approach
Transfer learning looks to take advantage of models which have been pre-trained on large datasets, then fine tuning to our specific problem. This approach is becoming very popular for several reasons (quicker time to train, better performance, not needing lots of data) and we found it to work well.

Model Summary

Implemented using keras with a tensorflow backend.

To evaluate the performance of our models we looked at the following measures of our two most promising architectures:

Resnet50

accuracy: 0.9620

loss: 0.1622

VGGNet

accuracy: 0.9477

loss: 0.3252

Model – Resnet50

Below configuration was found to be optimal while running the classification task on Resnet50
- Epochs: 25
- Batch Size: 100
- Weights = “imagenet”
- Intermediate dense layers:
  - Nodes: 4 layers of 256,128,64 respectively
  - activation = ‘relu’
  - Dropout = 0.5
  - BatchNormalization()
- Final dense layer:
  - Nodes: 3
  - activation = ‘softmax’
- Optimizer: Adam with a learning rate of 0.001

Introduction

Sound files

Further Research

Machine learning on spectrograms using labelled data

Automatic classification and better acoustic analysis (https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0048907)

Further fine-tuning of the boxing algorithm might lead to even better results, e.g.
- Fixing the time axis in the spectrograms
- Increasing the frequency range
- Other (better) noise reduction techniques

Conclusions

Elephant counting based on base frequency analysis is possible

The team delivered a ready-to-use software library for counting elephants that with a high accuracy (97% on selected cases)

The software can be used in the IoT Hub (Project 15) or on-premise

The application can be integrated into other software

A machine learning model (VGG or Resnet50) could be used to count the elephants instead of the rule-based boxing algorithm

Further research is needed to improve the results, for example for broadening to other species

Thanks

Many thanks to all people who helped with the project, by providing insights, performing reviews, and participating in meetings:
- Peter Wrege (Cornell University)
- Liz Rowland (Cornell University)
- Lee Stott (Microsoft)
- Sarah Maston (Microsoft)
- Thanks to the organizers of the “Artificial Intelligence – Cloud and Edge Implementations” course:
- Ajit Jaokar (University of Oxford)
- Peter Holland (University of Oxford)

Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.

Microsoft Project15 & University of Oxford Capstone Project with Elephant Listening Project Team 4

Oxford’s AI Group 4 Project 15 Writeup

Goal

Literature

Introduction

Data

Data Pipeline

Samples

Source Code

Results

Results of the Boxing algorithm

Project 15 Architecture

Building ML Models

Model Summary

Model – Resnet50

Introduction

Further Research

Conclusions

Thanks

Submit a Comment Cancel reply

Recent Posts

Recent Comments

Archives

Categories

Meta

We look forward to meeting you