This article is contributed. See the original author and article here.

Oxford’s AI Group 4 Project 15 Writeup


 


Who are we?




































 

Abishekh.jpg



 

Bas.jpg



 

Chandan.jpg



Abhishekh  Baskaran


 



Bas Geerdink


 



Chandan Konuri


 



 

Henrietta.jpg



 

Jay.jpg



 

Paulo.jpg



Henrietta  Ridley


 



Jay Padmanabhan


 



Paulo Campos


 



 

Vishweshwar.jpg



 



 



Vishweshwar Manthani


 


 



 



 



Goal


The goal of the project was to count the number of elephants in a sound file.


team4spec.jpg


To do so, we detected whether rumbles are belonging to the same elephant or not


 


Literature



  • Poole, Joyce H. (1999). Signals and assessment in African elephants: evidence from playback experiments. Animal Behaviour, 58(1), 185-193

  • Jarne, Cecilia (2019). A method for estimation of fundamental frequency for tonal sounds inspired on bird song studies. MethodX, 6, 124-131

  • Stoeger, Angela S. et al (2012). Visualizing Sound Emission of Elephant Vocalizations: Evidence for Two Rumble Production Types.

  • O’Connell-Rodwell, C.E. et al (2000). Seismic properties of Asian elephant (Elephas maximus) vocalizations and locomotion. Journal of the Acoustic Society of America, 108(6), 3066-3072

  • Heffner, R. S., & Heffner, H. E. (1982). Hearing in the elephant (Elephas maximus): Absolute sensitivity, frequency discrimination, and sound localization. Journal of Comparative and Physiological Psychology, 96(6), 926–944

  • Elephant Listening Project, Cornell University: https://elephantlisteningproject.org/

  • Project 15, Microsoft: https://microsoft.github.io/project15/ 


 


Introduction



  • Sound files can be analysed by transforming them into a 2D image: a spectrogram of time (seconds) vs frequency (Hertz). The third dimension is sound intensity (decibel), which can be shown as a colour or grayscale.

  • Elephants produce rumbles to communicate with a typical frequency of 10 – 50 Hz and lasting 2 – 6 seconds

  • One elephant rumble will have many harmonics, which are sound waves of increasing frequency.

  • An elephant can be identified by its base frequency. If there are two slightly overlapping or separated rumbles with a different base frequency, they probably belong to separate animals.


Data


We received a set of sounds files (.wav) and metadata that pointed us to the segments where elephants were likely to produce rumbles.


Challenges:



  • Big data set

  • Joining the files might be a challenge

  • Labels / annotations don’t mention the number of elephants


team4cornell.png


 


team4data.png


Data Pipeline



  1. Segmenting data: based the metadata files, we create segments of a few seconds that contain the interesting information

  2. Spectrograms: each data segment is transformed into a 2D image of time vs frequency (10-50 Hz), using FFT transformation algorithm, lowpass/highpass filters, and frequency filters

  3. Noise reduction: each spectrogram is reduced of noise and transformed into a simple monochrome (black and white) image

  4. Contours detection: each monochrome image is evaluated with a contour detection algorithm, to distinguish the separate ‘objects’ which in our case are the elephant rumbles

  5. Boxing: for each contour (potential elephant rumble) we calculate the size (height and width) by drawing a box around the contour

  6. Counting: we compare the boxes that identify the rumbles to each other in each spectrogram. Based on a few business rules, we count the number of unique elephant rumbles in each image


Samples


team4samples.JPG


 


team4samples2.JPG


Source Code



  • The source code is made available at: https://github.com/AI-Cloud-and-Edge-Implementations/Project15-G4 

  • All code is written in Python and runs on premise or in the cloud (Azure)

  • We used the following frameworks to process and analyze the data:

    • boto3 for connecting to Amazon AWS

    • Numpy, Pandas, SciPy and MatPlotLib for statistical analysis and visualization

    • Librosa for FFT

    • noisereduce for noice reduction

    • SoundFile

    • OpenCV for contour detection



  • Explanatory video can be found at: 


Results



  • We analysed 3935 elephant sounds:

    • 112 spectrograms were identified as containing 0 elephants

    • 3277 spectrograms were identified as containing 1 elephant

    • 505 spectrograms were identified as containing 2 elephants

    • 40 spectrograms were identified as containing 3 elephants




 


Results of the Boxing algorithm



  • The boxing algorithm was evaluated by Liz Rowland of Cornell University

  • The reported accuracy of the model is:

    • 97.29 % for the Training dataset (3180 cases)

    • 99.29 % for the Testing dataset (758 cases)

    • This proves that the model is useful for counting elephants



  • In combination with other models (elephant detection), many interesting use case can be built with this model, for example visualizing elephant movements and detecting poaching


 


Project 15 Architecture


p15open.png


team4arch.jpg


Building ML Models



  • Aim
     
    Using the processed spectrogram data as an input to a CNN to automatically categorise how many elephants are present

  • Why are we doing this? 

    • To enable automation the workflow end to end

    • To improve accuracy by reducing human error

    • To save time, enabling researchers to focus their attention on complex problems



  • Our Approach
     
    Transfer learning looks to take advantage of models which have been pre-trained on large datasets, then fine tuning to our specific problem. This approach is becoming very popular for several reasons (quicker time to train, better performance, not needing lots of data) and we found it to work well. 


 


Model Summary



  • Implemented using keras with a tensorflow backend. 

  • To evaluate the performance of our models we looked at the following measures of our two most promising architectures:











  • Resnet50

  • accuracy: 0.9620

  • loss: 0.1622





  • VGGNet




  • accuracy: 0.9477




  • loss: 0.3252





 


modelsummary.JPG


 


Model – Resnet50



  • Below configuration was found to be optimal while running the classification task on Resnet50

    • Epochs: 25

    • Batch Size: 100

    • Weights = “imagenet”

    • Intermediate dense layers: 

      • Nodes: 4 layers of 256,128,64 respectively

      • activation = ‘relu’

      • Dropout = 0.5

      • BatchNormalization()



    • Final dense layer:

      • Nodes: 3 

      • activation = ‘softmax’



    • Optimizer: Adam with a learning rate of 0.001




 


Introduction


Sound files


 


Further Research



  • Machine learning on spectrograms using labelled data 

  • Automatic classification and better acoustic analysis (https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0048907)

  • Further fine-tuning of the boxing algorithm might lead to even better results, e.g.

    • Fixing the time axis in the spectrograms

    • Increasing the frequency range

    • Other (better) noise reduction techniques




Conclusions



  • Elephant counting based on base frequency analysis is possible

  • The team delivered a ready-to-use software library for counting elephants that with a high accuracy (97% on selected cases)

  • The software can be used in the IoT Hub (Project 15) or on-premise

  • The application can be integrated into other software

  • A machine learning model (VGG or Resnet50) could be used to count the elephants instead of the rule-based boxing algorithm

  • Further research is needed to improve the results, for example for broadening to other species


 


Thanks



  • Many thanks to all people who helped with the project, by providing insights, performing reviews, and participating in meetings:

    • Peter Wrege (Cornell University)

    • Liz Rowland (Cornell University)

    • Lee Stott (Microsoft)

    • Sarah Maston (Microsoft)

    • Thanks to the organizers of the “Artificial Intelligence – Cloud and Edge Implementations” course:

    • Ajit Jaokar (University of Oxford)

    • Peter Holland (University of Oxford)




 


 


 

Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.