This article is contributed. See the original author and article here.



PROJECT TITLE: A Baseline Elephant Infrasound Detection architecture using IoT Edge, Cloud and Machine learning frameworks for elephant behavioural monitoring.


Group 2:

Authors: Shawn Deggans, Aneeq Ur Rehman, Tommaso Pappagallo, Vivek Shaw, Giulia Ciardi, Dnyaneshwar Kulkarni


shawn.jpg Aneeq.jpg Tommaso.jpg
Shawn Deggans Aneeq Ur Rehman

Tommaso Pappagallo

vivek.jpg Giulia.jpg dnyanesh.jpg
 Vivek Shaw Giulia Ciardi Dnyaneshwar Kulkarni




The goal of this project is to create a platform that will allow researchers to continuously build and refine machine learning models that can recognize elephant behaviour from infrasound. This platform is based on Microsoft’s Project15 architecture- an open-source solution platform built to aid conservation and sustainability projects using IoT devices.


We aim to apply this technology stack to the tracking and understanding of elephant behaviour to contribute to an effective monitoring program. Our solution aims to achieve this goal by:


  1. Systematically collecting data related to elephant vocalization via IoT devices.

  2. Detect threats and infer behavioural patterns based on elephant vocalization by using custom machine learning and signal processing modules deployed on the IoT device using Azure IoT stack.                                                            

Furthermore, Elephants’ low-frequency vocalizations (infrasound) are produced by flow-induced self-sustaining oscillations of laryngeal tissue [1]. This unique anatomy allows elephants to communicate in a sonic range that is below the audible human hearing range (14-24hz). This means that a standard microphone will not be an adequate detect for elephant vocalization. The normal range for standard microphones is 20hz-20khz. Specialized microphones can capture sound in the range of 1hz and above. Acquiring these specialized microphones and infrasound detection tools are beyond the scope of this project, in our approach we will outline how such a tool could be created and used in the field by rangers and researchers in our proposed architecture below.


The design of this architecture can be built upon in the future, to deploy more advanced custom modules and do more advanced analytics based on these collated vocalizations. We propose an end-to-end architecture that manages these needs for the researchers and domain experts to analyse and monitor elephant behaviour and identify potential threats and take timely action.





The above follows the overall proposed flow of the device-to-cloud communication, as well as the IoT Edge’s internal device logic.


1. Capture Infrasound using infrasound detection arrays on elephants:

The infrasound capture device continuously monitors for infrasound. These infrasound signals are captured and sent to the device for analysis. We are interested in exploring one type of infrasound capture device:

  • Infrasound detection array

The infrastructure detection array is the capstone project of a Spring 2020 project by team members: James Berger, Tiffany Graham, Josh Wewerka, and Tyler Wyse. Project details can be explored here, “[2]

This project was designed to detect infrasound direction using an array of infrasound microphones, but we would use the infrasound microphones in a low-powered device that could communicate over a long-range radio network, such as LoRaWAN. This network of devices would serve as the primary means for tracking elephants and recording their vocalization.

We envision this device as an IoT Edge device capable of supporting Azure Storage Account Blob Storage and custom IoT Edge Modules that are containerized machine learning modules capable to converting audio into spectrograms, and spectrograms into bounding box marked images, and delivering telemetry to IoT Hub.


2. Send infrasound data to IoT Device:

The 64-bit Raspberry Pi model captures the infrasound audio signals, breaks these into segments based on detected beginning-of-signal and end-of-signal indicators. Initially this limitation could mean we miss “conversations,” because this is the equivalent of capturing a word from each sentence in human speech. Elephants do not necessarily speak in words and sentences, so this likely will not keep us from meeting the goals of the system.


3. Storing data via Azure Blob Storage: 

These audio files will be saved to an Azure Blob Storage device module in an audio clips file [3].


4. Azure Functions and Spectrogram conversion via the FileWatcher Module. 

We develop a custom IoT Edge module known as the filewatcher module. This module uses an Azure Function to serve as a file watcher for the file storage. [4]


When it detects the file has been received in the blob storage account, it will convert the audio file to a spectrogram.


A MATLAB script is used to convert the raw audio files to spectrograms. The details of this script are available on our github repository. MATLAB offers better resolution images and a higher fidelity power spectrum that improves our elephant rumble detection process.


These spectrograms are stored in an image inbox in the blob storage.


5. Elephant Rumble Detection Module: 

Another Azure function listens to the image inbox for new images and keeps a track of the images uploaded here.


When new images appear, the function feeds the images to the rumble detection model.


The rumble detection model is trained via the custom vision ai using the spectrogram images from the raw audio files.


The custom vision AI is a good place to prototype and to build object detection and image classification models.


An example of steps to build a custom computer vision model and deploy to IoT Edge can be found here via DevOps (


The rumble detection model returns the image label (rumble or no rumble) and the bounding box coordinates. We use OpenCV to draw the coordinates on the image. Images are then saved to the Bounding Box Images folder for more backend processing within Azure ML to help feed the ML loop.

We implement this logic flow as part of a DevOps pipeline. For more detail on the pipeline components, please see technical design.


6. Elephant Spectrogram Feature Extraction and Analysis:

In addition to the above, we also extract features from the created spectrograms in our Azure storage accounts.

For our feature extraction we calculated 19 different variables with relation to flow statistic on the audio (mean, variance, etc.) and common signal characteristics (rms, peak2peak, etc.) making use of the MATLAB software here which enabled a wider array of spectral features to be extracted.


In summary for each sample, we found the frequency of the peak magnitude across the overall power spectrum, the 1st formant, the 2nd formant, the max, min, mean and finish of the fundamental. In addition, we computed the power in the low frequency range 0-30 Hz, mid frequency range 30-120 Hz and high frequency range >120 Hz. We located the frequency of the spectral kurtosis peak and the flow cumulative sum range.


7. Analysis in the ML Workspace using stored data: 

Our results offer some starting points for further research particularly in relation to trying to learn any underlying structure, with regards to age groups, in our population, given unlabelled data.


By unlabelled data, we mean that the raw audio files have no annotations or additional information associated with it to classify or distinguish elephant rumbles into maturity groups.


We classify elephant rumbles into maturity groups by


1. Leveraging the extracted features such as the 1st  and 2nd formant frequencies, rms, peak to peak envelopes, max, min, finish, frequencies for each sample- We then aim to classify the samples into Maturity Groups 1 or 2 We achieve this by observing the thresholds of these features extracted by setting these thresholds as proposed in this paper [5]. We also do a visual inspection by plotting these features out doing a visual inspection using various data visualisation techniques. An example of a Violin plot is shown below.


The violin plot clearly shows that at certain thresholds, we can cluster elephants into two different maturity groups.


2. We use unsupervised methods such as PCA to generate principal components and perform K-means clustering. This method showed clear signs of clustering when the points were projected into the PCA dimensions as seen in the output below. 


When comparing method 1 and 2 the classified maturity groups overlapped for over 75% of the data, however, to check whether this is statistically significant we would need to perform a t-test.





The above is the MLOPs pipeline as part of our workflow. We use the continuous integration and continuous deployment feature of azure DevOps to manage our workflows.


The following are some of the steps in this pipeline.


1. Creation of Azure Resources:

Resource Group:


Needed to create all our resources in azure.


IoT Hub:


IoT Hub is used as the primary IoT gateway. From here we manage device provisioning, IoT edge device creation, telemetry routing, and security. We will use the SAS token security and employ a regular SAS key rotation policy to keep devices secure.


IoT Edge device:

This is created in the IoT Hub and acts as our IoT device.


IoT Edge RunTime Agent:

This will be deployed on a raspberry pi OS VM container.


Azure Blob Storage:

Blob Storage holds images saved on the IoT Edge Module. This means that it also has folders that represent audio clips, image inbox, and bounding box images. This represents cold storage of data that could be studied later or needs to be archived.


Azure Container Registry: 

Container Registry is used to hold the Dockerfile images that represent the IoT Edge Modules. Modules are pulled by [1] IoT Hub to automatically deploy to devices in the field. All our custom modules mentioned previously contain docker files which are pushed to azure container registry as part of the pipeline.


Azure DevOps: 

Azure DevOps is one of the most important pieces of our MLOPs process. From here code is checked in and pushed out to the Container Registry. This also accounts for version control for our code base and manages pipelines. All docker files for both modules are located here and DevOps offers a continuous integration and continuous deployment scenario, as builds the code for the two custom modules, and generates a deployment manifest which is deployed on the IoT Edge device in the IoT Hub.


2. Import source code in DevOps:

We import our code files and repos in DevOps from our source repository.


3. Create a CI pipeline in DevOps:

The pipeline is then created with all environment variables defined. Environment variables include details of container registry, IoT Hub to build and push our modules to the azure container registry. We enable continuous integration.


4. Enable CD pipeline in DevOps:

We then create a release pipeline and push our deployment to the IoT Hub and hence the IoT Edge device [6].


The device is now ready to work in production.




  • Azure Custom Vision AI Portal

  • Azure DevOps.

  • Azure Machine learning.

  • Azure Blob Storage

  • Azure IoT Edge

  • Azure IoT Hub

  • Azure Container Registry

  • Docker

  • Matlab

  • Python

  • Jupyter notebooks

  • Digital Signal Processing

  • Data visualisation



  • Procurement of physical device/microphone to detect elephant rumbles.

  • Unlabelled data- Lack of ground truth available to classify elephant rumble type.

  • Spectrogram resolutions

    •  Need to be good enough for the training of the object detection model in custom vision ai portal.

    • We applied a variety of data augmentation methods using the Augmentor software package [7]. Some transformations yielded better results than the other but this needs to be explored in more detail.

  • Noise Separation- Separating elephant rumbles from background noise.

  • After research, we understand that our approach of analysing infrasound is of a limited scope on the overall communication amongst the elephants. The sounds are important, but like humans, elephants express their communication in more ways than just their voices. Elephants also express body language (visual communication), chemical communication, and tactile communication. We understand that this means the audio only component will not be able to capture the full picture. It will, however, add to the Elephant Sound Database.

  • The MATLAB packages locked us into a specific vendor and made it impossible for us to use the libraries in an IoT Edge container. This was due to licensing restrictions. We used MATLAB as the spectrogram resolutions were promising.



  • Polishing baseline code to a more production ready code base.

  • More literature review and access to labelled data for building machine learning models.

  • Use of semi-supervised approaches to develop a clean and annotated dataset for rumble classification. Validating this dataset with domain experts to serve as a gold standard.

  • Use of denoising auto-encoders for background noise cancellation in audio files.

  • Use deep learning (CNNs and graph CNNs) for extracting spectrogram data when building models instead of feature engineering. Comparison of both approaches.

  • Use of advanced unsupervised methods to get more insights on the data extracted from spectrograms.

  • Investigate sustainability and scalability of the proposed architecture in more detail.

  • Investigate paid and open-source datasets available for ML.

  • Improve existing custom vision ai model by:

    •  Incorporating more training data and researching data augmentation strategies for spectrogram data.

    • Using better data augmentation methods via spectrograms such as the following SpecAugment package [8]. Having a unified platform via python for improving spectrogram resolutions would be preferred.

    •  Algorithm development around bounding box detection of fundamental frequency and harmonics in the rumbles.













Project 15: Empowering all to preserve endangered species & habitats.

Date: April 22, 2021

Time: 08:00 AM – 09:00 AM (Pacific)

Location: Global

Format: Livestream

Topic: DevOps and Developer Tools

What is this session about?
If, just like us, you want to save animals and preserve our ecosystems, come celebrate Earth Day 2021 with us by learning how we are helping and how you could too!
Project 15 from Microsoft is an effort that includes an open source software platform that helps non-governmental organizations (NGOs) reduce their cost, complexity, and time to deployment by connecting scientific teams with the needed technology to empower them to solve those environmental challenges.
We will also be highlighting community efforts, Internet of Things (IoT) and Machine Learning for sustainability, Open-Source GitHub repository, and existing projects by students!

Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.

%d bloggers like this: