Speech Recognition for Singlish

Speech Recognition for Singlish

This article is contributed. See the original author and article here.

Mithun Prasad, PhD, Senior Data Scientist at Microsoft and Manprit Singh, CSA at Microsoft


 


Speech is an essential form of communication that generates a lot of data. As more systems provide a modal interface with speech, it becomes critical to be able to analyze human to computer interactions. Interesting market trends point that voice is the future of UI. This claim is further bolstered now with people looking to embrace contact less surfaces with the recent pandemic.


 


Interactions between agents and customers in a contact center remains dark data that is often untapped. We believe the ability to transcribe speech in the local dialects/slang should be in the midst of a call center advanced analytics road map such as the one proposed in this McKinsey recommendation. To enable this, we want to bring the best from the current speech transcription landscape, and present it in a coherent platform which businesses can leverage to get a head start on local speech to text adaptation use cases. 


 


There is tremendous interest in Singapore to understand Singlish.


 


Singlish is a local form of English in Singapore that blends words borrowed from the cultural mix of communities.


miprasad_0-1658626696072.png


An example of what Singlish looks like


 


A speech recognition system that could interpret and process the unique vocabulary used by Singaporeans (including Singlish and dialects) in daily conversations is very valuable. This automatic speech transcribing system could be deployed at various government agencies and companies to assist frontline officers in acquiring relevant and actionable information while they focus on interacting with customers or service users to address their queries and concerns.


 


Efforts are on to understand calls made to transcribe emergency calls at Singapore’s Civil Defence Force (SCDF) while AI Singapore has launched Speech Lab to channel efforts in this direction. Now, with the release of the IMDA National Speech Corpus, local AI developers now have the ability to customize AI solutions with locally accented speech data. 


 


IMDA National Speech Corpus


The Infocomm Media Development Authority of Singapore has released a large dataset, which is:


 


• A 3 part speech corpus each with 1000 hours of recordings of phonetically-balanced scripts from ~1000 local English speakers.


• Audio recordings with words describing people, daily life, food, location, brands, commonly found in Singapore. These are recorded in quiet rooms using a combination of microphones and mobile phones to add acoustic variety.


• Text files which have transcripts. Of note are certain terms in Singlish such as ‘ar’, ‘lor’, etc.


 


This is a bounty for the open AI community in accelerating efforts towards speech adaptation. With such efforts, the trajectory for the local AI community and businesses are poised for major breakthroughs in Singlish in the coming years.


 


We have leveraged the IMDA national speech corpus as a starting ground to see how adding customized audio snippets from locally accented speakers drives up accuracy of transcription. An overview of the uptick is in the below chart. Without any customization, the holdout set performed with an accuracy of 73%. As more data snippets were added, we can validate that with the right datasets, we can drive accuracy up using human annotated speech snippets.


 


miprasad_2-1658626909593.png


 


On the left is the uplift in terms of accuracy. The right correspondingly shows the Word Error Rate dropping on addition of more audio snippets


 


Keeping human in the loop


 


The speech recognition models learn from humans, based on “human-in-the-loop learning”. Human-in-the-Loop Machine Learning is when humans and Machine Learning processes interact to solve one or more of the following:



  • Making Machine Learning more accurate

  • Getting Machine Learning to the desired accuracy faster

  • Making humans more accurate

  • Making humans more efficient


 


An illustration of what a human in the loop looks like is as follows. 


miprasad_3-1658626952260.png


 


In a nutshell, human in the loop learning is giving AI the right calibration at appropriate junctures. An AI model starts learning for a task, which eventually can plateau over time. Timely interventions by a human in this loop can give the model the right nudge. “Transfer learning will be the next driver of ML success.”- Andrew Ng, in his Neural Information Processing Systems (NIPS) 2016 tutorial 


 


Not everybody has access to volumes of call center logs, and conversation recordings collected from a majority of local speakers which are key sources of data to train localized speech transcription AI. In the absence of significant amounts of local accented data with ground truth annotations, and our belief behind transfer learning to be a powerful driver in accelerating AI development, we leverage existing models and maximize their ability to understand towards local accents. 


 


miprasad_4-1658627000430.png


 


 


The framework allows extensive room for human in the loop learning and can connect with AI models from both cloud providers and open source projects. A detailed treatment of the components in the framework include:



  1. The speech to text model can be any kind of Automatic Speech Recognition (ASR) engine or Custom Speech API, which can run on cloud or on premise. The platform is designed to be agnostic to the ASR technology being used. 

  2. Search for ground truth snippets. In a lot of cases when the result is available, a quick search of the training records can point to the number of records trained, etc. 

  3. Breakdown on Word Error Rates (WER): The industry standard to measure Automatic Speech Recognition (ASR) systems is based on the Word Error Rate, defined as the below


miprasad_5-1658627047292.png


 


where S refers to the number of words substituted, D refers to the number of words deleted, and I refer to the number of words inserted by the ASR engine.


 


A simple example illustrating this is as below, where there is 1 deletion, 1 insertion, and 1 substitution in a total of 5 words in the human labelled transcript.


 


miprasad_6-1658627713891.png


 


Word Error Rate comparison between ground truth and transcript (Source: https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-custom-speech-evaluate-data)


 


So, the WER of this result will be 3/5, which is 0.6. Most ASR engines will return the overall WER numbers, and some might return the split between the insertions, deletions and substitutions. 


 


However, in our work (platform), we can provide a detailed split between the insertions, substitutions and deletions. 



  1. The platform built has ready interfaces that allow human annotators to plug audio files with relevant labeled transcriptions, to augment data

  2. It ships with dashboards which show detailed substitutions, such as how often was the term ‘kaypoh’ transcribed as ‘people’. 


The crux of the platform is the ability to control the existing transcription accuracy, by getting a detailed overview of how often the engine is having trouble transcribing certain vocabulary, and allowing human to give the right nudges to the model. 


 


References and useful links



  1. https://yourstory.com/2019/03/why-voice-is-the-future-of-user-interfaces-1z2ue7nq80?utm_pageloadtype=scroll

  2. https://www.mckinsey.com/business-functions/operations/our-insights/how-advanced-analytics-can-help-contact-centers-put-the-customer-first

  3. https://www.straitstimes.com/singapore/automated-system-transcribing-995-calls-may-also-recognise-singlish-shanmugam

  4. https://www.aisingapore.org/2018/07/ai-singapore-harnesses-advanced-speech-technology-to-help-organisations-improve-frontline-operations/

  5. https://livebook.manning.com/book/human-in-the-loop-machine-learning/chapter-1/v-6/17

  6. https://www.youtube.com/watch?v=F1ka6a13S9I

  7. https://ruder.io/transfer-learning/

  8. https://www.imda.gov.sg/programme-listing/digital-services-lab/national-speech-corpus


 


*** This work was performed in collaboration with Avanade Data & AI and Microsoft.


 

MTC Weekly Roundup – July 22

MTC Weekly Roundup – July 22

This article is contributed. See the original author and article here.

Hey there, MTC’ers! It’s been a busy week, so let’s jump right on in and look at what’s been happening in the Community this past week.


 


MTC Moments of the Week


 


This week, Community Events made a triumphant return with a double hitter!


 


Earlier this month, @Alex Simons published a blog post announcing the general availability of Microsoft Entra Permissions Management, and this past Tuesday, July 19, we had our first Entra AMA featuring @Nick Wryter, @Laura Viarengo, and @Mrudula Gaidhani.


 


Then, on Thursday, we had Tech Community Live: Endpoint Manager edition, which featured four AMA live streams all about the latest Endpoint Manager capabilities, including Windows Autopilot, Endpoint Analytics, and more! Thank you to everyone who attended :)


 


On the blogs this week, @Rafal Sosnowski published a post announcing the sunset of Windows Information Protection (WIP) and sharing resources on its successor, Microsoft Purview Data Loss Prevention (DLP), which you can try for free by enabling the free trial from the Microsoft Purview compliance portal.


 


Cecilia_Bergstedt_0-1658531799277.jpeg


 


I also want to shout out @Sergei Baklan for helping @Jammin2082 with their Morse code translator in Excel. What a cool way to use Excel!


 


 


Unanswered Questions – Can you help them out?


 


Every week, users come to the MTC seeking guidance or technical support for their Microsoft solutions, and we want to help highlight a few of these each week in the hopes of getting these questions answered by our amazing community!


 


This week, @Florian Hein shared a scenario they’ve run into involving links to Sharepoint pages not opening from within Teams. Have you experienced this before?


 


Cecilia_Bergstedt_1-1658531799281.png


 


 


Meanwhile, new contributor @eliekarkafy is looking for guidance in building documentation for an Azure Governance Framework. If you have recommendations or a template to share, hop in and help a fellow MTC’er!


 


Next Week – Mark your calendars!


Lesson Learned #230: Microsoft Reactor -Azure SQL Developer and DBA Best Practices (Spanish Version)

This article is contributed. See the original author and article here.

We had the great opportunity to deliver a session within program Microsoft Reactor with our colleague Pablo Javier Fernandez – Cloud Solution Architect – Data & AI LATAM SQL Advanced Cloud Expert about Azure SQL Developer and DBA Best Practices.


 


In this video you see how is an interaction between a DBA and developer to find:


 



  • How to identify the application is taking the resources.

  • How to review the database metrics. 

  • How to implement a maintenance plan using runbooks


 


Microsoft Reactor connects you with the developers and startups that share your goals. You could Learn new skills, meet new peers, and find career mentorship. Virtual events are running around the clock so join us anytime, anywhere!


 


You could find additional information below:


 


Lesson Learned #221:Hands-On-Labs: Activity Monitor in my Azure SQL Managed Instance – Microsoft Tech Community


Lesson Learned #220:Hands-On-Labs: Activity Monitor in my Elastic Database Pool – Microsoft Tech Community


Lesson Learned #219:Hands-On-Labs: What do I need to do in case of high CPU wait time – Microsoft Tech Community


Lesson Learned #218:Hands-On-Labs: What do I need to do in case of high LOG_RATE_GOVERNOR wait time – Microsoft Tech Community


Lesson Learned #207: Hands-On-Labs: 40613-Database ‘xyz’ on server ‘xyz2′ is not currently available – Microsoft Tech Community


Global Azure 2022 – No encuentro donde esta el problema de la query (Spanish Version Delivered) – Microsoft Tech Community


Lesson Learned #196: Latency and execution time in Azure SQL Database and Azure SQL Managed Instance – Microsoft Tech Community


Blog – Automating DB maintenance for all SQL Databases in a single server using Azure Data Factory pipeline (microsoft.com)


(and many others…)


 


Watch this video (Spanish version)


 


 


 


Enjoy!

Explore data governance with Microsoft on the Uncovering Hidden Risks podcast

This article is contributed. See the original author and article here.

The risk landscape for organizations has changed significantly in the past few years. Traditional ways of identifying and mitigating risks simply do not work. They focus primarily on external threats when risks from within the organization are just as prevalent and harmful. Additionally, regulations change frequently, and it is difficult for security and compliance leaders to keep up on these changes.


 


The Uncovering Hidden Risks podcast will explore the need for enterprises to quickly move to a more holistic approach to data protection and reduce their overall risk. The show will cover an array of topics, across data governance, risk management, and compliance. It will address industry trends and customer pain points.


 


In each episode Erica Toelle, Sr. Product Marketing Manager for Microsoft Purview, partners with a Microsoft guest host to interview a guest leader in the data governance and compliance industry. These experts have a unique and deep understanding of the challenges organizations face, and the people, processes, and technology used to address them.


 


We are excited to have you listen in to our conversations as we discuss a range of interesting topics, ranging from trends, best practices, and real-life strategies for developing a holistic data governance and risk management program.


 


The Uncovering Hidden Risks podcast will launch on Wednesday, July 27th! Subscribe now to get the first two episodes!


 


You can catch our podcast trailer and subscribe on https://www.uncoveringhiddenrisks.com


 


Or you can listen and subscribe on the following platforms:



Here is a preview of our first two episodes, launching on Wednesday, July 27th:


 


Episode 1: Transitioning to a holistic approach to data protection


Guest Bret Arsenault, CVP, CISO at Microsoft joins us on this week’s episode of Uncovering Hidden Risks to discuss how a holistic approach to data protection can deliver better results across your organization and the three steps that can get you there. Erica Toelle and Talhah Mir host this week’s episode to chat with Bret about current trends in the data protection space, what data protection issues are top of mind, and how teams should start on their data protection strategy.


 


Episode 2: Three Ways to Prepare for the Future of Data Governance and Collaboration


Guest Jeff Teper, Corporate Vice President of Microsoft 365 Collaboration, including Teams, SharePoint, and OneDrive, joins Erica Toelle and Chris McNulty on this week’s episode of Uncovering Hidden Risks. Jeff leads product, design, and engineering teams for Microsoft 365, including Teams, SharePoint, OneDrive, Viva, and more which empower people and organizations worldwide to collaborate at work, home, and school. Erica and Chris speak with Jeff about empowering users to do more through collaboration technology, a zero-trust model for collaboration, and how we can make powerful things simple. 


 


We look forward to exploring with you!

Accelerating sales cycle for faster deal closure with mixed reality

Accelerating sales cycle for faster deal closure with mixed reality

This article is contributed. See the original author and article here.

One of the key aspects that companies want to improve is their sales cycle. Organizations look for innovations that can provide real-time solutions to address their business problems. Keeping up with changing customer demands and rising competition is crucial in this technological era. This is especially crucial for industries like manufacturing, oil and gas, retail, etc.


 


At Softweb Solutions, we understand the criticality of this situation, especially in this pandemic era, more than ever. We have worked with interactive, immersive technologies that have changed the paradigm of the manufacturing sector. Among these, mixed reality (MR) paves its path to the next-level business development experiences.


 


When we talk about MR, we ought to mention Microsoft’s HoloLens which has revolutionized the way industries design and practice sales processes. Being a Microsoft partner for over a decade, Softweb Solutions always looks forward to leveraging Microsoft’s tools and services to foster business growth of our clients. With MR solutions that assist in faster quote generation, better product cataloging, remote training and real-time assistance, Softweb Solutions has a proven track record of offering services to promote business growth of our clients.


 


We have been working with MR since Microsoft introduced it in 2017, catering to clients from a vast range of industries with immersive solutions. Let’s have a walk-through of one such instance where we provided MR solutions using HoloLens 2.


 


Transforming sales process with Softweb Solutions’ HoloLens 2 app


Tinsley Equipment Company LLC. is an organization based in Texas, USA that offers bulk material handle equipment for a range of industries across Americas. Given the size of the equipment and the need to show the products at the jobsite for retrofit and green field applications, the MR platform made a perfect fit. To stand apart from their competition, the Tinsley Equipment team wanted to provide their customers with a real-time quote for the equipment under consideration. They wanted to push the pricing discussion up sooner in the sales cycle for the team to immediately work with customers on options and alternatives that takes several weeks or months.


 


“This MR solution has aided our customers to better understand crucial equipment details that drive price differences to either move forward with the project or table it until another time. This is a great service to save time and has helped Tinsley to develop a reputation of transparency and honesty that, we are told, many customers haven’t seen in some time,” said Warren Ferguson of Tinsley Equipment Company.


 


Warren Ferguson brought the MR concept to Softweb Solutions who followed their thorough analysis process to get in-depth insights on Tinsley’s business processes, how they operate and the problems that they were looking to solve. The Softweb team worked for several months to develop and scale the application. With features that help Tinsley to gain maximum benefits out of the solution, it has become a mainstream tool for the company.


 


At Softweb Solutions, our MR capabilities offer limitless possibilities to companies like Tinsley who are focused on providing value to their clients. Moving on from traditional simulation techniques, our AR product visualization solution enhances the sales experience by offering an immersive 360-degree perspective of the subject or the equipment. We provided accurate 3D visualizations for interactive product catalog displays that allowed the technical and sales team to collaborate efficiently.


 


With Softweb Solutions’ AR CPQ solution, Tinsley can shorten time to competency. The technician wears the HoloLens headset which allows them to view the equipment details and get accurate measurements.


 


Tinsley1.jpg


Some of the key features of the MR solution include:



  • Product management

    • Manage product details remotely and distribute them as asset bundle files for remote deployment



  • Quotation management

    • Select products from the lists

    • Add additional information

    • Preview and generate quotes

    • View quotes

    • CRM integration for single-point management



  • Dynamic product configuration

    • Guide customers on product standards and customization opportunities

    • Innovative customer experience by enabling them to choose their product specifications



  • 3D holograms

    • Showcase multiple products from the asset library

    • Give an immersive view of how the product(s) will look and fit at the customer’s location



  • Real-time quotations

    • Generate proposals through an integrated CRM/product management system

    • Reduce time-to-quote

    • Increases quote accuracy



  • Showcase augmented products

    • Create a rich, immersive and interactive user experience

    • Allow your customers to connect with your products with a 1:1 or 1:8 scale




The HoloLens 2 app allows the technicians to get spatial information with precision. They can present a variety of their products in the form of 3D holograms according to their customers’ configuration using our Augmented CPQ solution. With our AR CPQ app, the sales team can get the details of the equipment in real-time. They can quickly provide a viable quote and close the deal. This results in 2X faster deal closure and up to 25% increase in productivity.


 


The transformational capabilities of our MR solution for improved sales performance


By utilizing elements from both augmented reality (AR) and virtual reality (VR), MR offers a unique immersive experience that allows the sales team to present their products in an interactive manner. The engineering team can get a detailed understanding and correct measurements of the equipment with 3D holograms. Moreover, by using our virtual product configurator solution, you can save time, process information faster and get data with greater accuracy. However, the advantages of MR are not limited to sales and quotations. Depending on your use case, the technology can be used to address your business requirements. To learn more about how Softweb Solutions can help you to transform your business, you can visit our website.