This article is contributed. See the original author and article here.
Many of you must be wondering if building near real-time analytical solution is really possible or just another buzz in the world of analytics. In traditional data analytics solutions, you have to go through the time consuming curation processes to bring data in a shape to be consumed by the end users.
Now with the advanced systems and technologies, it is practically possible to ingest and query raw operational data in near real-time without impacting OLTP(Online transaction processing) system. Integration of Azure Cosmos DB with Azure Data Explorer makes it real with the following solution architecture –
Benefits of this architecture
- Readily available operational data for analysis as opposed to waiting for days to get the data.
- Querying data without impacting the OLTP system’s performance.
- Quick and fast interactive queries over fresh and large data sets.
- Drill down from analytic aggregates always point to the fresh data.
There are numerous potential benefits of this architecture from business growth perspective. Just to give you an idea on its value proposition which is applicable to most organizations across diverse industries, sharing few examples of near real-time scenarios –
- In e-commerce system, contextual recommendations and promotions based on customer’s current purchase enriched with historical purchase, consumer behavior trend analysis.
- In manufacturing industry, analysis of data from IoT devices to respond to operational events, predictive maintenance, improvement in production safety.
- In energy and utilities industry, analysis of meter readings from smart meters, decisions to instantly buy and sell capacity, power generation analysis.
Similarly in health, finance and many other industries, heaps of scenarios where you could make better business decisions if you get an ability to analyse data in near real-time.
The next obvious question would be around data redundancy and cost impact of this solution. You could optimize the cost of this solution by managing the data retention policies across all the services. For example, Cosmos DB is an operational hot store where data is stored only for few days, Azure Data Explorer is an analytical warm store where frequently accessed data is stored, export rest of the old data to cold storage which is Azure storage in this solution. It is very easy to configure data retention in all these services so you could easily change it depending on your requirements.
Demonstration of solution with hands on lab
To help you understand the end to end flow of near real-time analytics solution, hands on lab with step by step guidance has been put together along with working code samples so you can try and test it on your own with the simulated data. Brief on what is being covered in this lab –
- Simulate data using data generator component which will pump the data into Cosmos DB.
- Leverage Cosmos DB change feed feature to trigger Azure function to push every change in Cosmos DB.
- Use streaming capability of Azure Data explorer to ingest the streamed data via Azure Event Hub.
- Run interactive queries using KQL(Kusto query language) with the glimpses on the advanced scenarios like forecasting, anomaly detection and time series analysis.
- In last module of lab, you will have lot of fun building near real-time dashboard using ADX dashboards.
The lab is publicly available here at GitHub.
Try it out and share your feedback!
Near real-time analytics solution can be built in multiple ways using different azure services, this lab describes one of the possible scenarios. Similar outcomes can be achieved using other azure services which are not covered in this lab.
Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.