Friday Five: Azure File Sync, GitHub Tips, More!

Friday Five: Azure File Sync, GitHub Tips, More!

This article is contributed. See the original author and article here.

krols.jpg


Upgrading Radial Gauge from UWP to WinUI 3


Diederik Krols lives in Antwerp, Belgium. He is a principal consultant at U2U Consult where he leads, designs and develops C# and XAML apps for the enterprise and the store. He’s a Windows Development MVP since 2014. Diederik runs the XamlBrewer blog on WordPress and the XamlBrewer repositories on GitHub. Follow him on Twitter @diederikkrols.


robbos.jpg


Maturity levels of using GitHub Actions Securely


Rob Bos is a Developer Technologies MVP and DevOps consultant from The Netherlands. Rob is typically working with anything DevOps related to improve flow. As a Global DevOps Bootcamp team member, he loves to automate large setups for the yearly event and uses any tool to get things done. For more on Rob, check out his Twitter @robbos81


hal.jpg


Surfaces of the Future Past


Hal Hostetler is an Office Apps and Services MVP who has been in the MVP program since 1996. Now retired, Hal is a Certified Professional Broadcast Engineer and remains the regional engineer for Daystar Broadcasting and a senior consultant for Roland, Schorr, & Tower. He lives in Tucson, Arizona. For more on Hal, check out his Twitter @TVWizard


silviodibenetto.jpg


AZURE FILE SYNC V14.1


Silvio Di Benedetto is founder and CEO at Inside Technologies. He is a Digital Transformation helper, and Microsoft MVP for Cloud Datacenter Management. Silvio is a speaker and author, and collaborates side-by-side with some of the most important IT companies including Microsoft, Veeam, Parallels, and 5nine to provide technical sessions. Follow him on Twitter @s_net.


tommy morgan.jpg


Weekly Update December 2021 – New UI libraries for ACS, To Do Tasks API, New SEFAUtil PowerShell, Bye Bye Command Line


Tom Morgan is a Microsoft Teams Platform developer and Microsoft MVP with more than 10 years of experience in the software development industry. For the last 8 years, Tom has worked at Modality Systems, with responsibility for delivery of the Modality Systems product portfolio. Tom is passionate about creating great software that people will find useful. He enjoys blogging and speaking about Microsoft Teams development, Office365, Bot Framework, Cognitive Services and AI, and the future of the communications industry. He blogs at thoughtstuff.co.uk and tweets at @tomorgan.

Azure Databricks: 2021 Year in Review

Azure Databricks: 2021 Year in Review

This article is contributed. See the original author and article here.

As we approach the new year, we often find ourselves reflecting on what we accomplished in the year that is coming to a close.  Global pandemic aside, 2021 was a busy year for Azure Databricks.  The platform expanded into new core use-cases while also providing additional capabilities and features to support mature and proven use-cases and patterns. Below you’ll find some of the highlights from 2021.



Lakehouse Platform


In 2021, the Lakehouse architecture really picked up steam.  Early in the year, Lakehouse was mentioned in Gartner’s Hype Cycle for Data Management validating it as an architecture pattern being evaluated and leveraged across many different organizations and leading to many data and analytics companies and services adopting and providing guidance around the pattern.


 


MikeCornellDatabricks_0-1640811130885.png


 


The Azure Databricks Lakehouse Platform provides end-to-end, best in class support for data engineering, stream processing, data science and machine learning, and SQL and business intelligence all on top of transactional, open storage in Azure Data Lake Store.


 


The foundation for the Azure Databricks Lakehouse is Delta Lake.  Delta Lake is an open storage format that brings reliability to curated data in the data lake with acid transactions, data versioning, time-travel, etc.  2021 saw a slew of new features introduced to Delta Lake and Azure Databricks’s integration with Delta Lake including:




Photon


Also in 2021, Photon was made available in Azure Databricks. Photon is the native vectorized query engine on Azure Databricks, written to be directly compatible with Apache Spark APIs so it works with your existing code. 


 


MikeCornellDatabricks_1-1640811130933.png


 


It’s also the same engine that was used to set an official data warehousing performance record. Photon is the default engine for Databricks SQL and is also available as a runtime option on clusters in Azure Databricks.  Try it out to see how it improves performance on your existing workloads.



New User Experiences


2021 also brought new user experiences to Azure Databricks.  Previously, data engineers, data scientists, and data analysts all shared the same notebook-based experience in the Azure Databricks workspace.  In 2021, two new experiences were added.  


 


MikeCornellDatabricks_2-1640811130917.png


 


Databricks Machine Learning was added to cater more to the needs of data scientists and ML engineers with easy access to notebooks, experiments, the feature store, and the MLflow model registry.  Databricks SQL was added to cater more to the needs of business analysts, SQL analysts, and database admins with a familiar SQL-editor interface, query catalog, dashboards, access to query history, and other admin tools. An important characteristic of the three distinct user experiences is that all of them share a common metastore with database, table, and view definitions, consistent data security, and consistent programming languages and APIs to Delta Engine.



Databricks SQL


One of the biggest additions to Azure Databricks in 2021 was Databricks SQL.  Databricks SQL was announced earlier in the year during the Data and AI Summit and went GA in Azure Databricks in December.  


 


MikeCornellDatabricks_3-1640811130943.png


 


Databricks SQL provides SQL users and analysts with a familiar user experience and best-in-class performance for querying data stored in Azure Data Lake Store.  Users can query tables and views in the SQL editor, build basic visualizations, bring those visualizations together in dashboards, schedule their queries and dashboards to refresh, and even create alerts based on query results.
  


MikeCornellDatabricks_4-1640811130900.png


 


Databricks SQL also provides SQL and database admins with the tools and controls necessary to manage the environment and keep it secure.  Administrators can monitor SQL endpoint usage, review query history, look at query plans, and control data access down to row and column level with familiar GRANT/DENY ACLs.


 


MikeCornellDatabricks_5-1640811130903.png


 


Along with Databricks SQL came stronger and easier integration with Power BI.  Over the course of 2021, the Power BI connector for Azure Databricks went GA and got several major performance improvements including Cloud Fetch for faster retrieval of larger datasets into Power BI.  With the inclusion of Power BI in the Azure Databricks Partner Connect portal, users can now connect their Databricks SQL endpoint to Power BI with just a few clicks.



Data Engineering


2021 also saw several new features and enhancements to help make data engineers more efficient.


MikeCornellDatabricks_6-1640811130915.png


 


One of those is Delta Live Tables.  Delta Live Tables provides a framework for building data processing pipelines.  Users define transformations and data quality rules, and Delta Live Tables manages task orchestration, dependencies, monitoring, and error handling.  Transformations and data quality rules can be defined using basic declarative SQL statements.  Delta Live Tables can also be combined with the Databricks Auto Loader to provide simple, consistent incremental processing of incoming data.  The Auto Loader also saw several enhancements including:



 


MikeCornellDatabricks_7-1640811130946.png


 


Another new capability that was added in 2021 is multi-task job orchestration.  Prior to this capability, Databricks jobs could only reference one code artifact (i.e a notebook) per job. This meant that an external jobs orchestration tool was needed to string together multiple notebooks and manage dependencies.  Multi-task job orchestration allows multiple notebooks and dependencies to be orchestrated and managed all from a single job. It also enables some additional future capabilities like the ability to reuse a jobs cluster across multiple tasks and even calculate a single DAG across tasks.



Databricks Machine Learning


Finally, 2021 brought some notable new features to an already industry leading data science and machine learning platform in Azure Databricks.


MikeCornellDatabricks_8-1640811130950.png


 


One of those features, announced during the Data and AI Summit, is Databricks AutoML.  Databricks AutoML takes an open, glass-box approach to apply machine learning to a selected dataset.  It includes prepping the dataset, training models, recording the hyperparameters, metrics, models, etc. using MLflow experiments, and even a Python notebook with the source code for each trial run (including feature importance!).  This allows data scientists to easily build on top of the models and code generated by Databricks AutoML.  Databricks AutoML automatically distributes trial runs across a selected cluster so that trials run in parallel.


 


MikeCornellDatabricks_9-1640811130956.png


 


Another feature introduced to Databricks Machine Learning in Azure Databricks was the Databricks Feature Store.  The Databricks Feature Store is built on top of Delta Lake and is stored in Azure storage giving it all the benefits of Delta Lake like an open format, built in versioning, time-travel, and built in lineage.  When used in an MLflow model, the Databricks Feature Store injects the feature information and feature lookup code right into the MLflow model artifact.  This takes the burden off of the data scientist and MLOps engineer.  The Databricks Feature Store even includes a sync for online storage like Azure MySQL for low latency feature lookups for real time inference.


 


Both the Databricks AutoML and Databricks Feature Store are available in the Databricks Machine Learning experience in the Azure Databricks Workspace.



More to Come in 2022


While 2021 was a busy year for Azure Databricks, there’s already some highly anticipated features and capabilities expected in 2022.  Those include:



  • The Databricks Unity Catalog will make it easier to manage and discover databases, tables, security, lineage, and other artifacts across multiple Azure Databricks workspaces

  • Managed Delta Sharing will allow for secure sharing of Delta Lake datasets stored in Azure Data Lake Store to internal consumers and external consumers all with an open, vendor/tool agnostic standard (already available in Power BI!)

  • Databricks Serverless SQL will allow for nearly-instant SQL compute startup with minimal management likely resulting in lower costs for BI and SQL workloads

  • New low-code/no-code capabilities will enable and empower both data scientists and citizen data scientists to explore, visualize, and even prepare data with just a few clicks.



Get Started


Ready to dive in and get hands on with these new features? Register for a free trial, attend this 3-part training series, and join an Azure Databricks Quickstart Lab where you can get your questions answered by Databricks and Microsoft experts.

Automatically stop unused Azure Data Explorer Clusters

Automatically stop unused Azure Data Explorer Clusters

This article is contributed. See the original author and article here.

Azure Data explorer team is constantly focused on reducing COGS and making sure users are paying only for value they are getting.


As part of this initiative, we’re now adding a new automatic capability to stop unused clusters.


In case, you created a cluster and did not ingest any data to it or even if you ingested data and later, you’re not running any queries or ingesting new data for days, we will automatically stop that cluster.


Stopping the cluster reduces cost significantly as it releases the compute resources which are the bulk of the overall cluster cost.


Once a cluster is stopped you will need to actively start it when you need it again. It will not start automatically when a new query is sent or ingest operation performed.


When a cluster is stopped it does not lose any of the already ingested data, since the storage resources are not released. This means, that once you start the cluster it will be ready for use in minutes with all databases, tables and functions fully operational.


It is possible to manage cluster auto-stop either using SDKs or using the Azure Portal. For more details on automatic stop of inactive Azure Data Explorer clusters, read this – https://docs.microsoft.com/en-us/azure/data-explorer/auto-stop-clusters


GabiLehner_0-1640699117848.png


 


This new feature is another piece in the cost reduction effort the Azure Data Explorer team is making.


 


You’re welcome to add more proposals and ideas and vote for them here – https://aka.ms/adx.ideas


ADX dashboards team

Azure Marketplace new offers – Volume 181

Azure Marketplace new offers – Volume 181

This article is contributed. See the original author and article here.











We continue to expand the Azure Marketplace ecosystem. For this volume, 125 new offers successfully met the onboarding criteria and went live. See details of the new offers below:
































































































































































































































































































































































































































































Get it now in our marketplace


AQUILA By IMEXHS.png

AQUILA By IMEXHS: Launched by IMEXHS to optimize patient treatment and improve diagnostic accuracy, the AQUILA cloud-based medical imaging solution streamlines workflow and ensures interoperability and connectivity of all your medical IoT devices and images.


Atvero.png

Atvero: This solution from nittygritty.net is a cloud-based project information management solution for design and construction companies. It features native integration with Microsoft 365, including SharePoint and Teams. Atvero provides an immutable project record compliant to industry standards.


Auror - Crime Intelligence Platform.png

Auror – Crime Intelligence Platform: Protect your retail stores from repetitive theft and security incidents with this SaaS offering from Auror. Using real-time alerts and analytics this platform tracks criminal activity and equips your team with data-driven reports and actionable insights.


Autopilot.png

Autopilot: Create consistent and standardized workflows in your inbox and get approvals and documents processed quickly with Autopilot’s add-in for Microsoft 365. This app uses validated electronic forms and preconfigured processes to efficiently manage routine tasks. 


CentOS 7.9 Server with LAMP.png

CentOS 7.9 Server with LAMP: LAMP is named as an acronym of the names of its original four open-source components: the Linux operating system, the Apache HTTP Server, the MariaDB relational database management system, and the PHP programming language. This app is only available in Spanish.


CentOS 7.9 Server with Prometheus and Grafana.png

CentOS 7.9 Server with Prometheus and Grafana: This Grafana and Prometheus instance is installed on a readymind virtual machine image with CentOS 7.9. Prometheus implements a highly dimensional data model. Grafana allows you to query, visualize, alert on, and understand your metrics. This app is only available in Spanish.


CentOS 8.5.png

CentOS 8.5: Customized and optimized for production environments on Microsoft Azure, this pre-installed disk images from Cognosys provides a secure, stable, manageable, and reproducible server on CentOS 8.5.


Cloud Management Platform CMP.png

Cloud Management Platform (TN): Utilizing AI-driven anomaly detection and forecasting capabilities, Caloudi’s Cloud Management Platform simplifies your organization’s complex cost and usage data. Increase efficiency and lower costs while minimizing cumbersome calculations. This solution is available in Taiwan.


Cloud Management Platform CMP2.png

Cloud Management Platform (US): Utilizing AI-driven anomaly detection and forecasting capabilities, Caloudi Cloud Management Platform simplifies your organization’s complex cost and usage data. Increase efficiency and lower costs while minimizing cumbersome calculations. 


Cloudwrxs - SAP Hana Express.png

Cloudwrxs – SAP HANA Express: CLOUDWRXS installs an SAP HANA Express 2.0 database based on sources provided in a storage account. Using this solution, one virtual machine is deployed. On the virtual machine, the HANA Express 2.0 database is installed. The size of this VM can be configured as needed.


Debian 11 Server with Prometheus and Grafana.png

Debian 11 Server with Prometheus and Grafana: This Grafana and Prometheus instance is installed on a readymind virtual machine image with Debian Bullseye. Prometheus implements a highly dimensional data model. Grafana is an open-source platform for monitoring and observability. This app is only available in Spanish.


Desk365.png

Desk365: Kani Technologies’ solution is a modern helpdesk for the Microsoft 365 workplace that helps you deliver outstanding customer service through channels like Microsoft Teams, email, web forms/widgets, and more. Automate repetitive work and save time with Desk365’s intuitive, feature-rich web app.


Drupal LDAP _ Active Directory Authentication.png

Drupal LDAP / Active Directory Authentication: Drupal LDAP Module allows users to log into Drupal using a LDAP or Active Directory server. Users can then authenticate against various LDAP implementations like Microsoft Active Directory, OpenLDAP, OpenDS, FreeIPA, Synology and other directory systems authentication.


Hardenend Windows 2019 Server with IIS Application.png

Hardened Windows Server 2019 with IIS Application: This image of Windows Server 2019 from readymind has the IIS Application Server features installed and unwanted services disabled to ensure a better security level, and it also implements an automatic post-deployment configuration.


Joomla OAuth Client OpenID Connect SSO using Azure Active Directory.png

Joomla OAuth Client OpenID Connect SSO using Azure Active Directory: This Single Sign-On plugin enables secure login into Joomla using Azure Active Directory as OAuth and OpenID Connect provider. Other OAuth providers such as Azure B2C and Microsoft 365 along with other custom providers can also be used to configure this plugin.


miniOrange Reverse Proxy Server.png

miniOrange Reverse Proxy Server: Strengthen the security of your cloud applications with miniOrange Reverse Proxy services. By installing features like URL rewriting, SSL offloading, load balancing, and rate limiting you can secure your web servers with end-to-end encryption.


NextCloud-Ready with Debian Bullseye.png

NextCloud-Ready with Debian Bullseye: Nextcloud is open-source file sync and share software for individuals and large enterprises and service providers. Nextcloud provides a safe, secure, and compliant file synchronization and sharing solution on servers you control. This app is only available in Spanish.


NextCloud-Ready with openSUSE LEAP 15.2.png

NextCloud-Ready with openSUSE LEAP 15.2: Nextcloud is open-source file sync and share software for individuals and large enterprises and service providers. Nextcloud also puts server control in your hands. This app is only available in Spanish and includes installation instructions for openSUSE LEAP 15.2.


NextCloud-Ready with SUSE 15 SP2.png

NextCloud-Ready with SUSE 15 SP2: Nextcloud is open-source file sync and share software for individuals and large enterprises and service providers. Nextcloud also puts server control in your hands. This app is only available in Spanish and includes installation instructions for SUSE Linux Enterprise Server 15 SP2.


NextCloud-Ready with Ubuntu Server 20.04 LTS.png

NextCloud-Ready with Ubuntu Server 20.04 LTS: This Grafana and Prometheus instance is installed on a readymind virtual machine image with Ubuntu 20.04 LTS Server. Prometheus implements a highly dimensional data model. Grafana is a platform for monitoring and observability. This app is only available in Spanish.


openSUSE LEAP 15.2 with Prometheus and Grafana.png

openSUSE LEAP 15.2 with Prometheus and Grafana: This Grafana and Prometheus instance is installed on a readymind virtual machine image with openSUSE LEAP 15.2. Prometheus implements a highly dimensional data model. Grafana is a platform for monitoring and observability. This app is only available in Spanish.


Qorus Integration Engine 4.1.x on Oracle Linux 7.png

Qorus Integration Engine 4.1.x on Oracle Linux 7: This agile, low-cost, and low-code enterprise integration platform for back-office IT business process automation features high process quality through fast configuration-based development of interfaces with automated fault-tolerant execution. It offers transparent operations and comprehensive data management.


RCL Digital Identity.png

RCL Digital IdentityThe RCL Digital Identity solution enable private and government agencies to verify a user’s real-life identity and then issue a digital identity with W3C verifiable credentials to access secured services. This solution uses several Microsoft Azure services.


Red Hat Enterprise Linux (RHEL) 7.8.png

Red Hat Enterprise Linux (RHEL) 7.8: This image offered by ProComputers.com provides a minimal version of Red Hat Enterprise Linux 7.8 with an auto-extending root filesystem and cloud-init included. It contains just enough packages to run within Azure, bring up an SSH Server, and allow users to log in.


Red Hat Enterprise Linux (RHEL) 7.9.png

Red Hat Enterprise Linux (RHEL) 7.9: This is a ready to use minimal Red Hat Enterprise Linux 7.9 image based on the official RHEL 7.9 Binary DVD ISO image. No separate RHEL8 subscription is required as the image is integrated with the Microsoft Azure-hosted RedHat Update Infrastructure (RHUI).


Red Hat Enterprise Linux (RHEL) 8.5.png

Red Hat Enterprise Linux (RHEL) 8.5: This image offered by ProComputers.com provides a minimal Red Hat Enterprise Linux 8.5 image based on the official RHEL 8.5 Binary DVD ISO image. It contains just enough packages to run within Azure, bring up an SSH Server, and allow users to log in.


Rookout Live Logger.png

Rookout Live Logger: Rookout Live Logger empowers engineers by giving them full visibility into their code by providing real-time access to on-demand log data. This gives developers the freedom to switch logs on and off without having to write more code and restart the application thus saving time and cost.


SparkBeyond Discovery.png

SparkBeyond Discovery: SparkBeyond Discovery is a data science platform for supervised machine learning that autonomously analyzes complex data and generates innovative insights which can help data professionals deepen their understanding of the problem space and improve model performance.


SUSE Server15 SP2 with Prometheus and Grafana.png

SUSE Server15 SP2 with Prometheus and Grafana: This Grafana and Prometheus instance is installed on a readymind virtual machine image with SUSE Linux Enterprise Server 15 SP2. Prometheus implements a highly dimensional data model. Grafana is a platform for monitoring and observability. This app is only available in Spanish.


SysKit Point Hosted.png

SysKit Point: SysKit Point is a governance solution for Microsoft 365 that provides a centralized inventory of all assets and simplifies access reporting and management. It empowers all stakeholders and users to take cloud collaboration and security to the next level.


Ubuntu 18.04LTS Server with Prometheus and Grafana.png

Ubuntu 18.04 LTS Server with Prometheus and Grafana: This Grafana and Prometheus instance is installed on a readymind virtual machine image with Ubuntu 18.04 LTS Server. Prometheus implements a highly dimensional data model. Grafana is a platform for monitoring and observability. This app is only available in Spanish.


Ubuntu 20.04LTS Server with Prometheus and Grafana.png

Ubuntu 20.04 LTS Server with Prometheus and Grafana: This Grafana and Prometheus instance is installed on a readymind virtual machine image with Ubuntu 20.04 LTS Server. Prometheus implements a highly dimensional data model. Grafana is a platform for monitoring and observability. This app is only available in Spanish.



Go further with workshops, proofs of concept, and implementations


Analytics System- 4-Week Implementation.png

Analytics System: 4-Week Implementation: The experts from CES IT will help you navigate and implement the Microsoft Azure Analytics and Microsoft Power BI platforms so you can make data-driven decisions and drive better business outcomes. This offer is available only in Russian.


API Management Deployment- 2-Week Implementation.png

API Management Deployment: 2-Week Implementation: In this hands-on workshop, the consultants from Protopia will tailor the integration and deployment of Microsoft Azure APIs to your technical and business requirements. Aligned with Cloud Adoption Framework (CAF) this implementation is most suitable for enterprise security.


Birlasoft Azure Landing Zones- 4-Week Workshop.png

Azure Landing Zones: 4-Week Workshop: In this workshop, Birlasoft will show you how to set up Azure Landing Zones using templates to automate the provisioning of infrastructure components on your production and non-production environments.


Business Analytics & AI- 30-Day Proof of Concept.png

Business Analytics & AI: 30-Day Proof of Concept: Cognizant’s customized proof of concept will demonstrate how your business hosted on Microsoft Azure can scale securely and grow exponentially by integrating data-driven insights using Microsoft Power BI.


Data Analytics Envisioning Package- 1-Day Workshop.png

Data Analytics Envisioning Package: 1-Day Workshop: Did you know there is hidden potential in your data? Cosmo Consult’s team of experts will harness your organization’s existing data assets and modernize your business intelligence and data analytics using Microsoft Azure. This offer is available only in German.


Data Lake for RGM- 3-Week Pilot Implementation.png

Data Lake for RGM: 3-Week Pilot Implementation: Tredence will roadmap, design, and build a data lake architecture on Microsoft Azure so you can modernize your business intelligence. Make informed revenue growth management decisions across different regional markets.


Demand Forecasting- 3-Week Pilot Implementation.png

Demand Forecasting: 3-Week Pilot Implementation: Tredence will implement its demand forecasting solution using Microsoft Azure services to increase transparency and efficiency in your supply chain planning and operations. Optimize inventory and save time and cost.


Digital Xperience Service- 10-Week Implementation.png

Digital Xperience Service: 10-Week Implementation: Digital Xperience Service is Xpand IT’s collaborative solution that ensures your team, app, website, or business is supported in a wholistic way. Learn how you can benefit from this multidisciplinary digital initiative.


End-to-End Azure Data Factory Pipelining- 2-Week Proof of Concept.png

End-to-End Azure Data Factory Pipelining: 2-Week Proof of Concept: In this proof of concept, CLEAR PEAKS consultants will assess and test if Microsoft Azure Data Factory is the right solution for your organization’s business needs by creating an end-to-end data pipeline from data source to data warehouse.


Factory Control Tower- 4-Hour Workshop.png

Factory Control Tower: 4-Hour Workshop: Learn how you can implement smart manufacturing strategies by remotely monitoring and analyzing the performance of your machines and industrial assets in this 4-hour workshop by Cluster Reply. Bring the power of Azure IoT Edge and Hub to your factory floor.


Forecaster- 4-Week Implementation.png

Forecaster: 4-Week Implementation: Quickly and accurately forecasts key business metrics with APN’s solution based on Microsoft Azure services. The implementation will help increase the accuracy of your organization’s budgets, sales, demand, and inventory forecasting.


Hybrid Cloud Security- 2-Week Workshop.png

Hybrid Cloud Security: 2-Week Workshop: In this workshop from Netwoven, you will learn how to mitigate risks and increase the security posture of your hybrid cloud by harnessing the power of Microsoft Azure Security Center. Quickly find and react to security breaches while reducing operational costs.


Ideation to Application on Azure- 3-Month Proof of Concept.png

Ideation to Application on Azure: 3-Month Proof of Concept: Delegate will deliver a proof of concept of its IDEAS process and toolbox so you can create long-term value for your customer and help your organization adopt and execute smart IT solutions using a host of Microsoft Azure services.


Inventory Allocation- 10-Week Implementation.png

Inventory Allocation: 10-Week Implementation: With this service offer from Tredence, your organization will learn how to improve customer satisfaction by assigning scarce inventory to the appropriate customer/market. Tredence’s solution uses Azure Machine Learning, Azure SQL and Azure Databricks.


Labor Planning- 8-Week Implementation.png

Labor Planning: 8-Week Implementation: In this offer, Tredence will implement its ML-driven framework using a host of Microsoft Azure services to help you accurately predict and plan for your labor needs. Optimize the workforce required in your distribution centers and reduce overhead costs.


ML Ops Framework Setup - 4-Week Implementation.png

MLOps Framework Setup: 4-Week Implementation: Get end-to-end visibility, pipeline traceability, and root cause analysis into your ML pipelines with minimal manual effort using ML Works. This industrialized machine learning operations platform from Tredence uses native Microsoft Azure components.


Modern Data Platform- 1-Month Proof of Concept.png

Modern Data Platform: 1-Month Proof of Concept: With this consulting offer from One point, you will start designing your modern data platform using Microsoft Synapse Analytics. Accelerate the time-to-market of your data initiatives without compromising on quality. This offer is available only in French.


On Shelf Availability- 3-Week Pilot Implementation.png

On-Shelf Availability: 3-Week Pilot Implementation: Leveraging their On-Shelf Availability solution built using native Microsoft Azure components, the experts at Tredence will assist CPG’s in identifying stock-outs and inventory issues. Reduce lost sales opportunities and get access to reliable marketing data.


Order Fulfillment- 12-Week Pilot Implementation.png

Production Planning: 12-Week Pilot Implementation: See how you can align your organization’s production schedule with market demand using Tredence’s pilot implementation. The solution uses Microsoft Azure App Services and Azure Data Factory to track and streamline production.


Rapid Planning- 4-Week Implementation.png

Rapid Planning: 4-Week Implementation: The consultants from Abylon will help automate your financial planning and forecasting by implementing the Rapid Planning tool. This Microsoft Excel-add on allows you to export and import data from data warehouses and edit and validate it within Excel.


Supply Risk Monitoring- 12-Week Pilot Implemenation.png

Supply Risk Monitoring: 12-Week Pilot Implementation: By implementing Tredence’s supply risk monitoring solution you can proactively detect, identify, and resolve supply-chain disruptions. Your business can avoid shortages and stock-outs while accurately forecasting labor needs.


Trade Promo Optimization- 3-Week Pilot Implementation.png

Trade Promo Optimization: 3-Week Pilot ImplementationIn this engagement, Tredence will help your business improve its trade spend ROI and optimize trade promotion opportunities with its ML-driven trade promotion optimization solution on Microsoft Azure.


Unified Marketing Platform- 4-Week Pilot Implementation.png

Unified Marketing Platform: 4-Week Pilot Implementation: This implementation from Tredence will help optimize marketing decisions with a scientific ML-driven solution. Plan, execute and measure your marketing spend with a unified marketing platform on Microsoft Azure.


ViBiCloud Merah Putih- 5-Week Implementation.png

ViBiCloud Merah Putih: 5-Week Implementation: Reduce infrastructure overload and simplify your deployment process with this 5-week implementation by ViBiCloud Merah Putih. Integrated with Azure Arc and Azure Kubernetes Service this hybrid cloud solution is powered by Azure Stack HCI.


Xpress Azure Stack Hub- 10-Week Implementation.png

Xpress Azure Stack Hub: 10-Week Implementation: The experts from RFC will migrate your legacy environments to a hybrid cloud model and deploy Microsoft Azure Stack Hub in partnership with an OEM team. Learn how your enterprise can increase scalability, optimize performance, and reduce cost.



Contact our partners



ACI Fraud Management



Acotel Agent Smart Router Saas



AI Event Monitoring



Allscripts TouchWorks EHR



Argo Workflow Controller packaged by Bitnami



Argo Workflow Executor packaged by Bitnami



Articulo



Attendi Speech Service



Azure App Service: 2-Hour Briefing



Azure Sentinel: Fusion Core



BizTalk to Azure Migration: 10-Day Assessment



Booxi



CAMCUBE: Intelligent IT Solutions for Healthcare



Cloud Adoption Framework: 1-Week Assessment



Cloud Ctrl: Simplify Cloud Spend



Cloud Readiness Consulting Service: 1-Month Assessment



Cloud Security Consulting Service: 3-Week Assessment



CloudFit Managed Client Experience for CMMC



CMC – Azure Virtual Desktop



Cold Chain Management Platform



CrateOM



Crayon Data’s maya.ai: Powering the Edge of Relevance



Data Center Modernization: 2-Week Assessment



Data Governance



Data Science: 8-Day Assessment



Data Strategy: 4-Day Assessment



Datapath Intelligent Form Capture



EBO Intelligent Agent



EMODS: Energy Monitoring & Optimization for Decision Support



Heimdal Remote Desktop



Het Dataloket



Hypori



Identity Maturity Assessment: 3- to 4-Day Assessment



IDSync



Informatica Intelligent Cloud Services



KPMG Tax Reimagined



KPN Data Services Hub



KPN Secure Networking for End-to-End Connectivity



Logicalis Cloud Governance Forecast Framework



MC Virtual Care



Migration to Azure: 4-Week Assessment



Mi-Zentro



MLOps Framework Setup: 2-Week Assessment



ML!PA Smart Product



OneEAI Platform



Orbital Insight Supply Chain Analytics



PRCP Managed Services Logicalis Spain



Pricing – Hero Intelligence



Procesox



Public Transport Solution with Power BI: 5-Day Assessment



RabbitMQ Messaging Topology Operator



Secure Connected Factory



Sega Defense



SOC as a Service



TCS Intelligent Power Plant



Teams Voice Management Apps



UI Bakery



UIBSbe Booking Engine



UIBScommerce Add-on Shopping Cart



UIBScommerce E-commerce System



UIBStable Restaurant Reservations System



Unified Technologies Lighthouse: 1-Day Assessment



Uptake Sustainability Bundle



WAPPLES SA v6.0 on Azure



Web AR with Dynamics 365 Commerce



YooniK Web Authentication



Zero Trust Secure Network Access



ZiFaaS to Azure Integration Service: 1-Day Briefing



Extracting SAP data using OData – Part 7 – Delta extraction using SAP Extractors and CDS Views

Extracting SAP data using OData – Part 7 – Delta extraction using SAP Extractors and CDS Views

This article is contributed. See the original author and article here.







Before implementing data extraction from SAP systems please always verify your licensing agreement.

 


Seven weeks passed in a blink of an eye, and we are at the end of the Summer with OData-based extraction using Synapse Pipeline. Each week I published a new episode that reveals best practices on copying SAP data to the lake, making it available for further processing and analytics. Today’s episode is a special one. Not only it is the last one from the series, but I’m going to show you some cool features around data extraction that pushed me into writing the whole series. Since I have started working on the series, it was the main topic I wanted to describe. Initially, I planned to cover it as part of my Your SAP on Azure series that I’m running for the last couple of years. But as there are many intriguing concepts in OData-based extraction, and I wanted to show you as much as I can, I decided to run a separate set of posts. I hope you enjoyed it and learnt something new.


 


Last week I described how you could design a pipeline to extract only new and changed data using timestamps available in many OData services. By using filters, we can only select a subset of information which makes the processing much faster. But the solution I’ve shared works fine for just a part of services, where the timestamp is available as a single field. For others, you have to enhance the pipeline and make the complex expressions even more complicated.


 


There is a much better approach. Instead of storing the watermark in the data store and then using it as filter criteria, you can convince the SAP system to manage the delta changes for you. This way, without writing any expression to compare timestamps, you can extract recently updated information.


 


The concept isn’t new. SAP Extractors are available since I remember and are commonly used in SAP Business Warehouse. Nowadays, in recent SAP system releases, there are even analytical CDS views that support data extraction scenarios, including delta management! And the most important information is that you can expose both SAP extractors and CDS Views as OData services making them ideal data sources.


 


EXPOSE EXTRACTORS AND CDS VIEWS AS ODATA


 









There is a GitHub repository with source code for each episode. Learn more:


https://github.com/BJarkowski/synapse-pipelines-sap-odata-public



 


The process of exposing extractors and CDS views as OData is pretty straightforward. I think a bigger challenge is identifying the right source of data. 


 


You can list available extractors in your system in transaction RSA5. Some of them may require further processing before using.


 


image001.png


 


When you double click on the extractor name, you can list exposed fields together with the information if the data source supports delta extraction.


 


image003.png


 


In the previous episode, I mentioned that there is no timestamp information in OData service API_SALES_ORDER_SRV for entity A_SalesOrderItem. Therefore, each time we had to extract a full dataset, which was not ideal. The SAP extractor 2LIS_11_VAITM, which I’m going to use today, should solve that problem.


 


I found it much more difficult to find CDS views that support data extraction and delta management. There is a View Browser Fiori application that lists available CDS Views in the system, but it lacks some functionality to make use of it – for example, you can’t set filters on annotations. The only workaround I found was to enter @Analytics.dataextraction.enabled:true in the search field. This way you can at least identify CDS Views that can be used for data extraction. But to check if they support delta management you have to manually check view properties.


 


image004.png


 


Some CDS Views are still using the timestamp column to identify new and changed information, but as my source system is SAP S/4HANA 1909, I can benefit from the enhanced Change Data Capture capabilities, which use the SLT framework and database triggers to identify delta changes. I think it’s pretty cool. If you consider using CDS Views to extract SAP data, please check fantastic blog posts published by Simon Kranig. He nicely explains the mechanics of data extraction using CDS Views.


https://blogs.sap.com/2019/12/13/cds-based-data-extraction-part-i-overview/


 


I’ll be using the extractor 2LIS_11_VAITM to get item details and the I_GLAccountLineItemRawData to read GL documents. To expose the object as an OData service create a new project in transaction SEGW:


image006.png


 


Then select Data Model and open the context menu. Choose Redefine -> ODP Extraction.


 


image007.png


 


Select the object to expose. If you want to use an extractor, select DataSources / Extractors as the ODP context and provide the name in the ODP Name field:


 


image008.png


 


To expose a CDS View, we need to identify the SQL Name. I found it the easiest to use the View Browser and check the SQLViewName annotation:


image010.png


 


Then in the transaction SEGW create a new project and follow the exact same steps as for exposing extractors. The only difference is the Context, which should be set to ABAP Core Data Services.


 


image012.png


 


Further steps are the same, no matter if you work with an extractor or CDS view. Click Next. The wizard automatically creates the data model and OData service, and you only have to provide the description.


 


image014.png


 


Click Next again to confirm. In the pop-up window select all artefacts and click Finish.


 


image016.png


 


The last step is to Generate Runtime Object which you can do from the menu: Project -> Generate. Confirm model definition and after a minute your OData service will be ready for registration.


 


image018.png


 


Open the Activate and Maintain Services report (/n/iwfnd/maint_service) to activate created OData services. Click Add button and provide the SEGW project name as Technical Service Name:


 


image019.png


 


Click Add Selected Services and confirm your input. You should see a popup window saying the OData service was created successfully. Verify the system alias is correctly assigned and the ICF node is active:


 


image021.png


 


OData service is now published and we can start testing it.


 


EXTRACTING DATA FROM DELTA-ENABLED ODATA SERVICES


 


Let’s take a closer look at how does data extraction works in delta-enabled OData service. 


You have probably noticed during the service creation, that extractors and CDS views give you two entities to use:



  • Representing the data source model, with the name starting with EntityOf<objectName>, FactsOf<objectName> or AttrOf<objectName> depending on the type of extractor or view 

  • Exposing information about current and past delta tokens, with the name starting with DeltaLinksOf<objectName>


By default, if you send a request to the first service, you will retrieve a full dataset. Just like you’d work with any other OData services we covered in previous episodes. The magic happens if you add a special request header:


 


 

Prefer: odata.track-changes

 


 


It tells the system that you want it to keep track of delta changes for this OData source. Then, as a result, in the response content, together with the initial full dataset, you can find an additional field __delta with the link you can use to retrieve only new and changed information.


 


image023.png


The additional header subscribes you to the delta queue, which tracks data changes. If you follow the __delta link, which is basically the OData URL with extra query parameter !deltatoken, you will retrieve only updated information and not the full data set.


image025.png


In the SAP system, there is a transaction ODQMON that lets you monitor and manage subscriptions to the delta queue.


image027.png


You can query the second entity, with the name starting with DeltaLinksOf<EntityName>, to receive a list of the current and past delta tokens.


image029.png


We will use both entities to implement a pipeline in Synapse. Firstly, we will check if there are already open subscriptions. If not, then we’ll proceed with the initial full data extraction. Otherwise, we will use the latest delta token to retrieve changes made since the previous extraction.


 


IMPLEMENTATION


 


Open Synapse Studio and create a new pipeline. It will be triggered by the metadata one based on the ExtractionType field. Previously we have used the keywords Delta and Full to distinguish which pipeline should be started. We will use the same logic, but we’ll define a new keyword Deltatoken to distinguish delta-enabled OData services.


 


I have added both exposed OData services to the metadata store together with the entity name. We won’t implement any additional selection or filtering here (and I’m sure you know how to do it if you need it), so you can leave the fields Select and Filter empty. Don’t forget to enter the batch size, as it’s going to be helpful in the case of large datasets.


 


image031.png


 


Excellent. As I mentioned earlier, to subscribe to the delta queue, we have to pass an additional request header. Unfortunately, we can’t do it at the dataset level (like we would do for REST type connection), but there is a workaround we can use. When you define an OData linked service, you have an option of passing additional authentication headers. The main purpose of this functionality is to provide API Key for services that require this sort of authentication. But it doesn’t stop us from re-using this functionality to pass our custom headers.


 


There is just one tiny inconvenience that you should know. As the field should store an authentication key, the value is protected against unauthorized access. It means that every time you edit the linked service, you have to retype the header value, exactly the same as you would do with the password. Therefore if you ever have to edit the Linked Service again, remember to provide the header value again.


 


Let’s make changes to the Linked Service. We need to create a parameter that we will use to pass the header value:


 


 

"Header": {
	"type": "String"
}

 


 


Then to define authentication header add the following code under the typeProperties:


 


 

"authHeaders": {
    "Prefer": {
        "type": "SecureString",
        "value": "@{linkedService().Header}"
    }
},

 


 


For reference, below, you can find the full definition of my OData linked service.


 


 

{
    "name": "ls_odata_sap",
    "type": "Microsoft.Synapse/workspaces/linkedservices",
    "properties": {
        "type": "OData",
        "annotations": [],
        "parameters": {
            "ODataURL": {
                "type": "String"
            },
            "Header": {
                "type": "String"
            }
        },
        "typeProperties": {
            "url": "@{linkedService().ODataURL}",
            "authenticationType": "Basic",
            "userName": "bjarkowski",
            "authHeaders": {
                "Prefer": {
                    "type": "SecureString",
                    "value": "@{linkedService().Header}"
                }
            },
            "password": {
                "type": "AzureKeyVaultSecret",
                "store": {
                    "referenceName": "ls_keyvault",
                    "type": "LinkedServiceReference"
                },
                "secretName": "s4hana"
            }
        },
        "connectVia": {
            "referenceName": "SH-IR",
            "type": "IntegrationRuntimeReference"
        }
    }
}

 


 


The above change requires us to provide the header every time we use the linked service. Therefore we need to create a new parameter in the OData dataset to pass the value. Then we can reference it using an expression:


image035.png


image037.png


 


In Synapse, every parameter is mandatory, and we can’t make them optional. As we use the same dataset in every pipeline, we have to provide the parameter value in every activity that uses the dataset. I use the following expression to pass an empty string.


 


 


 

@coalesce(null)

 


 


 


Once we enhanced the linked service and make corrections to all activities that use the affected dataset it’s time to add Lookup activity to the new pipeline. We will use it to check if there are any open subscriptions in the delta queue. The request should be sent to the DeltaLinksOf entity. Provide following expressions:


 


 

ODataURL: @concat(pipeline().parameters.URL, pipeline().parameters.ODataService, '/')
Entity: @concat('DeltaLinksOf', pipeline().parameters.Entity)
Header: @coalesce(null)

 


 


 


image039.png


 


To get the OData service name to read delta tokens I concatenate ‘DeltaLinkOf’ with the entity name that’s defined in the metadata store.


 


Ideally, to retrieve the latest delta token, we would pass the $orderby query parameter to sort the dataset by the CreatedAt field. But surprisingly, it is not supported in this OData service. Instead, we’ll pull all records and use an expression to read the most recent delta token.


 


Create a new variable in the pipeline and add Set Variable activity. The below expression checks if there are any delta tokens available and then assigns the latest one to the variable.


 


image041.png


Add the Copy Data activity to the pipeline. The ODataURL and Entity parameters on the Source tab use the same expression as in other pipelines, so you can copy them and I won’t repeat it here. As we want to enable the delta capabilities, provide the following value as the header:


 


 

odata.track-changes

 


 


 Change the Use Query setting to Query. The following expression checks the content of the deltatoken variable. If it’s not empty, its value is concatenated with the !deltatoken query parameter and passed to the SAP system. Simple and working!


 


 

@if(empty(variables('deltatoken')), '', concat('!deltatoken=''', variables('deltatoken'), ''''))

 


 


image043.png


 


Don’t forget to configure the target datastore in the Sink tab. You can copy all settings from one of the other pipelines – they are all the same.


 


We’re almost done! The last thing is to add another case in the Switch activity on the metadata pipeline to trigger the newly created flow whenever it finds delta token value in the metadata store.


 


image045.png


 


We could finish here and start testing. But there is one more awesome thing I want to show you!


 


The fourth part of the series focuses on paging. To deal with very large datasets, we implemented a special routine to split requests into smaller chunks. With SAP extractors and CDS views exposed as OData, we don’t have to implement a similar architecture. They support server-side pagination and we just have to pass another header value to enable it.


 


Currently, in the Copy Data activity, we’re sending odata.track-chages as the header value. To enable server-side paging we have to extend it with odata.maxpagesize=<batch_size>.
Let’s make the correction in the Copy Data activity. Replace the Header parameter with the following expression:


 


 

@concat('odata.track-changes, odata.maxpagesize=', pipeline().parameters.Batch)

 


 


 


image047.png


Server-side pagination is a great improvement comparing with the solution I described in episode four.


 


EXECUTION AND MONITORING


 


I will run two tests to verify the solution works as expected. Firstly, after ensuring there are no open subscriptions in the delta queue, I will extract all records and initialize the delta load. Then I’ll change a couple of sales order line items and run the extraction process again. 


 


Let’s check it!


 


image049.png


 


The first extraction went fine. Out of 6 child OData services, two were processed by the pipeline supporting delta token. That fits what I have defined in the database. Let’s take a closer look at the extraction details. I fetched 379 sales order line items and 23 316 general ledger line items, which seems to be the correct amount.


 


image051.png


 


In the ODQMON transaction, I can see two open delta queue subscriptions for both objects, which proves the request header was attached to the request. I changed one sales order line item and added an extra one. Let’s see if the pipeline picks them up.


 


image053.png


Wait! Three records? How is that possible if I only made two changes?


 


Some delta-enabled OData services provide the functionality not only to track new items but also records deleted information. That’s especially useful in the case of sales orders. Unlike a posted accounting document, which can be only ‘removed’ by reversal posting, a sales order is open for changes much longer. Therefore, to have consistent data in the lake, we should also include deleted information.


 


But still, why did I extract three changes if I only made two changes? Because that’s how this extractor works. Instead of only sending the updated row, it firstly marks the whole row for deletion and then creates a new one with the correct data.


 


image055.png


 


So the only thing left is the validation of the server-side paging. And I have to admit it was a struggle as I couldn’t find a place in Synapse Pipelines to verify the number of chunks. Eventually, I had to use ICM Monitor to check logs at the SAP application servers. I found there an entry suggesting the paging actually took place – can you see the !skiptoken query parameter received by the SAP system?


 


image057.png


Do you remember that when you run delta-enabled extraction, there is an additional field __delta with a link to the next set of data? Server-side paging works in a very similar way. At the end of each response, there is an extra field __skip with the link to the next chunk of data. Both solutions use tokens passed as the query parameters. As we can see, the URL contains the token, which proves Synapse used server-side pagination to read all data.


 


It seems everything is working fine! Great job!


 


EPILOGUE


 


Next week there won’t be another episode of the OData extraction series. During the last seven weeks, I covered all topics I considered essential to create a reliable data extraction process using OData services. Initially, we built a simple pipeline that could only process a single (and not containing much data) OData service per execution. It worked well but was quite annoying. Whenever we wanted to extract data from a couple of services, we had to modify the pipeline. Not an ideal solution.


 


But I would be lying if I said we didn’t improve! Things got much better over time. In the second episode, we introduced pipeline parameters that eliminated the need for manual changes. Then, another episode brought metadata store to manage all services from a single place. The next two episodes focus on performance. I introduced the concept of paging to deal with large datasets, and we also discussed selects and filters to reduce the amount of data to replicate. The last two parts were all about delta extraction. I especially wanted to cover delta processing using extractors and CDS views as I think it’s powerful, yet not commonly known.


 


Of course, the series doesn’t cover all aspects of data extraction. But I hope this blog series gives you a strong foundation to find solutions and improvements on your own. I had a great time writing the series, and I learnt a lot! Thank you!