Find the team at Microsoft Build

by Contributed | May 24, 2021 | Technology

This article is contributed. See the original author and article here.

Learn. Connect. Code. That’s Microsoft Build!

But if your role is less “developer” and more “IT Pro”, you’ll be pleased to know you can catch a glimpse of some of our Modern Infrastructure team participating in Microsoft Build, May 25-27 2021. Click on the title to find out the session date and time in your local timezone, and add it to your schedule builder if you’ve signed in.

Not registered yet? Head over to the event page and click Register Now: https://mybuild.microsoft.com/home
If you can’t make a session, you’ll get access to view the recording later, if one is made available.

Ask the Expert: Microsoft Certifications – Orin Thomas

Get your Microsoft Certification questions answered by our panel of experts. We’ll cover available Microsoft Certification exams, how to keep your skills and certifications current, and tips and tricks on how to best leverage your certification.

Hybrid is here to stay – Pierre Roman

Cloud adoption is growing, but the cloud is more than just public cloud. Convergence with on-premise is a reality for most environments, and now with Azure Arc improvements this opens up new opportunities and this Table Topic is the place to talk about them. Unmute yourself, turn your camera on – No presentations here! Be prepared for chat and a fun interactive discussion! This session will not be recorded.

Mark Russinovich on Azure innovation and more! – Rick Claus

Join Rick and Mark in a conversation discussing Azure innovation, cloud native programming models, and other exciting technology.

You’ll also find Rick hosting Microsoft Build in the late night USA/daytime Europe time slot on the main screen player, Orin making a special guest appearance on an upcoming Hello World episode on Learn TV, and some of the team may pop up as chat moderators in your sessions!

Enjoy Microsoft Build this week and let us know – what caught your eye for IT Pros at this event?

Cornell University creating a graduate course on cloud computing, to educate the next generation

by Contributed | May 24, 2021 | Technology

This article is contributed. See the original author and article here.

Guest Blog by Ken Birman, N. Rama Rao Professor of Computer Science, Cornell University

At Cornell University, we have a long history in the area of “cloud computing” – dating back to before the subject even had that name!

I’m Ken Birman, and I’ve been involved in distributed computing and scalability from the early days. Back in the 1990’s, I created the first widely impactful software for fault-tolerant, scalable, consistent computing and data management, and my work was used very broadly. Even now, if you fly into Europe there is a good chance that the air traffic control system guiding your plane uses an architecture (and software) I created. For a decade my software ran the New York Stock Exchange and Swiss Stock Exchange without a single unmanaged fault – the code was designed to self-heal – and quite a few companies (including Microsoft) use my technology in their data-center management layers.

My students did well too: Ranveer Chandra is Microsoft’s Chief Scientist for Azure Global (and also the lead researcher for networking, and for IoT computing on smart farms). Werner Vogels is CTO at Amazon, and Dahlia Malkhi leads in the development of Facebook’s Diem digital currency.

So, it seemed natural to create a graduate course on cloud computing, to educate the next generation of developers, entrepreneurs, and leaders. When planning CS5412, “Topics in Cloud Computing”, I knew that hands-on experience would be key, and was fortunate to be offered tremendous support by Microsoft’s academic liaison group, along with Azure computing resources for our class projects. At the same time, I wanted to give the students a focus and concrete scenarios, which led me to partner with Cornell’s ANSC 3510, a course focused on technology in dairy farming. Working with Professor Julio Giordano, who created that course, we’ve created a unique structure in which two completely different classes are able to team up (using Microsoft Teams, of course!) to jointly create projects that demonstrate the power of cutting edge IoT infrastructures backed by Azure cloud ML and AI.

A few examples will help set the context (all the student groups define individualized projects, so these are really two of many). When cows enter a dairy to be milked, a procession of as many as 500 animals will move from outside in the field into the milking area of the barn through a series of side-by-side corridors. By installing cameras, we can photograph the animals as they pass through. With cloud computing tools, my students are able to identify the animal (RFID tags should make this easy, but in fact they often fall off or are difficult to ping in crowded settings), assess the health of each animal, decide if the cow needs to be routed into the “cow wash” before milking etc. One project looks at this specific question, hosting the solution on Azure IoT Edge.

A second project focused on building a highly flexible dairy dashboard for the farmer, tracking a variety of milk quality, quantity and safety properties while also providing an integrated perspective on the herd as a whole, dairy product price trends, and even allowing the farmer to zoom in and look at health statistics for individual animals. Automated notifications alert the farmer if a cow is close to calving or needs to see a vet for some other reason. Yet a third project looks at integrating data from diverse sources, such as weather predictions, farm-soil moisture levels and imaging to predict risk factors such as outbreaks of pathogens. A farmer warned of trouble three weeks from now might have time to bring the crop in a little early, before the pathogen gains a toehold.

What I find exciting about these projects is that they bring my cloud computing students into real-world settings. In teams of two or three, students from my cloud computing class partner with Professor Giordano’s students to understand the kinds of data available and to appreciate the proper way to approach technical questions seen on dairy farms. They work together to map these questions to ML, decide how to train the models, and even collaborate to design dashboards that visualize the outcomes of the analysis. Meanwhile, Professor Giordano’s animal science students, who come to these partnerships with a lot of knowledge about modern dairies and modern AI, learn about the challenges of putting those tools to work to create useful new technology, and are exposed to the power (and limitations) of the Azure cloud. As a side-effect, by working with Microsoft Teams, the groups also become comfortable with cutting-edge collaboration technology.

My approach in CS5412 has been to design to the course around three central goals. First, I do want my students to appreciate the relevant computer science foundations: topics such as state machine replication (Paxos), leader election, and fault-tolerant consensus. Second, it is important for them to learn to view the cloud as a platform with powerful, inherently scalable services such as CosmosDB, the Azure Blob Store and Azure Data Lake at their disposal, along with powerful computing frameworks like the Azure Intelligence Platform and Cognitive Search. These in turn center on fundamental practical questions, such as understanding data and computation sharding for scalability, and appreciating the roles of the IoT Hub and its integration with Azure Functions.

Last is the practical goal of becoming proficient with the actual tools: Any student who “survives” my course emerges as a capable cloud programmer with a valuable skill set coupled to an understand of how the Azure documentation and prebuilt “recipes” can lead to customizable code frameworks ideally fitted to their needs.

It has taken us many years to evolve CS5412 to the current state, but today I feel as if my students really learn the right material, that the partnership with Animal Sciences is operating smoothly and bringing really high value to the table, and that the course is simply working very well. My textbook, written in 2010, probably needs a thorough revision at this point, but on the other hand, modern students don’t really read textbooks! What they do read is online material, especially coupled with “how-to” videos and even hands-on opportunities, and on this, Microsoft has excelled.

The documentation available is outstanding, but beyond that, Microsoft has participated in Cornell’s Digital Agriculture Hackathons (the CIDA Hackathon), and my students find this to be a fantastic resource. During a February weekend students come together to learn how cloud computing can solve problems in agriculture, looking at technical but also social and economic/business perspectives, with hands-on instruction by Microsoft experts.

CS5412 has become a real treasure here at Cornell, and is helping position our students to hit the ground running as they enter the workforce. We couldn’t have done it without Microsoft Azure and Microsoft Teams, and are tremendously grateful for the partnership opportunity!

Azure CycleCloud – the simplest way to execute HPC on Azure

by Contributed | May 24, 2021 | Technology

This article is contributed. See the original author and article here.

Accelerated computing. Microsoft HPC continues to invest to deliver the broadest range of Accelerated and high-performance computing (HPC) capabilities in the public cloud. From InfiniBand-enabled Virtual Machine families for artificial intelligence and HPC, to Hyperscale services like Cray supercomputing, Azure enables customers to deliver the full spectrum of AI and machine learning applications.

Azure CycleCloud – the simplest way to execute HPC on Azure

Azure CycleCloud, a tool for creating, managing, operating, and optimizing HPC clusters of any scale in Azure. With Azure CycleCloud, we are making it even easier for everyone to deploy, use, and optimize HPC burst, hybrid, or cloud-only clusters. For users running traditional HPC clusters, using schedulers including SLURM, PBS Pro, Grid Engine, LSF, HPC Pack, or HTCondor, this will be the easiest way to get clusters up and running in the cloud, and manage the compute/data workflows, user access, and costs for their HPC workloads over time.

With a few clicks, HPC IT administrators can deploy high-performance clusters of compute, storage, filesystem, and application capability in Azure. Azure CycleCloud’s role-based policies and governance features make it easy for their organizations to deliver the hybrid compute power where needed while avoiding runaway costs. Users can rely on Azure CycleCloud to orchestrate their job and data workflows across these clusters.

NVIDIA GPU Cloud with Azure

As GPUs provide outstanding performance for AI and HPC, Microsoft Azure provides a variety of virtual machines enabled with NVIDIA GPUs. Starting today, Azure users and cloud developers have a new way to accelerate their AI and HPC workflows with powerful GPU-optimized software that takes full advantage of supported NVIDIA GPUs on Azure.

Containers from the NVIDIA GPU Cloud (NGC) container registry are now supported. The NGC container registry includes NVIDIA tuned, tested, and certified containers for deep learning software such as Microsoft Cognitive Toolkit, TensorFlow, PyTorch, and NVIDIA TensorRT. Through extensive integration and testing, NVIDIA creates an optimal software stack for each framework – including required operating system patches, NVIDIA deep learning libraries, and the NVIDIA CUDA Toolkit – to allow the containers to take full advantage of NVIDIA GPUs. The deep learning containers from NGC are refreshed monthly with the latest software and component updates.

NGC also provides fully tested, GPU-accelerated applications and visualization tools for HPC, such as NAMD, GROMACS, LAMMPS, ParaView, and VMD. These containers simplify deployment and get you up and running quickly with the latest features.

To make it easy to use NGC containers with Azure, a new image called NVIDIA GPU Cloud Image for Deep Learning and HPC is available on Azure Marketplace. This image provides a pre-configured environment for using containers from NGC on Azure. Containers from NGC on Azure NCv2, NCv3, and ND virtual machines can also be run with Azure Batch AI by following these GitHub instructions.To access NGC containers from this image, simply signup for a free account and then pull the containers into your Azure instance.

Resources on Microsoft Learn

Understand factors that influence HPC storage selection in Azure

Optimizing tightly coupled HPC applications on HBv2, HC, and HB series virtual machines

Troubleshoot tightly coupled HPC applications on HBv2, HC, and HB series virtual machines

Choose the suitable Service Level of Azure NetApp Files for your HPC applications

Create an Azure Batch account by using the Azure portal

Saving Tweety with Azure Percept

by Contributed | May 24, 2021 | Technology

This article is contributed. See the original author and article here.

Goran Vuksic

Goran works as a Technical manager for Stratiteq Sweden, he is Microsoft AI MVP, he has 15 years of work experience in IT and wide knowledge about various technologies and programming languages. He worked on various projects for notable clients and projects he worked on have been featured many times on web sites like Forbes, The Next Web, NVIDIA Developer, TechCrunch, Macworld and others. In the last few years, he attended several hackathons and other competitions on which his skills and work were recognized and awarded. Goran is tech enthusiast, he writes technical blog posts and he likes to share his wide knowledge on different workshops and talks. You can connect with Goran on LinkedIn and follow him on Twitter.

Introduction

Azure Percept is an easy-to-use platform for creating edge AI solutions. Azure Percept Development Kit comes with an intelligent camera Azure Percept Vision. Services like Azure Cognitive Services, Azure Machine Learning, Azure Live Video Analytics, and many other works out of the box with Azure Percept. With the development kit you can set up proof of concepts in minutes, and integrate it with Azure AI and Azure IoT services. Overview of Azure Percept Development Kit can be found here, and on same link you can also find links how to set up your device.

This post will show you how to create a simple project with Azure Percept Development Kit. I will use Tweety and Sylvester LEGO minifigures from Looney Tunes special edition. Idea is to train an AI model to be able to recognise Tweety and Sylvester, and find them amongst other LEGO minifigures. With a model that is able to tell where Tweety and Sylvester are we could keep Tweety safe, gor obvious reasons that Sylvester is always hungry and he should not be anywhere close to Tweety.

Azure Percept Studio

Azure Percept Studio makes it really easy to work with your Azure Percept, and it is the single launch point for creating edge AI models and solutions you want to develop.

Interface will guide you in an intuitive way no matter if you are new to AI models and want to create a prototype, try out some sample applications or try out more advanced tools.

In left menu you can click on option “Devices” and overview of your Percept devices will appear. For each device you can see status is it connected, and you can click on it to view details. We will use second tab “Vision” for this project where you can capture images for your project, view device stream and deploy a custom model to the device.

Training the model

We want to create a custom model that will be able to recognise Tweety and Sylvester. You can open Custom Vision in a new tab, log in and create a new project. Give a name to your project, create a resource, select “Object detection” and “General” domain. Custom Vision project domains are explained here in more detail.

Images can also be added to the project from Azure Percept Studio, with option we have seen recently. If you are using that option you will be able to take pictures with Azure Percept Vision camera. You can also add and use any image that you prepared earlier.

Once images are added, you need to tag them marking the objects your model will recognise. In this example I create two tags named “Sylvester” and “Tweety”. Minimum number of images you should add to train the model is 15, but for an AI model to actually be able to recognise objects like this you should add much more images.

After you tagged the images select “Train” to train the model and you can choose option “Quick training”. In minute or two your model will be ready and you can go back to the Azure Percept Studio.

Testing the model

In Azure Percept Studio click option “Deploy a Custom Vision project” and select the model you just trained. You can also select specific model iteration if you trained the model several times.

Once the model is deployed, you can click “View your device stream” to see live camera feed and test the model. Notification will appear in a few seconds when your stream is ready and you can open it in a separate tab.

Testing a model with a live stream is a great way to see how it actually performs, you can add different objects, position the camera in different angles and get a real idea of the performance. If your model does not have high enough accuracy or it false detects other objects, you can take more pictures, add them to Custom Vision, re-train the model and test new iteration to see how it works.

Summary

Through this article you have learned about Azure Percept Studio, how to train your Custom Vision model, how to deploy it to the Azure Percept device and test it via live stream. Now that you learned more about Azure you can claim your Azure Heroes Learner badge by scanning QR code on the following link (please note there are only 50 badges available).

Resources

Microsoft Learn Modules on Custom Vision AI
Explore computer vision in Microsoft Azure – Learn | Microsoft Docs
Analyze images with the Computer Vision service – Learn | Microsoft Docs
Classify images with the Custom Vision service – Learn | Microsoft Docs
Detect objects in images with the Custom Vision service – Learn | Microsoft Docs

The new SCAN Operator: Process Mining in Azure Data Explorer

by Contributed | May 23, 2021 | Technology

This article is contributed. See the original author and article here.

Introduction

Azure Data Explorer (ADX) is commonly used to store and analyze event logs of business processes. These logs are generated everywhere – applications emit events capturing user actions, IoT devices send sensors data, cloud services generate monitoring logs etc. Once the data is stored in ADX, it can be analyzed for insights. Kusto Query Language (KQL) has extensive set of operators and functions for various types of analysis, including relational operators (filtering, joining, aggregations etc.), time series analysis (anomaly detection, forecasting etc.), advanced plugins (root cause analysis, user analytics) and even extensibility to embed external Python/R packages. Today we are delighted to introduce the scan operator, adding native process mining and temporal analytics to ADX!

Process Mining

Process mining is a new emerging domain, dealing with analysis of events logs to discover the underlying process models, validate, and potentially improve them. In that context, an event contains at least process instance id, timestamp, and action. A process instance id (a.k.a. case id) might be identifier(s) of the specific user, device, session, or any other native partitioning of the events log. A process model describes a sequence of actions, commonly with time constraints, to achieve specific task. Some examples for process models:

Users login to a shopping web site, browse products, add to cart, check out, pay and logout.

A patient is admitted at a clinic, tested for pulse and blood pressure by the triage team, checked by the specialist, diagnosed, and dismissed.

A washing machine cycle: starting, weighing the laundry, filling water, heating, washing, rinsing, spinning, and finishing.

There are 3 classes of process mining techniques:

Discovery – automatically extract process model(s) out of the raw event logs.

Conformance checking – comparing existing process model with actual event log, detect and analyze discrepancies.

Performance mining – enhance the existing process to improve performance – reduce time between steps, improve retention etc.

The ‘scan’ operator

The scan operator is arguably the most advanced analytics operator in ADX. It extends ADX native analytics to include process mining, user analytics, recursive calculations and more. The user defines a process model in KQL, then feeds in a huge events log table. ADX scans it and extracts the instances (each one is an events sequence) conforming to that process model, filtering those deviating from it. The model definition syntax lets the user define sequential states and forward transition rules from the first to the last state. These states and rules are translated to a set of simple state machines that define the scan logic. Let’s review a few examples to understand the syntax, as well as its power and versatility for a wide range of scenarios.

Searching for ‘rallies’ of top stocks

Here we analyze historical prices of top tech stocks from 2019 till today. This period was interesting, from start of 2020 till today the market is impacted by the Covid-19 pandemic, still the overall trend of these stocks is positive. In the query below we look for longest sequences of increasing price per each stock. The following query is built by these steps:

Pre-processing:
- For each stock order the prices by time and calculate the daily change.

Extracting sequences using the ‘scan’ operator:
- A sequence starts when the price increases.
- The sequence continues as long as the price increases or doesn’t change.
- The sequence ends when the price decreases.

Post-processing:
- For each sequence delete the last event, as it contains the price decrease.
- Summarize the length and delta per sequence.
- For each stock extract the sequence with the biggest percentage increase.

TopStocksCleaned
| where Date >= datetime(2019-01-01) and Ticker in('AAPL', 'AMZN', 'GOOG', 'MSFT')
| partition by Ticker
(
    order by Date asc 
    | extend pDate=prev(Date), pAdjClose=prev(AdjClose)
    | extend delta = AdjClose - pAdjClose
    | scan with_match_id=m_id declare(down:bool=false, step:string) with 
    (
        //  if state of s1 is empty we require price increase, else continue as long as price doesn't decrease 
        step s1: delta >= 0 and (delta > 0 or isnotnull(s1.delta)) => step = 's1';
        //   exit the 'rally' when price decrease, also forcing a single match 
        step s2: delta < 0 and s2.down == false => down = true, step = 's2';
    )
)
| where step == 's1'   //  select only records with price increase
| summarize (start_date, start_AdjClose, start_delta)=arg_min(pDate, pAdjClose, delta), end_date=max(Date), trading_days=count(), total_delta=sum(delta) by Ticker, m_id
| extend delta_pct = total_delta*100.0/start_AdjClose
| summarize arg_max(delta_pct, *) by Ticker
| project Ticker, start_date, end_date, trading_days, delta_pct, start_AdjClose, total_delta
| order by delta_pct

Ticker	start_date	end_date	trading_days	delta_pct	start_AdjClose	total_delta
AAPL	2020-07-29	2020-08-07	7	20.752	93.75	19.455
AMZN	2020-04-13	2020-04-21	6	18.461	2040	376.610
MSFT	2020-02-28	2020-03-03	2	14.034	152.410	21.389
GOOG	2021-01-28	2021-02-03	4	12.422	1843.939	229.060

We can see that Apple had the longest and biggest rally, 7 consecutive trading days with almost 20% increase! Let’s chart all stocks:

TopStocksCleaned
| where Date >= datetime(2020-01-01) and Ticker in('AAPL', 'AMZN', 'GOOG', 'MSFT')
| project Ticker, Date, AdjClose
| evaluate pivot(Ticker, any(AdjClose))
| render timechart with(ysplit=panels)

We can nicely see the biggest increases, yellow highlighted.

Analyzing users’ mitigations for failing KQL queries

In this example we analyze sequences of users’ queries following a failed one, trying to extract interesting insights. The following query is quite complex, its steps are:

Pre-processing:
- For each user order the queries by time.
- Filter consecutive queries with less than a second gap – these are likely automated queries.

Extracting sequences using the ‘scan’ operator:
- Start a sequence by a query that was not completed successfully.
- Continue the sequence until successful query or time out.
- Each step updates the sequence length and the number of retries (of exactly the same query).

Post-processing:
- Aggregate sequences by user and retries only flag.
- Final aggregation by sequences length, final state and retries flag.

let Kusteam = toscalar((KustoAll | summarize make_list(Email)));
let etime = datetime(2021-05-20);
let stime = etime - 14d;
let max_gap = 5m;           //  max time gap between user's consecutive queries to be grouped in the same sequence
let min_gap = 1s;           //  min time gap between user's consecutive queries, ignoring automated queries
//
QueryCompletion_v2
| where User in(Kusteam)
| where Timestamp between (stime..etime)
| partition hint.strategy=native by User (
    order by Timestamp asc
    | extend nTimestamp = next(Timestamp)
    | where nTimestamp - Timestamp >= min_gap   //  filter automated queries
    | scan with_match_id=m_id declare(step:string='empty', timeout:boolean=false, retries:long=0, SequenceLen:long=0) with 
    (
        step s1: State != 'Completed' => step = 'not completed', SequenceLen = s1.SequenceLen + 1, retries = s1.retries + iff(s1.Text == Text, 1, 0);
        step s2: s2.step == 'empty' and (State == 'Completed' or Timestamp - s1.Timestamp >= max_gap) =>
                 step = 'end', timeout = Timestamp - s1.Timestamp >= max_gap, SequenceLen = s1.SequenceLen + 1, retries = s1.retries + iff(s1.Text == Text, 1, 0);
    )
)
| where not(timeout) // these records are not related to the past sequence
| extend RetriesOnly = (retries == SequenceLen - 1), FinalState = State
| summarize arg_max(SequenceLen, FinalState) by User, m_id, RetriesOnly
| summarize SequenceNum = count() by SequenceLen, FinalState, RetriesOnly
| where SequenceLen > 1 and FinalState != 'Cancelled'
| order by FinalState asc, SequenceLen asc, RetriesOnly asc

Results:

SequenceLen	FinalState	RetriesOnly	SequenceNum
2	Completed	False	306
2	Completed	True	20
3	Completed	False	66
3	Completed	True	2
4	Completed	False	25
4	Completed	True	1
5	Completed	False	11
6	Completed	False	4
2	Failed	False	8
2	Failed	True	20
3	Failed	False	2
3	Failed	True	7
4	Failed	False	2
4	Failed	True	1
5	Failed	False	2
5	Failed	True	32
6	Failed	False	13
7	Failed	False	3
11	Failed	False	1

Let’s first look on sequences of 2 events, i.e. a failed query followed by a successful one or a time out. These sequences account for 354/526=67% of all sequences. We can see that 326/526=62% of the failures were resolved by a single follow-up query. Out of these, 306/326=94% queries completed successfully after modifying the query, while only 20/326=6% completed just by retrying the same query. Looking on sequences of 3 events, additional 68/526=12.9% of the failed queries were resolved, but retrying the same query twice is much less effective: only 2/68=2.9% succeeded. The last thing to note is that as the sequences become longer, the chance of successful completion decreases, only a few sequences of 6 queries end successfully, longer ones just continued to fail, until the users abandoned them.

Summary

In this short blog we have seen only 2 examples of ‘scan’, however the scope and power of this operator is far beyond these specific use cases. This operator is very effective for various scenarios, from simple fill forward table columns or calculating cumulative sums, through analyzing events preceding some failure for root cause analysis, and up to funnel analysis, rectifying obstacles in UX workflows, improving performance and beyond. Have a look at scan doc, where you can learn how to use it and review more interesting examples. Note that the scan operator is new, currently in preview, still you are very welcome to try it and share your feedback with us!

Join Azure Data Explorer sessions at Microsoft Build 2021, Digital Event

by Contributed | May 23, 2021 | Technology

This article is contributed. See the original author and article here.

Join us for the Microsoft Build 2-day, digital event to expand your skillset, find technical solutions, and innovate for the challenges of tomorrow.

conference

sessionid

session_title

session_type

owner

participants

URL

level

session_location

starttime

duration

time_and_duration

kusto_affinity

Build 2021

CON-PRT157

Roundtable: Monitoring and managing your Azure Data Explorer deployments

Roundtable

Gabi Lehner

Guy Reginiano, Guy Yehudy, Pankaj Suri, Saeed Copty

https://sessions.mybuild.microsoft.com/sessions/details/4290398d-dfdf-467d-955a-5e2b8ecfe76c

200

Online

2021-05-26 08:30:00.0000000

Wednesday, May 26 8:30 AM – 9:30 AM GMT

Focused

Build 2021

CON-PRT103

Roundtable: Advanced Kusto query language topics

Roundtable

Avner Aharoni

Alexander Sloutsky, Tzvia Gitlin-Troyna

https://sessions.mybuild.microsoft.com/sessions/details/4d4887e9-f08d-4f88-99ac-41e5feb869e7

200

Online

2021-05-26 08:30:00.0000000

Wednesday, May 26 8:30 AM – 9:30 AM GMT

Focused

Build 2021

CON-PRT130

Roundtable: Data exploration and visualization with Azure Data Explorer

Roundtable

Michal Bar

Gabi Lehner

https://sessions.mybuild.microsoft.com/sessions/details/8a385a82-07a9-4f82-84dc-50649dee54e9

200

Online

2021-05-26 12:30:00.0000000

Wednesday, May 26 12:30 AM – 1:30 PM GMT

Focused

Build 2021

CON-PRT140

Roundtable: Industrial IoT analytics with Azure Time Series Insights

Roundtable

Chris Novak, Ellick Sung

https://sessions.mybuild.microsoft.com/sessions/details/8e2ef0a2-af8a-4b4f-80cd-063ba6ad181d

200

Online

2021-05-26 16:30:00.0000000

Wednesday, May 26 4:30 PM – 5:30 PM GMT

Focused

Build 2021

CON-PRT113

Roundtable: Azure Data Explorer Dashboards

Roundtable

Gabi Lehner

Rony Liderman

https://sessions.mybuild.microsoft.com/sessions/details/28beaa42-8b70-4e09-8a96-782fa6b8ae09

200

Online

2021-05-27 17:00:00.0000000

Thursday, May 27 5:00 PM – 6:00 PM GMT

Focused

Build 2021

CON-PRT142

Roundtable: Ingestion data into Azure Data Explorer

Roundtable

Tzvia Gitlin-Troyna

Vladik Branevich

https://sessions.mybuild.microsoft.com/sessions/details/e4df3b64-76c4-4c49-8396-48e0b6303c04

200

Online

2021-05-27 17:00:00.0000000

Thursday, May 27 5:00 PM – 6:00 PM GMT

Focused

Postgres with columnar compression in Hyperscale (Citus) on Azure

by Contributed | May 22, 2021 | Technology

This article is contributed. See the original author and article here.

Citus Columnar, one of the big new features in Citus 10, is now available in preview in Hyperscale (Citus) on our Azure Database for PostgreSQL managed service! You can check out the Citus 10 superpowers post to learn more about the rest of the Citus 10 goodness.

Citus is an open source extension to Postgres (not a fork) that enables horizontal scale, but offers other great features, too.

This post will walk you through how to create and use columnar tables in Hyperscale (Citus). You can also take a look at the columnar docs. Citus Columnar can be used with or without the scale-out features of Citus.

First some background: Postgres typically stores data using the heap access method, which is row-based storage. Row-based tables are good for transactional workloads, but can cause excessive IO for some analytic queries.

Columnar storage is a new way to store data in a Postgres table. Columnar groups data together by column instead of by row; and compresses the data, too. Arranging data by column tends to compress well, and it also means that queries can skip over columns they don’t need. Columnar dramatically reduces the IO needed to answer a typical analytic query—often by 10X!

Let’s check it out!

Quick Start for Columnar in Hyperscale (Citus)

The quickest way to get up and running with Citus Columnar is in the cloud, by selecting Hyperscale (Citus) when provisioning an Azure Database for PostgreSQL server. You can follow the quickstart guide in the Azure docs.

Make sure to select PostgreSQL server version 13 (if not selectable, check the box “Enable Preview Features” in East US):

Figure 1: Screenshot from the Azure Portal, showing the Preview features checkbox for Hyperscale (Citus), plus where to select the PostgreSQL version.

Next, configure the server group:

Figure 2: Screenshot from the Azure Portal, showing where to click the “Configure server group”.

For simplicity, choose the Basic Tier:

Figure 3: Screenshot from the Compute + storage screen, showing the Basic tier and Standard tier radio buttons. To use shard Postgres on a single node, choose Basic tier.

Click Save, and fill out the rest of the required fields. Then move to the “Networking” tab and configure it. For simplicity, click “Add current client IP address”:

Figure 4: For firewall rules, add current client IP address.

Configure anything else you’d like, then click “Review and Create”, and then “Create”.

After the deployment finishes, click “Go to resource”, and copy the coordinator name (hostname).

I’ll be using the psql client for these examples, to make use of the backslash commands (e.g. d+, timing on). If using another client, you can omit the commands beginning with , which are just informational and not necessary for functionality.

psql -h $COORDINATOR_NAME citus citus

--
-- Like all Postgres extensions, citus needs to be enabled
-- for this database.
--
CREATE EXTENSION IF NOT EXISTS citus;

--
-- Make an ordinary table, which is row-based storage, and a
-- columnar table.
--
CREATE TABLE simple_row(i INT8);
CREATE TABLE simple_columnar(i INT8) USING columnar;

--
-- Columnar tables act like row tables
--
INSERT INTO simple_row SELECT generate_series(1,100000);
INSERT INTO simple_columnar SELECT generate_series(1,100000);
SELECT AVG(i) FROM simple_row;
SELECT AVG(i) FROM simple_columnar;

Notice the “Access Method” when describing the table in psql:

d+ simple_row
                                Table "public.simple_row"
 Column |  Type  | Collation | Nullable | Default | Storage | Stats target | Description
--------+--------+-----------+----------+---------+---------+--------------+-------------
 i      | bigint |           |          |         | plain   |              |
Access method: heap

d+ simple_columnar
                             Table "public.simple_columnar"
 Column |  Type  | Collation | Nullable | Default | Storage | Stats target | Description
--------+--------+-----------+----------+---------+---------+--------------+-------------
 i      | bigint |           |          |         | plain   |              |
Access method: columnar

The default Access Method is heap, which means it’s a plain Postgres table, which is row-based storage. The columnar table has Access Method columnar.

Citus Columnar Video Demo

We made this video demo of Citus Columnar—using Citus open source—to help you to see:

How the query on the row table was so slow that I needed to skip ahead, while the query on the columnar table finished quickly.

Better visualizations of partitioning, that make it easy to see how columnar and partitioning work together.

The cool turtle in the background!

Figure 5: Two-part video demo of using Columnar with Citus open source.

Watch how Citus 10 brings columnar to Postgres!

What are the Benefits of Columnar?

Compression reduces storage requirements

Compression reduces the IO needed to scan the table

Projection Pushdown means that queries can skip over the columns that they don’t need, further reducing IO

Chunk Group Filtering allows queries to skip over Chunk Groups of data if the metadata indicates that none of the data in the chunk group will match the predicate. In other words, for certain kinds of queries and data sets, it can skip past a lot of the data quickly, without even decompressing it!

All of these together mean faster queries and lower costs!

Let’s See the Performance of Columnar in Hyperscale (Citus)

Here’s a microbenchmark to show off what columnar can do. This is a “columnar friendly” use case—a wide table, and a query that only reads a few of the columns.

This benchmark illustrates two benefits of columnar for PostgreSQL:

Reduced IO due to compression

Reduced IO because it skips over the columns not needed to answer the query

Configuration

Azure Database for PostgreSQL Hyperscale (Citus)

Citus v10.0.3

PostgreSQL version 13

2 vCores

128 GiB storage

Schema

CREATE TABLE perf_row(
  c00 int8, c01 int8, c02 int8, c03 int8, c04 int8, c05 int8, c06 int8, c07 int8, c08 int8, c09 int8,
  c10 int8, c11 int8, c12 int8, c13 int8, c14 int8, c15 int8, c16 int8, c17 int8, c18 int8, c19 int8,
  c20 int8, c21 int8, c22 int8, c23 int8, c24 int8, c25 int8, c26 int8, c27 int8, c28 int8, c29 int8,
  c30 int8, c31 int8, c32 int8, c33 int8, c34 int8, c35 int8, c36 int8, c37 int8, c38 int8, c39 int8,
  c40 int8, c41 int8, c42 int8, c43 int8, c44 int8, c45 int8, c46 int8, c47 int8, c48 int8, c49 int8,
  c50 int8, c51 int8, c52 int8, c53 int8, c54 int8, c55 int8, c56 int8, c57 int8, c58 int8, c59 int8,
  c60 int8, c61 int8, c62 int8, c63 int8, c64 int8, c65 int8, c66 int8, c67 int8, c68 int8, c69 int8,
  c70 int8, c71 int8, c72 int8, c73 int8, c74 int8, c75 int8, c76 int8, c77 int8, c78 int8, c79 int8,
  c80 int8, c81 int8, c82 int8, c83 int8, c84 int8, c85 int8, c86 int8, c87 int8, c88 int8, c89 int8,
  c90 int8, c91 int8, c92 int8, c93 int8, c94 int8, c95 int8, c96 int8, c97 int8, c98 int8, c99 int8
);

CREATE TABLE perf_columnar(LIKE perf_row) USING COLUMNAR;

Data Load

timing on

INSERT INTO perf_row
  SELECT
    g % 00500, g % 01000, g % 01500, g % 02000, g % 02500, g % 03000, g % 03500, g % 04000, g % 04500, g % 05000,
    g % 05500, g % 06000, g % 06500, g % 07000, g % 07500, g % 08000, g % 08500, g % 09000, g % 09500, g % 10000,
    g % 10500, g % 11000, g % 11500, g % 12000, g % 12500, g % 13000, g % 13500, g % 14000, g % 14500, g % 15000,
    g % 15500, g % 16000, g % 16500, g % 17000, g % 17500, g % 18000, g % 18500, g % 19000, g % 19500, g % 20000,
    g % 20500, g % 21000, g % 21500, g % 22000, g % 22500, g % 23000, g % 23500, g % 24000, g % 24500, g % 25000,
    g % 25500, g % 26000, g % 26500, g % 27000, g % 27500, g % 28000, g % 28500, g % 29000, g % 29500, g % 30000,
    g % 30500, g % 31000, g % 31500, g % 32000, g % 32500, g % 33000, g % 33500, g % 34000, g % 34500, g % 35000,
    g % 35500, g % 36000, g % 36500, g % 37000, g % 37500, g % 38000, g % 38500, g % 39000, g % 39500, g % 40000,
    g % 40500, g % 41000, g % 41500, g % 42000, g % 42500, g % 43000, g % 43500, g % 44000, g % 44500, g % 45000,
    g % 45500, g % 46000, g % 46500, g % 47000, g % 47500, g % 48000, g % 48500, g % 49000, g % 49500, g % 50000
  FROM generate_series(1,50000000) g;

INSERT INTO perf_columnar
  SELECT
    g % 00500, g % 01000, g % 01500, g % 02000, g % 02500, g % 03000, g % 03500, g % 04000, g % 04500, g % 05000,
    g % 05500, g % 06000, g % 06500, g % 07000, g % 07500, g % 08000, g % 08500, g % 09000, g % 09500, g % 10000,
    g % 10500, g % 11000, g % 11500, g % 12000, g % 12500, g % 13000, g % 13500, g % 14000, g % 14500, g % 15000,
    g % 15500, g % 16000, g % 16500, g % 17000, g % 17500, g % 18000, g % 18500, g % 19000, g % 19500, g % 20000,
    g % 20500, g % 21000, g % 21500, g % 22000, g % 22500, g % 23000, g % 23500, g % 24000, g % 24500, g % 25000,
    g % 25500, g % 26000, g % 26500, g % 27000, g % 27500, g % 28000, g % 28500, g % 29000, g % 29500, g % 30000,
    g % 30500, g % 31000, g % 31500, g % 32000, g % 32500, g % 33000, g % 33500, g % 34000, g % 34500, g % 35000,
    g % 35500, g % 36000, g % 36500, g % 37000, g % 37500, g % 38000, g % 38500, g % 39000, g % 39500, g % 40000,
    g % 40500, g % 41000, g % 41500, g % 42000, g % 42500, g % 43000, g % 43500, g % 44000, g % 44500, g % 45000,
    g % 45500, g % 46000, g % 46500, g % 47000, g % 47500, g % 48000, g % 48500, g % 49000, g % 49500, g % 50000
  FROM generate_series(1,50000000) g;

VACUUM (FREEZE, ANALYZE) perf_row;
VACUUM (FREEZE, ANALYZE) perf_columnar;

-- checkpoint if superuser; otherwise wait for system to settle
CHECKPOINT; CHECKPOINT;

The row table loaded in 2274s, while the columnar table loaded in 1111s for a speedup of 2X. Load speed is not always better with columnar, but columnar does have the advantage when the system is IO-bound.

Compression Ratio

For this data, you can see a compression ratio of about 7X when using Columnar.

SELECT pg_total_relation_size('perf_row')::numeric/
       pg_total_relation_size('perf_columnar') AS compression_ratio;
 compression_ratio  
--------------------
 6.7394916300510456
(1 row)

(Note: the compression ratio varies depending on the version of the compression library. Earlier versions of zstd showed a better compression ratio of better than 8X. I have filed an issue with the zstd project in case it’s a bug that can be fixed.)

We can also get some information from VACUUM VERBOSE:

VACUUM VERBOSE perf_columnar;
INFO:  statistics for "perf_columnar":
storage id: 10000000003
total file size: 6754770944, total data size: 6733594337
compression rate: 6.03x
total row count: 50000000, stripe count: 334, average rows per stripe: 149700
chunk count: 500000, containing data for dropped columns: 0, zstd compressed: 500000

VACUUM

Notice that there are 334 stripes. Stripes are the unit of a data load/write. By default, each stripe can hold up to 150,000 tuples.

By default, data is compressed with zstd compression. The compression rate calculated by VACUUM VERBOSE is slightly different than what we saw above, because it considers only the average compression ratio of the data, and doesn’t account for metadata (like visibility information).

Queries

Now let’s run a couple SQL queries. We will use EXPLAIN ANALYZE so that we can see the details in addition to the overall runtime. Timings are taken from the median of three consecutive runs.

Notice that only a 3 out of 100 columns are necessary to answer this query.

--
-- Parallel query actually slows down the query on the row table in this example, so disable it.
-- Columnar doesn't support parallel query.
--
SET max_parallel_workers_per_gather = 0;

EXPLAIN (ANALYZE, BUFFERS) SELECT c00, SUM(c29), AVG(c71) FROM perf_row GROUP BY c00;
                                                            QUERY PLAN                                                                                                       
----------------------------------------------------------------------------------------------------------------------------------                                           
 HashAggregate  (cost=6430556.07..6430563.57 rows=500 width=72) (actual time=449720.051..449720.570 rows=500 loops=1)                                                        
   Group Key: c00                                                                     
   Batches: 1  Memory Usage: 169kB                                                                                                                                           
   Buffers: shared hit=32 read=5555524
   I/O Timings: read=411407.314                                                                                                                                              
   ->  Seq Scan on perf_row  (cost=0.00..6055556.04 rows=50000004 width=24) (actual time=0.136..429496.825 rows=50000000 loops=1)                                            
         Buffers: shared hit=32 read=5555524                                                                                                                                 
         I/O Timings: read=411407.314                                                 
 Planning:                             
   Buffers: shared hit=4      
 Planning Time: 0.156 ms                                                                                                                                                     
 JIT:                                                                                 
   Functions: 7                                                                                                                                                              
   Options: Inlining true, Optimization true, Expressions true, Deforming true                                                                                               
   Timing: Generation 1.852 ms, Inlining 21.516 ms, Optimization 138.386 ms, Emission 67.720 ms, Total 229.474 ms                                                            
 Execution Time: 449722.605 ms                                                                                                                                               
(16 rows)                                                                                                                                                                    

EXPLAIN (ANALYZE, BUFFERS) SELECT c00, SUM(c29), AVG(c71) FROM perf_columnar GROUP BY c00;
                                                                      QUERY PLAN                                                                       
------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=399659.16..399662.16 rows=200 width=72) (actual time=20688.895..20689.190 rows=500 loops=1)
   Group Key: c00
   Batches: 1  Memory Usage: 185kB
   Buffers: shared hit=109765
   ->  Custom Scan (ColumnarScan) on perf_columnar  (cost=0.00..24659.16 rows=50000000 width=24) (actual time=6.285..10261.578 rows=50000000 loops=1)
         Columnar Chunk Groups Removed by Filter: 0
         Buffers: shared hit=109765
 Planning:
   Buffers: shared hit=20
 Planning Time: 0.682 ms
 JIT:
   Functions: 6
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 1.349 ms, Inlining 0.000 ms, Optimization 0.234 ms, Emission 4.110 ms, Total 5.692 ms
 Execution Time: 20690.686 ms
(15 rows)

We see a dramatic difference in the overall buffers read, which results in a dramatic difference in the runtime:

Storage	Columnar	Speedup
Row	450s
Columnar	21s	21X

What are the Limitations?

These limitations are not set in stone, and we look forward to working on them in the future:

No UPDATE or DELETE support

No index support

No logical replication or logical decoding support

See more limitations in the columnar README

This means that, as of Citus 10 in Hyperscale (Citus), columnar should be used for append-only tables that are used for analytic queries. Even if UPDATE/DELETE are supported in the future, UPDATE & DELETE will not be as efficient as they are on row-based storage, so columnar is not a good fit for many transactional workloads. However, you can pick and choose columnar where it works best, and use row tables where they work best, to get the benefits of both.

Hybrid Columnar and Row tables with Range Partitioning

A useful way to take advantage of Citus Columnar is when you combine it with native range partitioning. Using columnar with partitioning helps to overcome the limitation on updates and deletes, by using a mix of row and columnar partitions within the same partitioned table.

Normally, range partitioning is used for time-based partitioning. Often, you have one or two recent “active” partitions that are still being updated, and then many older partitions that are rarely updated but still queried. In this case the one or two active partitions can be row-based storage to allow updates, and the older partitions can be converted to columnar storage to benefit from compression and scan speed.

CREATE TABLE events(ts timestamptz, i int, n numeric, s text)
  PARTITION BY RANGE (ts);

CREATE TABLE events_2021_jan PARTITION OF events
  FOR VALUES FROM ('2021-01-01') TO ('2021-02-01');

CREATE TABLE events_2021_feb PARTITION OF events
  FOR VALUES FROM ('2021-02-01') TO ('2021-03-01');

INSERT INTO events SELECT
    '2021-01-01'::timestamptz + '0.45 seconds'::interval * g,
    g,
    g*pi(),
    'number: ' || g::text
    FROM generate_series(1,10000000) g;

VACUUM (FREEZE, ANALYZE) events_2021_feb;

Later, when you’re ready to “columnarize” the older January partition, you can use this Citus-provided function to convert the access method from row to columnar storage.

SELECT alter_table_set_access_method('events_2021_jan', 'columnar');
VACUUM (FREEZE, ANALYZE) events_2021_jan;

-- checkpoint if superuser; otherwise wait for system to settle
CHECKPOINT; CHECKPOINT;

Now you can see that the January partition is columnar, and the February partition is row storage (heap).

d+ events_2021_jan
                                       Table "public.events_2021_jan"
 Column |           Type           | Collation | Nullable | Default | Storage  | Stats target | Description
--------+--------------------------+-----------+----------+---------+----------+--------------+-------------
 ts     | timestamp with time zone |           |          |         | plain    |              |
 i      | integer                  |           |          |         | plain    |              |
 n      | numeric                  |           |          |         | main     |              |
 s      | text                     |           |          |         | extended |              |
Partition of: events FOR VALUES FROM ('2021-01-01 00:00:00+00') TO ('2021-02-01 00:00:00+00')
Partition constraint: ((ts IS NOT NULL) AND (ts >= '2021-01-01 00:00:00+00'::timestamp with time zone) AND (ts < '2021-02-01 00:00:00+00'::timestamp with time zone))
Access method: columnar

d+ events_2021_feb
                                      Table "public.events_2021_feb"
 Column |           Type           | Collation | Nullable | Default | Storage  | Stats target | Description
--------+--------------------------+-----------+----------+---------+----------+--------------+-------------
 ts     | timestamp with time zone |           |          |         | plain    |              |
 i      | integer                  |           |          |         | plain    |              |
 n      | numeric                  |           |          |         | main     |              |
 s      | text                     |           |          |         | extended |              |
Partition of: events FOR VALUES FROM ('2021-02-01 00:00:00+00') TO ('2021-03-01 00:00:00+00')
Partition constraint: ((ts IS NOT NULL) AND (ts >= '2021-02-01 00:00:00+00'::timestamp with time zone) AND (ts < '2021-03-01 00:00:00+00'::timestamp with time zone))
Access method: heap

And they each have about half the rows. But even though the January partition has more rows, it’s much smaller due to columnar compression:

SELECT COUNT(*) FROM events; -- parent table scans both partitions
  count
----------
 10000000
(1 row)

SELECT COUNT(*) FROM events_2021_jan;
  count
---------
 5951999
(1 row)

SELECT COUNT(*) FROM events_2021_feb;
  count
---------
 4048001
(1 row)

SELECT pg_size_pretty(pg_relation_size('events_2021_jan'));
 pg_size_pretty
----------------
 69 MB
(1 row)

SELECT pg_size_pretty(pg_relation_size('events_2021_feb'));
 pg_size_pretty
----------------
 264 MB
(1 row)

What is Chunk Group Filtering?

Let’s run a query for a particular hour within the January partition of the events table.

EXPLAIN (ANALYZE,BUFFERS)
SELECT SUM(n)
FROM events_2021_jan
  WHERE ts >= '2021-01-11 01:00'::timestamptz AND
        ts < '2021-01-11 02:00'::timestamptz;
                                                                 QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=4438.09..4438.10 rows=1 width=32) (actual time=8.356..8.357 rows=1 loops=1)
   Buffers: shared hit=2922 read=2
   ->  Custom Scan (ColumnarScan) on events_2021_jan  (cost=0.00..4418.55 rows=7815 width=11) (actual time=2.998..7.703 rows=8000 loops=1)
         Filter: ((ts >= '2021-01-11 01:00:00+00'::timestamp with time zone) AND (ts < '2021-01-11 02:00:00+00'::timestamp with time zone))
         Rows Removed by Filter: 12000
         Columnar Chunk Groups Removed by Filter: 594
         Buffers: shared hit=2922 read=2
 Planning:
   Buffers: shared hit=27 dirtied=2
 Planning Time: 0.233 ms
 Execution Time: 8.380 ms
(11 rows)

Notice Columnar Chunk Groups Removed by Filter: 594.

First, we need some terminology to understand this:

Stripe: all loads into a columnar table are broken into stripes of 150000 rows (by default). The larger a stripe, the more sequential access when reading a given column.

Chunk Group: Stripes are broken down further into Chunk Groups of 10000 rows (by default).

Chunk: Each Chunk Group consists of one Chunk for each column. A Chunk is the unit of compression, and the min/max is tracked for each chunk to enable Chunk Group Filtering.

Chunk Group Filtering: When a query’s WHERE clause can’t possibly match any of the tuples in a Chunk, and we know that by the min/max values of the chunk, then Chunk Group Filtering will simply skip over the whole Chunk Group without decompressing any of the Chunks in the Chunk Group.

You can see above that 594 Chunk Groups were filtered out, which means that 5931999 rows were filtered out without needing to fetch or decompress the data. Only 2 Chunk Groups (20000 rows) needed to be actually fetched and decompressed, which is why the query took only milliseconds.

How is Citus Columnar connected to Citus?

Citus Columnar is a new feature that we’ve introduced in Citus 10. Now that Citus 10 is available in preview in Hyperscale (Citus), as long as you’ve turned on the preview features, you can now create Postgres tables with Hyperscale (Citus) with the new USING columnar syntax, and you’re ready to go (of course, read the docs, too!).

Citus is known for its ability to scale Postgres. Importantly, you can use Columnar with or without the Citus scale-out features. Columnar is a great complement to typical Citus use cases, but you can pick and choose whether to use Citus Columnar on a single node, or as part of a distributed Citus cluster.

You can mix and match:

columnar and row tables

columnar and row partitions of the same table

local columnar tables and distributed columnar tables; you can mix local and distributed tables on a single node with Basic tier—as well as on a distributed Hyperscale (Citus) cluster

What about cstore_fdw?

If you’ve heard of the cstore_fdw extension that my teammates at Citus created a number of years ago, you can think of Citus Columnar as the next generation of cstore_fdw. If using cstore_fdw, consider migrating to Citus Columnar.

cstore_fdw achieved the core benefits of columnar in terms of performance; but Citus Columnar goes much further in terms of integration and feature compatibility.

Citus Columnar works with:

ROLLBACK

Write-ahead logging (WAL)

Phsyical Replication

pg_upgrade

and also provides a more seamless user experience, similar to ordinary Postgres row tables.

Citus Columnar was able to accomplish this better experience by using the Table Access Method API, new in PostgreSQL version 12. Using this new extension API allows tighter integration into Postgres while still being a pure extension.

Try out Citus Columnar for your analytics workloads

The dramatic compression and scan speed improvements offered by Columnar in Hyperscale (Citus) allow you to do more with less. Take advantage by identifying large, append-only tables and evaluate whether columnar will improve performance or reduce costs.

Columnar storage may also allow you to keep data in Postgres for longer, rather than forcing you to archive older data, where it can’t be queried efficiently.

If you want to dive even deeper into learning about Citus Columnar, I recommend:

Columnar README available on the Citus GitHub repo.

Hyperscale (Citus) docs about columnar storage.

Video Demo of Columnar Short video to explain the what, why, and how to use columnar storage with Citus and Postgres.

Citus 10 on Hyperscale (Citus) blog Lots of good examples, and information about Basic tier as well.

Try out Citus 10—open source or in the cloud

Hyperscale (Citus) quickstart for Basic tier

Citus open source download

Finally, special thanks to Hadi Moshayedi, who was the original author of cstore_fdw (the mature basis of Citus columnar), and is the co-author of Citus 10 columnar!

Engaging with students via playing video games

by Contributed | May 22, 2021 | Technology

This article is contributed. See the original author and article here.

Dr Chris Headleand meeting a skeleton pirate

This is a guest post by Dr Chris Headleand, Associate Professor and Director of Teaching and Learning, School of Computer Science, University of Lincoln. Follow Chris on Twitter or connect with him on LinkedIn.

For the past two and a half years my team and I have been live streaming video games sessions with our students. We use this as an extracurricular pastoral engagement, giving students an opportunity to interact with staff in an informal setting.

Video games live-streams are a form of broadcast where players go online, and stream their video gameplay experience. They have become one of the world’s most popular forms of online entertainment accounting for a significant amount of international internet usage. They are so popular that Twitch.tv (one of the most popular live streaming services) was bought by Amazon for US$970M in 2015. Now video games live streaming accounts for a significant amount of global internet bandwidth.

The majority of streaming services are community-driven with everyone capable of being a streamer or viewer. The majority of gamers already have the equipment they need to stream, and the majority streamers are amateur gamers, simply sharing their playing experience with the world. Video games broadcasting has essentially democroacissed reality television.

The video games live-streaming format has a number of advantages for student engagement initiatives. Firstly, a large number of students are already using these services as entertainment venues. By running initiatives on a live stream service we take activities to a platform the students are likely already visiting, rather than directing them to a service they wouldn’t otherwise use. Secondly, video games are inherently entertaining, the format keeps students engaged even during breaks in interaction. Furthermore, watching people engage in play is humanising, and the process of watching someone play a game is inherently disarming. We have found that livestreaming has helped to develop the approachability of staff. It’s a great way of taking education to the students!

Games are inherently engaging, hence why millions of people watch sports on television every week. The game can help keep people engaged between questions. If someone doesn’t have anything to ask, they may still stay to watch the game. This gives them a chance to hear other peoples questions, and the time to formulate their own. Importantly, the game also keeps the staff team engaged. We always run our stream for a couple of hours, and students come and go as they please (and however suits their schedule). The game gives the broadcasting team something to keep them motivated between peaks in activity.

Looking down over the pirate town and your ship from atop a hill

Getting Started with Live Streaming

Interested in starting with live streaming? You need some basic equipment, most notably a reasonably powerful computer, a video camera, a microphone, and some broadcasting software. You need a good computer, as broadcasting can be a computationally intensive process.

Regarding broadcasting software there are a number of options available. The most popular is arguably OBS (Open Broadcast Software) and there are many useful guides on how to set this up. However, OBS does require a certain amount of technical engagement. There are now a few cloud and web-based options (such as streamyard) which are much easier for non-technical individuals.

Once you have set up your software and hardware ready for streaming, you need to pick the game you want to play, and the team you want involved.

Picking The Right Game!

Not every game is suitable for live-streaming as a student engagement exercise. These are the points that we personally consider when picking a game.

Pacing: Remember that you are doing this to engage with students. The mechanics of the game should not require so much attention that it distracts the players from being able to interact with the viewers.

Competitive Vs Cooperative: Competitive games can be really entertaining to stream, but often require a level of focus which can limit the amount of interaction with the audience. Cooperative games are typically slower paced, which can be good for audience interaction. However, there are less cooperative games on offer.

Number of Players: Games are typically limited in the number of players they support. You need to consider how many people you want involved in the stream at any one time. Different perspectives can facilitate conversation, however, too many people can result in an overcrowded discussion. We have found that between four and six players tends to work well.

Content: In Higher Education we work with a mature audience which allows for a broad range of games to be used in this application. However, some you may consider some content too distracting, or worry that it is potentially offensive. A good solution to this problem is to consult with your students, ask them the kinds of games that they would like to see you stream.

Sea of Thieves

We experimented with a number of different games before settling on Sea of Thieves by RARE ltd, a UK based games studio owned by Microsoft. Sea of Thieves is a multiplayer cooperative pirate simulator where players work together to complete quests, and is available for PC and Xbox.

Pirate Captain Headleand celebrating a successful mission

The game features lots of downtime built into its gameplay loop; each quest requires you to sail between islands which can take several minutes, this provides loads of time for interacting with the audience. The content is humorous and and lighthearted, and doesn’t include any mature content. Visually, the game is rendered in a cartoon-like graphic style which is easy to follow from a viewer’s perspective even if they are watching on a mobile device. Sea of Thieves currently allows up to 4 players to join a pirate crew, which meant that we could invite a range of different staff and students to join the stream and be involved in these engagement opportunities. We also found that the cooperative nature of the game was a better fit for the ethos we wanted to promote than competitive games. Another big advantage for us was that the game was open-ended, with regular updates. This allowed us to better justify the expense of the game as once purchased we could use the same game for a large number of streams without running out of content. Furthermore, the game includes many opportunities for spontaneous un-scripted gameplay with challenges (such as megalodon sharks) which adds enjoyable unpredictability to the stream.

Sailing the Sea of Thieves in a three mast ship

Stream Teams

There is always one person on the camera but a good stream requires a team. You can stream by yourself, but it can help the conversation to invite colleagues or other students to join you in playing the game and talking on the stream. You may also want to consider moderators to monitor the chat and facilitate conversation.

Running Your Stream

You have a team, a game, and you are ready to broadcast… it’s time to run your first stream! In this section we will cover a little advice to help your first stream go smoothly. Firstly, Involve your students in the process from day one. Ask them if this is a format they would be interested in engaging with, and get them involved in the design of the session.

Secondly, remember why you are doing this! It is easy to forget the purpose of your stream when you are midway through playing a game. Remember the intention is to give students another way to engage in a dialogue with you, so make sure you keep an eye on the chat interface and respond to questions as quickly as possible. However, remember that you won’t be the only person answering questions and engaging with the community. You will often find that when a student asks a question, another in the audience will reply before you do. Try to encourage this, and praise people for contributing to the community.

Conversely, not all students will want to interact directly with the broadcasters. In the streaming community, non-interactive members of the audience are typically referred to as ‘Lurkers’. We have found that this passive engagement is still positive and community building! These viewers still hear the same discussions as the active participants, so they still benefit from the discourse.

Relax and enjoy the game you are playing! From an engagement perspective there is something very relatable about watching people engage in play. If you enjoy yourself, you will run a much more engaging stream.

Insights From Our Streams

Our streams get a great level of engagement, and the students love them. However, our most popular streams have always been when we have had the most diverse stream team, featuring staff, students, alumni, friends from industry, and members of our professional service team. Students really value being able to get advice and insight from just outside of their normal community.

We have also noticed that the streams are a great way to improve the approachability of staff. There are few things more humanising than watching someone in a position of power engaging in play. Discussions will often continue long after the stream concludes.

Dont be discouraged if your actual viewership numbers are lower than you would expect. Students will typically disseminate the information the hear in the stream. We also found that many students would get together to watch the stream as a group (adding a second level of social interaction). When we have done followup surveys, we typically find that actual viewership was two or three times larger than the viewership numbers suggested.

As a final point, remember that you never know who is tuning in for the broadcast. We have had students’ families watch in the past, students have also invited friends from other institutions. Whenever you do a public broadcast you are representing yourself, and your institution – it can be useful to keep this in mind.

Check Out One Of Our Streams

As an example of one of the streams, check out this highlights reel from our second semester B Twitch AMA mainly focused on working in the video games industry.

Microsoft 365 PnP Community Spotlight: Garry Trinder

by Contributed | May 22, 2021 | Technology

This article is contributed. See the original author and article here.

Which community project do you maintain?

CLI for Microsoft 365

How does it help people?

CLI for Microsoft 365 is a cross platform, command line interface that helps developers and administrators manage their Microsoft 365 tenants and SharePoint Framework projects, using any operating system and any command line shell.

What have you been working on lately?

I have recently been working on making the CLI more accessible to new users and new contributors to the project.

We recently released our public Docker images, Docker enables us to bundle a pre-configured version of CLI for Microsoft 365 together with all its required dependencies into a publicly downloadable image, which you can then use to create an isolated environment on your local machine, called a container, where you can use the CLI for Microsoft 365 without cluttering your machine with dependencies, and all performed by executing just a single command.

One of the difficulties of getting started with a new project is setting up the development environment, installing dependencies and hoping that you’ve followed all the instructions correctly. We know that this can be time consuming but also put off potential contributors to the project, so we’ve looked at ways in which we can make contributor onboarding as simple as possible, so we have provided a definition for creating an instant development environment that contains all the dependencies needed to contribute to CLI for Microsoft 365, whether that is making code changes or making an update to the documentation, we have you covered. We also automate the initial configuration steps for you when the development environment is being created, so we run npm install to install the necessary packages from npm, npm run build to build the CLI source and create a symbolic link by running npm link to ensure that when you execute m365 it uses the code in the src directory. You can use our development environment in the cloud using GitHub Codespaces or locally using Visual Studio Code Remote Development Containers.

What do you do at work?

At work I am a Solutions Architect and Development Team lead for CPS, a Microsoft Gold Partner. I work primarily with Microsoft 365 and Microsoft Azure, in recent years I have been mainly focused on integration and automation.

Why are you a part of the M365 community?

I believe that by sharing our own knowledge and experiences, we can improve the quality of the solutions created by the community. I have been privileged to have been part of the community for many years and I have directly benefited from others sharing their knowledge with me, it is this that drives me to contribute to the community so that others can benefit in the way that I have.

What was you first community contribution?

I submitted a pull request to add a new command to CLI for Microsoft 365 which enables users to delete a Microsoft Teams channel.

One tip for someone who’d like to start contributing

Don’t be afraid to ask questions, the Microsoft 365 community is incredibly welcoming and everyone wants to help each other, there are no stupid questions.

Join us in the Student Zone at Microsoft Build 2021

by Contributed | May 21, 2021 | Technology

This article is contributed. See the original author and article here.

The Student Zone at Microsoft Build is back! On May 25 through May 27, get together with students, recent grads, and other tech-savvy tinkerers to learn new skills and find fun ways to pursue your passions through tech.

Prepare for launch with space-themed sessions that introduce Python, C#, Visual Studio Code, and GitHub. Bring your imagination to life through art, games, movies, sports, and social good with help from tools like JavaScript, machine learning, and Microsoft 365. Or get the insider scoop on tech careers from the awesome session hosts and guest speakers at Build. It’s all waiting for you in the Student Zone.

Collaborate in Visual Studio Code and GitHub

Get to know these popular tools so that you can code and collaborate like a pro.

Build apps with Python and C#

Learn how to make web apps or explore machine learning with two of the most popular languages.

Learn web development skills with JavaScript, HTML, and CSS

See how languages come together in an interactive web app and learn how to build your own.

Learn how to become a software developer

Hear from experienced developers and learn from the career journeys they’ve taken.

Ask experts about transitioning from school to your career

Learn from a data scientist, web dev, mixed reality dev, program manager, and cloud advocate.

Ask your burning questions about chatbots

Bots seem to be everywhere, but how do they work and what can they do? Find your answers here.

Build inclusive technology for a diverse world

Learn what it takes to build technologies and products that meet the needs of a diverse global audience.

Create visual storytelling magic with technology

Learn how Grady Cofer, a visual effects supervisor at Industrial Light & Magic, uses tech in the movies.

Build a Space Jam: A New Legacy gathering hub using JavaScript

See how JavaScript, Microsoft Teams, Microsoft Graph, and some cartoons come together with code.

Hear how a tech career can take you to space

Learn how Astronaut Stanley G. Love went from teaching programming to spending 306 hours in space.

Discover new ways to express your artistic side through AI

See how a creative senior cloud advocate uses Python and Azure to create amazing composite images.

Make your mark through a community project

Learn how to contribute to shared projects using MakeCode and Javascript.

Get in your zone

About 30 speakers are ready to share their knowledge with you in the Student Zone. Get the full details for each session and RSVP to pack your schedule full of goodies. See all Student Zone sessions

« Older Entries

Next Entries »

Azure CycleCloud – the simplest way to execute HPC on Azure

NVIDIA GPU Cloud with Azure

Resources on Microsoft Learn

Goran Vuksic

Introduction

Azure Percept Studio

Training the model

Testing the model

Summary

Resources

Introduction

Process Mining

The ‘scan’ operator

Searching for ‘rallies’ of top stocks

Analyzing users’ mitigations for failing KQL queries

Summary

Quick Start for Columnar in Hyperscale (Citus)

Citus Columnar Video Demo

What are the Benefits of Columnar?

Let’s See the Performance of Columnar in Hyperscale (Citus)

Configuration

Schema

Data Load

Compression Ratio

Queries

What are the Limitations?

Hybrid Columnar and Row tables with Range Partitioning

What is Chunk Group Filtering?

How is Citus Columnar connected to Citus?

What about cstore_fdw?

Try out Citus Columnar for your analytics workloads

Try out Citus 10—open source or in the cloud

Dr Chris Headleand meeting a skeleton pirate

Looking down over the pirate town and your ship from atop a hill

Getting Started with Live Streaming

Picking The Right Game!

Sea of Thieves

Stream Teams

Running Your Stream

Insights From Our Streams

Check Out One Of Our Streams

Recent Posts

Recent Comments

Archives

Categories

Meta

We look forward to meeting you