This article is contributed. See the original author and article here.

Bucketing data by dynamic bucket size


 


When we create chart in Power BI or any other visualization, we have a limited number of datapoints we can plot.


Power BI samples the data according to the resolution and the size of the chart vs. the number of points.


This means that we may bring from ADX hundreds of thousands of points and Power BI will sample the data we bring and plot the sampled data.


What if we want to limit the number of points ourselves using a different method?


Power BI imposes a limit of 1 million rows returned by a single query so if you have more than a million values returned, you’ll have to reduce this number.


Even if the number of results is not over the limit, it will be more efficient to return a small number of rows instead of bringing a very high number of points and letting Power BI do the sampling.


 


Function that aggregates the data in a dynamic way


 


The scenario is a line chart using a date time column on the X axis and some value on the Y that we can average.


I created a KQL function that summarizes the data to limit the number of points using a dynamic bin size.


The function averages the values in each group of points.


Other ways of representing the values can use other aggregations like count, sum or more sophisticated methods


The function uses data from the help cluster.


You can create the function in any database you have access to.


.create-or-alter function conditional_Sampling(From:datetime,To:datetime,GranularityS:string=”5s”,MaxPoints:long=1000)


{


// Easier to use strings in Power BI and convert here


let Granularity=totimespan(GranularityS) ;


// Number of points is calculated based on the timespan and the known granularity in the specific table


let Points=tolong((To-From)/Granularity) ;


// The factor is the number of points represented as one point in the chart


let Factor=iff(Points <=MaxPoints,1,tolong(Points/MaxPoints)) ;


let Span=Granularity*Factor ;


// In this table we have an observation each 5s


cluster(“help”).database(“Samples”).SamplePowerRequirementHistorizedData


| where timestamp between(From..To) 


// Aggregate all points in Span to one average value


| summarize Value=avg(value) by bin(timestamp,Span)


}


 


 


 


 


 


 

Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.