This article is contributed. See the original author and article here.

Pagination is important technique in SQL that enables you to return partial results of the query by specifying a range of rows that should be returned. It is commonly used when you need to get the partial results of the queries that return a large number of rows in the result in the pages with size, 10, 25, 50, or 100.

In this article you will see how to do pagination on large amount of data in Synapse SQL.

First let’s see some pagination methods in SQL language.

OFFSET/FETCH

T-SQL language has OFFSET/FETCH clause that enables you to skip some number of rows and take few of remaining (similar to TOP clause). An example of query with OFFSET/FETCH clause is shown in the following code:

SELECT *
       FROM dbo.Supplier AS data
ORDER BY data.S_SUPPKEY
OFFSET (@offset) ROWS
FETCH NEXT (@fetch) ROWS ONLY;

Unfortunately, this method is still not available in Synapse SQL.

TOP offset+fetch

To get the rows between @offset and @offset+@fetch, we can first select TOP @offset+@fetch records from query (for example TOP 110 records if the offset is 100 and fetch is 10), then get bottom @fetch records from that row set (for example rows between 100 and 110).

The query that can return rows between @offset and @offset+@fetch is shown in the following example:

SELECT *
FROM (
       SELECT TOP (@fetch) *
       FROM
              (SELECT TOP(@offset+@fetch) *
                      FROM dbo.Supplier
                      ORDER BY S_SUPPKEY ASC
              ) AS topFO
       ORDER BY S_SUPPKEY DESC
     ) AS bottomF
ORDER BY S_SUPPKEY ASC

Subquery topFO is a set of rows that contain top @offset+@fetch rows according to the desired sort criterion. Now we need to get @fetch bottom records by reverting the order of this rowset and getting TOP @fetch rows as bottomF subquery. Note that we don’t have BOTTOM operator in TSQL so we need to temporarily revert the sort order and get TOP rows in opposite direction. These are the rows that we need but now in the wrong sort order, so we need to change sort order again in the surrounding query.

It might be strange that you need to write a query that sorts results three times (I’m calling this Salto mortale technique), but we will not have big performance impact because second and third sort worn on the smaller amount of already sorted rows.

ROW_NUMBER window

ROW_NUMBER window function can return order number of the row in the rowset by specified sort criterion. We can use this info to return only the rows that have ROW_NUMBER in the range between @offset and @offset+@fetch:

SELECT TOP (@fetch) *
FROM
( SELECT ROW_NUMBER() OVER(ORDER BY S_SUPPKEY) AS RowNum, *
       FROM dbo.Supplier AS tr) AS data
WHERE @offset < RowNum
ORDER BY data.S_SUPPKEY

Inner sub-query will return rows with positions and outer query will filter them out.

Performance test

The important question what method we should choose to implement queries with pagination. I have created procedures that select and paginate rows from tables by specified criterion and return 10-row pages with offsets 30, 300, and 2000. I have executed these tests on dbo.Customer table from TPCH 1TB database with 150.000.000 rows. Using the following code:

declare @offset int = 10;
EXEC page_row_number 'dbo.Customer', 'C_CUSTKEY', @offset, 10
EXEC page_offset_fetch 'dbo.Customer', 'C_CUSTKEY', @offset, 10
EXEC page_salto_mortale 'dbo.Customer', 'C_CUSTKEY', @offset, 10

set @offset = 300;
EXEC page_salto_mortale 'dbo.Customer', 'C_CUSTKEY', @offset, 10
EXEC page_row_number 'dbo.Customer', 'C_CUSTKEY', @offset, 10
EXEC page_offset_fetch 'dbo.Customer', 'C_CUSTKEY', @offset, 10

set @offset = 5000;
EXEC page_offset_fetch 'dbo.Customer', 'C_CUSTKEY', @offset, 10
EXEC page_salto_mortale 'dbo.Customer', 'C_CUSTKEY', @offset, 10
EXEC page_row_number 'dbo.Customer', 'C_CUSTKEY', @offset, 10

Since OFFSET/FETCH is still not available in Synapse SQL, I have run the test on Azure SQL Managed Instance and the results are displayed in the following Query Store UI:

JovanPop_0-1601670941315.png

 

 

First three results are ROW_NUMBER methods with offsets 10, 300, and 5000. It seems that this method provides worst results. Salto mortale (or TOP offset+fetch with switching order) seems providing even slightly better results than native OFFSET/FETCH but with minor perf difference.

 I also double-checked the results using Query Performance Insights library and found average duration, execution count and spent CPU time for these queries:

JovanPop_1-1601670941340.png

 

The conclusions are:

  • The best performance we are getting with TOP offset+fetch query (a.k.a. Salto mortale query), but similar perf we can expect with OFFSET/FETCH.
  • Performance don’t depend on offset. Regardless of offset most of the time is probably spent on sorting entire data set to find what is the TOP offset+fetch rows in the desired sort order or position using ROW_NUMBER by desired sort order.

If you want to try this experiment, the code of the test procedures is below:

 

CREATE OR ALTER PROCEDURE page_row_number (@table varchar(200), @column sysname, @offset varchar(20), @fetch varchar(20))
AS BEGIN
       DECLARE @sql NVARCHAR(4000) = N'
       SELECT TOP (' + @fetch + ') * /* Row number ' + @table + ' offset: '+@offset+' */
       FROM
       ( SELECT ROW_NUMBER() OVER(ORDER BY ' + @column + ') AS RowNum, *
              FROM ' + @table + ' AS tr) AS data
       WHERE ' + @offset + ' < RowNum
       ORDER BY data.' + @column + '';

       EXEC sp_executesql @tsql = @sql;
END
GO
CREATE OR ALTER PROCEDURE page_offset_fetch (@table varchar(200), @column sysname, @offset varchar(20), @fetch varchar(20))
AS BEGIN
       DECLARE @sql NVARCHAR(4000) = N'
       SELECT * /* Offset fetch ' + @table + ' offset: '+@offset + ' */
              FROM ' + @table + ' AS data
       ORDER BY data.' + @column + '
       OFFSET ('+@offset+') ROWS
       FETCH NEXT ('+@fetch+') ROWS ONLY;';

       EXEC sp_executesql @tsql = @sql;
END
GO
CREATE OR ALTER PROCEDURE page_salto_mortale (@table varchar(200), @column sysname, @offset int, @fetch int)
AS BEGIN
       DECLARE @sql NVARCHAR(4000) = N'
       SELECT * /* Salto mortale ' + @table + ' offset: '+ CAST(@offset AS VARCHAR(20)) + '*/
       FROM (
              SELECT TOP ('+CAST(@fetch AS VARCHAR(20))+') *
              FROM
                      (SELECT TOP('+CAST( (@offset+@fetch) AS VARCHAR(20))+') *
                             FROM ' + @table + '
                             ORDER BY ' + @column + ' ASC) AS topFO
              ORDER BY ' + @column + ' DESC
       ) AS topF
       ORDER BY ' + @column + ' ASC';

       EXEC sp_executesql @tsql = @sql;
END

Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.