This article is contributed. See the original author and article here.

1. Overview


 


When gathering SharePoint data through Microsoft Graph Data Connect in a large tenant, you might be pulling many thousands, millions or even billions of objects. If you want to get just a subset of the results, there is a mechanism to filter the results at the source.


 


A common example is to pull the data just for one site (filtering for the site id) to get started with a large tenant. This way your results will be smaller, cheaper, and easier to handle. You can also use this to get just specific types of sites, like OneDrive or Communication sites.


 


In this blog, we’ll investigate how you can apply a filter to a dataset in Microsoft Graph Data Connect for SharePoint.


 


2. What is filtering?


 


When you request data from the Microsoft Graph Data Connect for SharePoint, you get the full dataset. For instance, if you request the Sites dataset, you get all the sites in the tenant. This is described in the yellow paths below.


 


JoseBarreto_0-1711144124348.png


 


If you provide a filter with the request, you can get a partial dataset based on a filter expression. You could, for instance, request all sites with a template id of 21 (that means a OneDrive). This is shown by the green paths above.


 


In short, filtering delivers a partial dataset based on a filter expression specified in the request. Only objects that meet the criteria are delivered.


 


The main uses for filtering include:



  • Reducing the size of the transfers

  • Excluding sensitive data from the results


 


Filtering is available only in these specific SharePoint datasets:



  • SharePoint Sites

  • SharePoint Permissions

  • SharePoint Group

  • SharePoint Files (coming soon)

  • SharePoint File Actions (coming soon)


 


Note: Microsoft Graph Data Connect also offers an option to exclude specific columns from the results, which is another mechanism to protect sensitive data.


 


3. Datasets and Columns


 


This filtering feature applies only to the SharePoint datasets listed below. You can filter by Site Id in all SharePoint datasets. You can also filter by Template Id in the Sites, Files and File Actions   datasets:


 




































Dataset



Site Id column



Template Id column



Sites



Id



RootWeb.WebTemplateId



Permissions



SiteId



N/A



Group



SiteId



N/A



Files



SiteId



WebTemplateId



File Actions



SiteId



WebTemplateId



 


For WebTemplateId, refer to this blog post for details on the codes used:
https://barreto.home.blog/2023/04/17/sharepoint-on-mgdc-faq-is-onedrive-included/


If you’re interested in filtering for additional SharePoint dataset columns, please let us know in the comments.


 


4. Expressions and Operators


 


Filters are specified using an expression. The expression usually includes a column (as described in the previous section), an operator and a constant.


 


The main operators supported include:





























































Type



Operator



Description



Equality



eq



Equal



Equality



ne



Not Equal



Equality



in



is IN a list



Relational



lt



Less Than



Relational



gt



Greater Than



Relational



le



Less than or Equal



Relational



ge



Greater than or Equal



Logical



not



Not



Logical



and



And



Logical



or



Or



 


Here are a few sample expressions you could use:




































Dataset



Description



Filtering Expression



Sites



Include only a specific site



Id eq ‘a123’



Sites



Include all sites except a specific site



Id ne ‘a123’



Sites



Include only a specific list of sites



Id in (‘a123’, ‘b456’, ‘c789’)



Sites



Include everything but OneDrive sites



RootWeb.WebTemplateId ne 21



Permissions



Include only a specific site



SiteId eq ‘a123’



 


The expression language is patterned after the Microsoft Graph filter query parameter.  You can read more about it at https://learn.microsoft.com/en-us/graph/filter-query-parameter.


 


5. Including the Filter expression in the request


 


To add a filter to your request, you must specify a DataFilter property in the JSON representation of the request. Start by selecting the Copy Data task you are using and going to the “Source” tab of the Copy Task properties.


 


JoseBarreto_1-1711144124360.png


 


 


Click on the “{ }” button on the top right (indicated above with a red arrow), which brings up the JSON definition for the request.


 


JoseBarreto_2-1711144124369.png


 


 


In the JSON definition, find the “source” section under “typeProperties”, where you can find the “dateFilterColumn”, the “startTime” and the “endTime” properties. Add a new “DataFilter” property with the expression you want to apply. Save the changes to the JSON and make sure to publish the pipeline to apply the changes.


 


In the screenshot above, you see the definition indicated by the red arrow:
acitvity >> typeProperties >> source >> DataFilter >> RootWeb.WebTemplateId eq 21


 


Be very careful. If you specify the filtering expression in the wrong place in the JSON, the request may fail, or it could just run without filtering anything from the dataset. If you’re running this in a large production tenant, it is recommended to try this first in a small test or dev tenant.


 


6. More Request Metadata


 


As a result of this extra property in your request, the results will be filtered, and you will see fewer rows/objects in your output. Another way to see this is looking at the metadata file that is produced with any Microsoft Graph Data Connect run. Look for a folder called “metadata” in the same place you assigned to receive the data (your Azure account container and folder).


 


In the job metadata file associated with this request, you will see a few additional properties:



  • IsFilterApplied – Shows true if the request included a filter expression.

  • Filter – The expression passed by in the DataFilter property.

  • NumberOfRowsExtracted – Shows the number of rows after the filter was applied.


 


Here’s a sample of a job metadata file:


 

{
    "CopyActivityId": "00000000-0000-0000-0000-000000000000",
    "JobSubmissionTime": "2024-01-10T21:27:18Z",
    "JobCompletionTime": "2024-01-10T21:32:44Z",
    "RequestStartDate": "2024-01-06T00:00:00Z",
    "RequestEndDate": "2024-01-06T00:00:00Z",
    "ColumnsRequested":"ptenant, Id, Url, RootWeb, WebCount, StorageQuota, StorageUsed, StorageMetrics, GroupId, GeoLocation, IsInRecycleBin, IsTeamsConnectedSite, IsTeamsChannelSite, TeamsChannelType, IsHubSite, HubSiteId, BlockAccessFromUnmanagedDevices, BlockDownloadOfAllFilesOnUnmanagedDevices, BlockDownloadOfViewableFilesOnUnmanagedDevices, ShareByEmailEnabled, ShareByLinkEnabled, SensitivityLabelInfo, Classification, IBMode, IBSegments, Owner, SecondaryContact, ReadLocked, ReadOnly, CreatedTime, LastSecurityModifiedDate, Operation, SnapshotDate",
    "ExtractionMode": "Full",
    "IsFilterApplied": true,
    "Filter": "RootWeb.WebTemplateId eq 21",
    "NumberOfRowsExtracted": 4,
    "TableName": "BasicDataSet_v0.SharePointSites_v1",
    "ApplicationId": "00000000-0000-0000-0000-000000000000",
    "OfficeGeo": "NAM",
    "DataFactoryName": "mgdc-synapse"
}

 


 


7. Combining Filters with Deltas


 


It is possible to combine the filtering and the delta features of the SharePoint datasets in Microsoft Graph Data Connect.


 


When you specify a filtering expression and two different dates, the dataset will be first filtered and then the differences between the two dates will be calculated.


 


For more details about Delta datasets, see this blog post:
MGDC for SharePoint FAQ: How can I use Delta State Datasets? 


 


8. Filtering Errors


 


 If you pass an expression for filtering and there is a problem with it, Microsoft Graph Data Connect for SharePoint will fail the request and no data will be returned. An error message will be returned so you can understand what happened.


 


Here are some of the common error conditions:



  • Cannot filter and sample at the same time. If you specify both a DataFilter and the Top properties, you will get an error like this:


 

Operation on target Copy_Sites failed: 
ErrorCode=UserErrorOffice365DataLoaderError,
'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,
Message=Office365 data loading failed to execute. 
office365LoadErrorType: PermanentError. 
Not Supported: Both DataFilter and AdditionalDataSetProperties with 'top' can not be specified at the same time,
Source=Microsoft.DataTransfer.ClientLibrary,'

 



  • You specified a column name that is not one the columns supported for that dataset. If you specify a non-existent or unsupported column, you will get an error like this:


 

Operation on target Copy_Sites failed: 
ErrorCode=UserErrorOffice365DataLoaderError,
'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,
Message=Office365 data loading failed to execute. 
office365LoadErrorType: PermanentError. 
[1]: Invalid schema specified in filters. 
Only few columns are supported for filters. 
Supported Columns: ['Id', 'RootWeb.WebTemplateId']. 
Usage Example: Id eq '00000000-0000-0000-0000-000000000000' 
or RootWeb.WebTemplateId eq 21'
,Source=Microsoft.DataTransfer.ClientLibrary,'

 



  • You specified a bad expression. If you specify a malformed expression in your DataFilter expression, you will get an error.


 


9. Conclusion


 


I hope this blog post will help you get started with Filtering. For more information about Microsoft Graph Data Connect for SharePoint, please visit the collection of links I keep at Links about SharePoint on Microsoft Graph Data Connect.

Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.