How to make AI training faster

How to make AI training faster

This article is contributed. See the original author and article here.

You’re welcome to follow my GitHub repo and give it a star:https://github.com/xinyuwei-david/david-share.git


xinyuwei_0-1724472736934.png


 


Factors Affecting AI Training Time


In deep learning training, the calculation of training time involves multiple factors, including the number of epochs, global batch size, micro batch size, and the number of computing devices, among others. Below is a basic formula illustrating the relationship between these parameters (note that this is just a basic illustrative formula, mainly explaining proportional and inversely proportional relationships; actual training may require considering more factors):


xinyuwei_10-1724466430536.png


Among them—



  • Epochs refer to the number of times the model processes the entire training dataset.

  • Total Number of Samples is the total number of samples in the training dataset.

  • Global Batch Size is the total number of data samples processed in each training iteration.

  • Time per Step is the time required for each training iteration, which depends on hardware performance, model complexity, optimization algorithms, and other factors.

  • Number of Devices is the number of computing devices used for training, such as the number of GPUs.


This formula provides a basic framework, but please note that the actual training time may be influenced by many other factors, including I/O speed, network latency (for distributed training), CPU-GPU communication speed, The Frequency of Hardware Failures During GPU Training, etc. Therefore, this formula can only serve as a rough estimate, and the actual training time may vary.


 


Detailed explanations


The training time of a deep learning model is determined by multiple factors, including but not limited to the following:



  • Number of Epochs: An epoch means that the model has processed the entire training dataset once. The more epochs, the more data the model needs to process, and thus the longer the training time.

  • Global Batch Size: The global batch size is the total number of data samples processed in each training iteration. The larger the global batch size, the more data is processed in each iteration, which may reduce the number of iterations required per epoch, potentially shortening the total training time. However, if the global batch size is too large, it may lead to memory overflow.

  • Micro Batch Size: The micro batch size refers to the number of data samples processed by each computing device in each training iteration. The larger the micro batch size, the more data each device processes per iteration, which may improve computational efficiency and thus shorten training time. However, if the micro batch size is too large, it may lead to memory overflow.

  • Hardware Performance: The performance of the computing devices used (such as CPUs, GPUs) will also affect training time. More powerful devices can perform computations faster, thereby shortening training time.

  • Model Complexity: The complexity of the model (such as the number of layers, number of parameters, etc.) will also affect training time. The more complex the model, the more computations are required, and thus the longer the training time.

  • Optimization Algorithm: The optimization algorithm used (such as SGD, Adam, etc.) and hyperparameter settings like learning rate will also affect training time.

  • Parallel Strategy: The use of parallel computing strategies such as data parallelism, model parallelism, etc., will also affect training time.



There are many factors that determine the length of training time, and they need to be considered comprehensively based on the specific training task and environment.

So, in this formula


xinyuwei_11-1724468441652.png










Time per step should be understood as primarily related to the computational power of the GPU.”Time per Step,” that is, the time required for each training step, is determined by multiple factors, including but not limited to the following:

  • Hardware Performance: The performance of the computing devices used (such as CPUs, GPUs) will directly affect the speed of each training iteration. More powerful devices can perform computations faster.

  • Model Complexity: The complexity of the model (such as the number of layers, number of parameters, etc.) will also affect the time for each training iteration. The more complex the model, the more computations are required.

  • Optimization Algorithm: The optimization algorithm used (such as SGD, Adam, etc.) will also affect the time for each training iteration. Some optimization algorithms may require more complex computational steps to update the model parameters.

  • Data type used in training:Different data types used in training have significant effect on time per step. Data types include FP32, FP/BF16, FP8, etc.


Training steps


So, what determines the total training steps?”Total Training Steps” is determined by the number of training epochs and the number of steps per epoch. Specifically, it equals the number of epochs multiplied by the number of steps per epoch. This can be expressed with the following formula:
 









xinyuwei_12-1724468479243.png

 


Global Batch Size


So, what determines the Global Batch Size?

 

xinyuwei_13-1724468504561.png

global_batch_size = 
gradient_accumulation_steps 
* nnodes (node mumbers) 
* nproc_per_node (GPU in one node) 
* per_device_train_batch_si(micro bs size) 









Assume a scenario:






batch_size = 10  # Batch size  
total_num = 1000  # Total number of training data  


When training one batch of data and updating the gradient once (gradient accumulation steps = 1):


 

train_steps = total_num / batch_size = 1000 / 10 = 100  

 


This means there are 100 steps per epoch, and the gradient update steps are also 100.
When the memory is insufficient to support a batch size of 10, we can use gradient accumulation to reduce the size of each micro-batch. Suppose we set the gradient accumulation steps to 2:


 

gradient_accumulation_steps = 2  
micro_batch_size = batch_size / gradient_accumulation_steps = 10 / 2 = 5  

 


This means that for each gradient update, we accumulate data from 2 micro-batches, with each micro-batch size being 5. This reduces memory pressure, but the data size per gradient update remains 10 data points.

Result:



  • The number of training steps per epoch (train_steps) remains 100 because the total amount of data and the number of steps per epoch have not changed.

  • The gradient update steps remain 100 because each gradient update accumulates data from 2 micro-batches.


It is important to note that when using gradient accumulation, each training step handles the accumulation of gradients from multiple micro-batches, which may slightly increase the computation time per step. Therefore, if memory is sufficient, it is better to increase the batch size to reduce the number of gradient accumulations. When memory is insufficient, gradient accumulation is an effective method.

The global batch size significantly impacts the training effectiveness of the model. Generally, a larger global batch size provides more accurate gradient estimates, aiding model convergence. However, it also increases memory pressure on each device. If memory resources are limited, using a large global batch size may not be feasible.

In such cases, gradient accumulation can be used. By training with a smaller micro-batch size on each device, we reduce memory pressure while maintaining a large global batch size for accurate gradient estimates. This allows training large models on limited hardware resources without sacrificing the global batch size.

In summary, gradient accumulation is a trade-off strategy to balance global batch size and training effectiveness when memory resources are limited.



So, if we look at these two formulas:


xinyuwei_14-1724469770773.png


 


xinyuwei_15-1724469780649.png


The larger the global batch size, the shorter the total training time, provided that there is no OOM (Out of Memory) and the GPU computational power is not fully utilized.


 


The Relationship Between Data Parallelism and Batch Size












 This section essentially analyzes this formula:


global_batch_size = 
gradient_accumulation_steps 
* nnodes (The number of nodes is, in effect, the PP) 
* nproc_per_node (The number of cards per node is, in effect, the TP) 
* per_device_train_batch_si(micro bs size) 


In distributed deep learning, data parallelism is a common strategy. The training data is split into multiple small batches and distributed to different computing nodes. Each node has a copy of the model and trains on its data subset, speeding up the training process.

At the end of each training step, the model weights of all nodes are synchronized using the AllReduce operation. AllReduce aggregates gradients from all nodes and broadcasts the result back, allowing each node to update its model parameters.

If training on a single device, AllReduce is not needed as all computations occur on the same device. However, in distributed training, especially with data parallelism, AllReduce or similar operations are necessary to synchronize model parameters across devices.

Many deep learning frameworks (e.g., PyTorch, TensorFlow) use NVIDIA’s NCCL for communication across multiple GPUs. Each GPU trains on its data subset and synchronizes model weights using NCCL’s AllReduce at the end of each step.

Although AllReduce is commonly used in data parallelism, other NCCL operations may be employed depending on the framework and strategy.

Data parallelism (DP) and micro batch size are interrelated. DP involves training on multiple devices, each processing a portion of the data. Micro batch size is the number of samples each device processes per iteration. With DP, the original batch size is split into micro batches across devices. Without DP or model parallelism (MP), micro batch size equals global batch size. With DP or MP, the global batch size is the sum of all micro batches.

DP can be applied on multiple devices within a single server or across multiple servers. Setting DP to 8 means training on 8 devices, either on the same server or distributed across servers.

Pipeline parallelism (PP) is a different strategy where different model parts run on different devices. Setting DP to 8 in PP means 8 devices process data in parallel at each pipeline stage.

In summary, DP and PP can be used simultaneously on devices within a single server or across multiple servers.












 

SMB security hardening in Windows Server 2025 & Windows 11

SMB security hardening in Windows Server 2025 & Windows 11

This article is contributed. See the original author and article here.

Heya folks, Ned here again. Last November, Microsoft launched the Secure Future Initiative (SFI) to prepare for the increasing scale and high stakes of cyberattacks. SFI brings together every part of Microsoft to advance cybersecurity protection across our company and products.


Windows has focused on security options with each major release, and Windows 11 24H2 and Windows Server 2025 are no exception: they include a dozen new SMB features that make your data, your users, and your organization safer – and most are on by default. Today I’ll explain their usefulness, share some demos, and point to further details.


 


The new OSes will soon be generally available and you can preview them right now: download Windows Server 2025 and Windows 11 24H2.


 


On to the security.


 


SMB signing required by default


 


What it is


We now require signing by default for all Windows 11 24H2 SMB outbound and inbound connections and for all outbound connections in Windows Server 2025. This changes legacy behavior, where we required SMB signing by default only when connecting to shares named SYSVOL and NETLOGON and where Active Directory domain controllers required SMB signing for their clients.


 


How it helps you


SMB signing has been available for decades and prevents data tampering and relay attacks that steal credentials. By requiring signing by default, we ensure that an admin or user must opt out of this safer configuration, instead of requiring them to be very knowledgeable about SMB network protocol security and turn signing on.


 


Learn more



 


SMB NTLM blocking


 


Picture2.png


 


What it is


The SMB client now supports blocking NTLM authentication for remote outbound connections. This changes the legacy behavior of always using negotiated authentication that could downgrade from Kerberos to NTLM.


 


How it helps you


Blocking NTLM authentication prevents tricking clients into sending NTLM requests to malicious servers, which counteracts brute force, cracking, relay, and pass-the-hash attacks. NTLM blocking is also required for forcing an organization’s authentication to Kerberos, which is more secure because it verifies identities with its ticket system and better cryptography. Admins can specify exceptions to allow NTLM authentication over SMB to certain servers.


 


Learn more



 


SMB authentication rate limiter


 


What it is


The SMB server service now throttles failed authentication attempts by default. This applies to SMB sharing files on both Windows Server and Windows.


 


How it helps you


Brute force authentication attacks bombard the SMB server with multiple username and password-guesses and the frequency can range from dozens to thousands of attempts per second. The SMB authentication rate limiter is enabled by default with a 2 second delay between each failed NTLM or Local KDC Kerberos-based authentication attempt. An attack that sends 300 guesses per second for 5 minutes, for example – 90,000 attempts – would now take 50 hours to complete. An attacker is far more likely to simply give up than keep trying this method.


 


Learn more



 


SMB insecure guest auth now off by default in Windows Pro editions


 


What it is


Windows 11 Pro no longer allows SMB client guest connections or guest fallback to an SMB server by default. This makes Windows 11 Pro operate like Windows 10 and Windows 11 Enterprise, Education, and Pro for Workstation editions have for years.


 


How it helps you


Guest logons don’t require passwords & don’t support standard security features like signing and encryption. Allowing a client to use guest logons makes the user vulnerable to attacker-in-the-middle scenarios or malicious server scenarios – for instance, a phishing attack that tricks a user into opening a file on a remote share or a spoofed server that makes a client think it’s legitimate. The attacker doesn’t need to know the user’s credentials and a bad password is ignored. Only third-party remote devices might require guest access by default. Microsoft-provided operating systems haven’t enabled guest in server scenarios since Windows 2000.


 


Learn more



 


SMB dialect management


 


Picture3.png


 


What it is


You can now mandate the SMB 2 and 3 protocol versions used.


 


How it helps you


Previously, the SMB server and client only supported automatically negotiating the highest matched dialect from SMB 2.0.2 to 3.1.1. This means you can intentionally block older protocol versions or devices from connecting. For example, you can specify connections to only use SMB 3.1.1, the most secure dialect of the protocol. The minimum and maximum can be set independently on both the SMB client and server, and you can set just a minimum if desired.


 


Learn more



 


SMB client encryption mandate now supported


 


What it is


The SMB client now supports requiring encryption of all outbound SMB connections.


 


How it helps you


Encryption of all outbound SMB client connections enforces the highest level of network security and brings management parity to SMB signing. When enabled, the SMB client won’t connect to an SMB server that doesn’t support SMB 3.0 or later, or that doesn’t support SMB encryption. For example, a third-party SMB server might support SMB 3.0 but not SMB encryption. Unlike SMB signing, encryption is not required by default.


 


Learn more



 


Remote Mailslots deprecated and disabled by default


 


What it is


Remote Mailslots are deprecated and disabled by default for SMB and for DC locator protocol usage with Active Directory.


 


How it helps you


The Remote Mailslot protocol is an obsolete, simple, unreliable, IPC method first introduced in MS DOS. It is completely unsafe and has no authentication or authorization mechanisms.


 


Learn more



 


SMB over QUIC in Windows Server all editions


 


2024-08-23_08-28-33.png


 


What it is


SMB over QUIC is now included in all Windows Server 2025 editions (Datacenter, Standard, Azure Edition), not just on Azure Edition like it was in Windows Server 2022.


 


How it helps you


SMB over QUIC is an alternative to the legacy TCP protocol and is designed for use on untrusted networks like the Internet. It uses TLS 1.3 and certificates to ensure that all SMB traffic is encrypted and usable through edge firewalls for mobile and remote users without the need for a VPN. The user experience does not change at all.


 


Learn more



 


SMB over QUIC client access control


 


What it is


SMB over QUIC client access control lets you restrict which clients can access SMB over QUIC servers. The legacy behavior allowed connection attempts from any client that trusts the QUIC server’s certificate issuance chain.


 


How it helps you


Client access control creates allow and block lists for devices to connect to the file server. A client would now need its own certificate and be on an allow list to complete the QUIC connection before any SMB connection occurs. Client access control gives organizations more protection without changing the authentication used when making the SMB connection and the user experience does not change. You can also completely disable the SMB over QUIC client or only allow connection to specific servers.


 


Learn more



 


SMB alternative ports


 


What it is


You can use the SMB client to connect to alternative TCP, QUIC, and RDMA ports than their IANA/IETF defaults of 445, 5445, and 443.


 


How it helps you


With Windows Server, this allows you to host an SMB over QUIC connection on an allowed firewall port other than 443. You can only connect to alternative ports if the SMB server is configured to support listening on that port. You can also configure your deployment to block configuring alternative ports or specify that ports can only connect to certain servers.


 


Learn more



 


SMB Firewall default port changes


 


What it is


The built-in firewall rules don’t contain the SMB NetBIOS ports anymore.


 


How it helps you


The NetBIOS ports were only necessary for SMB1 usage, and that protocol is deprecated and removed by default. This change brings SMB firewall rules more in line with the standard behavior for the Windows Server File Server role. Administrators can reconfigure the rules to restore the legacy ports.


 


Learn more



 


SMB auditing improvements


 


What it is


SMB now supports auditing use of SMB over QUIC, missing third party support for encryption, and missing third party support for signing. These all operate at the SMB server and SMB client level.


 


How it helps you


It is much easier for you to determine if Windows and Windows Server devices are making SMB over QUIC connections. It is also much easier to determine if third parties support signing and encryption before mandating their usage.


 


Learn more



 


Summary


 


With the release of Windows Server 2025 and Windows 11 24H2, we have made the most changes to SMB security since the introduction of SMB 2 in Windows Vista. Deploying these operating systems fundamentally alters your security posture and reduces risk to this ubiquitous remote file and data fabric protocol used by organizations worldwide.


 


For more information on changes in Windows Server 2025, visit Windows Server Summit 2024 – March 26-28, 2024 | Microsoft Event. You will find dozens of presentations and demos on the latest features arriving this fall in our latest operating system.


 


And remember, you can try all of this right now: preview Windows Server 2025 and Windows 11 24H2.


 


Until next time,


 


– Ned Pyle

A better Phi Family is coming – multi-language support, better vision, intelligence MOEs

A better Phi Family is coming – multi-language support, better vision, intelligence MOEs

This article is contributed. See the original author and article here.


Phi3getstarted.png

 




 


After the release of Phi-3 at Microsoft Build 2024, it has received different attention, especially the application of Phi-3-mini and Phi-3-vision on edge devices. In the June update, we improved Benchmark and System role support by adjusting high-quality data training. In the August update, based on community and customer feedback, we brought Phi-3.5-mini-128k-instruct multi-language support, Phi-3.5-vision-128k with multi-frame image input, and provided Phi-3.5 MOE newly added for AI Agent. Next, let’s take a look



Multi-language support


In previous versions, Phi-3-mini had good English corpus support, but weak support for non-English languages. When we tried to ask questions in Chinese, there were often some wrong questions, such as


Lee_Stott_1-1724196256927.png

 





Obviously, this is a wrong answer


But in the new version, we can have better understanding and corpus support with the new Chinese prediction support

Lee_Stott_2-1724196257055.png

 







You can also try the enhancements in different languages, or in the scenario without fine-tuning and RAG, it is also a good model.


Code Sample:  https://github.com/microsoft/Phi-3CookBook/blob/main/code/09.UpdateSamples/Aug/phi3-instruct-demo.ipynb



Better vision



Phi-3.5-Vision enables Phi-3 to not only understand text and complete dialogues, but also have visual capabilities (OCR, object recognition, and image analysis, etc.). However, in actual application scenarios, we need to analyze multiple images to find associations, such as videos, PPTs, books, etc. In the new Phi-3-Vision, multi-frame or multi-image input is supported, so we can better complete the inductive analysis of videos, PPTs, and books in visual scenes.



As shown in this video






We can use OpenCV to extract key frames. We can extract 21 key frame images from the video and store them in an array.


images = [] 
placeholder = “” 
for i in range(1,22): 
    with open(“../output/keyframe_”+str(i)+“.jpg”, “rb”) as f:

        images.append(Image.open(“../output/keyframe_”+str(i)+“.jpg”))
        placeholder += f”n”







Combined with Phi-3.5-Vision’s chat template, we can perform a comprehensive analysis of multiple frames.

Lee_Stott_3-1724196257060.png



This allows us to more efficiently perform dynamic vision-based work, especially in edge scenarios.



Code Sample: https://github.com/microsoft/Phi-3CookBook/blob/main/code/09.UpdateSamples/Aug/phi3-vision-demo.ipynb



Intelligence MOEs



In order to achieve higher performance of the model, in addition to computing power, model size is one of the key factors to improve model performance. Under a limited computing resource budget, training a larger model with fewer training steps is often better than training a smaller model with more steps.



Mixture of Experts Models (MoEs) have the following characteristics:




  • Faster pre-training speed than dense models

  • Faster inference speed than models with the same number of parameters

  • Requires a lot of video memory because all expert systems need to be loaded into memory

  • There are many challenges in fine-tuning, but recent research shows that instruction tuning for mixed expert models has great potential.




Now there are a lot of AI Agents applications, we can use MOEs to empower AI Agents. In multi-task scenarios, the response is faster.



We can explore a simple scenario where we want to use AI to help us write Twitter based on some content and translate it into Chinese and publish it to social networks. We can combine Phi-3.5 MOEs to complete this. We can use Prompt to set and arrange tasks, such as blog content publishing, translated content, and the best answer.



“””

sys_msg = “””You are a helpful AI assistant, you are an agent capable of using a variety of tools to answer a question. Here are a few of the tools available to you:

 Blog: This tool helps you describe a certain knowledge point and content, and finally write it into Twitter or Facebook style content
 Translate: This is a tool that helps you translate into any language, using plain language as required
 Final Answer: the final answer tool must be used to respond to the user. You must use this when you have decided on an answer.

To use these tools you must always respond in JSON format containing `“tool_name”` and `“input”` keyvalue pairs. For example, to answer the question, “Build Muliti Agents with MOE models” you must use the calculator tool like so:

{
    “tool_name”: “Blog”,
    “input”: “Build Muliti Agents with MOE models”
}

Or to translate the question “can you introduce yourself in Chinese” you must respond:

{
    “tool_name”: “Search”,
    “input”: “can you introduce yourself in Chinese”
}

Remember just output the final result, ouput in JSON format containing `“agentid”`,`“tool_name”` , `“input”` and `“output”`  keyvalue pairs .:

[
    {   “agentid”: “step1”,
        “tool_name”: “Blog”,
        “input”: “Build Muliti Agents with MOE models”,
        “output”: “………”
    },

    {   “agentid”: “step2”,
        “tool_name”: “Search”,
        “input”: “can you introduce yourself in Chinese”,
        “output”: “………”
    },
    {
        “agentid”: “final”
        “tool_name”: “Result”,
        “output”: “………”
    }
]

The users answer is as follows.

“”





We can see that by telling the model what skills it needs to have and the arrangement of tasks, we can assign them to different tasks through Phi-3.5 MOEs to complete related work.





Write something about Generative AI with MOEs , translate it to Chinese



The result :



[
    {   
        “agentid”: “step1”, 
        “tool_name”: “Blog”,
        “input”: “Generative AI with MOE”, 
        “output”: “Generative AI with MOE (Mixture of Experts) is a powerful approach that combines the strengths of generative models and the flexibility of MOE architecture. This hybrid model can generate high-quality, diverse, and contextually relevant content, making it suitable for various applications such as content creation, data augmentation, and more.”
    },
    {   
        “agentid”: “step2”,
        “tool_name”: “Translate”,   
        “input”: “Generative AI with MOE is a powerful approach that combines the strengths of generative models and the flexibility of MOE architecture. This hybrid model can generate high-quality, diverse, and contextually relevant content, making it suitable for various applications such as content creation, data augmentation, and more.”, 
        “output”: “基于生成AI的MOE(Mixture of Experts)是一种强大的方法,它结合了生成模型的优势和MOE架构的灵活性。这种混合模型可以生成高质量、多样化且上下文相关的内容,使其适用于各种应用,如内容创建、数据增强等。”
    },
    { 
         “agentid”: “final”,
         “tool_name”: “Result”,
         “output”: “基于生成AI的MOE(Mixture of Experts)是一种强大的方法,它结合了生成模型的优势和MOE架构的灵活性。这种混合模型可以生成高质量、多样化且上下文相关的内容,使其适用于各种应用,如内容创建、数据增强等。”
    }
]






If conditions permit, we can more smoothly integrate the Phi-3 MOEs model into frameworks such as AutoGen, Semantic Kernel, and Langchain.



Code Sample: https://github.com/microsoft/Phi-3CookBook/blob/main/code/09.UpdateSamples/Aug/phi3_moe_demo.ipynb



Thoughts on SLMs


 



SLMs do not replace LLMs but give GenAI a broader scenario. The update of Phi-3 allows more edge devices to have better support, including text, chat, and vision. In modern AI Agents application scenarios, we hope to have more efficient task execution efficiency. In addition to computing power, MoEs are the key to solving problems. Phi-3 is still iterating, and I hope everyone will pay more attention and give us better feedback.










Learn how to customize and optimize Copilot for Security with the custom Data Security plugin

Learn how to customize and optimize Copilot for Security with the custom Data Security plugin

This article is contributed. See the original author and article here.

This is a step-by-step guided walkthrough of how to use the custom Copilot for Security pack for Microsoft Data Security and how it can empower your organization to understand the cyber security risks in a context that allows them to achieve more. By focusing on the information and organizational context to reflect the real impact/value of investments and incidents in cyber. We are working to add this to our native toolset as well, we will update once ready.


 


Prerequisites



  • License requirements for Microsoft Purview Information Protection depend on the scenarios and features you use. To understand your licensing requirements and options for Microsoft Purview Information Protection, see the Information Protection sections from Microsoft 365 guidance for security & compliance and the related PDF download for feature-level licensing requirements. You also need to be licensed for Microsoft Copilot for Security, more information here.

  • Consider setting up Azure AI Search to ingest policy documents, so that they can be part of the process.


 


Step-by-step guided walkthrough


In this guide we will provide high-level steps to get started using the new tooling. We will start by adding the custom plugin.



  1. Go to securitycopilot.microsoft.com

  2. Download the DataSecurityAnalyst.yml file from here.

  3. Select the plugins icon down in the left corner.


JonNordstrm_0-1713791147737.png


 



  1. Under Custom upload, select upload plugin.


JonNordstrm_1-1713791147745.png


 



  1. Select the Copilot for Security plugin and upload the DataSecurityAnalyst.yml file.


JonNordstrm_2-1713791147749.png


 



  1. Click Add

  2. Under Custom you will now see the plug-in


JonNordstrm_3-1713791147750.png


 


 


 The custom package contains the following prompts


 Under DLP you will find this if you type /DLP


 


JonNordstrm_4-1713791147758.png


 


 


Under Sensitive you will find this if you type sensitive


 


JonNordstrm_5-1713791147767.png


 


Let us get started using this together with the Copilot for Security capabilities



Anomalies detection sample


The DLP anomaly is checking data from the past 30 days and inspect on a 30m interval for possible anomalies. Using a timeseries decomposition model.


 


JonNordstrm_0-1713794451225.png


 


The sensitivity content anomaly is using a slightly different model due to the amount of data. It is based on the diffpatterns function that compares week 3,4 with week 1,2.


 


JonNordstrm_1-1713794620074.png


 


Access to sensitive information by compromised accounts.


This example is checking the alerts reported against users with sensitive information that they have accessed.


 


JonNordstrm_2-1713794838205.png


 


Who has accessed a Sensitive e-mail and from where?


We allow for organizations to input message subject or message Id to identify who has opened a message. Note this only works for internal recipients.


 


JonNordstrm_3-1713794932861.png


 


You can also ask the plugin to list any emails classified as Sensitive being accessed from a specific network or affected of a specific CVE.


 


JonNordstrm_10-1713791147801.png


 


Document accessed by possible compromised accounts.


You can use the plugin to check if compromised accounts have been accessing a specific document.


 


JonNordstrm_11-1713791147806.png


 


CVE or proximity to ISP/IPTags


This is a sample where you can check how much sensitive information that is exposed to a CVE as an example. You can pivot this based on ISP as well.


 


JonNordstrm_0-1713795319975.png


 


Tune Exchange DLP policies sample.


If you want to tune your Exchange, Teams, SharePoint, Endpoint or OCR rules and policies you can ask Copilot for Security for suggestions.


 


JonNordstrm_13-1713791147819.png


 


Purview unlabelled operations


How many of the operations in your different departments are unlabelled?  Are any of the departments standing out?


 


JonNordstrm_14-1713791147842.png


 


In this context you can also use Copilot for Security to deliver recommendations and highlight what the benefit of sensitivity labels are bringing.


 


JonNordstrm_15-1713791147861.png


 


 


Applications accessing sensitive content.


What applications have been used to access sensitive content? The plugin supports asking for applications being used to access sensitive content. This can be a fairly long list of applications, you can add filters in the code to filter out common applications.


 


JonNordstrm_16-1713791147868.png


 


If you want to zoom into what type of content a specific application is accessing.


 


JonNordstrm_17-1713791147876.png


 


What type of network connectivity has been made from this application?


 


JonNordstrm_1-1713795957292.png


 


Or what if you get concerned about the process that has been used and want to validate the SHA256?


 


JonNordstrm_19-1713791147887.png


 


 


Hosts that are internet accessible accessing sensitive content


Another threat vector could be that some of your devices are accessible to the Internet and sensitive content is being processed. Check for processing of secrets and other sensitive information.


 


JonNordstrm_2-1713796212776.png


 


 


Promptbooks


Promptbooks are a valuable resource for accomplishing specific security-related tasks. Consider them as a way to practically implement your standard operating procedure (SOP) for certain incidents. By following the SOP, you can identify the various dimensions in an incident in a standardized way and summarize the outcome. For more information on prompt books please see this documentation.


 


Exchange incident sample prompt book


 


JonNordstrm_21-1713791147894.png


 


JonNordstrm_0-1713855135569.png


 


JonNordstrm_1-1713855341307.png


 


Note: The above detail is currently only available using Sentinel, we are working on Defender integration.


 


JonNordstrm_3-1713855588028.png


 


 


JonNordstrm_4-1713855701088.png


 


JonNordstrm_5-1713855792749.png


 


JonNordstrm_6-1713855936122.png


 


SharePoint sample prompt book


JonNordstrm_28-1713791147951.png


 


JonNordstrm_7-1713856107627.png


 


JonNordstrm_8-1713856185445.png


 


JonNordstrm_9-1713856281126.png


 


JonNordstrm_32-1713791147978.png


 


JonNordstrm_10-1713856446267.png


 


JonNordstrm_11-1713856606803.png


 


JonNordstrm_12-1713856723307.png


 


Posts part of this series


Comprehensive coverage and cost-savings with Microsoft Sentinel’s new data tier

Comprehensive coverage and cost-savings with Microsoft Sentinel’s new data tier

This article is contributed. See the original author and article here.

As digital environments grow across platforms and clouds, organizations are faced with the dual challenges of collecting relevant security data to improve protection and optimizing costs of that data to meet budget limitations. Management complexity is also an issue as security teams work with diverse datasets to run on-demand investigations, proactive threat hunting, ad hoc queries and support long-term storage for audit and compliance purposes. Each log type requires specific data management strategies to support those use cases. To address these business needs, customers need a flexible SIEM (Security Information and Event Management) with multiple data tiers.


 


Microsoft is excited to announce the public preview of a new data tier Auxiliary Logs and Summary Rules in Microsoft Sentinel to further increase security coverage for high-volume data at an affordable price.  


 


Auxiliary Logs supports high-volume data sources including network, proxy and firewall logs. Customers can get started today in preview with Auxiliary Logs today at no cost. We will notify users in advance before billing begins at $0.15 per Gb (US East). Initially Auxiliary Logs allow long term storage, however on-demand analysis is limited to the last 30 days.  In addition, queries are on a single table only.  Customers can continue to build custom solutions using Azure Data Explorer however the intention is that Auxiliary Logs cover most of those use-cases over time and are built into Microsoft Sentinel, so they include management capabilities. 


 


Summary Rules further enhance the value of Auxiliary Logs. Summary Rules enable customers to easily aggregate data from Auxiliary Logs into a summary that can be routed to Analytics Logs for access to the full Microsoft Sentinel query feature set. The combination of Auxiliary logs and Summary rules enables security functions such as Indicator of Compromise (IOC) lookups, anomaly detection, and monitoring of unusual traffic patterns. Together, Auxiliary Logs and Summary Rules offer customers greater data flexibility, cost-efficiency, and comprehensive coverage. 


 


Some of the key benefits of Auxiliary Logs and Summary Rules include: 



  • Cost-effective coverage: Auxiliary Logs are ideal for ingesting large volumes of verbose logs at an affordable price-point. When there is a need for advanced security investigations or threat hunting, Summary Rules can aggregate and route Auxiliary Logs data to the Analytics Log tier delivering additional cost-savings and security value.  



  • On-demand analysis: Auxiliary Logs supports 30 days of interactive queries with limited KQL, facilitating access and analysis of crucial security data for threat investigations. 



  • Flexible retention and storage: Auxiliary Logs can be stored for up to 12 years in long-term retention. Access to these logs is available through running a search job. 


 


Microsoft Sentinel’s multi-tier data ingestion and storage options 


Microsoft is committed to providing customers with cost-effective, flexible options to manage their data at scale. Customers can choose from the different log plans in Microsoft Sentinel to meet their business needs. Data can be ingested as Analytics, Basic and Auxiliary Logs. Differentiating what data to ingest and where is crucial. We suggest categorizing security logs into primary and secondary data.  



  • Primary logs (Analytics Logs): Contain data that is of critical security value and are utilized for real-time monitoring, alerts, and analytics. Examples include Endpoint Detection and Response (EDR) logs, authentication logs, audit trails from cloud platforms, Data Loss Prevention (DLP) logs, and threat intelligence.  

    • Primary logs are usually monitored proactively, with scheduled alerts and analytics, to enable effective security detections.  

    • In Microsoft Sentinel, these logs would be directed to Analytics Logs tables to leverage the full Microsoft Sentinel value. 

    • Analytics Logs are available for 90 days to 2 years, with 12 years long-term retention option. 





  • Secondary logs (Auxiliary Logs): Are verbose, low-value logs that contain limited security value but can help draw the full picture of a security incident or breach. They are not frequently used for deep analytics or alerts and are often accessed on-demand for ad-hoc querying, investigations, and search.  

    • These include NetFlow, firewall, and proxy logs, and should be routed to Basic Logs or Auxiliary Logs. 

      • Auxiliary logs are appropriate when using Log Stash, Cribl or similar for data transformation. 

      • In the absence of transformation tools, Basic Logs are recommended.  



    • Both Basic and Auxiliary Logs are available for 30 days, with long-term retention option of up to 12 years. 

    • Additionally, for extensive ML, complex hunting tasks and frequent, extensive long-term retention customers have the choice of ADX. But this adds additional complexity and maintenance overhead. 




Microsoft Sentinel’s native data tiering offers customers the flexibility to ingest, store and analyze all security data to meet their growing business needs.  


 


Use case example: Auxiliary Logs and Summary Rules Coverage for Firewall Logs 


Firewall event logs are a critical network log source for threat hunting and investigations. These logs can reveal abnormally large file transfers, volume and frequency of communication by a host, and port scanning. Firewall logs are also useful as a data source for various unstructured hunting techniques, such as stacking ephemeral ports or grouping and clustering different communication patterns.  


In this scenario, organizations can now easily send all firewall logs to Auxiliary Logs at an affordable price point. In addition, customers can run a Summary Rule that creates scheduled aggregations and route them to the Analytics Logs tier. Analysts can use these aggregations for their day-to-day work and if they need to drill down, they can easily query the relevant records from Auxiliary Logs. Together Auxiliary Logs and Summary Rules help security teams use high volume, verbose logs to meet their security requirements while minimizing costs. 


 


Yael_Bergman_0-1724143656000.png


Figure 1: Ingest high volume, verbose firewall logs into an Auxiliary Logs table. 


 


Yael_Bergman_1-1724143656002.png


Figure 2: Create aggregated datasets on the verbose logs in Auxiliary Logs plan.  


 


Customers are already finding value with Auxiliary Logs and Summary Rules as seen below: 


“The BlueVoyant team enjoyed participating in the private preview for Auxiliary logs and are grateful Microsoft has created new ways to optimize log ingestion with Auxiliary logs. The new features enable us to transform data that is traditionally lower value into more insightful, searchable data.” 


Mona Ghadiri 


Senior Director of Product Management, BlueVoyant 


 


“The Auxiliary Log is a perfect fusion of Basic Log and long-term retention, offering the best of 
both worlds. When combined with Summary Rules, it effectively addresses various use cases for ingesting large volumes of logs into Microsoft Sentinel.” 


Debac Manikandan 


Senior Cybersecurity Engineer, DEFEND 


 


Looking forward 


Microsoft is committed to expanding the scenarios covered by Auxiliary Logs over time, including data transformation and standard tables, improved query performance at scale, billing and more. We are working closely with our customers to collect feedback and will continue to add more functionality. As always, we’d love to hear your thoughts.  


 


Learn more