In deep learning training, the calculation of training time involves multiple factors, including the number of epochs, global batch size, micro batch size, and the number of computing devices, among others. Below is a basic formula illustrating the relationship between these parameters (note that this is just a basic illustrative formula, mainly explaining proportional and inversely proportional relationships; actual training may require considering more factors):
Among them—
Epochs refer to the number of times the model processes the entire training dataset.
Total Number of Samples is the total number of samples in the training dataset.
Global Batch Size is the total number of data samples processed in each training iteration.
Time per Step is the time required for each training iteration, which depends on hardware performance, model complexity, optimization algorithms, and other factors.
Number of Devices is the number of computing devices used for training, such as the number of GPUs.
This formula provides a basic framework, but please note that the actual training time may be influenced by many other factors, including I/O speed, network latency (for distributed training), CPU-GPU communication speed, The Frequency of Hardware Failures During GPU Training, etc. Therefore, this formula can only serve as a rough estimate, and the actual training time may vary.
Detailed explanations
The training time of a deep learning model is determined by multiple factors, including but not limited to the following:
Number of Epochs: An epoch means that the model has processed the entire training dataset once. The more epochs, the more data the model needs to process, and thus the longer the training time.
Global Batch Size: The global batch size is the total number of data samples processed in each training iteration. The larger the global batch size, the more data is processed in each iteration, which may reduce the number of iterations required per epoch, potentially shortening the total training time. However, if the global batch size is too large, it may lead to memory overflow.
Micro Batch Size: The micro batch size refers to the number of data samples processed by each computing device in each training iteration. The larger the micro batch size, the more data each device processes per iteration, which may improve computational efficiency and thus shorten training time. However, if the micro batch size is too large, it may lead to memory overflow.
Hardware Performance: The performance of the computing devices used (such as CPUs, GPUs) will also affect training time. More powerful devices can perform computations faster, thereby shortening training time.
Model Complexity: The complexity of the model (such as the number of layers, number of parameters, etc.) will also affect training time. The more complex the model, the more computations are required, and thus the longer the training time.
Optimization Algorithm: The optimization algorithm used (such as SGD, Adam, etc.) and hyperparameter settings like learning rate will also affect training time.
Parallel Strategy: The use of parallel computing strategies such as data parallelism, model parallelism, etc., will also affect training time.
There are many factors that determine the length of training time, and they need to be considered comprehensively based on the specific training task and environment.
So, in this formula
Time per step should be understood as primarily related to the computational power of the GPU.”Time per Step,” that is, the time required for each training step, is determined by multiple factors, including but not limited to the following:
Hardware Performance: The performance of the computing devices used (such as CPUs, GPUs) will directly affect the speed of each training iteration. More powerful devices can perform computations faster.
Model Complexity: The complexity of the model (such as the number of layers, number of parameters, etc.) will also affect the time for each training iteration. The more complex the model, the more computations are required.
Optimization Algorithm: The optimization algorithm used (such as SGD, Adam, etc.) will also affect the time for each training iteration. Some optimization algorithms may require more complex computational steps to update the model parameters.
Data type used in training:Different data types used in training have significant effect on time per step. Data types include FP32, FP/BF16, FP8, etc.
Training steps
So, what determines the total training steps?”Total Training Steps” is determined by the number of training epochs and the number of steps per epoch. Specifically, it equals the number of epochs multiplied by the number of steps per epoch. This can be expressed with the following formula:
Global Batch Size
So, what determines the Global Batch Size?
global_batch_size =
gradient_accumulation_steps
* nnodes (node mumbers)
* nproc_per_node (GPU in one node)
* per_device_train_batch_si(micro bs size)
Assume a scenario:
batch_size = 10 # Batch size
total_num = 1000 # Total number of training data
When training one batch of data and updating the gradient once (gradient accumulation steps = 1):
This means there are 100 steps per epoch, and the gradient update steps are also 100. When the memory is insufficient to support a batch size of 10, we can use gradient accumulation to reduce the size of each micro-batch. Suppose we set the gradient accumulation steps to 2:
This means that for each gradient update, we accumulate data from 2 micro-batches, with each micro-batch size being 5. This reduces memory pressure, but the data size per gradient update remains 10 data points.
Result:
The number of training steps per epoch (train_steps) remains 100 because the total amount of data and the number of steps per epoch have not changed.
The gradient update steps remain 100 because each gradient update accumulates data from 2 micro-batches.
It is important to note that when using gradient accumulation, each training step handles the accumulation of gradients from multiple micro-batches, which may slightly increase the computation time per step. Therefore, if memory is sufficient, it is better to increase the batch size to reduce the number of gradient accumulations. When memory is insufficient, gradient accumulation is an effective method.
The global batch size significantly impacts the training effectiveness of the model. Generally, a larger global batch size provides more accurate gradient estimates, aiding model convergence. However, it also increases memory pressure on each device. If memory resources are limited, using a large global batch size may not be feasible.
In such cases, gradient accumulation can be used. By training with a smaller micro-batch size on each device, we reduce memory pressure while maintaining a large global batch size for accurate gradient estimates. This allows training large models on limited hardware resources without sacrificing the global batch size.
In summary, gradient accumulation is a trade-off strategy to balance global batch size and training effectiveness when memory resources are limited.
So, if we look at these two formulas:
The larger the global batch size, the shorter the total training time, provided that there is no OOM (Out of Memory) and the GPU computational power is not fully utilized.
The Relationship Between Data Parallelism and Batch Size
This section essentially analyzes this formula:
global_batch_size =
gradient_accumulation_steps
* nnodes (The number of nodes is, in effect, the PP)
* nproc_per_node (The number of cards per node is, in effect, the TP)
* per_device_train_batch_si(micro bs size)
In distributed deep learning, data parallelism is a common strategy. The training data is split into multiple small batches and distributed to different computing nodes. Each node has a copy of the model and trains on its data subset, speeding up the training process.
At the end of each training step, the model weights of all nodes are synchronized using the AllReduce operation. AllReduce aggregates gradients from all nodes and broadcasts the result back, allowing each node to update its model parameters.
If training on a single device, AllReduce is not needed as all computations occur on the same device. However, in distributed training, especially with data parallelism, AllReduce or similar operations are necessary to synchronize model parameters across devices.
Many deep learning frameworks (e.g., PyTorch, TensorFlow) use NVIDIA’s NCCL for communication across multiple GPUs. Each GPU trains on its data subset and synchronizes model weights using NCCL’s AllReduce at the end of each step.
Although AllReduce is commonly used in data parallelism, other NCCL operations may be employed depending on the framework and strategy.
Data parallelism (DP) and micro batch size are interrelated. DP involves training on multiple devices, each processing a portion of the data. Micro batch size is the number of samples each device processes per iteration. With DP, the original batch size is split into micro batches across devices. Without DP or model parallelism (MP), micro batch size equals global batch size. With DP or MP, the global batch size is the sum of all micro batches.
DP can be applied on multiple devices within a single server or across multiple servers. Setting DP to 8 means training on 8 devices, either on the same server or distributed across servers.
Pipeline parallelism (PP) is a different strategy where different model parts run on different devices. Setting DP to 8 in PP means 8 devices process data in parallel at each pipeline stage.
In summary, DP and PP can be used simultaneously on devices within a single server or across multiple servers.
For Project Service Automation customers on US government cloud, we will have a future announcement regarding upgrade and the availability of Project Operations.
Beginning March 31st, 2025, Microsoft will no longer support PSA on commercial cloud environments. There will not be any feature enhancements, updates, bug fixes, or other updates to this offering. Any support ticket logged for the PSA commercial cloud will be closed with instructions to upgrade to Dynamics 365 Project Operations.
We strongly encourage all customers of PSA commercial cloud to start planning your upgrade process as soon as possible so you can to take advantage of many new Project Operations features such as:
Integration with Project for the Web with many new advanced scheduling features
Project Operations was first released in October 2020 as a comprehensive product to manage Projects from inception to close by bringing together the strengths of Dataverse, Microsoft Dynamics 365 Finance and Supply Chain Management, and Project for the web assets.
Want to learn more about Project Operations? Check this link and navigate to our detailed documentation!
Want to try Project Operations? Click here and sign up for a 30-day trial!
This article is contributed. See the original author and article here.
Heya folks, Ned here again. Last November, Microsoft launched the Secure Future Initiative (SFI) to prepare for the increasing scale and high stakes of cyberattacks. SFI brings together every part of Microsoft to advance cybersecurity protection across our company and products.
Windows has focused on security options with each major release, and Windows 11 24H2 and Windows Server 2025 are no exception: they include a dozen new SMB features that make your data, your users, and your organization safer – and most are on by default. Today I’ll explain their usefulness, share some demos, and point to further details.
We now require signing by default for all Windows 11 24H2 SMB outbound and inbound connections and for all outbound connections in Windows Server 2025. This changes legacy behavior, where we required SMB signing by default only when connecting to shares named SYSVOL and NETLOGON and where Active Directory domain controllers required SMB signing for their clients.
How it helps you
SMB signing has been available for decades and prevents data tampering and relay attacks that steal credentials. By requiring signing by default, we ensure that an admin or user must opt out of this safer configuration, instead of requiring them to be very knowledgeable about SMB network protocol security and turn signing on.
The SMB client now supports blocking NTLM authentication for remote outbound connections. This changes the legacy behavior of always using negotiated authentication that could downgrade from Kerberos to NTLM.
How it helps you
Blocking NTLM authentication prevents tricking clients into sending NTLM requests to malicious servers, which counteracts brute force, cracking, relay, and pass-the-hash attacks. NTLM blocking is also required for forcing an organization’s authentication to Kerberos, which is more secure because it verifies identities with its ticket system and better cryptography. Admins can specify exceptions to allow NTLM authentication over SMB to certain servers.
The SMB server service now throttles failed authentication attempts by default. This applies to SMB sharing files on both Windows Server and Windows.
How it helps you
Brute force authentication attacks bombard the SMB server with multiple username and password-guesses and the frequency can range from dozens to thousands of attempts per second. The SMB authentication rate limiter is enabled by default with a 2 second delay between each failed NTLM or Local KDC Kerberos-based authentication attempt. An attack that sends 300 guesses per second for 5 minutes, for example – 90,000 attempts – would now take 50 hours to complete. An attacker is far more likely to simply give up than keep trying this method.
SMB insecure guest auth now off by default in Windows Pro editions
What it is
Windows 11 Pro no longer allows SMB client guest connections or guest fallback to an SMB server by default. This makes Windows 11 Pro operate like Windows 10 and Windows 11 Enterprise, Education, and Pro for Workstation editions have for years.
How it helps you
Guest logons don’t require passwords & don’t support standard security features like signing and encryption. Allowing a client to use guest logons makes the user vulnerable to attacker-in-the-middle scenarios or malicious server scenarios – for instance, a phishing attack that tricks a user into opening a file on a remote share or a spoofed server that makes a client think it’s legitimate. The attacker doesn’t need to know the user’s credentials and a bad password is ignored. Only third-party remote devices might require guest access by default. Microsoft-provided operating systems haven’t enabled guest in server scenarios since Windows 2000.
You can now mandate the SMB 2 and 3 protocol versions used.
How it helps you
Previously, the SMB server and client only supported automatically negotiating the highest matched dialect from SMB 2.0.2 to 3.1.1. This means you can intentionally block older protocol versions or devices from connecting. For example, you can specify connections to only use SMB 3.1.1, the most secure dialect of the protocol. The minimum and maximum can be set independently on both the SMB client and server, and you can set just a minimum if desired.
The SMB client now supports requiring encryption of all outbound SMB connections.
How it helps you
Encryption of all outbound SMB client connections enforces the highest level of network security and brings management parity to SMB signing. When enabled, the SMB client won’t connect to an SMB server that doesn’t support SMB 3.0 or later, or that doesn’t support SMB encryption. For example, a third-party SMB server might support SMB 3.0 but not SMB encryption. Unlike SMB signing, encryption is not required by default.
Remote Mailslots deprecated and disabled by default
What it is
Remote Mailslots are deprecated and disabled by default for SMB and for DC locator protocol usage with Active Directory.
How it helps you
The Remote Mailslot protocol is an obsolete, simple, unreliable, IPC method first introduced in MS DOS. It is completely unsafe and has no authentication or authorization mechanisms.
SMB over QUIC is now included in all Windows Server 2025 editions (Datacenter, Standard, Azure Edition), not just on Azure Edition like it was in Windows Server 2022.
How it helps you
SMB over QUIC is an alternative to the legacy TCP protocol and is designed for use on untrusted networks like the Internet. It uses TLS 1.3 and certificates to ensure that all SMB traffic is encrypted and usable through edge firewalls for mobile and remote users without the need for a VPN. The user experience does not change at all.
SMB over QUIC client access control lets you restrict which clients can access SMB over QUIC servers. The legacy behavior allowed connection attempts from any client that trusts the QUIC server’s certificate issuance chain.
How it helps you
Client access control creates allow and block lists for devices to connect to the file server. A client would now need its own certificate and be on an allow list to complete the QUIC connection before any SMB connection occurs. Client access control gives organizations more protection without changing the authentication used when making the SMB connection and the user experience does not change. You can also completely disable the SMB over QUIC client or only allow connection to specific servers.
You can use the SMB client to connect to alternative TCP, QUIC, and RDMA ports than their IANA/IETF defaults of 445, 5445, and 443.
How it helps you
With Windows Server, this allows you to host an SMB over QUIC connection on an allowed firewall port other than 443. You can only connect to alternative ports if the SMB server is configured to support listening on that port. You can also configure your deployment to block configuring alternative ports or specify that ports can only connect to certain servers.
The built-in firewall rules don’t contain the SMB NetBIOS ports anymore.
How it helps you
The NetBIOS ports were only necessary for SMB1 usage, and that protocol is deprecated and removed by default. This change brings SMB firewall rules more in line with the standard behavior for the Windows Server File Server role. Administrators can reconfigure the rules to restore the legacy ports.
SMB now supports auditing use of SMB over QUIC, missing third party support for encryption, and missing third party support for signing. These all operate at the SMB server and SMB client level.
How it helps you
It is much easier for you to determine if Windows and Windows Server devices are making SMB over QUIC connections. It is also much easier to determine if third parties support signing and encryption before mandating their usage.
With the release of Windows Server 2025 and Windows 11 24H2, we have made the most changes to SMB security since the introduction of SMB 2 in Windows Vista. Deploying these operating systems fundamentally alters your security posture and reduces risk to this ubiquitous remote file and data fabric protocol used by organizations worldwide.
This article is contributed. See the original author and article here.
After the release of Phi-3 at Microsoft Build 2024, it has received different attention, especially the application of Phi-3-mini and Phi-3-vision on edge devices. In the June update, we improved Benchmark and System role support by adjusting high-quality data training. In the August update, based on community and customer feedback, we brought Phi-3.5-mini-128k-instruct multi-language support, Phi-3.5-vision-128k with multi-frame image input, and provided Phi-3.5 MOE newly added for AI Agent. Next, let’s take a look
Multi-language support
In previous versions, Phi-3-mini had good English corpus support, but weak support for non-English languages. When we tried to ask questions in Chinese, there were often some wrong questions, such as
Obviously, this is a wrong answer
But in the new version, we can have better understanding and corpus support with the new Chinese prediction support
You can also try the enhancements in different languages, or in the scenario without fine-tuning and RAG, it is also a good model.
Phi-3.5-Vision enables Phi-3 to not only understand text and complete dialogues, but also have visual capabilities (OCR, object recognition, and image analysis, etc.). However, in actual application scenarios, we need to analyze multiple images to find associations, such as videos, PPTs, books, etc. In the new Phi-3-Vision, multi-frame or multi-image input is supported, so we can better complete the inductive analysis of videos, PPTs, and books in visual scenes.
As shown in this video
We can use OpenCV to extract key frames. We can extract 21 key frame images from the video and store them in an array.
images =[]
placeholder =“” for i inrange(1,22): withopen(“../output/keyframe_”+str(i)+“.jpg”,“rb”)as f:
In order to achieve higher performance of the model, in addition to computing power, model size is one of the key factors to improve model performance. Under a limited computing resource budget, training a larger model with fewer training steps is often better than training a smaller model with more steps.
Mixture of Experts Models (MoEs) have the following characteristics:
Faster pre-training speed than dense models
Faster inference speed than models with the same number of parameters
Requires a lot of video memory because all expert systems need to be loaded into memory
There are many challenges in fine-tuning, but recent research shows that instruction tuning for mixed expert models has great potential.
Now there are a lot of AI Agents applications, we can use MOEs to empower AI Agents. In multi-task scenarios, the response is faster.
We can explore a simple scenario where we want to use AI to help us write Twitter based on some content and translate it into Chinese and publish it to social networks. We can combine Phi-3.5 MOEs to complete this. We can use Prompt to set and arrange tasks, such as blog content publishing, translated content, and the best answer.
“””
sys_msg = “””
You are a helpful AI assistant, you are an agent capable of using a variety of tools to answer a question. Here are a few of the tools available to you:
– Blog: This tool helps you describe a certain knowledge point and content,andfinally write it into Twitter or Facebook style content – Translate: This is a tool that helps you translate into any language, using plain language as required – Final Answer: the final answer tool must be used to respond to the user. You must use this when you have decided on an answer.
To use these tools you must always respond in JSON format containing `“tool_name”` and `“input”` key–value pairs. For example, to answer the question,“Build Muliti Agents with MOE models” you must use the calculator tool like so:
{ “tool_name”:“Blog”, “input”:“Build Muliti Agents with MOE models” }
Or to translate the question “can you introduce yourself in Chinese” you must respond:
{ “tool_name”:“Search”, “input”:“can you introduce yourself in Chinese” }
Remember just output the final result, ouput in JSON format containing `“agentid”`,`“tool_name”` , `“input”` and `“output”` key–value pairs .:
{“agentid”:“step2”, “tool_name”:“Search”, “input”:“can you introduce yourself in Chinese”, “output”:“………” }, { “agentid”:“final” “tool_name”:“Result”, “output”:“………” } ]
The users answer isas follows.
“”“
We can see that by telling the model what skills it needs to have and the arrangement of tasks, we can assign them to different tasks through Phi-3.5 MOEs to complete related work.
Write something about Generative AI with MOEs , translate it to Chinese
The result :
[ { “agentid”:“step1”, “tool_name”:“Blog”, “input”:“Generative AI with MOE”, “output”:“Generative AI with MOE (Mixture of Experts) is a powerful approach that combines the strengths of generative models and the flexibility of MOE architecture. This hybrid model can generate high-quality, diverse, and contextually relevant content, making it suitable for various applications such as content creation, data augmentation, and more.” }, { “agentid”:“step2”, “tool_name”:“Translate”, “input”:“Generative AI with MOE is a powerful approach that combines the strengths of generative models and the flexibility of MOE architecture. This hybrid model can generate high-quality, diverse, and contextually relevant content, making it suitable for various applications such as content creation, data augmentation, and more.”, “output”:“基于生成AI的MOE(Mixture of Experts)是一种强大的方法,它结合了生成模型的优势和MOE架构的灵活性。这种混合模型可以生成高质量、多样化且上下文相关的内容,使其适用于各种应用,如内容创建、数据增强等。” }, { “agentid”:“final”, “tool_name”:“Result”, “output”:“基于生成AI的MOE(Mixture of Experts)是一种强大的方法,它结合了生成模型的优势和MOE架构的灵活性。这种混合模型可以生成高质量、多样化且上下文相关的内容,使其适用于各种应用,如内容创建、数据增强等。” } ]
If conditions permit, we can more smoothly integrate the Phi-3 MOEs model into frameworks such as AutoGen, Semantic Kernel, and Langchain.
SLMs do not replace LLMs but give GenAI a broader scenario. The update of Phi-3 allows more edge devices to have better support, including text, chat, and vision. In modern AI Agents application scenarios, we hope to have more efficient task execution efficiency. In addition to computing power, MoEs are the key to solving problems. Phi-3 is still iterating, and I hope everyone will pay more attention and give us better feedback.
This article is contributed. See the original author and article here.
[Available for iOS only, Android support coming soon]
If you already have a Dynamics 365 Field Service, Dynamics 365 Guides, and/or Dynamics 365 Remote Assist license, you can access the new remote assistance capabilities in your mobile Teams app automatically upon release at no additional cost.
Turn any mobile device into a mixed reality collaboration platform
Today, frontline workers use Teams within the Microsoft Dynamics 365 Remote Assist and Guides applications to collaborate with spatial annotations. The Remote Assist mobile app is a popular choice for workers on the go because it’s fast and easy to get anyone on a call, show the task in front of you, and ink your space.
Now, those same workers can quickly access this core functionality directly from the Teams mobile app as long you have Dynamics 365 Field Service license. For workers who are often on the move, having all their core collaboration capabilities in a single app makes the job easier. It eliminates the need to switch apps, while making sure all your collaboration capabilities from Teams are at your fingertips.
No more context switching—stay within the flow of work
Using this feature is as straightforward as joining a Teams meeting or making a call. With the front-facing camera, users can share their view with remote participants. This allows real-time collaboration relying on 3D annotations overlayed on physical objects to enhance comprehension.
Just like with the Remote Assist app, users can move and change angles without losing track of annotations anchored to their environment. This advanced level of interaction empowers Teams mobile users to share insights and reduce miscommunications that could lead to rework.
Reduce app sprawl by eliminating the need to manage another app
If your company already leverages Teams to facilitate communication and collaboration, why not make it cover more collaborative use cases for frontline workers too? IT administrators don’t need to manage another app to enable remote assistance capabilities for their mobile workforce.
Bringing Spatial Annotations to the Teams mobile app means fewer apps for IT teams to provision, update, and audit. Companies can benefit from Teams’ ability to support end-to-end encryption, data loss prevention, and compliance certifications, adding additional security measures protecting against unauthorized access to confidential company information.
How can I access Spatial Annotations on my mobile Teams app?
The public preview for iOS users is currently rolling out, with public preview for Android users coming later this summer. General availability will come later in 2024.
Infusing mixed reality capabilities into apps workers are already using, on devices they already have in their pockets, is just one way we’re working to bring mixed reality to frontline workers. We’re excited with this next step democratizing mixed reality and bringing leading-edge mixed reality solutions to more people across industries.
Recent Comments