Agentic AI in retail: How Dynamics 365 powers Commerce Anywhere

Agentic AI in retail: How Dynamics 365 powers Commerce Anywhere

This article is contributed. See the original author and article here.

Retail Frontier Firms are evolving their operating models to keep pace with increasingly dynamic markets, using AI to support more responsive and resilient decision-making and execution across commerce channels. Rather than improving individual functions in isolation, these organizations are rethinking how commerce operates end to end, enabling AI agents to work alongside people to support faster, more consistent outcomes across the business. This evolution is accelerating as retailers navigate rising customer expectations, sustained margin pressure, volatile demand, and ongoing labor constraints: conditions that benefit from decisions being made and executed more continuously.

In Retail Frontier Firms, AI capabilities are embedded where decisions and value are created: in stores, at the digital shelf, across merchandising, pricing, fulfillment, customer service, and checkout. AI agents interpret signals from customers, inventory, suppliers, and channels and help coordinate actions across the enterprise. This supports retailers as they respond to change with greater speed, consistency, and scale across touchpoints.

This operating model is enabled by agents that share context and operate cohesively across the retail ecosystem. At the core of agentic commerce is Model Context Protocol (MCP), which provides AI agents with access to a shared, enterprise-grade understanding of products, inventory, pricing, policies, and customer intent. By grounding agents in a common business context, MCP helps support aligned, governed, and consistent decision-making across channels and functions. The future of retail is increasingly shaped by a human and AI agent operating model, connected by shared context and open protocols.

  • Model Context Protocol (MCP) unlocks hundreds of thousands of business functions for secure, real-time use by agents, developers, and applications. 
  • Agent Communication Protocol (ACP) enables agents across merchandising, supply chain, store operations, and service to collaborate end to end, helping reduce fragmentation and align execution across functions.
  • Payment and transaction agent protocols extend AI capabilities through checkout and settlement, supporting trusted, compliant transactions across in-store, online, and conversational commerce.

Together, these capabilities support a more outcome-driven operating model focused on availability, margin, conversion, service levels, and loyalty. Humans define strategy, priorities, and guardrails, while AI agents help orchestrate execution across day-to-day operations: supporting modern retail operations designed for Commerce Anywhere.

As consumer expectations continue to rise, shoppers increasingly demand seamless, continuous interactions where they move effortlessly from social-commerce discovery to mobile checkout, in-store pickup, curbside fulfillment, or voice-activated reordering. Frontier retail responds to this shift by dissolving the boundaries between channels and touchpoints, allowing commerce to adapt in real time to customer intent, location, and context. For brands, this means the ability to deliver frictionless, anticipatory commerce at scale by meeting customers wherever they are, with relevance and speed, without adding operational complexity.

The industry is rapidly shifting away from static, siloed channels toward autonomous, context-aware agents that orchestrate buying journeys seamlessly across stores, digital experiences, and conversational interfaces. Agents move beyond traditional personalization. They actively guide product discovery, shape contextual offers, negotiate availability, and coordinate fulfillment, helping to continuously optimize inventory, pricing, promotions, and supply-chain decisions behind the scenes. As personalization and automation become table stakes, agentic AI emerges as the strategic engine driving scalable growth and sustainable Commerce Anywhere.

Introducing Microsoft Dynamics 365 Commerce MCP Server

Agentic commerce introduces a new operating model in which AI agents collaborate through MCP, enabling continuous decision-making and coordinated execution across the retail value chain. The new Dynamics 365 Commerce MCP Server exposes core retail business logic including catalog, pricing, promotions, inventory, carts, orders, and fulfillment as MCP-enabled capabilities. Expected to be in preview in February 2026, this will allow retailers to build agentic commerce experiences where AI agents can securely discover, decide, and execute retail workflows across digital, physical, and conversational channels.

By combining the ERP, Analytics, and Commerce MCP servers, Dynamics 365 supports a more agent-driven operating model in which front-office experiences and back-office operations are connected and optimized, helping retailers operate with greater agility and readiness for Commerce Anywhere.

How retailers can begin adopting agentic commerce today

Retail leaders can begin moving toward agentic commerce by adopting AI agents in three practical ways:

  1. Starting with agents embedded in Dynamics 365
  2. Extending capabilities through custom-built agents using MCP
  3. Leveraging partner-built agents across the broader retail ecosystem

Together, these approaches allow retailers to progress at their own pace while aligning agent adoption to their operating model, business priorities, and maturity.

1. Start with agents embedded in Dynamics 365

Purpose-built agents are designed to address common retail challenges and operational friction points. Dynamics 365 agents and retail industry agents can be embedded directly into core business processes, allowing teams to realize value quickly.

Microsoft retail industry agents, like the Catalog Enrichment Agent and Personalized Shopping Agent are examples of vertical-specific agents designed around retail data models, workflows, and decision patterns that support scenarios like product discovery, assortment accuracy, and personalized engagement without requiring custom development.

Today, in Dynamics 365 the Supplier Communications Agent is a good example of embedded agents in action. Retailers can proactively monitor supply signals and engage suppliers in real time to confirm availability, align delivery timelines, and respond to changes earlier. This supports faster coordination, fewer surprises, and more reliable execution at scale.

2. Build custom agents using MCP

Retail operations are shaped by business logic that is unique to each organization: driven by merchandising strategies, store formats, service models, and supply-chain constraints. Microsoft Copilot Studio enables retailers to build custom AI agents that encode their own rules across replenishment, allocation, fulfillment, and store execution, aligning agent behavior directly to how the business operates.

These custom agents can operate across planning and selling in the flow of work using MCP-powered access to enterprise systems. Inside Microsoft Teams, Merchandising Managers and Planners can collaborate in real time with agents that access products, demand forecasts, supplier relationships, inventory, and pricing through Dynamics 365 ERP MCP.

On the selling side, through the Dynamics 365 Commerce MCP Server, your custom agents can extend intelligence into customer experiences. Agents can discover products, personalize offers, assess availability, reserve inventory, and complete transactions across digital, physical, and conversational channels while operating with a unified view of pricing, promotions, and fulfillment.

const currentTheme =
localStorage.getItem(‘blogInABoxCurrentTheme’) ||
(window.matchMedia(‘(prefers-color-scheme: dark)’).matches ? ‘dark’ : ‘light’);

// Modify player theme based on localStorage value.
let options = {“autoplay”:false,”hideControls”:null,”language”:”en-us”,”loop”:false,”partnerName”:”cloud-blogs”,”poster”:”https://cdn-dynmedia-1.microsoft.com/is/image/microsoftcorp/938028-agenticcommercewithDynamics-365_tbmnl_en-us?wid=1280″,”title”:””,”sources”:[{“src”:”https://cdn-dynmedia-1.microsoft.com/is/content/microsoftcorp/938028-agenticcommercewithDynamics-365-0x1080-6439k”,”type”:”video/mp4″,”quality”:”HQ”},{“src”:”https://cdn-dynmedia-1.microsoft.com/is/content/microsoftcorp/938028-agenticcommercewithDynamics-365-0x720-3266k”,”type”:”video/mp4″,”quality”:”HD”},{“src”:”https://cdn-dynmedia-1.microsoft.com/is/content/microsoftcorp/938028-agenticcommercewithDynamics-365-0x540-2160k”,”type”:”video/mp4″,”quality”:”SD”},{“src”:”https://cdn-dynmedia-1.microsoft.com/is/content/microsoftcorp/938028-agenticcommercewithDynamics-365-0x360-958k”,”type”:”video/mp4″,”quality”:”LO”}]};

if (currentTheme) {
options.playButtonTheme = currentTheme;
}

document.addEventListener(‘DOMContentLoaded’, () => {
ump(“ump-69652f15e14a6”, options);
});

3. Extend agentic commerce through partners and the ecosystem

Retailers can further accelerate agent adoption by leveraging partner-built agents designed for specific retail scenarios and industries. Commerce MCP enables software development companies and system integrators to build agents more quickly by reducing integration overhead, standardizing access to retail data, and maintaining enterprise-grade trust and compliance.

Early partner solutions already demonstrate the breadth of what’s possible, from store associate productivity and clienteling to conversational commerce and business-to-business (B2B) buying experiences, including:

  • Amicis: The Store Commerce Agent is a voice-first, screen-aware assistant designed for in-the-moment store execution. It can help associates complete high-friction tasks like returns, exchanges, order lookups, and policy checks in Dynamics 365 Commerce using natural voice commands, while adapting to what’s on the POS screen.
  • Evenica: The B2B Licensee Product Request Agent uses conversational AI and image recognition to support licensees in finding beverage products. When a product is not available in the catalog, the agent can create a request case to support the product intake process.
  • Argano: The Retail Clienteling Agent offers a conversational clienteling experience by bringing together customer insights, product data, and agentic AI into a single, governed workflow. It helps retail associates improve customer relationships by delivering personalized, brand-aligned interactions before, during, and after in-store appointments.
  • Sunrise: The Commerce Companion is a suite of retail agents that help simplify everyday store operations across inventory and fulfillment to purchasing and store processes. Using natural language, it is designed to deliver fast, accurate answers and guided actions, which can enable associates to serve customers efficiently while keeping operations moving smoothly.
  • Visionet: FashionGPT Agent can turn natural-language shopping intent into real-time retail execution across product, pricing, inventory, and promotions. It drives the end-to-end shopping journey and help turn conversations into measurable actions across channels.

Together, embedded agents, custom-built agents, and partner solutions give retailers flexible entry points into agentic commerce supporting near-term impact while laying the foundation for a more adaptive, AI-enabled operating model across Commerce Anywhere.

Agentic retail with Dynamics 365 in action at NRF 2026

At NRF, we will demonstrate how Dynamics 365 works with Copilot and agentic capabilities to support Commerce Anywhere and more efficient, end-to-end retail operations. We will share examples of how retailers are using Dynamics 365 to evolve their operating models and advance Frontier Firm capabilities.

Visit us during NRF expo hours at Level 3, Booth 4503, and join the related theater sessions at our booth:

  • Beyond the Boutique: How Frette Uses AI to Transform Store Experience
    January 11, 2026 (Sunday), 2:00 PM ET
    Session led by Sunrise Technologies
  • Reimagine retail business processes with Agentic ERP
    January 13, 2026 (Tuesday), 2:30 PM ET

The future of retail belongs to frontier organizations that can sense, decide, and act in real time. With agentic commerce enabled by Dynamics 365, retailers gain the foundation to move faster with confidence, aligning strategy, execution, and customer experience through intelligent agents that operate seamlessly across every channel. We look forward to connecting with you in New York and exploring how agentic business applications in Dynamics 365 can support your next step forward.

The post Agentic AI in retail: How Dynamics 365 powers Commerce Anywhere appeared first on Microsoft Dynamics 365 Blog.

Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.

What’s new in Microsoft Copilot Studio: November 2025

What’s new in Microsoft Copilot Studio: November 2025

This article is contributed. See the original author and article here.

November 2025 was a busy month for Microsoft Copilot Studio, marked by major announcements at Microsoft Ignite 2025 and a wave of new features now rolling out to makers.

The post What’s new in Microsoft Copilot Studio: November 2025 appeared first on Microsoft 365 Blog.

Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.

Dynamics 365 sets the bar for agentic sales qualification on new benchmark

Dynamics 365 sets the bar for agentic sales qualification on new benchmark

This article is contributed. See the original author and article here.

In October 2025, we announced the general availability of the Sales Qualification Agent (SQA) in Dynamics 365 Sales—a breakthrough in autonomous lead qualification. Sales Qualification Agent empowers sellers by helping build higher quality opportunity while eliminating tedious, repetitive work. Sales Qualification Agent autonomously researches every lead, initiates personalized outreach, and engages prospects to understand purchase intent, ensuring that sellers spend their time meeting prospects who are ready to take the next step. With modes enabling both seller-driven and fully autonomous qualification, the agent supports a key goal for sales organizations—increasing revenue per seller.

Customers are using Sales Qualification Agent in two ways: 

  1. Helping boost revenue beyond current sales capacity
    • Responding to inbound leads within minutes instead of days, increasing response rates and in turn, qualified opportunities.
    • Engaging leads that sellers are unable to follow up on due to capacity constraints, or those deemed economically unviable to pursue.
    • Increasing pipeline quality by focusing the seller’s time on a handful of high intent, engaged leads recommended by the agent.
  2. Helping reduce sales costs
    • Reducing back-office costs related to lead research and validation, using Sales Qualification Agent in “Research only” mode to hand-off only the leads that meet the ideal customer profile criteria.
    • Automatically disqualifying low-quality leads, saving hours of seller time during the week.

Continuing benchmarking the quality of sales AI agents

Microsoft is building the future of agentic Sales technology with prebuilt AI agents, such as Sales Qualification Agent, the Sales Research Agent, and the Sales Close Agent available in Dynamics 365.

At Microsoft, we’re committed to delivering quality, trust, and transparency with our agents, and that requires rigorous evaluation. As we continue to build new agents and improve existing ones for critical sales workflows, evaluation benchmarks provide a structured and transparent way for our customers to measure quality for the jobs the agent does.

Today, we’re announcing the Microsoft Sales Bench—a new collection of evaluation benchmarks designed to assess the performance of AI-powered sales agents across real-world scenarios. This framework brings together purpose-built metrics, hundreds of sales-specific scenarios, and composite scoring validated by both human and AI judges.

The Sales Bench isn’t starting from scratch. It now formalizes and expands what began with the Sales Research Bench, published on October 21, 2025, which evaluates how AI solutions answer business research questions for sales leaders.

Today, we’re extending the Microsoft Sales Bench with a second benchmark: the Microsoft Sales Qualification Bench, focused on measuring how effectively AI agents qualify leads and generate high-quality pipeline.

Introducing the Sales Qualification Bench for lead qualification

This Microsoft Sales Qualification Bench evolved from rigorous evaluations we conducted since the Sales Qualification Agent’s public preview in April, with the goal of objectively measuring quality as we further developed the agent, partnering with customers from a diverse set of industries. Since the preview, we measured every update against these standards, ensuring improvements are real and repeatable.

We generated a synthetic dataset modeled after companies from three different industries, with 300 leads, with attributes such as name, company, and email ID—representative of what sales teams typically work with before any enrichment or hygiene is performed. In addition to these typical attributes, we also added key knowledge inputs such as value proposition of the products being sold, customer case studies, and documentation for answering customer questions.

In addition to Sales Qualification Agent, we used the evaluation framework to measure ChatGPT by OpenAI on the same dataset. Since we didn’t have access to an autonomous agent from OpenAI, we mimicked how a human seller would use ChatGPT to recreate the three key jobs SQA performs. We provided each system—Sales Qualification Agent and ChatGPT—the exact same lead inputs, knowledge sources, and contextual signals under controlled evaluation configurations. We used a ChatGPT Pro license with GPT-4.1. This model is the closest match (and slightly better) to Sales Qualification Agent’s GPT-4.1 mini, which we intentionally chose to deliver optimal quality at lower cost per lead than newer models. Additionally, Pro license was chosen to optimize for quality: ChatGPT’s pricing page describes Pro as “full access to the best of ChatGPT.”1

The framework evaluates outputs from the three jobs across Sales Qualification Agent and ChatGPT:

  • Research: Company research for the lead—background, strategic priorities, financial health, and latest news.
  • Outreach: A personalized email generated based on research, to make initial contact with the lead.
  • Engagement: The agent’s conversation with a lead until it’s qualified or dispositioned.

Our scoring metrics span core quality (accuracy, relevance, completeness), trustworthiness (grounding and citations), and business-specific success criteria (e.g., relevancy of company research to highlight interest in the seller’s offerings, personalization of the initial outreach emails sent to catch the lead’s attention, accuracy of responses to the lead’s questions to drive purchase intent, and the timing of handoff to a seller when the lead is ready to engage).

Outputs were scored independently by both human reviewers and an LLM judge built with GPT-5.1, using a 1–10 scale for each metric. These metric-specific scores were then rolled up using a simple average to produce a composite quality score. The result is a rigorous benchmark presenting a composite score and dimension-specific scores to reveal where agents excel or need improvement. Our methodology, metrics, and their definitions are described in this technical blog.

Results

In evaluations completed on December 4, 2025, using the Sales Qualification Bench, Sales Qualification Agent outperformed ChatGPT on each of the three jobs required for sales qualification:

  1. Research: The Sales Qualification Agent outperformed ChatGPT with 6% higher aggregate scores, leading on relevancy and completeness in research results that highlighted the lead company’s interest in the seller’s offerings.
  2. Outreach: Sales Qualification Agent demonstrated 20% better results compared to ChatGPT, generating email drafts with accurate personalization and mentions of relevant recent events that will resonate with the lead.
  3. Engagement: Sales Qualification Agent’s email responses to engage a lead over a multi-turn conversation scored 16% higher than ChatGPT’s. SQA generated emails that responded to the lead’s questions with accurate answers that develop their purchase interest and with precise discovery questions that qualify the lead before handing off to a seller.

In addition to performing better on these metrics, Sales Qualification Agent has the ability to run autonomously, which can help significantly reduce the time spent generating pipeline while helping sales teams build better quality pipeline.

Sales Qualification Agent scores well on these three jobs as its optimized for sales-specific scenarios and uses the following techniques to get great results:

  1. It uses agentic Retrieval Augmented Generation (RAG) to relentlessly research each lead, ensuring greater completeness. More on this in the following section.
  2. With knowledge of what the company sells, it can contextualize every workflow to increase relevancy for both the seller and the lead.
  3. It can retrieve organizational knowledge from attached documents and internal repositories like SharePoint with greater precision, boosting accuracy of its responses when engaging with the lead.

The technical blog details which metrics SQA excels at relative to ChatGPT, where it falls short, and why.

Translating evals to real-world impact

Running evals led to major Sales Qualification Agent improvements during its six-month preview. Early results prompted us to try agentic AI design patterns, especially agentic RAG, which improved our company research by allowing iterative web searches and real-time reasoning. They also led us to enhance data coverage by auto-linking existing CRM records to each lead and inferring company names from lead emails. These updates provided sellers with deeper insights, revealing strategic opportunities and risks beyond basic facts.

For instance, when researching leads for a security company, Sales Qualification Agent can link news on recent cyberattacks to increased demand for its software. As highlighted in the technical blog, research synthesized by the agent makes such inferences more consistently than ChatGPT. Enhancing the agent’s research also improved the relevance and personalization of outreach emails, helping agents better engage leads and clarify their ability and intent to purchase before handing them off to sellers.

Sandvik Coromant, a leader in precision cutting tools, partnered with us to pilot Sales Qualification Agent for their Digital Commerce program. After the updates, Pia Cedendahl, Global Sales Manager for Strategic Channels/Partners and Online Sales, noted, “Sales Qualification Agent’s answers became far more on-point to our business—it’s like having a research assistant that already understands what we care about.” Sandvik Coromant saw improved lead conversion and higher engagement from their Digital Account Managers, validating the impact of our evaluation-driven approach. Pia joined Microsoft leaders at the Microsoft Ignite 2025 session, “Accelerate revenue and seller productivity with agentic CRM,” where she shared how the team saved more than 120 hours and $19,000 in just the first three weeks since launching a pilot, and forecasted a 5% increase in revenue with full rollout.

Better insights, more personalization, proven value

Equipped with agentic AI design and backed by data-driven evaluation, customers can confidently use the Sales Qualification Agents so that:

  • Sellers receive comprehensive company overviews, timely news highlights, and actionable recommendations that are consistently delivered with high quality—drawing a clear line from insight to action.
  • Sales leaders can expand their qualified pipeline cost efficiently, with the agent ensuring high lead quality.
  • Prospects benefit from more personalized outreach, enhancing their experience and supporting increased conversion rates.

What’s next

We’ll continue to refine Sales Qualification Agent using agentic design patterns, aiming to make every improvement measurable and meaningful. Stay tuned for the full evaluation results and methodology for the Sales Qualification Bench, which will be published for transparency and reproducibility. We also intend to add more evaluation frameworks and benchmarks to the Microsoft Sales Bench collection including benchmarks that cover future sales agent capabilities.


1ChatGPT pricing page, accessed November 24, 2025

The post Dynamics 365 sets the bar for agentic sales qualification on new benchmark appeared first on Microsoft Dynamics 365 Blog.

Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.

Sales Qualification Agent: How we evaluated and improved AI quality with benchmarks

Sales Qualification Agent: How we evaluated and improved AI quality with benchmarks

This article is contributed. See the original author and article here.

The Sales Qualification Agent (SQA) in Dynamics 365 Sales introduces a new class of autonomous sales AI, one that does far more than assist with drafting or summarization. SQA performs multi-step reasoning, conducts live web research, generates personalized outreach, and engages prospects in multi-turn qualification conversations. These capabilities directly shape pipeline quality, seller productivity, and customer relationships. 

As agentic AI becomes deeply embedded in revenue-critical workflows, trust must be earned through transparent, repeatable, and rigorous evaluation—not anecdotal wins or point demos.

Today, we’re announcing the Microsoft Sales Bench—a collection of evaluation benchmarks designed to assess the performance of AI-powered sales agents across real-world scenarios. Adding to the Sales Research Bench already published as part of this collection to evaluate Sales Research Agent, today we are also publishing the Sales Qualification Bench to evaluate Sales Qualification Agent in Dynamics 365 Sales.

This post presents the detailed evaluation methodology and results for the agent, including a head-to-head comparison against chatGPT using identical data, tasks, and scoring rubrics. These efforts establish the first benchmark purpose-built to measure end-to-end sales agent workflows, from research to outreach to live qualification. 

SQA Architecture  

The Dynamics 365 Sales Qualification Agent (SQA) architecture is designed as an end-to-end, enterprise-grade AI system that autonomously researches leads, synthesizes insights, and generates seller-ready outreach. It combines an intelligence engine powered by large language models with iterative web and enterprise data research, tightly integrated with Dynamics 365 Sales and Microsoft Copilot Studio for orchestration. Built on secure enterprise foundations, the architecture enforces governance, compliance, and data protection while enabling scalable, trustworthy AI-driven sales workflows. 

Evaluation Metrics and Methodology 

To understand how well the Sales Qualification Agent (SQA) performs in real-world sales qualification workflows, we designed the Sales Qualification Bench, a comprehensive evaluation that mirrors how sellers actually research leads, personalize outreach, and engage with prospects. Our goal was straightforward: measure whether SQA can help reps qualify faster, personalize more effectively, and carry higher-quality customer conversations—using the same signals and information they rely on every day. 

To ensure that the evaluations accurately represent real-world conditions, we developed a testbed that closely mirrors the complexity and ambiguity found in contemporary sales environments. This allowed us to evaluate SQA end to end, from autonomous research and reasoning to grounded, actionable research briefs, outreach messages, and multi-turn qualification conversations. 

Evaluation Setup

To ensure real-world fidelity, we constructed a production-like lead evaluation environment that mirrors how SQA operates in Dynamics 365 Sales. 

Lead and Data Corpus 
  • Three synthetic but realistic seller companies (C1) across distinct industries, with unique: 
    • Product offerings 
    • Knowledge sources 
    • Ideal customer profiles 
  • 300+ lead dataset (C2) expanded into a scenario-rich corpus: 
    • Companies across 6 global regions (North America, Europe, Asia, South America, Australia, Africa) 
    • 33 industries 
    • Mixed clarity (well-known brands and long-tail companies) 
    • Structured attributes (name, role, email) 
  • CRM roles represented
    • Sales representatives 
    • Digital specialists 
    • Customer success managers 
    • Each linked to relevant accounts, opportunities, and cases 
  • Company segment coverage
    • Enterprise 
    • Mid-Market 
    • Small Business 
    • Government 
    • Education 
  • 500+ email exchanges simulating real sales interactions: 
    • Technical product questions 
    • Meeting requests 
    • Ambiguous or low-intent inquiries 
Simulated Agent Workflows 

All evaluations reflected real SQA behavior: 

  • Autonomous web-based research 
  • Role-aware outreach generation 
  • Multi-turn qualification conversation handling 
Tasks Evaluated and Evaluation Metrics 
1. Company Research 

For each lead, the agent generates a structured research brief including: 

  • Business overview, strategy and priorities 
  • Financial signals 
  • Recent news relevant to the seller 
Metrics  Definition 
Recency  Measure of how recent time-sensitive insights are relative to the current date (older insights are not as useful for sellers) 
Relevance & Solution Fit   Measure of how well the insights are tied back to sellers’ offerings (relevant insights are more actionable than a regurgitation of facts) and articulate the lead company’s need or interest in then 
Completeness    Measure of how well the insights capture all the facts that are useful to a seller 
Reliability  Measure of how consistently the agent finds useful insights for the seller (e.g., strategic priorities return current strategic priorities and not generic mission statements, news returns news articles and not generic evergreen statements about a company)  
Credibility  Measure of how reputable the sources referenced by the agent are  
2. Lead Outreach 

Based on its research, the agent generated a personalized email aligned to: 

  • The lead’s role 
  • The seller’s value proposition 
  • The company’s business context 
  • Value-based positioning 
     
Metric  Definition 
Clarity  Assesses how clear, precise, and jargon-free the message is, ensuring every sentence adds value. 
Personalization  Measures how well the email is tailored to the specific target company, using concrete company-level details rather than generic industry language. 
News-anchored opening  Checks whether the email references recent company events or updates, ensuring the outreach feels timely and current. 
Relevance and Solution Fit  Measure of how well the insights are tied back to sellers’ offerings/solutions (relevant insights are more actionable than a regurgitation of facts), and articulate the lead company’s need or interest in them
Structure  Evaluates whether the email has a clear logical flow from opening hook to problem, solution, and call to action. 
3. Qualification Conversations (Engage) 

The agent then autonomously engages back and forth with the lead, progressively asking them questions for customer-configured qualification criteria such as budget, need, and timeline and answering the lead’s questions such as: 

  • “What does your solution do?” 
  • “How are you priced?” 
  • “How do you compare to competitors?” 
  • “Who else uses this?” 
Metric  Definition 
Answer Quality  Assesses whether the agent provides clear, relevant, and complete answers that directly address the customer’s intent. 
Agent Comprehension  Evaluates how well the agent understands customer intent, prioritizes requests, and adapts tone and strategy based on the user’s response. 
Answer Readability  Checks that responses are natural, professional, easy to read, and fully compliant with formatting and content rules
Human handoff accuracy  Ensures the agent correctly flags when human intervention is required, such as for unanswered technical questions, legal/billing requests, meeting requests, or explicit requests for a human. 
Discovery question coverage  Measures how effectively the agent qualifies leads using indirect, strategic discovery questions across Need, Budget, Authority, and Timeline

Each metric is scored independently on a 0–10 scale, where higher scores indicate stronger performance. We used an LLM-as-a-judge approach to score outputs against the ground truth and rubric and manually reviewed a sampled subset of evaluations to calibrate the judges and validate scoring consistency. To reduce judge variance and mitigate hallucination risk, each sample was evaluated five times, and the mean across runs was recorded as the final score. 

Benchmarking Strategy with ChatGPT 

To ensure an objective and fair comparison, we replicated a standard seller workflow in ChatGPT UI using GPT-4.1 with Pro license, a more advanced model than the GPT-4.1-mini variant currently used by SQA. 

Standard Prompting 

This setup simulates how a seller naturally interacts with a general-purpose LLM: 

  • High-level contextual instructions only 
  • Mirrors SQA’s autonomous research-to-outreach flow 

This ensures: 

  • Workflows remain representative and unbiased 
  • Comparisons reflect real-world usability, not prompt-engineering skill 
Identical Knowledge Sources and Context 

ChatGPT was given the exact same knowledge sources as SQA, including: 

  • Full lead information and seller value proposition 
  • Seller Q&A documentation via the SharePoint connector 
  • Historical conversation context for reply generation 

This isolates differences in agent reasoning and orchestration, not data access. 

Evaluation Results  

Microsoft evaluated the Sales Qualification Agent (SQA) and ChatGPT with over 300 leads, covering research, outreach, and qualification tasks with identical knowledge sources. Evaluations completed on December 4, 2025, showed that SQA consistently outperformed ChatGPT-4.

  • Research: SQA was 6% more effective at relevant, thorough company research. 
  • Outreach: SQA was 20% better at personalized communication and timely event references. 
  • Engagement: SQA scored 16% higher for precise responses and targeted qualifying questions. 

SQA also operates autonomously, reducing overhead and boosting pipeline quality for sales teams. 

Results by Task Category 

1. Company Research 

SQA was 6% better than ChatGPT, winning in its ability to perform more relevant and complete research that highlighted the lead company’s interest in the sellers offerings: 

  • SQA provided more relevant results: To ensure sellers spend their time on the most important leads, they need to determine whether a lead is good fit for their offerings. While both SQA and ChatGPT were given the same context (seller company and value proposition of the offerings), SQA consistently did better at tying its research back to this context, helping sellers determine fit. Appendix A shows an example where SQA was able to tie the company’s strategic priorities to its need for a collaboration platform and infer strong purchase ability from its robust operational health and minimal leverage burden.
  • SQA synthesized results with higher level of fidelity and completeness: The agent’s value is directly correlated to its ability to eliminate tedious work for the seller. SQA produced more detailed research synthesis (as demonstrated in Appendix A), giving a single, trusted source for the seller to get equipped with any insights they may need.  

These results stem from numerous experiments aimed at optimizing web research for the best outcomes at minimal cost, rather than relying on costly advanced models. Sellers get deeper insights with SQA’s agentic RAG for real-time reasoning with iterative web search results, combined with unique capabilities that increase data coverage, for example, auto-linking CRM records and extraction of company name from lead emails. 

2. Personalized Outreach 

SQA was 20% better than ChatGPT, notably ahead in the level of personalization and mentions of relevant recent events that will resonate with the lead. 

  • More personalized and customer-centricity: A lead is more likely to respond to a cold outreach email that directly explains how the seller’s offering can address their needs. SQA did so effectively by starting with the lead’s situation and recent events, while ChatGPT often focused on the seller and uses heavier technical jargon. A clear, actionable call to action bookends the email and guides the conversation forward. Appendix B shows an example of how SQA was able to tie a recent acquisition the lead’s company made to the value proposition of the seller’s offering. 

These results are based on direct engagement with sellers – every sales team that deploys SQA gives us precious feedback that all other customers benefit from.   

3. Qualification Conversations (Engage) 

SQA was 16% better than ChatGPT. It responded with greater precision to the lead’s questions to develop purchase interest and asked pointed discovery questions to better qualify the lead before handing off to a seller. 

  • Answers accurately by correctly understanding the lead’s intent and maintaining conversation context effectively. To drive deeper buyer consideration, SQA independently answered even the most technical questions that leads had about the seller’s offerings while maintaining the context from earlier messages in the simulated conversation, delivering clear, direct, and well-structured responses. Appendix C demonstrates SQA’s ability to pull the most relevant information from provided knowledge sources (in this case, files with technical specifications) during an ongoing conversation with a lead. 
  • Handles uncertainty responsibly, handing off to a supervisor/seller when appropriate. Both SQA and ChatGPT were instructed to handoff a lead to a supervising seller when a suitable response cannot be generated or when the lead is considered qualified as per pre-defined criteria. SQA handed off accurately and at the right moment in more tests than ChatGPT.  
  • Demonstrates strong discovery coverage. To maximize the value exchange from each follow-up conversation with the lead, SQA and ChatGPT were instructed to include discovery questions in their response to assess pre-configured qualification criteria (covering lead’s need, budget, buying authority and purchase timeline). SQA was able to ask pointed discovery questions to cover more of these criteria than ChatGPT in our simulated conversations. This resulted in SQA identifying and handing off better qualified leads through its engagement.

These gains are attributable to the hard lessons we have learnt through close collaboration with customers to understand the diversity of needs on intent detection and knowledge retrieval across AI agents for Dynamics 365 Sales, Service and Contact Center applications.  

Conclusion 

Sales Qualification Agent (SQA) is not a simple productivity tool—it is a complex multi-step agent directly influencing revenue outcomes. The Sales Qualification Bench represents a foundational step toward enterprise-grade trust, transparency, and continuous quality improvement for agentic AI in sales. 

Through rigorous, scenario-driven benchmarking, SQA demonstrated clear leadership across research, outreach, and qualification—while also surfacing concrete architectural and rule-quality investments for the next phase of growth. 

Upcoming plans for the Sales Qualification Bench include using the benchmark for continuous improvement of the Sales Qualification Agent, running further comparisons against a wider range of competitive offerings, and publishing the eval package so customers can reproduce it on their own data. Evaluation is not a one-time event. It is now embedded as a first-class engineering discipline in the SQA development lifecycle—ensuring the agent evolves safely, measurably, and in lockstep with seller needs. 

Appendix

Sample inputs and responses for each category  

Appendix A. Sample of research synthesis generated by SQA
Task 
Find and summarize business overview, strategic priorities, financial summary, and recent news. C1: Microsoft Teams C2: Agência Nacional de Águas e Saneamento Básico 
Research Response 
Overview Agência Nacional de Águas e Saneamento Básico (ANA) is a Brazilian federal regulatory agency in the public sector, focused on water resources management and basic sanitation. It operates nationwide with headquarters linked to the Ministry of Integration and Development. ANA is responsible for regulating the use of water resources, managing the security of dams, overseeing public irrigation, and establishing national standards for basic sanitation services. The agency serves primarily government entities and industry—a B2B context, as it sets legislation and frameworks for critical infrastructure projects. Its mission centers on sustainable water resource management for development and environmental security. This background is highly relevant to Microsoft Teams’ value proposition, as ANA’s nationwide role and collaboration among stakeholders (internal teams, other agencies, private partners) demand robust, secure digital communication and collaboration platforms like Teams, which would support ANA’s operations, regulatory projects, and document governance requirements.  

Strategic priorities ANA’s current strategic priorities include strengthening climate resilience, advancing national water governance, and digitalizing monitoring infrastructure to improve the management of Brazil’s water resources. The agency has prioritized participation in international programs, digital transformation in water resource management, and enhancing collaboration tools for more efficient regulatory processes. It seeks integrative technology for better data sharing, persistent project documentation, and cross-functional coordination among regulatory, legal, and operational teams. These needs align closely with Microsoft Teams’ capabilities: integrated communication, persistent chat channels, seamless meetings, and data governance tools are well-suited to ANA’s initiatives and pain points around digital collaboration, transparency, and security in government operations.  

Financial summary Agência Nacional de Águas e Saneamento Básico (ANA) demonstrates stable financial health as a federally funded autarchy, with annual revenues sourced exclusively from government allocations. Its updated budget stands at R$458.63 million, with R$416.56 million in current revenues and R$277.8 million in expenses paid. The agency has executed approximately 89% of its forecasted budget with minimal reliance on outside debts or capital markets, indicating low financial risk and cost-effective operations. For a seller like Microsoft Teams, ANA shows consistent funding and robust operational execution, supporting strong purchasing capacity for communication and collaboration products and services.  

News 
Agência Nacional de Águas e Saneamento Básico (ANA) launches new Digital Transformation Plan for 2025–2027: Published on 16/09/2025, ANA’s new Digital Transformation Plan prioritizes digital advancement for the next three years. This aligns with the value proposition of Microsoft Teams, as modernization initiatives may involve upgrading communication and collaboration infrastructure, making ANA potentially receptive to integrated solutions like Teams. 

ANA publishes revised Information and Communication Technology Master Plan (PDTIC) for 2024–2026: On 24 February 2025, ANA approved updates to its Master Plan for IT and Communication, emphasizing ongoing activities by its IT department. For Microsoft Teams, this is relevant as the procurement and deployment of communication platforms may form part of the plan’s priorities and budget.  

ANA releases 2025–2026 regulatory agenda with focus on digital processes and operational standards in water and sanitation: The new regulatory agenda, published 12/12/2024, includes digital governance, public-private partnerships, and operational standards for water and sanitization. Growing focus on digital solutions to support these agendas could signal interest in platforms that enable cross-team alignment, document management, and virtual collaboration, matching Microsoft Teams capabilities.  

Appendix B. Sample of an initial outreach email generated by SQA 

Task 
Write a cold sales outreach email to a lead. Don’t include a sign-off or signature in the email body. 
Lead ID: `d3f74f84-7e2b-4c16-9c37-919c2b8c2c6e` 
Responses 
Hi Akira, 

I noticed the Japan Meteorological Agency’s recent procurement notice for a cloud-based meteorological communication system, highlighting your focus on secure and efficient collaboration.  

Microsoft Teams can support JMA’s mission by consolidating chat, video calls, and file sharing into a single platform, enabling seamless cross-team collaboration on forecasts and critical data. This integrated approach aligns with your strategic priorities of disaster prevention and timely information sharing across multiple locations.  

Would you be available for a quick call in the next few days to discuss further?  
 

Appendix C. Sample of an engagement email generated by SQA 

Task 
Last Customer Reply “We’re constantly battling fragmented conversations—Slack channels get out of control, and switching to Teams for meetings is just more friction. Documents get lost between email threads and random cloud folders. Sales and marketing are rarely on the same page, especially when we’re chasing deals. Honestly, half the team just ends up texting when things get urgent. If you can solve that, I’m listening”  
Response 
Hi Liam, 

Thanks for sharing the specific challenges your team faces with fragmented conversations, switching between Slack and Teams, and document management. 

Microsoft Teams offers several features that directly address these issues: 
+ Unified chat and channels keep conversations organized by project or department, reducing fragmentation. 
+ Integrated meetings and calling allow seamless transitions from chat to video, minimizing friction between platforms. 
+ Secure file sharing and real-time co-authoring help prevent documents from getting lost across email threads and cloud folders. 
+ Deep integration with Microsoft 365 apps ensures sales and marketing teams stay aligned, with shared access to files and collaborative tools. 
+ External collaboration features allow you to work securely with guests and partners without switching accounts.  

To help tailor recommendations, could you share what budget range you have considered for improving your communication tools? Let me know if you’d like more details or have other questions about streamlining collaboration at CSU. 

The post Sales Qualification Agent: How we evaluated and improved AI quality with benchmarks appeared first on Microsoft Dynamics 365 Blog.

Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.