Technology Archives - Page 2 of 1290 - Dr. Ware Technology Services

Announcing End of Support for Dynamics 365 Project Service Automation (PSA) on the U.S. Government cloud

by Contributed | Feb 13, 2026 | Dynamics 365, Microsoft 365, Technology

This article is contributed. See the original author and article here.

On March 19, 2024, we announced that support for Dynamics 365 Project Service Automation on the commercial cloud will end on March 31, 2025. As planned, end of support went into effect on that date.

Today we are announcing the end of support for Dynamics 365 Project Service Automation on the U.S. Government cloud (GCC) beginning March 31, 2027. Dynamics 365 Project Operations has been available in U.S. Government cloud (GCC) since December 2025.

Beginning March 31, 2027, Microsoft will no longer support PSA on GCC environments. There will not be any feature enhancements, updates, bug fixes, or other updates to this offering. Any support ticket logged for the PSA application on GCC will be closed with instructions to upgrade to Project Operations.   

We strongly encourage all PSA customers on GCC environments to start planning your upgrade process as soon as possible so you can take advantage of many new Project Operations features such as:  

Integration with Planner capabilities on Dataverse with many new advanced scheduling features

Project Budgeting and Time-phased forecasting

Date Effective price overrides

Revision and Activation on Quotes

Material usage recording in projects and tasks

Subcontract Management

Advances and Retained-based contracts

Contract not-to-exceed

Task and Progress based billing

Multi-Customer contracts

AI and Copilot based experiences.

 For Project Service Automation customers on GCC High or DoD, we will have a future announcement regarding the availability of Dynamics 365 Project Operations.

Upgrade documentation and FAQ links 

Upgrade from Project Service Automation to Project Operations | Microsoft Learn

Project Service Automation end of life FAQ | Microsoft Learn

Feature changes from Project Service Automation to Project Operations | Microsoft Learn

Project Service Automation to Project Operations project scheduling conversion process | Microsoft Learn

Learn more about Dynamics 365 Project Operations 

Project Operations was first released in October 2020 as a comprehensive product to manage Projects from inception to close by bringing together the strengths of Dataverse, Microsoft Dynamics 365 Finance, Microsoft Dynamics 365 Supply Chain Management, and Microsoft Planner.

Want to learn more about Project Operations? Check this link and navigate to our detailed documentation!  

Want to try Project Operations? Click here and sign up for a 30-day trial!  

The post Announcing End of Support for Dynamics 365 Project Service Automation (PSA) on the U.S. Government cloud appeared first on Microsoft Dynamics 365 Blog.

Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.

The ultimate Microsoft 365 community event returns—your front‑row seat to the future of intelligent work

by Contributed | Feb 5, 2026 | AI, Business, Copilot for Work, Microsoft 365, Technology

This article is contributed. See the original author and article here.

This event is your front-row seat to everything new and next across Microsoft 365—with hundreds of opportunities to learn directly from product makers and connect with the best community in tech.

The post The ultimate Microsoft 365 community event returns—your front‑row seat to the future of intelligent work appeared first on Microsoft 365 Blog.

Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.

General Availability of Quality Evaluation Agent’s conversation capabilities

by Contributed | Feb 5, 2026 | Dynamics 365, Microsoft 365, Technology

This article is contributed. See the original author and article here.

Quality Evaluation Agent in Dynamics 365 Customer Service and Dynamics 365 Contact Center is an AI-led evaluation framework that empowers teams to deliver consistent, scalable quality oversight and automate quality evaluations across customer interactions. 

Beginning February 6, QEA conversation capabilities become generally available, joining case evaluation as a GA feature, as previously announced in this blog. This milestone expands QEA’s coverage and impact across customer support scenarios.

Looking Forward:

QEA continues to evolve with key upcoming enhancements across the evaluation framework. This includes multilanguage support, criteria versioning, the ability to flag critical questions, simulation capabilities, knowledge source adherence, and more.

Get started today by enabling QEA in your Dynamics 365 Customer Service and Dynamics 365 Contact Center environment.  

Learn more 

Watch a quick  video introduction. 

For configuration steps, feature updates, and best practices, see  Manage Quality Evaluation Agent | Microsoft Learn 

The post General Availability of Quality Evaluation Agent’s conversation capabilities appeared first on Microsoft Dynamics 365 Blog.

Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.

Evaluating AI Agents in Contact Centers: Introducing the Multi-modal Agents Score

by Contributed | Feb 4, 2026 | Dynamics 365, Microsoft 365, Technology

This article is contributed. See the original author and article here.

As self-service becomes the first stop in contact centers, AI agents now define the frontline customer experience. Modern customer interactions span voice, text, and visual channels, where meaning is shaped not only by what is said, but by how it’s said, when it’s said, and the context surrounding it.

In customer service, this is even more pronounced-customers reaching out for support don’t just convey information. They convey intent, sentiment, urgency, and emotion, often simultaneously across modalities; a pause or interruption on a voice call signals frustration, blurred document image leads to downstream reasoning failures, and flat or fragmented response erodes trust-even if the answer is correct In our previous blog post, we reflected on the evolution of contact centers from scripted interactions to AI-driven experiences. As contact center landscape continues to change, the way we evaluate AI agents must change with them. Traditional approaches fall short by focusing on isolated metrics or single modalities, rather than the end-to-end customer experience.

Contact centers struggle to reliably assess whether their AI agents are improving over time or across architectures, channels, and deployments. While cloud services rely on absolute measures like availability, reliability and latency, AI agent evaluation today remains fragmented, relative, and modality specific. What would be useful is an absolute, normalized measure of end-to-end conversational quality- one that reflects how customers actually experience interactions and answers the fundamental question: Is this agent good at handling real customer conversations?

Introducing the Multimodal Agent Score (MAS)

MAS is built on the observation that every service interaction- whether human-to-human or human-to-agent- naturally progresses through three fundamental stages: (explored in more detail here: Measuring What Matters: Redefining Excellence for AI Agents in the Contact Center )

Understanding the input – accurately capturing and interpreting what the customer is saying, including intent, context, and signals such as urgency or emotion.

Reasoning over that input – determining the appropriate actions, managing context across turns, and deciding how to resolve the issue responsibly.

Responding effectively – delivering clear, natural, and confident resolution in the right tone and format.

Multimodal Agent Score directly mirrors these stages. It is a weighted composite score (0-100) designed to assess end-to-end AI agent quality across modalities- voice, text, and visual- aligned to how real conversations naturally unfold.

MAS Dimensions and Parameters

Conversation Stage	MAS Quality Dimension	What It Measures	Example Parameters
Understanding	Agent Understanding Quality	how well the agent hears and understands the user (e.g., latency, interruptions, speech recognition accuracy)	Intent-determination, Interruption, missed window
Reasoning	Agent Reasoning Quality	how well the agent interprets intent and resolves the user’s request	Intent-resolution, acknowledgement
Response	Agent Response Quality	how well the agent responds, including tone, sentiment, and expressiveness	CSAT, Tone stability

Computing each MAS score:

MAS is computed as a weighted aggregation of three quality dimensions stated in the table above.

where:

Qj represents one of the three quality dimensions: Agent Understanding Quality (AUQ), Agent Reasoning Quality (ARQ), Agent Response Quality (AReQ)

wj represent the costs or weights of each dimension

αj captures the a priori probability of the respective dimension

Computing each MAS dimension:

Computing each MAS dimension (AUQ, ARQ, AReQ) involves aggregating underlying parameters into a single weighted score. Raw measurements (such as interruption, intent determination, or tone stability) are first normalized into a 0–1 score before aggregating them at the dimension level. We apply a linear normalization function clipping each raw measurement at predefined thresholds suitable for the parameter being measured (for example, maximum allowed interruption or minimum required accuracy). This maintains the sensitivity of each parameter in the relevant effective range and avoids the negative impact of measurement outliers, making MAS an absolute measure of agent quality.

MAS in Practice: Voice Agent Evaluation Example

To ground MAS in real-world conditions, we evaluated ~2,000 synthetic voice conversations across two agent configurations using identical prompts and scenarios:

Agent-1: Chained voice agent using a three-stage ASR–LLM–TTS pipeline

Agent-2: Real-time voice agent using direct speech-to-speech architecture

The evaluation dataset included noise, interruptions, accessibility effects, and vocal variability to simulate production environments.

Shown below is a comparison of core MAS metrics, including dimension-level scores and the overall MAS score.

Voice Evaluation Results (Excerpt)

Dimension	Parameters	Agent-1	Agent-2
AUQ	Interruption Rate (%)	0.045	0.025
AUQ	Missed Response Windows	0.00045	0.0015
ARQ	Intent Resolution	0.13	0.08
ARQ	Acknowledgement Quality	0.08	0.10
AReQ	CSAT	0.128	0.126
AReQ	Tone stability	0.16	0.14

Key Observations

MAS provides flexibility to surface quality insights at an aggregate level, while enabling deeper analysis at the individual parameter level. To better understand performance outliers and anomalous behaviors, we went beyond composite scores and analyzed agent quality at the individual parameter level. This deeper inspection allowed us to attribute observed degradations to specific factors: Example:

Channel quality matters: Communication channels introduce multiple challenge such as latency, interruptions, compression and loss of information, penalizing recognition and response quality.

Turn-taking quality is critical: Missed windows and interruptions strongly correlate with abandonment.

Tone and coherence matter: Cleaner audio and uninterrupted responses lead to higher acknowledgement and perceived empathy.

MAS reveals root causes: Differences in scores clearly distinguish understanding, reasoning, and response failures-something single metrics cannot do.

Looking Forward

We will continue to refine and evolve MAS as we validate it against real-world deployments and business outcomes. As the Dynamics 365 Contact Center team, we aim to establish MAS as our quality benchmark for evaluating AI agents across channels. Over time, we also intend to make MAS broadly available, extensible, and pluggable, enabling organizations to adapt it, to evaluate their contact center agents across modalities. For readers interested in the underlying methodology and mathematical foundations, a detailed research paper will be published separately.

The post Evaluating AI Agents in Contact Centers: Introducing the Multi-modal Agents Score appeared first on Microsoft Dynamics 365 Blog.

Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.

Measuring What Matters: Redefining Excellence for AI Agents in the Contact Center

by Contributed | Feb 4, 2026 | Dynamics 365, Microsoft 365, Technology

This article is contributed. See the original author and article here.

The contact center industry is at an inflection point. AI agent performance measurement is becoming essential as contact centers shift toward autonomous resolution. Gartner predicts that by 2029, AI agents will autonomously resolve 80% of common customer service issues. Yet, despite massive investment in conversational AI, most organizations lack a coherent way to measure whether their AI agents are good. Traditional metrics like AHT, CSAT, and others are important to track business results. However, they are trailing signals and don’t tell you whether an AI agent is competent, reliable, or most importantly improving.

This isn’t just a technical problem. It’s a business problem. Without rigorous measurement, companies can’t improve their agents, can’t demonstrate ROI, and can’t confidently deploy AI to handle their most valuable customer interactions.

What Makes a Great Customer Service Agent?

In 2017, Harvard Business Review published research that challenged everything the industry believed about customer service excellence. The study, based on data from over 1,400 service representatives and 100,000 customers worldwide, revealed a truth which goes against many support manuals. Customers don’t want to be pampered during support interactions. They just want their problems solved with minimal effort and maximum speed. This research also highlights why strong AI agent performance measurement is required to benchmark these behavioral models.

The research team identified seven distinct personality profiles among customer service representatives. Two profiles stand out as particularly instructive for understanding AI agent design:

Empathizers are agents most managers would prefer to hire. They are natural listeners who prioritize emotional connection. They validate customer feelings, express genuine concern, and focus on making customers feel heard. When a frustrated customer calls about a billing error, an Empathizer responds with warmth: “I completely understand how frustrating that must be. Let me look into this for you and make sure we get it sorted out.” Empathizers excel at building rapport and defusing tension. Managers love them, 42% of surveyed managers said they’d preferentially hire this profile.

Controllers take a fundamentally different approach. They’re direct, confident problem-solvers who take charge of interactions. Rather than asking customers what they’d like to do, Controllers tell them what they should do. When that same frustrated customer calls about a billing error, a Controller responds differently. “I see the problem. There’s a duplicate charge from October 15th. I’m removing it now and crediting your account. You’ll see the adjustment within 24 hours. Is there anything else I can help you fix today? ” Controllers are decisive, prescriptive, and focused on the fastest path to resolution.

Here’s what the HBR research revealed: Controllers dramatically outperform Empathizers on virtually every quality metric that matters: customer satisfaction, first-contact resolution, and especially customer effort scores. Yet only 2% of managers said they’d preferentially hire Controllers. This does not eliminate the need for empathetic agents but clarifies that empathy is necessary but not enough.

This insight becomes even more important when we consider the context of modern customer service. Nearly a decade of investment in self-service technology means that by the time a customer reaches a human or an AI agent, they’ve already tried to solve the problem themselves. They’ve searched for the FAQ, attempted the chatbot, maybe even watched a YouTube tutorial. They’re not calling because they want to chat. They’re calling because they’re stuck, frustrated, and need someone to take charge and fix their problem.

The HBR research quantified this: 96% of customers who have low-effort service experience intend to re-purchase from that company, directly translating into higher retention and recurring revenue. For high-effort experiences, that number drops to just 9%. Customer effort is four times more predictive of disloyalty than customer satisfaction.

The AI Advantage: Dynamic Persona Adaptation

Human agents are who they are. An Empathizer can learn Controller techniques, but their natural instincts will always pull toward emotional validation. A Controller can practice active listening, but they’ll always be most comfortable cutting the chase. Training can shift behavior at the margins, but a fundamental personality is remarkably stable.

AI agents can learn from the best human agents and adapt their style in real time based on conversation context. A well-designed agent can operate in Controller mode for straightforward technical issues- direct and prescriptive-and shift to Empathizer mode when a customer shares difficult news. It adapts mid-conversation based on sentiment, issue complexity, and customer preferences.

This isn’t about mimicking personality types. It’s about dynamically deploying the right approach for each moment of each interaction. The best AI agents don’t choose between being helpful and being efficient. They recognize that true helpfulness often means being efficient. They adapt their communication style to what each customer needs in each moment.

But this flexibility adds to the fundamental measurement challenges for both human and AI agents’ evaluation. There is no single “best” conversation. All interactions are highly dynamic with no fixed reference for comparison, and the most important business metrics are trailing and hard to attribute at the conversation or agent level. As a result, no single metric can capture this complexity. We need a framework that evaluates agent capabilities across contexts.

Defining Excellence: What the Best AI Agents Achieve

Before introducing a measurement framework, let’s establish benchmarks that framework, let’s establish benchmarks that define world-class performance.

First-Contact Resolution (FCR) measures whether the customer’s issue was fully resolved without requiring a callback, transfer, or follow-up. Industry average sits around 70-75%. This matters because FCR correlates directly with customer satisfaction: centers with high FCR see 30% higher satisfaction scores than those struggling with repeat contacts.

Customer Satisfaction (CSAT) captures how customers feel about their interaction. The industry average, measured via post-call surveys, hovers around 78%. World-class performance means 85% or higher. Top performers in 2025 are pushing toward 90%.

Response Latency is particularly critical for voice AI. Human conversation has a natural rhythm, roughly 500 milliseconds between when one person stops speaking, and another responds. AI agents that exceed this threshold feel unnatural. Research shows that customers hang up 40% more frequently when voice agents take longer than one second to respond. The target for production voice AI is 800 milliseconds or less, with leading implementations achieving sub-500ms latency.

Average Handle Time (AHT) varies significantly by industry. Financial services averages 6-8 minutes, healthcare 8-12 minutes, technical support 12-18 minutes. The key insight is that AHT should be minimized without sacrificing resolution quality. Fast and wrong is worse than slow and right, but fast and right is the goal.

These benchmarks provide targets, but they are trailing signals and don’t tell us how to build agents that achieve them. For that, we need to understand the three pillars of agent quality.

The Three Pillars: Understand, Reason, Respond

Every customer interaction, whether with a human or an AI, follows the same fundamental structure. The agent must understand what the customer is saying, reason about how to help, and deliver an effective answer. The key is that any weakness in any pillar undermines the entire interaction. LLM benchmarks are fragmented and do not provide a holistic and focused view into contact center scenarios.

Pillar One: Understand

The first challenge is accurately capturing and interpreting customer input. For voice agents, this means speech recognition that works in real-world conditions of background noise, accents, interruptions, domain-specific terminology. For video or images, it means visual understanding that handles varying noise, object occlusion, and context-dependent interpretation. Classic benchmarks are misleading here. Models achieving 95% accuracy on clean test data often fall to 70% or below in production environments with crying babies, barking dogs, and customers calling from their cars. Additionally, interruptions and system latency are key challenges that impact understanding score quality.

Beyond transcription, understanding requires intent determination. When a customer says, “I’m calling about my order. I think it was delivered to the wrong address,” the agent needs to identify both the topic (order delivery) and the specific issue (wrong address). The measure needs to detect that this is a complaint requiring resolution, not just an informational query. And ideally, it should pick up on emotional cues: frustration, urgency, confusion, all that should influence how it responds.

Key metrics for this pillar include word error rate for transcription accuracy, intent recognition precision and recall, and latency from when the customer stops speaking to when the agent begins responding. Interruption rates also matter. Agents that talk over customers while they’re still speaking destroy the conversational experience.

Pillar Two: Reason

Understanding what the customer said is only the beginning. The agent must then determine the right course of action. This is where “intelligence” in artificial intelligence matters.

Effective reasoning means connecting customer intent to appropriate actions. If the customer needs their address changed, the agent should access the order management system, verify customer identity, make the change, and confirm success. If the issue is more complex (say, the package was marked delivered but never arrived), the agent needs to pull tracking information, assess whether this looks like miss-delivery, determine whether a replacement or refund is appropriate, and potentially flag the case for investigation.

This pillar also encompasses multi-turn context management. Customers don’t speak in complete, self-contained utterances. They reference previous statements, use pronouns, and assume the agent is tracking the conversation. “What about my other order?” only makes sense if the agent remembers discussing a first order. “Can you do that for my husband’s account too?” requires understanding what “that” refers to and what permissions are appropriate.

Perhaps most critically, reasoning quality includes knowing what the agent doesn’t know. A well-designed agent admits uncertainty rather than fabricating answers. This is particularly challenging in the LLM where models are trained to produce answers no matter what. There are two parts to that problem, one the agent should reason and ask for additional data. In truly autonomous agents such interactions should go beyond slot filling or interview. It needs to be dynamic, adaptive, and contextual. When the agent feels stuck, it should admit that and either ask for help from supervisor or simply escalate. In any case, responsible AI guardrails and validations are key to ensuring proper agent responses and guarded interactions.

Key metrics include intent resolution rate, task completion rate, context retention across turns, and hallucination frequency.

Pillar Three: Respond

The final pillar is delivering the response effectively. Even perfect understanding and flawless reasoning mean nothing if the agent can’t communicate the resolution clearly.

Answer quality encompasses both content and delivery. The content must be accurate, complete, and actionable. Customers shouldn’t need to ask follow-up questions because the agent omitted critical information. They shouldn’t be confused by jargon or ambiguous phrasing.

In a multi-channel, multi-modal agent world, AI agents must adapt how they deliver responses based on the channel and context. Effective delivery is about aligning the form, timing, and tone of responses to the interaction at hand. Emotional Quotient matters regardless of modality. When the tone, voice or interaction feels mechanical, even correct content can lose its impact and undermine trust across channels, the objective remains consistent: ensure responses feel natural, clear, and trustworthy from the customer’s perspective.

The Controller research is relevant here. The best responses are often more direct than traditional customer service training suggests. Instead of “I’d be happy to help you with that. Let me take a look at your account and see what options might be available for addressing this situation,” top performers say “I see the problem. Here’s what I’m doing to fix it.”

Key metrics include solution accuracy, response completeness, fluency ratings, and post-response customer sentiment. For voice, prosody and expressiveness scores capture delivery quality.

To build AI agents that customers truly trust, organizations must move beyond fragmented metrics and isolated KPIs. Excellence in customer service is not the result of a single capability. It emerges from how well an agent performs across the three pillars. These pillars form the foundation of modern AI agent performance measurement.

A Composite Score as Unified Measure

We believe the future of AI agent evaluation lies in a composite approach, the one that brings together these core capabilities into a unified measure of quality. However, no single metric can tell you whether an AI agent truly works well with real customers. Individual measures tend to over-optimize narrow behaviors while hiding the trade-offs between speed, accuracy, reasoning quality, and customer experience.

A composite score solves this problem by balancing multiple dimensions into one holistic view of agent performance. This approach reveals strengths and weaknesses at the system level rather than through isolated signals. Most importantly, a unified score enables consistent benchmarking and clearer progress tracking. It gives both executives and practitioners a metric they can confidently use to drive improvement.

We are introducing a contact center evaluation guideline and a set of metrics designed to holistically assess AI agent performance across the dimensions that matter most in real customer interactions. Rather than optimizing isolated signals, this approach evaluates how effectively an agent understands customer intent, reasons through the problem space, and delivers clear, confident, and timely resolutions.

These guidelines are intended to provide a practical foundation for teams building, deploying, and scaling AI agents in production. They enable consistent measurement, meaningful comparison, and continuous improvement over time.

This framework is intended to be open and evaluable by anyone. For a deeper dive into the evaluation framework, recommended metrics, and examples of how this can be applied in practice, please refer to the detailed blog: Evaluating AI Agents in Contact Centers: Introducing the Multi-modal Agents Score

The post Measuring What Matters: Redefining Excellence for AI Agents in the Contact Center appeared first on Microsoft Dynamics 365 Blog.

Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.

The Key Facets of AI Evaluation in the Contact Center

by Contributed | Feb 4, 2026 | Dynamics 365, Microsoft 365, Technology

This article is contributed. See the original author and article here.

As AI becomes central to contact center operations–powering every customer engagement channel–evaluation is no longer a back-office technical exercise. Evaluation is a critical business capability directly impacting customer experience, operational effectiveness, and business outcome

But the evaluation is not one-dimensional. Organizations must think about when, how, by whom, and on what data AI systems are evaluated. This blog explores the key facets of AI evaluation and how they apply specifically to contact center environments.

1: Development-Stage vs. Production-Stage Evaluation

Definition: Evaluation at development time refers to tests and assessments performed during the AI software creation phase. Production time evaluation occurs after deployment, when the AI software is live and serving real users.

Implications: Development-time evaluation provides a controlled environment to identify and address issues before AI reaches customers. This helps reduce risk and prevent costly failures. However, it cannot fully capture real-world complexity. Production-time evaluation reflects actual customer behavior and operating conditions. While it offers critical insight into true performance and experience, manage it carefully to avoid customer impact when issues surface.

How contact centers should think about this (example):

Use development-time evaluation to prevent issues. These include incorrect intent detection, poor prompt behavior, broken escalation flows, non-compliant responses, or unacceptable latency before they ever reach customers.

Use production-time evaluation to detect and measure real customer impact, like drops in containment, rising transfers to human agents, customer frustration, regional or channel-specific issues, and performance degradation caused by real traffic patterns.

2: Manual vs. Automated Evaluation

Definition: Manual execution involves running evaluation tasks at human command. Automated evaluation is run at predetermined time or based on triggers. 

Implications: Manual evaluation brings human judgment, context, and nuance that automation alone cannot capture, making it especially valuable when evaluation needs are unpredictable, environmental changes are not captured by automated triggers, or when automated runs would be too costly. Automated evaluation complements this, providing consistent, scalable coverage as AI systems evolve, reliably reevaulating systems after known changes, such as releases or configuration updates through CI/CD pipelines.

How contact centers should think about this:

Use automation for baseline quality, regression, and continuous monitoring

Use manual evaluation for exceptions, deep dives, and human judgment

The best strategy combines both. Automation ensures assessment of critical changes, while human evaluators interpret results, investigate anomalies, and adapt to unexpected conditions.

3: Evaluations Run by the Platform vs. by Customers

Definition: Evaluations can be conducted internally by the developer organization (such as Microsoft) or externally by customers using the software in their own environments. 

Implications: Developer-run and customer-run evaluations each provide distinct and necessary value. Internal evaluations establish a consistent baseline for quality, safety, and compliance. Customer-led evaluations surface real-world behaviors, operational constraints, and usage patterns that cannot be fully anticipated during development. Relying on only one limits visibility and can leave gaps in reliability or usability.

How contact centers should think about this:

Rely on platform evaluations to establish a trusted baseline. This ensures core capabilities—such as accuracy, safety and compliance, latency, escalation behavior, and failure handling—meet enterprise standards before features are rolled out broadly.

Platform providers should partner closely with customers. This enables them to run their own evaluations and deeply understand AI performance within their specific domains, workflows, and operating environments. This collaboration helps surface both expected and edge-case behaviors—across positive and negative scenarios

4: Synthetic Data vs. Production Traffic

Definition: Synthetic data refers to artificially generated datasets designed to simulate specific scenarios. Production traffic comprises actual user interactions and data generated during live operation. 

Implications: Data Fidelity and Risk: Synthetic data enables safe, repeatable evaluation without exposing sensitive information or impacting real users. However, it may lack the complexity and unpredictability of production data. Production traffic delivers high-fidelity insights but carries risks of data leakage, performance degradation, or user impact. Relevance: Synthetic data is valuable for early-stage, edge-case, or privacy-sensitive evaluations. Production traffic is essential for verifying AI system behavior under real-world conditions.

How contact centers should think about this:

Begin with synthetic data to evaluate safely and iterate quickly, especially when testing new scenarios, edge cases, or changes.

Leverage production data to validate performance at scale, ensuring AI behaves as expected under real customer traffic and operating conditions.

Treat production evaluation as a continuous monitoring and learning loop, focused on measuring impact and improving quality—rather than experimenting on live customers.

5: Evaluation After vs. During Execution

Definition: Post-execution evaluation analyzes the results after a process or test run finishes, while in-execution (real-time) evaluation monitors and assesses behavior as it unfolds. 

Implications: post-execution evaluation enables deep analysis and long-term improvement, while in-execution evaluation allows faster detection and mitigation of issues. Using both helps contact centers balance insight with real-time protection of customer experience.

How contact centers should think about this:

Post-conversation evaluation can provide a large amount of information about correctness, groundedness, resolution effectiveness across completed AI interactions.

Real-time evaluation of empathy and sentiment enables timely intervention, such as escalating to a human agent or allowing supervisor guidance during the interaction

Together, these approaches form a core part of AI evaluation in the contact center, helping organizations balance deep analysis with real‑time protections.

Final Thoughts: A Modern Evaluation Mindset

There is no single “right” way to evaluate AI systems. Instead, evaluation should be viewed as a multi-dimensional strategy that evolves alongside your AI systems.

By thoughtfully strategizing across evaluation dimensions, organizations can build AI systems that are not only intelligent, but also trustworthy, resilient, and customer-first. Evaluation is no longer optional- it is how modern organizations ensure AI delivers on its promise, every day.

Get more details:

Measuring What Matters: Redefining Excellence for AI Agents in the Contact Center

Evaluating AI Agents in Contact Centers: Introducing the Multi-modal Agents Score

The post The Key Facets of AI Evaluation in the Contact Center appeared first on Microsoft Dynamics 365 Blog.

Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.

« Older Entries

Next Entries »

Announcing End of Support for Dynamics 365 Project Service Automation (PSA) on the U.S. Government cloud

Upgrade documentation and FAQ links

Learn more about Dynamics 365 Project Operations

The ultimate Microsoft 365 community event returns—your front‑row seat to the future of intelligent work

General Availability of Quality Evaluation Agent’s conversation capabilities

Looking Forward:

Learn more

Evaluating AI Agents in Contact Centers: Introducing the Multi-modal Agents Score

Introducing the Multimodal Agent Score (MAS)

MAS Dimensions and Parameters

Computing each MAS score:

Computing each MAS dimension:

MAS in Practice: Voice Agent Evaluation Example

Voice Evaluation Results (Excerpt)

Key Observations

Looking Forward

Measuring What Matters: Redefining Excellence for AI Agents in the Contact Center

What Makes a Great Customer Service Agent?

The AI Advantage: Dynamic Persona Adaptation

Defining Excellence: What the Best AI Agents Achieve

The Three Pillars: Understand, Reason, Respond

Pillar One: Understand

Pillar Two: Reason

Pillar Three: Respond

A Composite Score as Unified Measure

The Key Facets of AI Evaluation in the Contact Center

1: Development-Stage vs. Production-Stage Evaluation

How contact centers should think about this (example):

2: Manual vs. Automated Evaluation

How contact centers should think about this:

3: Evaluations Run by the Platform vs. by Customers

How contact centers should think about this:

4: Synthetic Data vs. Production Traffic

How contact centers should think about this:

5: Evaluation After vs. During Execution

How contact centers should think about this:

Final Thoughts: A Modern Evaluation Mindset

Recent Posts

Recent Comments

Archives

Categories

Meta

We look forward to meeting you

Announcing End of Support for Dynamics 365 Project Service Automation (PSA) on the U.S. Government cloud

Upgrade documentation and FAQ links 

Learn more about Dynamics 365 Project Operations 

Learn more