FAQ for analytics

2025-06-26

These frequently asked questions (FAQ) describe the AI effect of analytics assistance features in Copilot Studio.

How is generative AI used for analytics?

Copilot Studio uses AI to measure the quality of generative answer responses and to create clusters, which are used to provide insights into agent performance.

Generative answers uses knowledge sources of your choosing to generate a response. The feature also collects any feedback you provide. Analytics use large language models (LLMs) to classify the chat messages between users and agents into levels indicating the quality of generative answer responses. Copilot Studio compiles these indicators to give makers a summary of an agent’s overall performance.

Clustering uses LLMs to sort users' messages into groups based on shared subjects and provide each group with a descriptive name. Copilot Studio uses the names of these clusters to provide different types of insights you can use to improve your agent.

Quality of responses for generative answers

What is the quality of response intended use?

Makers use quality of response analytics to discover insights into agent usage and performance, then create actions for agent improvement. Currently, analytics can be used to understand if the quality of an agent’s generative answers meets the maker's expectations.

In addition to overall quality, quality of response analytics identifies areas where an agent performs poorly or fails to perform the maker’s intended goals. Based on that, the maker can define areas where generative answers perform poorly and take steps to improve their quality.

In addition, when identifying poor performance, there are best practices that can help improve quality. For example, after identifying knowledge sources with poor performance, a maker can edit the knowledge source or split the knowledge source into multiple, more focused sources for increased quality.

What data is used to create analytics for quality of response?

Quality of response analytics are calculated using a sample of generative answer responses. It requires the user query, the agent response, and the relevant knowledge sources that the generative model uses for the generative answer.

Quality of response analytics uses that information to evaluate if the generative answer quality is good, and if not, then why the quality is poor. For example, quality of response can identify incomplete, irrelevant, or not fully grounded responses.

What are the limitations of quality of response analytics, and how can users minimize the impact of limitations?

Quality of response analytics aren't calculated using all generative responses. Instead, analytics measures a sample of user-agent sessions. Agents below a minimum number of successful generative answers can't receive a quality of response analytical summary.
There are cases when analytics don't evaluate an individual response accurately. On an aggregated level, it should be accurate for most cases.
Quality of response analytics don’t provide a breakdown of the specific queries that led to low quality performance. They also don't provide a breakdown of common knowledge sources or topics that were used when low quality responses occur.
Analytics aren't calculated for answers that use generative knowledge.
Part of the metrics quality of responses analytics assesses is answer completeness. It evaluates how much the response is complete in related to the retrieved document.

If a relevant document which contains additional information to the given question isn’t retrieved, the completeness metric isn't evaluated according to this document.

What protections are in place within Copilot Studio for responsible AI?

Users of agents don't see analytics results; they're available to agent makers and admins only.

Makers and admins can only use quality of response analytics to see the percentage of good quality responses and any predefined reasons for poor performance. Makers can only see the percentage of good quality responses and predefine reasons.

We tested analytics for quality of responses thoroughly during development to ensure good performance. However, on rare occurrences, quality of response assessments may be inaccurate.

Clustering for insights

What is clustering's intended use?

Clustering for insights is used to discover and create contextual insights. Currently, Copilot Studio analytics uses clustering to find user queries that an agent isn’t able to address, then organize them into groups by content theme. Copilot Studio uses these groups to generate insights for addressing these unanswered queries. Copilot Studio generates clusters for this insight type daily, using all unanswered user queries from the last seven days.

What data is used to create clusters and insights?

Clustering collects user queries from the last seven days that, using generative answers, resulted in an unanswered query.

The feature also collects any feedback makers provide through thumbs up or down reactions. We use this data to evaluate and improve the quality of clustering. More information on what data is collected is available in the preview terms.

What are the limitations of clustering and insights, and how can users minimize the impact of limitations?

Clustering and suggestion quality depends on the queries that fit the goal for each suggestion type. If there aren't many user queries that match the insight type, or the queries are too unrelated, the clusters can be too specific or too vague to result in meaningful suggestions.
Clustering and insights aren't always perfect and can contain mistakes.

What operational factors and settings allow for effective and responsible use of agent templates and managed agents?

Users of agents don't see clusters or insights; they're available to agent makers and admins only. To protect against harmful content, we apply content moderation policies during suggestion generation.