How can I use Copilot to retrieve all questions and answers from an external website that match a specific keyword?

Haritha Yuvaraj 5 Reputation points
2025-08-06T09:46:05.25+00:00

I'm working on building a Copilot agent where an external website (https://www.doubtnut.com/class-11/physics) is configured as a knowledge source. My goal is for the agent to return all questions and answers from the site that match a given keyword. 

For example, when I search using the prompt: 

"Q&A from the Kinetic Theory chapter where 'velocity' is mentioned in the question." 

Copilot only returns the top 3 results, even though there are more relevant matches available. I need it to retrieve all matching questions and their solutions where the keyword (e.g., "velocity") appears in the question. 

Could you please guide me on how to achieve this using a custom topic with a Generative Answers node, or if there’s a way to configure this through default settings?

Microsoft Copilot | Microsoft 365 Copilot | Development
{count} vote

1 answer

Sort by: Most helpful
  1. Karan Shewale 1,040 Reputation points Microsoft External Staff
    2025-08-06T12:17:03.43+00:00

    Hello Haritha,

    Thank you for your detailed question about retrieving comprehensive results from external knowledge sources in Copilot Studio. The limitation you're experiencing with only getting top 3 results is due to default response limits and search configurations. Here's how to address this:

    Understanding the Limitation

    Copilot Studio's Generative Answers node has built-in result limits to:

    • Ensure response quality and relevance
    • Prevent token limit exhaustion
    • Maintain reasonable response times

    Solutions to Retrieve More Results

    1. Custom Topic with Enhanced Search Configuration

    Create a custom topic with these settings:

    Topic: Comprehensive Q&A Search

    Trigger Phrases: "all questions about", "complete list", "comprehensive search"

    Variables:

    - Keyword (Text)

    - Subject (Text)

    2. Modify Generative Answers Node Settings

    In your Generative Answers node:

    Data Source Configuration:

    • Set "Number of sources" to maximum (typically 20)
      • Enable "Search across all content"
      Search Instructions:

    Search for ALL questions and answers from the knowledge source where the keyword appears in the question text. Return a comprehensive list including:

    1. Question number/identifier

    2. Complete question text

    3. Full solution/answer

    4. Organize by relevance but include all matches

    3. Advanced Prompt Engineering

    Use specific prompts that request comprehensive results:

    "Retrieve every question from [subject] where '[keyword]' appears in the question. 

    Format as: Question [number]: [full question] | Answer: [complete solution]. 

    Continue until all matching questions are listed."

    4. Iterative Search Approach

    Create a multi-turn conversation flow:

    • Initial search returns first batch
    • Follow-up prompts: "Show me more results" or "Continue the search"
    • Use conversation memory to track what's already been shown

    5. Knowledge Source Optimization

    Ensure your external website data is properly indexed:

    • URL Configuration: Verify all relevant pages are included
    • Content Depth: Set crawling depth to capture all Q&A pages
    • Refresh Frequency: Regular updates to maintain current content

    6. Alternative Technical Approaches

    Option A: Power Automate Integration

    • Create a flow that scrapes the website comprehensively
    • Store results in SharePoint or Dataverse
    • Reference this structured data in Copilot

    Option B: Custom Connector

    • Build a custom connector to the website's API (if available)
    • Implement pagination to retrieve all results
    • Return structured JSON with all matching Q&As

    Implementation Example

    Custom Topic: "Complete Physics Q&A Search"

    Node 1: Question (Keyword Input)

    "What keyword would you like to search for in physics questions?"

    Node 2: Generative Answers

    Data Source: External Website

    Instructions: "Search the entire knowledge base for ALL questions containing the keyword '{Keyword}'. Return every match with question number, full question text, and complete solution. Do not limit results."

    Node 3: Follow-up Options

    "Would you like to:

    1. Search for another keyword

    2. Filter by specific chapter

    3. Get additional results if available"

    Best Practices

    1. Chunk Large Responses: If hitting token limits, implement pagination
    2. Structured Output: Request specific formatting for easier parsing
    3. Conversation Memory: Track previous searches to avoid duplicates
    4. User Control: Let users specify how many results they want

    Limitations to Consider

    • Token limits may still restrict extremely large result sets
    • External website structure affects search quality
    • Response time increases with result volume

    Thanks,  

    Karan Shewale. 

    *************************************************************************  

    If the response is helpful, please click "Accept Answer" and upvote it. You can share your feedback via Microsoft Teams Developer Feedback link. Click here to escalate.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.