Share via


Advanced indexing in eDiscovery

When a data source is added to an eDiscovery case, any content deemed as partially indexed or that had indexing errors is reindexed. The reindexing process is called advanced indexing. There are many reasons that content is partially indexed or has indexing errors. These errors might include image files, the presence of images in a file, unsupported file types, or file sized indexing limits.

Advanced indexing is available in eDiscovery with premium feature support and is applied when processes are run, including processes like generating statistics, adding items to review set, and direct export. You don't need to wait until advanced indexing completes before conducting a search or adding items to a review set like previous versions of eDiscovery. Also user doesn't need to manually trigger any update on advanced indexing to ensure newly added content to the target locations need to be reindexed.

Tip

Want to try premium eDiscovery features? See the subscription requirements for Microsoft 365 Enterprise E5 licensing.

For SharePoint files, advanced indexing only runs on items marked as partially indexed or items with indexing errors. In Exchange, email messages with image attachments aren't marked as partially indexed or with indexing errors. This means that those files aren't reindexed by the advanced indexing process.

Note

Optical Character Recognition (OCR) automatically runs during advanced indexing. For more information about how OCR works and to configure OCR related settings, see Learn about search and analytics settings in eDiscovery cases.

View advanced indexing results

On the statistics dashboard, review advanced indexing hits and choose where to apply advanced indexing. View estimated hits on how many additional items from partially indexed items that match the search query after they're reindexed with advanced indexing. To ensure full advanced indexing is run on all partially indexed items in scope without sampling, use add to a review set and export with advanced indexing.

You can also review advanced indexed items in search results in the process report. This report provides details on natively indexed items, partially indexed items, and items identified by advanced indexing.

For more granular insights, you can use the CSV reports. This might include which specific items were exported or which locations were searched. These reports indicate whether an item was included due to a match with the indexed query or as a result of advanced indexing. The location report provides hit counts and data volume per location, broken down by matches to the standard index versus the advanced index.

Scope advanced indexing

You can scope advanced indexing to partially indexed items in locations where there are already indexed hits, partially indexed items in locations where there aren't any indexed hits, or both.

  • Partially indexed items from locations with indexed hits: This option is more targeted and limits the scope to only those locations (mailboxes, sites) where some content was successfully indexed and matched the search criteria.

    Choose this option when:

    • You want to reduce noise and focus only on locations already known to be relevant.
    • You’re conducting a narrow investigation and want to avoid reviewing irrelevant content.
    • You’re optimizing for review efficiency and cost control (for example, in large-scale eDiscovery cases).
    • You want to prioritize the locations and the partially indexed items in locations with high relevancy to the matter first, then later come back to do a more comprehensive analysis that include all locations (including those without indexed hits).
  • Partially indexed items from all locations searched: This option is more comprehensive and includes partially indexed items from every location searched, regardless of whether any indexed content matched.

    Choose this option when:

    • You need maximum completeness for legal defensibility or regulatory compliance.
    • You’re investigating potential data concealment or malicious behavior where relevant content might exist only in unindexed formats (for example, image-based attachments or encrypted files).
    • You’re dealing with high-risk scenarios where missing even one critical item could have legal or business consequences.