Get named entities extraction insights

2025-06-10

Named entities extraction uses natural language processing (NLP) to find locations, people, and brands in audio and images in media files. Named entities extraction uses transcription and optical character recognition (OCR).

Named entities use cases

Contextual advertising, for example, placing an ad for a pizza chain following footage on Italy.
Deep searching media archives for insights on people or locations to create feature stories for the news.
Creating a verbal description of footage using optical character recognition (OCR) processing to enhance accessibility for the visually impaired, for example a background storyteller in movies.
Extracting insights on brand names.

View the insight JSON with the web portal

After you upload and index a video, download insights in JSON format from the web portal.

Select the Library tab.
Select the media you want.
Select Download, and then select Insights (JSON). The JSON file opens in a new browser tab.
Find the key pair described in the example response.

Use the API

Use a Get Video Index request. Pass &includeSummarizedInsights=false.
Find the key pairs described in the example response.

Example response

    namedPeople: [
    {
    referenceId: "Satya_Nadella",
    referenceUrl: "https://en.wikipedia.org/wiki/Satya_Nadella",
    confidence: 1,
    description: "CEO of Microsoft Corporation",
    seenDuration: 33.2,
    id: 2,
    name: "Satya Nadella",
    appearances: [
    {
    startTime: "0:01:11.04",
    endTime: "0:01:17.36",
    startSeconds: 71,
    endSeconds: 77.4
    },
    {
    startTime: "0:01:31.83",
    endTime: "0:01:37.1303666",
    startSeconds: 91.8,
    endSeconds: 97.1
    },

Important

Read the transparency note overview for all VI features. Each insight also has its own transparency note.

Named entities notes

Carefully consider the accuracy of the results, to promote more accurate detections, check the quality of the audio and images, low quality audio and images might impact the detected insights.
Named entities only detect insights in audio and images. Logos in a brand name may not be detected.
Carefully consider that when using for law enforcement named entities may not always detect parts of the audio. To ensure fair and high-quality decisions, always combine named entities with human oversight.
Don't use named entities for decisions that may have serious adverse impacts on individuals and groups. Machine learning models that extract text can result in undetected or incorrect text output. Your decisions based on incorrect output could have serious adverse impacts that must be avoided. You should always include human review of determinations that have the potential for serious impacts on individuals.

Components

During the named entities extraction procedure, the media file is processed, as follows:

Component	Definition
Source file	The user uploads the source file for indexing.
Text extraction	- The audio file is sent to Speech Services API to extract the transcription. - Sampled frames are sent to the Azure AI Vision API to extract OCR.
Analytics	The insights are then sent to the Text Analytics API to extract the entities. For example, Microsoft, Paris or a person’s name like Paul or Sarah.
Processing and consolidation	The results are then processed. Where applicable, Wikipedia links are added and brands are identified via the Video Indexer built-in and customizable branding lists.
Confidence value	The estimated confidence level of each named entity is calculated as a range of 0 to 1. The confidence score represents the certainty in the accuracy of the result. For example, an 82% certainty is represented as an 0.82 score.

Sample code

See all samples for VI

Azure AI Video Indexer documentation

Share via