Hi there,
Great question! Differentiating text vs diagrams/images in scanned PDFs using Azure AI Document Intelligence (formerly Form Recognizer) depends on the approach you use.
🧠 Option 1: Layout Model (Prebuilt)
The Layout model can analyze scanned PDFs and images to extract:
Text (lines, words, tables)
Bounding box coordinates for each line or word
Information about selection marks and reading order
However, it does not directly tag images or diagrams. But you can infer non-text regions (i.e., diagrams/images) by detecting areas without extracted text, especially large blank bounding boxes or content with no recognized OCR.
👉 Use this API:
https://<endpoint>/formrecognizer/documentModels/prebuilt-layout:analyze
🧠 Option 2: Custom Neural Model with Image Classifier (Hybrid Approach)
If you want to go further:
Combine Azure Document Intelligence for text extraction AND
Use Azure Computer Vision or Custom Vision to classify image areas (detect diagrams, logos, illustrations, etc.)
For example:
Use Document Intelligence to get page layout and text bounding boxes.
Use that info to crop the non-text zones.
Send those cropped areas to Azure Vision APIs to classify them as diagrams or images.
🧠 Option 3: Use Page Content Tags (if using PDF SDK or AI Indexing)
Some advanced pipelines (like Azure AI Search + Cognitive Skills) allow "image content detection" by chaining:
OCR Skill
Layout Skill
Image Analysis Skill
This may help tag and extract diagram zones, especially in scanned engineering or academic documents.
🧪 Tip:
When working with scanned PDFs, always ensure that the PDF is readable and OCR-enabled (or set "readingOrder": "natural" in the layout API).
Let me know if you'd like a working Python or REST sample showing how to extract and infer these regions. And if this helps, please click “Accept Answer” so others can benefit too 😊
Best Regards,
Jerald Felix