Share via


Nutrient - Extract from PDF (Preview)

Unlock powerful PDF text and data extraction with Nutrient Document Converter Extract actions. Seamlessly retrieve text, data, extract key-value pairs, and leverage OCR technology to process scanned documents. Ideal for indexing, search, content analysis, and structured data workflows.

This connector is available in the following products and regions:

Service Class Regions
Copilot Studio Premium All Power Automate regions except the following:
     -   US Government (GCC)
     -   US Government (GCC High)
     -   China Cloud operated by 21Vianet
     -   US Department of Defense (DoD)
Logic Apps Standard All Logic Apps regions except the following:
     -   Azure Government regions
     -   Azure China regions
     -   US Department of Defense (DoD)
Power Apps Premium All Power Apps regions except the following:
     -   US Government (GCC)
     -   US Government (GCC High)
     -   China Cloud operated by 21Vianet
     -   US Department of Defense (DoD)
Power Automate Premium All Power Automate regions except the following:
     -   US Government (GCC)
     -   US Government (GCC High)
     -   China Cloud operated by 21Vianet
     -   US Department of Defense (DoD)
Contact
Name Nutrient (formerly Muhimbi) Support
URL https://support.nutrient.io/hc/en-us/requests/new
Email support+low-code@nutrient.io
Connector Metadata
Publisher Muhimbi trading as Nutrient
Website https://www.nutrient.io/low-code/
Privacy policy https://www.nutrient.io/legal/privacy/
Categories Collaboration;Content and Files

Extract text and data from PDFs

Nutrient Document Converter enables you to extract text, data, or specific pages from PDF files as part of automated workflows in Power Automate. You can also extract text from images using OCR.

Available actions

Refer to the linked guides for step-by-step instructions on implementing these actions in your workflows.

Prerequisites

To use Nutrient Document Converter, you need a Free or Trial account. Refer to the comparison guide to understand the differences between these account types.

Getting started

Follow the steps below to start using the Nutrient Document Converter connector:

Known issues and limitations

Documents protected with IRM, DRM, RMS, or AIP solutions cannot be processed due to security restrictions.

For questions or assistance, contact our Support team.

Throttling Limits

Name Calls Renewal Period
API calls per connection 100 60 seconds

Actions

Extract key value pairs from a PDF document

Identify and extract key-value pairs from documents for processing forms or structured data workflows.

Extract text from a PDF document

Retrieve text content from PDF documents for easy indexing, search, or content analysis.

Extract text from a PDF file using OCR

Extract text from scanned documents or images using OCR technology, making them searchable and editable.

Extract key value pairs from a PDF document

Identify and extract key-value pairs from documents for processing forms or structured data workflows.

Parameters

Name Key Required Type Description
Source file name
source_file_name True string

Name of the source file including extension

Source file content
source_file_content True byte

Content of the file to convert

OCR Language
ocr_language string

The language codes for OCR and KVP extraction, separated by '+'. For example, 'eng+deu+fra' would add English, German, and French.

DPI
dpi enum

Remove the blank pages in the PDF

KVP Output Format
kvp_format enum

The output formats separated by commas. KVP data can be output in JSON, CSV and XML. e.g. json,csv,xml

Page Range
page_range string

The pages to be processed by KVP. Use the string of '1 - 5' for pages 1 to 5, or use the string of '1, 5, 6' to specify pages 1 and 5 and 6.

Autorotate
autorotate enum

Setting this to 'Yes' will automatically rotate pages if the text does not have the correct orientation.

Trim Symbols
trim_symbols enum

Setting this to 'Yes' will remove any symbols from the start/end of values, with the exception of the hash '#' or period '.' symbols.

Include Key Bounding Box
include_key_bounding_box enum

Include the bounding box values for the key in the output

Include Value Bounding Box
include_value_bounding_box enum

Include the bounding box values for the value in the output

Include Page Number
include_page_number enum

Include the page number for the key value pair in the output

Include Confidence
include_confidence enum

Include the confidence score for the key value pair in the output. Confidence is measured between 0 (no confidence) and 100 (full confidence).

Confidence Threshold
confidence_threshold integer

The confidence threshold a key value pair must reach to be included in the output. Results under the threshold are discarded.

Include Type
include_type enum

Include the data type for the key value pair in the output

Expected Keys
expected_keys string

The JSON string containing the expected keys and synonyms

Fail on error
fail_on_error boolean

Fail on error

Returns

Response data for all operations

Extract text from a PDF document

Retrieve text content from PDF documents for easy indexing, search, or content analysis.

Parameters

Name Key Required Type Description
Source file name
source_file_name True string

Name of the source file including extension

Source file content
source_file_content True byte

Content of the file to convert

Page Range
page_range string

The page range to extract text from e.g. 1,5,8-12

Fail on error
fail_on_error boolean

Fail on error

Returns

Response data for all operations

Extract text from a PDF file using OCR

Extract text from scanned documents or images using OCR technology, making them searchable and editable.

Parameters

Name Key Required Type Description
Source file name
source_file_name True string

Name of the source file including extension

Source file content
source_file_content True byte

Content of the file to OCR

Language
language enum

Language

X Coordinate
x string

X Coordinate (in Pts, 1/72 of an inch)

Y Coordinate
y string

Y Coordinate (in Pts, 1/72 of an inch)

Width
width string

Width of the OCR area (in Pts, 1/72 of an inch)

Height
height string

Height of the OCR area (in Pts, 1/72 of an inch)

Page number
page_number string

Page number (leave blank to OCR all pages)

Performance
performance enum

Performance ()

Blacklist / whitelist
characters_option enum

Characters option

Characters
characters string

Characters to blacklist or whitelist

Use pagination
paginate boolean

Paginate

Fail on error
fail_on_error boolean

Fail on error

Returns

Response data for OCRText operation

Definitions

ocr_operation_response

Response data for OCRText operation

Name Path Type Description
Out text
out_text string

Extracted OCRed text in plain text.

Base file name
base_file_name string

Name of the input file without the extension.

Result code
result_code enum

Operation result code.

Result details
result_details string

Operation result details.

operation_response

Response data for all operations

Name Path Type Description
Processed file content
processed_file_content byte

File generated by the Muhimbi converter.

Base file name
base_file_name string

Name of the input file without the extension.

Result code
result_code enum

Operation result code.

Result details
result_details string

Operation result details.