Nutrient - Extract from PDF (Preview)

Unlock powerful PDF text and data extraction with Nutrient Document Converter Extract actions. Seamlessly retrieve text, data, extract key-value pairs, and leverage OCR technology to process scanned documents. Ideal for indexing, search, content analysis, and structured data workflows.

This connector is available in the following products and regions:

Service	Class	Regions
Copilot Studio	Premium	All Power Automate regions except the following: - US Government (GCC) - US Government (GCC High) - China Cloud operated by 21Vianet - US Department of Defense (DoD)
Logic Apps	Standard	All Logic Apps regions except the following: - Azure Government regions - Azure China regions - US Department of Defense (DoD)
Power Apps	Premium	All Power Apps regions except the following: - US Government (GCC) - US Government (GCC High) - China Cloud operated by 21Vianet - US Department of Defense (DoD)
Power Automate	Premium	All Power Automate regions except the following: - US Government (GCC) - US Government (GCC High) - China Cloud operated by 21Vianet - US Department of Defense (DoD)

Contact
Name	Nutrient (formerly Muhimbi) Support
URL	https://support.nutrient.io/hc/en-us/requests/new
Email	support+low-code@nutrient.io

Connector Metadata
Publisher	Muhimbi trading as Nutrient
Website	https://www.nutrient.io/low-code/
Privacy policy	https://www.nutrient.io/legal/privacy/
Categories	Collaboration;Content and Files

Extract text and data from PDFs

Nutrient Document Converter enables you to extract text, data, or specific pages from PDF files as part of automated workflows in Power Automate. You can also extract text from images using OCR.

Available actions

Refer to the linked guides for step-by-step instructions on implementing these actions in your workflows.

Prerequisites

To use Nutrient Document Converter, you need a Free or Trial account. Refer to the comparison guide to understand the differences between these account types.

Getting started

Follow the steps below to start using the Nutrient Document Converter connector:

Sign up for a 30-day trial by filling out this form.
After submitting the form, you will receive an email with your trial activation details.
Refer to the getting started video for a walkthrough of the process.
Read the Document Converter for Power Automate guide for detailed instructions.
Explore Power Automate and Logic Apps tutorials for practical examples.

Known issues and limitations

Documents protected with IRM, DRM, RMS, or AIP solutions cannot be processed due to security restrictions.

For questions or assistance, contact our Support team.

Throttling Limits

Name	Calls	Renewal Period
API calls per connection	100	60 seconds

Actions

Extract key value pairs from a PDF document	Identify and extract key-value pairs from documents for processing forms or structured data workflows.
Extract text from a PDF document	Retrieve text content from PDF documents for easy indexing, search, or content analysis.
Extract text from a PDF file using OCR	Extract text from scanned documents or images using OCR technology, making them searchable and editable.

Extract key value pairs from a PDF document

Operation ID:: extract_key_value_pairs

Identify and extract key-value pairs from documents for processing forms or structured data workflows.

Parameters

Name	Key	Required	Type	Description
Source file name	source_file_name	True	string	Name of the source file including extension
Source file content	source_file_content	True	byte	Content of the file to convert
OCR Language	ocr_language		string	The language codes for OCR and KVP extraction, separated by '+'. For example, 'eng+deu+fra' would add English, German, and French.
DPI	dpi		enum	Remove the blank pages in the PDF
KVP Output Format	kvp_format		enum	The output formats separated by commas. KVP data can be output in JSON, CSV and XML. e.g. json,csv,xml
Page Range	page_range		string	The pages to be processed by KVP. Use the string of '1 - 5' for pages 1 to 5, or use the string of '1, 5, 6' to specify pages 1 and 5 and 6.
Autorotate	autorotate		enum	Setting this to 'Yes' will automatically rotate pages if the text does not have the correct orientation.
Trim Symbols	trim_symbols		enum	Setting this to 'Yes' will remove any symbols from the start/end of values, with the exception of the hash '#' or period '.' symbols.
Include Key Bounding Box	include_key_bounding_box		enum	Include the bounding box values for the key in the output
Include Value Bounding Box	include_value_bounding_box		enum	Include the bounding box values for the value in the output
Include Page Number	include_page_number		enum	Include the page number for the key value pair in the output
Include Confidence	include_confidence		enum	Include the confidence score for the key value pair in the output. Confidence is measured between 0 (no confidence) and 100 (full confidence).
Confidence Threshold	confidence_threshold		integer	The confidence threshold a key value pair must reach to be included in the output. Results under the threshold are discarded.
Include Type	include_type		enum	Include the data type for the key value pair in the output
Expected Keys	expected_keys		string	The JSON string containing the expected keys and synonyms
Fail on error	fail_on_error		boolean	Fail on error

Returns

Response data for all operations

Body: operation_response

Extract text from a PDF document

Operation ID:: extract_text

Retrieve text content from PDF documents for easy indexing, search, or content analysis.

Parameters

Name	Key	Required	Type	Description
Source file name	source_file_name	True	string	Name of the source file including extension
Source file content	source_file_content	True	byte	Content of the file to convert
Page Range	page_range		string	The page range to extract text from e.g. 1,5,8-12
Fail on error	fail_on_error		boolean	Fail on error

Returns

Response data for all operations

Body: operation_response

Extract text from a PDF file using OCR

Operation ID:: ocr_text

Extract text from scanned documents or images using OCR technology, making them searchable and editable.

Parameters

Name	Key	Required	Type	Description
Source file name	source_file_name	True	string	Name of the source file including extension
Source file content	source_file_content	True	byte	Content of the file to OCR
Language	language		enum	Language
X Coordinate	x		string	X Coordinate (in Pts, 1/72 of an inch)
Y Coordinate	y		string	Y Coordinate (in Pts, 1/72 of an inch)
Width	width		string	Width of the OCR area (in Pts, 1/72 of an inch)
Height	height		string	Height of the OCR area (in Pts, 1/72 of an inch)
Page number	page_number		string	Page number (leave blank to OCR all pages)
Performance	performance		enum	Performance ()
Blacklist / whitelist	characters_option		enum	Characters option
Characters	characters		string	Characters to blacklist or whitelist
Use pagination	paginate		boolean	Paginate
Fail on error	fail_on_error		boolean	Fail on error

Returns

Response data for OCRText operation

Body: ocr_operation_response

Definitions

ocr_operation_response

Response data for OCRText operation

Name	Path	Type	Description
Out text	out_text	string	Extracted OCRed text in plain text.
Base file name	base_file_name	string	Name of the input file without the extension.
Result code	result_code	enum	Operation result code.
Result details	result_details	string	Operation result details.

operation_response

Response data for all operations

Name	Path	Type	Description
Processed file content	processed_file_content	byte	File generated by the Muhimbi converter.
Base file name	base_file_name	string	Name of the input file without the extension.
Result code	result_code	enum	Operation result code.
Result details	result_details	string	Operation result details.

Share via

Nutrient - Extract from PDF (Preview)

Extract text and data from PDFs

Available actions

Prerequisites

Getting started

Known issues and limitations

Throttling Limits

Actions

Extract key value pairs from a PDF document

Parameters

Returns

Extract text from a PDF document

Parameters

Returns

Extract text from a PDF file using OCR

Parameters

Returns

Definitions

ocr_operation_response

operation_response