Nutrient - Extract from PDF (Preview)

Unlock powerful PDF text and data extraction with Nutrient Document Converter Extract actions. Seamlessly retrieve text, data, extract key-value pairs, and leverage OCR technology to process scanned documents. Ideal for indexing, search, content analysis, and structured data workflows.
This connector is available in the following products and regions:
Service | Class | Regions |
---|---|---|
Copilot Studio | Premium | All Power Automate regions except the following: - US Government (GCC) - US Government (GCC High) - China Cloud operated by 21Vianet - US Department of Defense (DoD) |
Logic Apps | Standard | All Logic Apps regions except the following: - Azure Government regions - Azure China regions - US Department of Defense (DoD) |
Power Apps | Premium | All Power Apps regions except the following: - US Government (GCC) - US Government (GCC High) - China Cloud operated by 21Vianet - US Department of Defense (DoD) |
Power Automate | Premium | All Power Automate regions except the following: - US Government (GCC) - US Government (GCC High) - China Cloud operated by 21Vianet - US Department of Defense (DoD) |
Contact | |
---|---|
Name | Nutrient (formerly Muhimbi) Support |
URL | https://support.nutrient.io/hc/en-us/requests/new |
support+low-code@nutrient.io |
Connector Metadata | |
---|---|
Publisher | Muhimbi trading as Nutrient |
Website | https://www.nutrient.io/low-code/ |
Privacy policy | https://www.nutrient.io/legal/privacy/ |
Categories | Collaboration;Content and Files |
Extract text and data from PDFs
Nutrient Document Converter enables you to extract text, data, or specific pages from PDF files as part of automated workflows in Power Automate. You can also extract text from images using OCR.
Available actions
- Extract key-value pairs
- Extract text using OCR
- Extract data from PDFs
- Extract PDF pages
- Extract text from images
- Extract text from PDFs using Power Automate
Refer to the linked guides for step-by-step instructions on implementing these actions in your workflows.
Prerequisites
To use Nutrient Document Converter, you need a Free or Trial account. Refer to the comparison guide to understand the differences between these account types.
Getting started
Follow the steps below to start using the Nutrient Document Converter connector:
- Sign up for a 30-day trial by filling out this form.
- After submitting the form, you will receive an email with your trial activation details.
- Refer to the getting started video for a walkthrough of the process.
- Read the Document Converter for Power Automate guide for detailed instructions.
- Explore Power Automate and Logic Apps tutorials for practical examples.
Known issues and limitations
Documents protected with IRM, DRM, RMS, or AIP solutions cannot be processed due to security restrictions.
For questions or assistance, contact our Support team.
Throttling Limits
Name | Calls | Renewal Period |
---|---|---|
API calls per connection | 100 | 60 seconds |
Actions
Extract key value pairs from a PDF document |
Identify and extract key-value pairs from documents for processing forms or structured data workflows. |
Extract text from a PDF document |
Retrieve text content from PDF documents for easy indexing, search, or content analysis. |
Extract text from a PDF file using OCR |
Extract text from scanned documents or images using OCR technology, making them searchable and editable. |
Extract key value pairs from a PDF document
Identify and extract key-value pairs from documents for processing forms or structured data workflows.
Parameters
Name | Key | Required | Type | Description |
---|---|---|---|---|
Source file name
|
source_file_name | True | string |
Name of the source file including extension |
Source file content
|
source_file_content | True | byte |
Content of the file to convert |
OCR Language
|
ocr_language | string |
The language codes for OCR and KVP extraction, separated by '+'. For example, 'eng+deu+fra' would add English, German, and French. |
|
DPI
|
dpi | enum |
Remove the blank pages in the PDF |
|
KVP Output Format
|
kvp_format | enum |
The output formats separated by commas. KVP data can be output in JSON, CSV and XML. e.g. json,csv,xml |
|
Page Range
|
page_range | string |
The pages to be processed by KVP. Use the string of '1 - 5' for pages 1 to 5, or use the string of '1, 5, 6' to specify pages 1 and 5 and 6. |
|
Autorotate
|
autorotate | enum |
Setting this to 'Yes' will automatically rotate pages if the text does not have the correct orientation. |
|
Trim Symbols
|
trim_symbols | enum |
Setting this to 'Yes' will remove any symbols from the start/end of values, with the exception of the hash '#' or period '.' symbols. |
|
Include Key Bounding Box
|
include_key_bounding_box | enum |
Include the bounding box values for the key in the output |
|
Include Value Bounding Box
|
include_value_bounding_box | enum |
Include the bounding box values for the value in the output |
|
Include Page Number
|
include_page_number | enum |
Include the page number for the key value pair in the output |
|
Include Confidence
|
include_confidence | enum |
Include the confidence score for the key value pair in the output. Confidence is measured between 0 (no confidence) and 100 (full confidence). |
|
Confidence Threshold
|
confidence_threshold | integer |
The confidence threshold a key value pair must reach to be included in the output. Results under the threshold are discarded. |
|
Include Type
|
include_type | enum |
Include the data type for the key value pair in the output |
|
Expected Keys
|
expected_keys | string |
The JSON string containing the expected keys and synonyms |
|
Fail on error
|
fail_on_error | boolean |
Fail on error |
Returns
Response data for all operations
- Body
- operation_response
Extract text from a PDF document
Retrieve text content from PDF documents for easy indexing, search, or content analysis.
Parameters
Name | Key | Required | Type | Description |
---|---|---|---|---|
Source file name
|
source_file_name | True | string |
Name of the source file including extension |
Source file content
|
source_file_content | True | byte |
Content of the file to convert |
Page Range
|
page_range | string |
The page range to extract text from e.g. 1,5,8-12 |
|
Fail on error
|
fail_on_error | boolean |
Fail on error |
Returns
Response data for all operations
- Body
- operation_response
Extract text from a PDF file using OCR
Extract text from scanned documents or images using OCR technology, making them searchable and editable.
Parameters
Name | Key | Required | Type | Description |
---|---|---|---|---|
Source file name
|
source_file_name | True | string |
Name of the source file including extension |
Source file content
|
source_file_content | True | byte |
Content of the file to OCR |
Language
|
language | enum |
Language |
|
X Coordinate
|
x | string |
X Coordinate (in Pts, 1/72 of an inch) |
|
Y Coordinate
|
y | string |
Y Coordinate (in Pts, 1/72 of an inch) |
|
Width
|
width | string |
Width of the OCR area (in Pts, 1/72 of an inch) |
|
Height
|
height | string |
Height of the OCR area (in Pts, 1/72 of an inch) |
|
Page number
|
page_number | string |
Page number (leave blank to OCR all pages) |
|
Performance
|
performance | enum |
Performance () |
|
Blacklist / whitelist
|
characters_option | enum |
Characters option |
|
Characters
|
characters | string |
Characters to blacklist or whitelist |
|
Use pagination
|
paginate | boolean |
Paginate |
|
Fail on error
|
fail_on_error | boolean |
Fail on error |
Returns
Response data for OCRText operation
Definitions
ocr_operation_response
Response data for OCRText operation
Name | Path | Type | Description |
---|---|---|---|
Out text
|
out_text | string |
Extracted OCRed text in plain text. |
Base file name
|
base_file_name | string |
Name of the input file without the extension. |
Result code
|
result_code | enum |
Operation result code. |
Result details
|
result_details | string |
Operation result details. |
operation_response
Response data for all operations
Name | Path | Type | Description |
---|---|---|---|
Processed file content
|
processed_file_content | byte |
File generated by the Muhimbi converter. |
Base file name
|
base_file_name | string |
Name of the input file without the extension. |
Result code
|
result_code | enum |
Operation result code. |
Result details
|
result_details | string |
Operation result details. |