Azure AI Search indexer reported "Could not parse document. Document key cannot be longer than 1024 characters."

桂學文 Kevin Kuei 145 Reputation points
2025-07-31T10:16:04.45+00:00

Hi, I've an AI Search service and I tried to use "Import and vectorize data wizard" to create index+indexer+dataSource+skillSet to build index and indexing from blob storage.

Then I saw the indexer reported error as below:

Could you please advise me on how to resolve this?

I also noticed that some of the blob pathnames in my Blob Storage container are quite long (to make them easier for humans to read). Could this be contributing to the problem?

Thank you for your assistance.

Error
Document Key
localId=aHR0cHM6Ly9zdGlvdDA2MTkuYmxvYi5jb3JlLndpbmRvd3MubmV0L2N0LWlvdC0wNjE5LWttLyVFOCU4OCVBQSVFOSU4MSU4QiVFNSVBRCVBMyVFNSU4OCU4QS9kb2N1bWVudC8lRTglODglQUElRTklODElOEIlRTUlQUQlQTMlRTUlODglOEFfJUU3JTkyJUIwJUU1JUEyJTgzJUU0JUI4JThEJUU3JUEyJUJBJUU1JUFFJTlBJUU3JTlGJUE1JUU4JUE2JUJBJUU1JUIwJThEJUU4JTg4JUFBJUU3JUFFJUExJUU3JUIzJUJCJUU2JTg5JTgwJUU1JUE0JUE3JUU1JUFEJUI4JUU3JTk0JTlGJUU0JUI5JThCJUU3JTk0JTlGJUU2JUI2JUFGJUU5JTgxJUIyJUU3JTk2JTkxJUU1JUJEJUIxJUU5JTlGJUJGJUU2JThFJUEyJUU4JUE4JThFJUUyJTgwJTk0JUU4JTg3JUFBJUU2JTg4JTkxJUU2JTk1JTg4JUU4JTgzJUJEJUU4JTg4JTg3JUU1JUJGJTgzJUU3JTkwJTg2JUU4JUIzJTg3JUU2JTlDJUFDJUU3JTlBJTg0JUU1JUI5JUIyJUU2JTkzJUJFJUU2JTk1JTg4JUU2JTlFJTlDLyVFNyU5MiVCMCVFNSVBMiU4MyVFNCVCOCU4RCVFNyVBMiVCQSVFNSVBRSU5QSVFNyU5RiVBNSVFOCVBNiVCQSVFNSVCMCU4RCVFOCU4OCVBQSVFNyVBRSVBMSVFNyVCMyVCQiVFNiU4OSU4MCVFNSVBNCVBNyVFNSVBRCVCOCVFNyU5NCU5RiVFNCVCOSU4QiVFNyU5NCU5RiVFNiVCNiVBRiVFOSU4MSVCMiVFNyU5NiU5MSVFNSVCRCVCMSVFOSU5RiVCRiVFNiU4RSVBMiVFOCVBOCU4RSVFMiU4MCU5NCVFOCU4NyVBQSVFNiU4OCU5MSVFNiU5NSU4OCVFOCU4MyVCRCVFOCU4OCU4NyVFNSVCRiU4MyVFNyU5MCU4NiVFOCVCMyU4NyVFNiU5QyVBQyVFNyU5QSU4NCVFNSVCOSVCMiVFNiU5MyVCRSVFNiU5NSU4OCVFNiU5RSU5Qy5wZGY1&documentKey=aHR0cHM6Ly9zdGlvdDA2MTkuYmxvYi5jb3JlLndpbmRvd3MubmV0L2N0LWlvdC0wNjE5LWttLyVFOCU4OCVBQSVFOSU4MSU4QiVFNSVBRCVBMyVFNSU4OCU4QS9kb2N1bWVudC8lRTglODglQUElRTklODElOEIlRTUlQUQlQTMlRTUlODglOEFfJUU3JTkyJUIwJUU1JUEyJTgzJUU0JUI4JThEJUU3JUEyJUJBJUU1JUFFJTlBJUU3JTlGJUE1JUU4JUE2JUJBJUU1JUIwJThEJUU4JTg4JUFBJUU3JUFFJUExJUU3JUIzJUJCJUU2JTg5JTgwJUU1JUE0JUE3JUU1JUFEJUI4JUU3JTk0JTlGJUU0JUI5JThCJUU3JTk0JTlGJUU2JUI2JUFGJUU5JTgxJUIyJUU3JTk2JTkxJUU1JUJEJUIxJUU5JTlGJUJGJUU2JThFJUEyJUU4JUE4JThFJUUyJTgwJTk0JUU4JTg3JUFBJUU2JTg4JTkxJUU2JTk1JTg4JUU4JTgzJUJEJUU4JTg4JTg3JUU1JUJGJTgzJUU3JTkwJTg2JUU4JUIzJTg3JUU2JTlDJUFDJUU3JTlBJTg0JUU1JUI5JUIyJUU2JTkzJUJFJUU2JTk1JTg4JUU2JTlFJTlDLyVFNyU5MiVCMCVFNSVBMiU4MyVFNCVCOCU4RCVFNyVBMiVCQSVFNSVBRSU5QSVFNyU5RiVBNSVFOCVBNiVCQSVFNSVCMCU4RCVFOCU4OCVBQSVFNyVBRSVBMSVFNyVCMyVCQiVFNiU4OSU4MCVFNSVBNCVBNyVFNSVBRCVCOCVFNyU5NCU5RiVFNCVCOSU4QiVFNyU5NCU5RiVFNiVCNiVBRiVFOSU4MSVCMiVFNyU5NiU5MSVFNSVCRCVCMSVFOSU5RiVCRiVFNiU4RSVBMiVFOCVBOCU4RSVFMiU4MCU5NCVFOCU4NyVBQSVFNiU4OCU5MSVFNiU5NSU4OCVFOCU4MyVCRCVFOCU4OCU4NyVFNSVCRiU4MyVFNyU5MCU4NiVFOCVCMyU4NyVFNiU5QyVBQyVFNyU5QSU4NCVFNSVCOSVCMiVFNiU5MyVCRSVFNiU5NSU4OCVFNiU5RSU5Qy5wZGY1

Operation
Target field 'chunk_id' is either not present, doesn't have a value set, or no data could be extracted from the document for it.Failed document: 'https://stiot0619.blob.core.windows.net/ct-iot-0619-km/%E8%88%AA%E9%81%8B%E5%AD%A3%E5%88%8A/document/%E8%88%AA%E9%81%8B%E5%AD%A3%E5%88%8A_%E7%92%B0%E5%A2%83%E4%B8%8D%E7%A2%BA%E5%AE%9A%E7%9F%A5%E8%A6%BA%E5%B0%8D%E8%88%AA%E7%AE%A1%E7%B3%BB%E6%89%80%E5%A4%A7%E5%AD%B8%E7%94%9F%E4%B9%8B%E7%94%9F%E6%B6%AF%E9%81%B2%E7%96%91%E5%BD%B1%E9%9F%BF%E6%8E%A2%E8%A8%8E%E2%80%94%E8%87%AA%E6%88%91%E6%95%88%E8%83%BD%E8%88%87%E5%BF%83%E7%90%86%E8%B3%87%E6%9C%AC%E7%9A%84%E5%B9%B2%E6%93%BE%E6%95%88%E6%9E%9C/%E7%92%B0%E5%A2%83%E4%B8%8D%E7%A2%BA%E5%AE%9A%E7%9F%A5%E8%A6%BA%E5%B0%8D%E8%88%AA%E7%AE%A1%E7%B3%BB%E6%89%80%E5%A4%A7%E5%AD%B8%E7%94%9F%E4%B9%8B%E7%94%9F%E6%B6%AF%E9%81%B2%E7%96%91%E5%BD%B1%E9%9F%BF%E6%8E%A2%E8%A8%8E%E2%80%94%E8%87%AA%E6%88%91%E6%95%88%E8%83%BD%E8%88%87%E5%BF%83%E7%90%86%E8%B3%87%E6%9C%AC%E7%9A%84%E5%B9%B2%E6%93%BE%E6%95%88%E6%9E%9C.pdf'

Message
Could not parse document. Document key cannot be longer than 1024 characters.

Details
Target field 'chunk_id' is either not present, doesn't have a value set, or no data could be extracted from the document for it.Failed document: 'https://stiot0619.blob.core.windows.net/ct-iot-0619-km/%E8%88%AA%E9%81%8B%E5%AD%A3%E5%88%8A/document/%E8%88%AA%E9%81%8B%E5%AD%A3%E5%88%8A_%E7%92%B0%E5%A2%83%E4%B8%8D%E7%A2%BA%E5%AE%9A%E7%9F%A5%E8%A6%BA%E5%B0%8D%E8%88%AA%E7%AE%A1%E7%B3%BB%E6%89%80%E5%A4%A7%E5%AD%B8%E7%94%9F%E4%B9%8B%E7%94%9F%E6%B6%AF%E9%81%B2%E7%96%91%E5%BD%B1%E9%9F%BF%E6%8E%A2%E8%A8%8E%E2%80%94%E8%87%AA%E6%88%91%E6%95%88%E8%83%BD%E8%88%87%E5%BF%83%E7%90%86%E8%B3%87%E6%9C%AC%E7%9A%84%E5%B9%B2%E6%93%BE%E6%95%88%E6%9E%9C/%E7%92%B0%E5%A2%83%E4%B8%8D%E7%A2%BA%E5%AE%9A%E7%9F%A5%E8%A6%BA%E5%B0%8D%E8%88%AA%E7%AE%A1%E7%B3%BB%E6%89%80%E5%A4%A7%E5%AD%B8%E7%94%9F%E4%B9%8B%E7%94%9F%E6%B6%AF%E9%81%B2%E7%96%91%E5%BD%B1%E9%9F%BF%E6%8E%A2%E8%A8%8E%E2%80%94%E8%87%AA%E6%88%91%E6%95%88%E8%83%BD%E8%88%87%E5%BF%83%E7%90%86%E8%B3%87%E6%9C%AC%E7%9A%84%E5%B9%B2%E6%93%BE%E6%95%88%E6%9E%9C.pdf'

my index:

{
  "@odata.etag": "\"0x8DDD01644836E64\"",
  "name": "idx-iot-0619",
  "fields": [
    {
      "name": "chunk_id",
      "type": "Edm.String",
      "searchable": true,
      "filterable": false,
      "retrievable": true,
      "stored": true,
      "sortable": true,
      "facetable": false,
      "key": true,
      "analyzer": "keyword",
      "synonymMaps": []
    },
    {
      "name": "parent_id",
      "type": "Edm.String",
      "searchable": false,
      "filterable": true,
      "retrievable": true,
      "stored": true,
      "sortable": false,
      "facetable": false,
      "key": false,
      "synonymMaps": []
    },
    {
      "name": "chunk",
      "type": "Edm.String",
      "searchable": true,
      "filterable": false,
      "retrievable": true,
      "stored": true,
      "sortable": false,
      "facetable": false,
      "key": false,
      "synonymMaps": []
    },
    {
      "name": "title",
      "type": "Edm.String",
      "searchable": true,
      "filterable": false,
      "retrievable": true,
      "stored": true,
      "sortable": false,
      "facetable": false,
      "key": false,
      "synonymMaps": []
    },
    {
      "name": "text_vector",
      "type": "Collection(Edm.Single)",
      "searchable": true,
      "filterable": false,
      "retrievable": true,
      "stored": true,
      "sortable": false,
      "facetable": false,
      "key": false,
      "dimensions": 3072,
      "vectorSearchProfile": "idx-iot-0619-azureOpenAi-text-profile",
      "synonymMaps": []
    }
  ],
  "scoringProfiles": [],
  "suggesters": [],
  "analyzers": [],
  "normalizers": [],
  "tokenizers": [],
  "tokenFilters": [],
  "charFilters": [],
  "similarity": {
    "@odata.type": "#Microsoft.Azure.Search.BM25Similarity"
  },
  "semantic": {
    "defaultConfiguration": "idx-iot-0619-semantic-configuration",
    "configurations": [
      {
        "name": "idx-iot-0619-semantic-configuration",
        "flightingOptIn": false,
        "rankingOrder": "BoostedRerankerScore",
        "prioritizedFields": {
          "titleField": {
            "fieldName": "title"
          },
          "prioritizedContentFields": [
            {
              "fieldName": "chunk"
            }
          ],
          "prioritizedKeywordsFields": []
        }
      }
    ]
  },
  "vectorSearch": {
    "algorithms": [
      {
        "name": "idx-iot-0619-algorithm",
        "kind": "hnsw",
        "hnswParameters": {
          "metric": "cosine",
          "m": 4,
          "efConstruction": 400,
          "efSearch": 500
        }
      }
    ],
    "profiles": [
      {
        "name": "idx-iot-0619-azureOpenAi-text-profile",
        "algorithm": "idx-iot-0619-algorithm",
        "vectorizer": "idx-iot-0619-azureOpenAi-text-vectorizer"
      }
    ],
    "vectorizers": [
      {
        "name": "idx-iot-0619-azureOpenAi-text-vectorizer",
        "kind": "azureOpenAI",
        "azureOpenAIParameters": {
          "resourceUri": "https://aoai-iot-0619.openai.azure.com",
          "deploymentId": "text-embedding-3-large",
          "apiKey": "<redacted>",
          "modelName": "text-embedding-3-large"
        }
      }
    ],
    "compressions": []
  }
}

my indexer:

{
  "@odata.context": "https://as-iot-0619.search.windows.net/$metadata#indexers/$entity",
  "@odata.etag": "\"0x8DDD016BB8DE288\"",
  "name": "idx-iot-0619-indexer",
  "description": null,
  "dataSourceName": "idx-iot-0619-datasource",
  "skillsetName": "idx-iot-0619-skillset",
  "targetIndexName": "idx-iot-0619",
  "disabled": null,
  "schedule": null,
  "parameters": {
    "batchSize": null,
    "maxFailedItems": null,
    "maxFailedItemsPerBatch": null,
    "configuration": {
      "dataToExtract": "contentAndMetadata",
      "parsingMode": "default"
    }
  },
  "fieldMappings": [
    {
      "sourceFieldName": "metadata_storage_name",
      "targetFieldName": "title",
      "mappingFunction": null
    }
  ],
  "outputFieldMappings": [],
  "cache": null,
  "encryptionKey": null
}

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
{count} votes

Accepted answer
  1. Nikhil Jha (Accenture International Limited) 230 Reputation points Microsoft External Staff Moderator
    2025-08-04T07:05:23.4666667+00:00

    Hello 桂學文 Kevin Kuei,

    You're absolutely correct in noting that the “Import and Vectorize Data” wizard significantly streamlines the onboarding process. It efficiently auto-generates the key components needed for setup—including the index, data source, skillset, and indexer—which makes initial integration both faster and more accessible for users. However, as you’ve observed firsthand, the wizard automatically assigns a document key based on the blob's full path, encoded in base64 format. This becomes problematic when dealing with long or URL-encoded blob paths, as the resulting string can surpass the 1024-character limit enforced by Azure Cognitive Search. Unfortunately, the current implementation does not provide a built-in way to override or transform the default key behavior. Additionally, the mappingFunction does not offer support for key transformation methods like hashing or GUID creation, which limits flexibility in handling large or complex blob path structures.

    As rightly suggested by external contributors Divyesh Govaerdhanan (credit to the community here 🙌), a feasible workaround is:
    1.        Rename chunk_id to a non-key field

    • This allows you to retain the full blob path for reference without violating the key length constraint.

    2.        Create a new field like doc_id or id

    • Populate it with a short, unique value such as a GUID or a truncated hash.
    • This must be done outside the wizard using a preprocessing step (e.g., Azure Function, Logic App, or script).
    • Set this new field as your index key.

    Currently, the wizard doesn’t expose customization hooks for setting or transforming the document key, and this limitation has been raised multiple times by users and is under review by the Azure Cognitive Search product team.

    If manual setup seems overwhelming, a middle ground would be to:

    • Use the wizard to generate your pipeline.
    • Export the index definition from the Azure portal or API.
    • Modify the index to use doc_id as the key.
    • Re-index your data using a shorter identifier injected into the documents.

    We understand that this adds complexity, and we appreciate your feedback—it helps prioritize future improvements. Let us know if you'd like assistance with exporting the configuration or generating unique keys for your documents—we’d be happy to help guide you through that process.

    Reference link:

    0 comments No comments

2 additional answers

Sort by: Most helpful
  1. Divyesh Govaerdhanan 8,345 Reputation points
    2025-07-31T22:21:38.4733333+00:00

    Hello,

    Welcome to Microsoft Q&A,

    This error is because your document key (chunk_id) exceeds the Azure Cognitive Search limit of 1024 characters, which is a hard constraint for key fields in an index. The field chunk_id is marked as the primary key ("key": true) in your index. Azure Cognitive Search requires key fields to be no longer than 1024 UTF-16 characters. In your case, chunk_id is generated from a long Base64-encoded or URL-encoded blob path, likely resulting in a string far beyond the 1024-character limit.

    You could generate a short, unique Key, or if your documents have a built-in short and unique identifier (e.g., metadata_storage_path or metadata_storage_name), consider using that instead for the key field.

    If you must preserve long keys for any reason:

    • Split metadata and content indexing into two separate indices.
    • Use the long URI in a non-key field in a secondary index.
    • Link them using a short hashed ID as a join key.

    https://learn.microsoft.com/en-us/azure/search/search-what-is-an-index#key-field

    Please upvote and accept the answer if it helps!!

    0 comments No comments

  2. Divyesh Govaerdhanan 8,345 Reputation points
    2025-08-01T16:03:36.73+00:00

    Hello 桂學文 Kevin Kuei,

    You're right, the “Import and Vectorize Data” wizard simplifies the onboarding by generating the index, indexer, skillset, and datasource automatically, but it currently uses the full blob path (encoded) as the document key, which is problematic when paths exceed the 1024-character limit, as in your case. At this time, the wizard does not support hashing or transforming the chunk_id key automatically. This limitation is acknowledged in feedback from users and Microsoft community forums. Several requests have been raised to let users customize the key field logic in the import wizard, but this is still under review.

    Is it possible for you to,

    1. Rename chunk_id to a non-key field
    2. Create a new field doc_id (or id) and:
      • Populate it with a GUID or truncated hash from a custom app/script
      • Set doc_id as your index key

    As you've observed, mappingFunction does not support hash() or GUID generation natively, so that must happen outside the wizard.

    Please Upvote and accept the answer if it helps!!

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.