Hi, I've an AI Search service and I tried to use "Import and vectorize data wizard" to create index+indexer+dataSource+skillSet to build index and indexing from blob storage.
Then I saw the indexer reported error as below:
Could you please advise me on how to resolve this?
I also noticed that some of the blob pathnames in my Blob Storage container are quite long (to make them easier for humans to read). Could this be contributing to the problem?
Thank you for your assistance.
Error
Document Key
localId=aHR0cHM6Ly9zdGlvdDA2MTkuYmxvYi5jb3JlLndpbmRvd3MubmV0L2N0LWlvdC0wNjE5LWttLyVFOCU4OCVBQSVFOSU4MSU4QiVFNSVBRCVBMyVFNSU4OCU4QS9kb2N1bWVudC8lRTglODglQUElRTklODElOEIlRTUlQUQlQTMlRTUlODglOEFfJUU3JTkyJUIwJUU1JUEyJTgzJUU0JUI4JThEJUU3JUEyJUJBJUU1JUFFJTlBJUU3JTlGJUE1JUU4JUE2JUJBJUU1JUIwJThEJUU4JTg4JUFBJUU3JUFFJUExJUU3JUIzJUJCJUU2JTg5JTgwJUU1JUE0JUE3JUU1JUFEJUI4JUU3JTk0JTlGJUU0JUI5JThCJUU3JTk0JTlGJUU2JUI2JUFGJUU5JTgxJUIyJUU3JTk2JTkxJUU1JUJEJUIxJUU5JTlGJUJGJUU2JThFJUEyJUU4JUE4JThFJUUyJTgwJTk0JUU4JTg3JUFBJUU2JTg4JTkxJUU2JTk1JTg4JUU4JTgzJUJEJUU4JTg4JTg3JUU1JUJGJTgzJUU3JTkwJTg2JUU4JUIzJTg3JUU2JTlDJUFDJUU3JTlBJTg0JUU1JUI5JUIyJUU2JTkzJUJFJUU2JTk1JTg4JUU2JTlFJTlDLyVFNyU5MiVCMCVFNSVBMiU4MyVFNCVCOCU4RCVFNyVBMiVCQSVFNSVBRSU5QSVFNyU5RiVBNSVFOCVBNiVCQSVFNSVCMCU4RCVFOCU4OCVBQSVFNyVBRSVBMSVFNyVCMyVCQiVFNiU4OSU4MCVFNSVBNCVBNyVFNSVBRCVCOCVFNyU5NCU5RiVFNCVCOSU4QiVFNyU5NCU5RiVFNiVCNiVBRiVFOSU4MSVCMiVFNyU5NiU5MSVFNSVCRCVCMSVFOSU5RiVCRiVFNiU4RSVBMiVFOCVBOCU4RSVFMiU4MCU5NCVFOCU4NyVBQSVFNiU4OCU5MSVFNiU5NSU4OCVFOCU4MyVCRCVFOCU4OCU4NyVFNSVCRiU4MyVFNyU5MCU4NiVFOCVCMyU4NyVFNiU5QyVBQyVFNyU5QSU4NCVFNSVCOSVCMiVFNiU5MyVCRSVFNiU5NSU4OCVFNiU5RSU5Qy5wZGY1&documentKey=aHR0cHM6Ly9zdGlvdDA2MTkuYmxvYi5jb3JlLndpbmRvd3MubmV0L2N0LWlvdC0wNjE5LWttLyVFOCU4OCVBQSVFOSU4MSU4QiVFNSVBRCVBMyVFNSU4OCU4QS9kb2N1bWVudC8lRTglODglQUElRTklODElOEIlRTUlQUQlQTMlRTUlODglOEFfJUU3JTkyJUIwJUU1JUEyJTgzJUU0JUI4JThEJUU3JUEyJUJBJUU1JUFFJTlBJUU3JTlGJUE1JUU4JUE2JUJBJUU1JUIwJThEJUU4JTg4JUFBJUU3JUFFJUExJUU3JUIzJUJCJUU2JTg5JTgwJUU1JUE0JUE3JUU1JUFEJUI4JUU3JTk0JTlGJUU0JUI5JThCJUU3JTk0JTlGJUU2JUI2JUFGJUU5JTgxJUIyJUU3JTk2JTkxJUU1JUJEJUIxJUU5JTlGJUJGJUU2JThFJUEyJUU4JUE4JThFJUUyJTgwJTk0JUU4JTg3JUFBJUU2JTg4JTkxJUU2JTk1JTg4JUU4JTgzJUJEJUU4JTg4JTg3JUU1JUJGJTgzJUU3JTkwJTg2JUU4JUIzJTg3JUU2JTlDJUFDJUU3JTlBJTg0JUU1JUI5JUIyJUU2JTkzJUJFJUU2JTk1JTg4JUU2JTlFJTlDLyVFNyU5MiVCMCVFNSVBMiU4MyVFNCVCOCU4RCVFNyVBMiVCQSVFNSVBRSU5QSVFNyU5RiVBNSVFOCVBNiVCQSVFNSVCMCU4RCVFOCU4OCVBQSVFNyVBRSVBMSVFNyVCMyVCQiVFNiU4OSU4MCVFNSVBNCVBNyVFNSVBRCVCOCVFNyU5NCU5RiVFNCVCOSU4QiVFNyU5NCU5RiVFNiVCNiVBRiVFOSU4MSVCMiVFNyU5NiU5MSVFNSVCRCVCMSVFOSU5RiVCRiVFNiU4RSVBMiVFOCVBOCU4RSVFMiU4MCU5NCVFOCU4NyVBQSVFNiU4OCU5MSVFNiU5NSU4OCVFOCU4MyVCRCVFOCU4OCU4NyVFNSVCRiU4MyVFNyU5MCU4NiVFOCVCMyU4NyVFNiU5QyVBQyVFNyU5QSU4NCVFNSVCOSVCMiVFNiU5MyVCRSVFNiU5NSU4OCVFNiU5RSU5Qy5wZGY1
Operation
Target field 'chunk_id' is either not present, doesn't have a value set, or no data could be extracted from the document for it.Failed document: 'https://stiot0619.blob.core.windows.net/ct-iot-0619-km/%E8%88%AA%E9%81%8B%E5%AD%A3%E5%88%8A/document/%E8%88%AA%E9%81%8B%E5%AD%A3%E5%88%8A_%E7%92%B0%E5%A2%83%E4%B8%8D%E7%A2%BA%E5%AE%9A%E7%9F%A5%E8%A6%BA%E5%B0%8D%E8%88%AA%E7%AE%A1%E7%B3%BB%E6%89%80%E5%A4%A7%E5%AD%B8%E7%94%9F%E4%B9%8B%E7%94%9F%E6%B6%AF%E9%81%B2%E7%96%91%E5%BD%B1%E9%9F%BF%E6%8E%A2%E8%A8%8E%E2%80%94%E8%87%AA%E6%88%91%E6%95%88%E8%83%BD%E8%88%87%E5%BF%83%E7%90%86%E8%B3%87%E6%9C%AC%E7%9A%84%E5%B9%B2%E6%93%BE%E6%95%88%E6%9E%9C/%E7%92%B0%E5%A2%83%E4%B8%8D%E7%A2%BA%E5%AE%9A%E7%9F%A5%E8%A6%BA%E5%B0%8D%E8%88%AA%E7%AE%A1%E7%B3%BB%E6%89%80%E5%A4%A7%E5%AD%B8%E7%94%9F%E4%B9%8B%E7%94%9F%E6%B6%AF%E9%81%B2%E7%96%91%E5%BD%B1%E9%9F%BF%E6%8E%A2%E8%A8%8E%E2%80%94%E8%87%AA%E6%88%91%E6%95%88%E8%83%BD%E8%88%87%E5%BF%83%E7%90%86%E8%B3%87%E6%9C%AC%E7%9A%84%E5%B9%B2%E6%93%BE%E6%95%88%E6%9E%9C.pdf'
Message
Could not parse document. Document key cannot be longer than 1024 characters.
Details
Target field 'chunk_id' is either not present, doesn't have a value set, or no data could be extracted from the document for it.Failed document: 'https://stiot0619.blob.core.windows.net/ct-iot-0619-km/%E8%88%AA%E9%81%8B%E5%AD%A3%E5%88%8A/document/%E8%88%AA%E9%81%8B%E5%AD%A3%E5%88%8A_%E7%92%B0%E5%A2%83%E4%B8%8D%E7%A2%BA%E5%AE%9A%E7%9F%A5%E8%A6%BA%E5%B0%8D%E8%88%AA%E7%AE%A1%E7%B3%BB%E6%89%80%E5%A4%A7%E5%AD%B8%E7%94%9F%E4%B9%8B%E7%94%9F%E6%B6%AF%E9%81%B2%E7%96%91%E5%BD%B1%E9%9F%BF%E6%8E%A2%E8%A8%8E%E2%80%94%E8%87%AA%E6%88%91%E6%95%88%E8%83%BD%E8%88%87%E5%BF%83%E7%90%86%E8%B3%87%E6%9C%AC%E7%9A%84%E5%B9%B2%E6%93%BE%E6%95%88%E6%9E%9C/%E7%92%B0%E5%A2%83%E4%B8%8D%E7%A2%BA%E5%AE%9A%E7%9F%A5%E8%A6%BA%E5%B0%8D%E8%88%AA%E7%AE%A1%E7%B3%BB%E6%89%80%E5%A4%A7%E5%AD%B8%E7%94%9F%E4%B9%8B%E7%94%9F%E6%B6%AF%E9%81%B2%E7%96%91%E5%BD%B1%E9%9F%BF%E6%8E%A2%E8%A8%8E%E2%80%94%E8%87%AA%E6%88%91%E6%95%88%E8%83%BD%E8%88%87%E5%BF%83%E7%90%86%E8%B3%87%E6%9C%AC%E7%9A%84%E5%B9%B2%E6%93%BE%E6%95%88%E6%9E%9C.pdf'
my index:
{
"@odata.etag": "\"0x8DDD01644836E64\"",
"name": "idx-iot-0619",
"fields": [
{
"name": "chunk_id",
"type": "Edm.String",
"searchable": true,
"filterable": false,
"retrievable": true,
"stored": true,
"sortable": true,
"facetable": false,
"key": true,
"analyzer": "keyword",
"synonymMaps": []
},
{
"name": "parent_id",
"type": "Edm.String",
"searchable": false,
"filterable": true,
"retrievable": true,
"stored": true,
"sortable": false,
"facetable": false,
"key": false,
"synonymMaps": []
},
{
"name": "chunk",
"type": "Edm.String",
"searchable": true,
"filterable": false,
"retrievable": true,
"stored": true,
"sortable": false,
"facetable": false,
"key": false,
"synonymMaps": []
},
{
"name": "title",
"type": "Edm.String",
"searchable": true,
"filterable": false,
"retrievable": true,
"stored": true,
"sortable": false,
"facetable": false,
"key": false,
"synonymMaps": []
},
{
"name": "text_vector",
"type": "Collection(Edm.Single)",
"searchable": true,
"filterable": false,
"retrievable": true,
"stored": true,
"sortable": false,
"facetable": false,
"key": false,
"dimensions": 3072,
"vectorSearchProfile": "idx-iot-0619-azureOpenAi-text-profile",
"synonymMaps": []
}
],
"scoringProfiles": [],
"suggesters": [],
"analyzers": [],
"normalizers": [],
"tokenizers": [],
"tokenFilters": [],
"charFilters": [],
"similarity": {
"@odata.type": "#Microsoft.Azure.Search.BM25Similarity"
},
"semantic": {
"defaultConfiguration": "idx-iot-0619-semantic-configuration",
"configurations": [
{
"name": "idx-iot-0619-semantic-configuration",
"flightingOptIn": false,
"rankingOrder": "BoostedRerankerScore",
"prioritizedFields": {
"titleField": {
"fieldName": "title"
},
"prioritizedContentFields": [
{
"fieldName": "chunk"
}
],
"prioritizedKeywordsFields": []
}
}
]
},
"vectorSearch": {
"algorithms": [
{
"name": "idx-iot-0619-algorithm",
"kind": "hnsw",
"hnswParameters": {
"metric": "cosine",
"m": 4,
"efConstruction": 400,
"efSearch": 500
}
}
],
"profiles": [
{
"name": "idx-iot-0619-azureOpenAi-text-profile",
"algorithm": "idx-iot-0619-algorithm",
"vectorizer": "idx-iot-0619-azureOpenAi-text-vectorizer"
}
],
"vectorizers": [
{
"name": "idx-iot-0619-azureOpenAi-text-vectorizer",
"kind": "azureOpenAI",
"azureOpenAIParameters": {
"resourceUri": "https://aoai-iot-0619.openai.azure.com",
"deploymentId": "text-embedding-3-large",
"apiKey": "<redacted>",
"modelName": "text-embedding-3-large"
}
}
],
"compressions": []
}
}
my indexer:
{
"@odata.context": "https://as-iot-0619.search.windows.net/$metadata#indexers/$entity",
"@odata.etag": "\"0x8DDD016BB8DE288\"",
"name": "idx-iot-0619-indexer",
"description": null,
"dataSourceName": "idx-iot-0619-datasource",
"skillsetName": "idx-iot-0619-skillset",
"targetIndexName": "idx-iot-0619",
"disabled": null,
"schedule": null,
"parameters": {
"batchSize": null,
"maxFailedItems": null,
"maxFailedItemsPerBatch": null,
"configuration": {
"dataToExtract": "contentAndMetadata",
"parsingMode": "default"
}
},
"fieldMappings": [
{
"sourceFieldName": "metadata_storage_name",
"targetFieldName": "title",
"mappingFunction": null
}
],
"outputFieldMappings": [],
"cache": null,
"encryptionKey": null
}