Serverless synapse querying COSMOS analytics store

Subhash Marti 0 Reputation points
2025-08-05T11:11:46.1866667+00:00

I have a Serverless SQL Pool of Synapse and it is connected to COSMOS DB's Analytical store, Some of the items come through as non json objects , when i copy from the COSMOS DB and paste it in the Notepad++ the data seems to be of JSON format but the Analytical store connected SQL pool retrieves them as non JSON objects

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
{count} votes

1 answer

Sort by: Most helpful
  1. Marcin Policht 53,675 Reputation points MVP Volunteer Moderator
    2025-08-05T11:36:18.1666667+00:00

    When you're working with Serverless SQL Pool in Synapse Analytics connected to Azure Cosmos DB Analytical Store, encountering non-JSON objects is a common issue, especially when:

    • The source data has inconsistencies in formatting.
    • Certain items in the Analytical Store are malformed or not valid JSON.
    • There are differences in expected schema structures (e.g., missing or extra properties).
    • The Synapse view over the Cosmos DB container cannot infer a proper schema due to data anomalies.

    The Synapse Serverless SQL pool queries the Analytical Store using a defined schema, and it expects all items to be valid JSON documents. If even a single record doesn't conform to JSON standards (e.g., trailing commas, incorrect escaping, invalid characters), it will be interpreted as a non-JSON object, and you'll see issues like empty rows, null values, failure to read specific rows, _raw or similar columns showing unparsed strings.

    When you paste the document into Notepad++ and it looks like JSON, you're seeing visually valid JSON — but Synapse requires technically valid JSON per the JSON spec. You might want to check for unescaped quotes, backslashes, or control characters, numeric fields using the wrong type (e.g., NaN, Infinity), or comments and trailing commas, which JSON doesn’t support.

    To validate:

    A. Use OPENROWSET with RAW format - you can try querying the raw data to inspect problematic rows:

    SELECT TOP 10 *
    FROM OPENROWSET(
        BULK 'https://<cosmos-db>.documents.azure.com/dbs/<db>/colls/<collection>',
        FORMAT='CSV',
        DATA_SOURCE = 'CosmosDBDataSource'
    ) AS rows
    

    Or try:

    SELECT TOP 100 * 
    FROM [CosmosDB].[database].[collection]
    WHERE ISJSON(_raw) = 0
    

    This assumes _raw is available or you’re querying through a view over the Analytical Store.

    B. Add IS_JSON() check - you can filter for only valid JSON rows like this:

    SELECT * 
    FROM [CosmosDB].[database].[collection]
    WHERE IS_JSON([your_column]) = 1
    

    This should help temporarily skip malformed records.

    If possible, fix the malformed documents in Cosmos DB directly. You can:

    • Query for invalid JSON items using Cosmos DB SDK or Azure Data Explorer.
    • Rewrite or delete problematic entries.

    In Synapse, you can use TRY_CAST or TRY_CONVERT to avoid query failures:

    SELECT 
        TRY_CAST(JSON_VALUE([your_column], '$.someProperty') AS VARCHAR(100)) AS SomeValue
    FROM [CosmosDB].[database].[collection]
    

    If the JSON structure is inconsistent, you might use Azure Data Factory or Synapse Dataflows to flatten or clean the data before querying it.


    If the above response helps answer your question, remember to "Accept Answer" so that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.

    hth

    Marcin

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.