Wrong extent dates in ADX deduplicated materialized view

01725609 85 Reputation points
2025-07-15T10:47:17.63+00:00

Hi

I'm having the following scenario.

We have a Source Table (which can include duplicate records)

We have a materialized view which deduplicates with the following command:

.create materialized-view with (lookback=1d, lookback_column="CustomDateTime", autoUpdateSchema=true) Mv_TableName_Deduplicated on table TableName
{
TableName
    | summarize take_any(*) by DuplicateDetectionHash
}

We have ingested a lot of historical data in our source table using lightingest as suggested in the documentation, and also specifying the correct creation time so the extents are built with the correct datetime.

However, when looking at the extents in the materialized view, we see that the extents are not copied from the source table but are rather on time of ingestion into the materialized view.

Since I am not creating the Materialized view using the backfill property, how can we make sure that the extents within the deduplication materialized view have the correct datetime? because applying hot caching policies will become difficult now.

As I read the documentation, applying values for effectiveDateTime and updateExtentsCreationTime is only when we put the backfill parameter to true, which we don't want to do because the source table can have billions of records, so this would become a resource intensive operation.

Also, same question applies when having aggregated views with monthly aggregated data, where historical data being added using the same mechanism (without the actual backfill property of the materialized view).

Thank you

Azure Data Explorer
Azure Data Explorer
An Azure data analytics service for real-time analysis on large volumes of data streaming from sources including applications, websites, and internet of things devices.
{count} votes

Accepted answer
  1. Venkat Reddy Navari 5,330 Reputation points Microsoft External Staff Moderator
    2025-07-17T04:04:03.6433333+00:00

    Hi 01725609 Thank you for your detailed question and follow-up this is an important scenario and a common source of confusion when working with materialized views in Azure Data Explorer (ADX), especially when dealing with historical ingestion and deduplication logic.

    Understanding the Behavior

    You're absolutely correct: As per current official documentation, the effectiveDateTime and updateExtentsCreationTime properties are only honored when backfill=true is specified at view creation time.

    If backfill is not enabled:

    The extents in the materialized view will reflect the time of ingestion into the view, not the original creation times from the source table.

    Even if lightningest was used to ingest historical data correctly into the source table, this does not propagate to the materialized view unless backfill=true.

    Why Altering Doesn't Help

    You also pointed out that these properties are not supported in the .alter materialized-view command and you're right again. Unfortunately, effectiveDateTime is not alterable post-creation, meaning the only way to apply it is during initial creation of the view with backfill=true.

    Workaround Recommendations

    Given your concern about avoiding a full backfill due to large data volumes, here are a few practical alternatives:

    Create the View with backfill=true on a filtered dataset: If possible, segment your data (e.g. one month or week at a time) and create the materialized view using backfill=true with filtering, so only a manageable subset is included during backfill.

    Example:

    
    Copy
    .create materialized-view with (
        lookback=1d,
        lookback_column="CustomDateTime",
        autoUpdateSchema=true,
        backfill=true,
        effectiveDateTime=datetime(2025-01-01)
    ) Mv_TableName_Deduplicated on table TableName
    {
        TableName
        | where CustomDateTime >= datetime(2025-01-01) and CustomDateTime < datetime(2025-02-01)
        | summarize take_any(*) by DuplicateDetectionHash
    }
    

    You can then repeat this process incrementally for historical partitions.

    Use Update Policies or Scheduled Queries Instead: For more control over extent timestamps (and to avoid materialized view limitations), you can:

    • Create a new table with a schema matching your view
    • Use an Update Policy or scheduled KQL job to populate it from the source
    • Use ingest inline or ingest from query to maintain timestamp alignment manually.

    Hope this helps. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. Alex Burlachenko 13,330 Reputation points Volunteer Moderator
    2025-07-15T12:11:17.55+00:00

    hi there,

    thanks for posting this question ))

    let's tackle the core issue. u're seeing wrong extent dates in ur materialized view because the view doesn't inherit the source table's extent timestamps by default. yes, its annoying, but there's a way to fix it without backfilling billions of records )

    when u create or replace the materialized view, add the 'effectiveDateTime' property. this tells ADX to use the source table's timestamps for the view's extents. no need for backfill! https://docs.microsoft.com/en-us/azure/data-explorer/kusto/management/materialized-views/materialized-view-create

    create materialized-view with (lookback=1d, lookback_column='CustomDateTime', autoUpdateSchema=true, effectiveDateTime=true) Mv_TableName_Deduplicated on table TableName { TableName | summarize take_any(*) by DuplicateDetectionHash }

    for aggregated monthly views, same principle applies. just make sure u set effectiveDateTime=true in the view properties. this might help in other tools too when u need consistent timestamp behavior )

    if u already created the view, u can alter it to add this property. no need to rebuild from scratch! its much lighter on resources.

    sometimes the lookback window needs adjustment. if ur data has big time jumps, u might need to tweak that 1d parameter. the docs explain it well here https://docs.microsoft.com/en-us/azure/data-explorer/kusto/management/materialized-views/materialized-view-overview

    when u use lightingest next time, double check the creation time parameter is set exactly how u want it. small things like that can save u headaches later ))

    for ur hot caching policies, this solution should make them work as expected. no more weird timestamp mismatches!

    let me know if this works for u

    Best regards,

    Alex

    and "yes" if you would follow me at Q&A - personaly thx.
    P.S. If my answer help to you, please Accept my answer
    

    https://ctrlaltdel.blog/


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.