Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
The Lakeflow Declarative Pipelines event log contains all information related to a pipeline, including audit logs, data quality checks, pipeline progress, and data lineage.
The following tables describe the event log schema. Some of these fields contain JSON data that require parsing to perform some queries, such as the details
field. Azure Databricks supports the :
operator to parse JSON fields. See :
(colon sign) operator.
Note
Some fields in the event log are for internal use by Azure Databricks. The following documentation describes the fields that are intended for customer consumption.
For details about using the Lakeflow Declarative Pipelines event log, see Lakeflow Declarative Pipelines event log.
PipelineEvent object
Represents a single pipeline event in the event log.
Field | Description |
---|---|
id |
A unique identifier for the event log record. |
sequence |
A JSON string containing metadata to identify and order events. |
origin |
A JSON string containing metadata for the origin of the event, for example, the cloud provider, the cloud provider region, user, and pipeline information. See Origin object. |
timestamp |
The time the event was recorded, in UTC. |
message |
A human-readable message describing the event. |
level |
The warning level. The possible values are:
|
maturity_level |
The stability of the event schema. The possible values are:
It is not recommended to build monitoring or alerts based on EVOLVING or DEPRECATED fields. |
error |
If an error occurred, details describing the error. |
details |
A JSON string containing structured details of the event. This is the primary field used for analyzing events. The JSON string format depends on the event_type . See The details object for more information. |
event_type |
The event type. For a list of event types, and what details object type they create, see The details object. |
The details object
Each event has different details
properties in the JSON object, based on the event_type
of the event. This table lists the event_type
, and the associated details
. The details
properties are described in the Details types section.
Details type by event_type |
Description |
---|---|
create_update |
Captures the complete configuration that is used to start a pipeline update. Includes any configuration set by Databricks. For details, see Details for create_update. |
user_action |
Provides details on any user action on the pipeline (including creating a pipeline, as well as starting or canceling an update). For details, see Details for user_action event. |
flow_progress |
Describes the lifecycle of a flow from starting, running, to completed or failed. For details, see Details for flow_progress event. |
update_progress |
Describes the lifecycle of a pipeline update from starting, running, to completed, or failed. For details, see Details for update_progress event. |
flow_definition |
Defines the schema and query plan for any transformations occurring in a given flow. Can be thought of as the edges of the Dataflow DAG. It can be used to calculate the lineage for each flow as well as to see the explained query plan. For details, see Details for flow_definition event. |
dataset_definition |
Defines a dataset, which is either the source or the destination for a given flow. For details, see Details for dataset_definition event. |
sink_definition |
Defines a given sink. For details see Details for sink_definition event. |
deprecation |
Lists features that are soon to be or currently deprecated that this pipeline uses. For examples of the values, see Details enum for deprecation event. |
cluster_resources |
Includes information about cluster resources for pipelines that are running on classic compute. These metrics are only populated for classic compute pipelines. For details, see Details for cluster_resources event. |
autoscale |
Includes information about autoscaling for pipelines that are running on classic compute. These metrics are only populated for classic compute pipelines. For details, see Details for autoscale event. |
planning_information |
Represents planning information related to materialized view incremental vs. full refresh. Can be used to get more details on why a materialized view is fully recomputed. For details, see Details for planning_information event. |
hook_progress |
An event to indicate the current status of a user hook during the pipeline run. Used for monitoring the status of event hooks, for example, to send to external observability products. For details, see Details for hook_progress event. |
operation_progress |
Includes information about the progress of an operation. For details, see Details for operation_progress event. |
Details types
The following objects represent the details
of a different event type in the PipelineEvent
object.
Details for create_update
The details for the create_update
event.
Field | Description |
---|---|
dbr_version |
The version of the Databricks Runtime. |
run_as |
The user ID that the update will run on behalf of. Typically this is either the owner of the pipeline or a service principal. |
cause |
The reason for the update. Typically either JOB_TASK if run from a job, or USER_ACTION when run interactively by a user. |
Details for user_action event
The details for the user_action
event. Includes the following fields:
Field | Description |
---|---|
user_name |
The name of the user that triggered a pipeline update. |
user_id |
The ID of the user that triggered a pipeline update. This is not always the same as the run_as user, which could be a service principal or other user. |
action |
The action the user took, including START and CREATE . |
Details for flow_progress event
The details for a flow_progress
event.
Field | Description |
---|---|
status |
The new status of the flow. Can be one of:
|
metrics |
Metrics about the flow. For details, see FlowMetrics. |
data_quality |
Data quality metrics about the flow and associated expectations. For details, see DataQualityMetrics. |
Details for update_progress event
The details for an update_progress
event.
Field | Description |
---|---|
state |
The new state of the update. Can be one of:
Useful for calculating the duration of various stages of a pipeline update from total duration to time spent waiting for resources, for example. |
cancellation_cause |
The reason why an update entered the CANCELED state. Includes reasons such as USER_ACTION or WORKFLOW_CANCELLATION (the workflow that triggered the update was canceled). |
Details for flow_definition event
The details for a flow_definition
event.
Field | Description |
---|---|
input_datasets |
The inputs read by this flow. |
output_dataset |
The output dataset this flow writes to. |
output_sink |
The output sink this flow writes to. |
explain_text |
The explained query plan. |
schema_json |
Spark SQL JSON schema string. |
schema |
Schema of this flow. |
flow_type |
The type of flow. Can be one of:
|
comment |
User comment or description about the dataset. |
spark_conf |
Spark confs set on this flow. |
language |
The language used to create this flow. Can be SCALA , PYTHON , or SQL . |
once |
Whether this flow was declared to run once. |
Details for dataset_definition event
The details for a dataset_definition
event. Includes the following fields:
Field | Description |
---|---|
dataset_type |
Differentiates between materialized views and streaming tables. |
num_flows |
The number of flows writing to the dataset. |
expectations |
The expectations associated with the dataset. |
Details for sink_definition event
The details for a sink_definition
event.
Field | Description |
---|---|
format |
The format of the sink. |
options |
The key-value options associated with the sink. |
Details enum for deprecation event
The deprecation
event has a message
field. The possible values for the message
include the following. This is a partial list that grows over time.
Field | Description |
---|---|
TABLE_MANAGED_BY_MULTIPLE_PIPELINES |
A table is managed by multiple pipelines. |
INVALID_CLUSTER_LABELS |
Using cluster labels that are not supported. |
PINNED_DBR_VERSION |
Using dbr_version instead of channel in pipeline settings. |
PREVIOUS_CHANNEL_USED |
Using the release channel PREVIOUS , which might go away in a future release. |
LONG_DATASET_NAME |
Using a data set name longer than the supported length. |
LONG_SINK_NAME |
Using a sink name longer than the supported length. |
LONG_FLOW_NAME |
Using a flow name longer than the supported length. |
ENHANCED_AUTOSCALING_POLICY_COMPLIANCE |
Cluster policy only complies when Enhanced Autoscaling uses fixed cluster size. |
DATA_SAMPLE_CONFIGURATION_KEY |
Using the configuration key to configure data sampling is deprecated. |
INCOMPATIBLE_CLUSTER_SETTINGS |
Current cluster settings or cluster policy are no longer compatible with Lakeflow Declarative Pipelines. |
STREAMING_READER_OPTIONS_DROPPED |
Using streaming reader options that are dropped. |
DISALLOWED_SERVERLESS_STATIC_SPARK_CONFIG |
Setting static Spark configs through pipeline configuration for serverless pipelines is not allowed. |
INVALID_SERVERLESS_PIPELINE_CONFIG |
Serverless customer provides invalid pipeline configuration. |
UNUSED_EXPLICIT_PATH_ON_UC_MANAGED_TABLE |
Specifying unused explicit table paths on UC managed tables. |
FOREACH_BATCH_FUNCTION_NOT_SERIALIZABLE |
The provided foreachBatch function is not serializable. |
DROP_PARTITION_COLS_NO_PARTITIONING |
Dropping the partition_cols attribute results in no partitioning. |
PYTHON_CREATE_TABLE |
Using @dlt.create\_table instead of @dlt.table. |
PYTHON_CREATE_VIEW |
Using @dlt.create\_view instead of @dlt.view. |
PYTHON_CREATE_STREAMING_LIVE_TABLE |
Using create_streaming_live_table instead of create_streaming_table . |
PYTHON_CREATE_TARGET_TABLE |
Using create_target_table instead of create_streaming_table . |
FOREIGN_KEY_TABLE_CONSTRAINT_CYCLE |
Set of tables managed by pipeline has a cycle in the set of foreign key constraints. |
PARTIALLY_QUALIFIED_TABLE_REFERENCE_INCOMPATIBLE_WITH_DEFAULT_PUBLISHING_MODE |
A partially qualified table reference that has different meanings in default publishing mode and legacy publishing mode. |
Details for cluster_resources event
The details for a cluster_resources
event. Only applicable for pipelines running on classic compute.
Field | Description |
---|---|
task_slot_metrics |
The task slot metrics of the cluster. For details, see TaskSlotMetrics object |
autoscale_info |
The state of autoscalers. For details, see AutoscaleInfo object |
Details for autoscale event
The details for an autoscale
event. Autoscale events are only applicable when the pipeline uses classic compute.
Field | Description |
---|---|
status |
Status of this event. Can be one of:
|
optimal_num_executors |
The optimal number of executors suggested by the algorithm before applying min_workers and max_workers bounds. |
requested_num_executors |
The number of executors after truncating the optimal number of executors suggested by the algorithm to min_workers and max_workers bounds. |
Details for planning_information event
The details for a planning_information
event. Useful for seeing details related to the chosen refresh type for a given flow during an update. Can be used to help debug why an update is fully refreshed rather than incrementally refreshed. For more details on incremental refreshes, see Incremental refresh for materialized views
Field | Description |
---|---|
technique_information |
Refresh-related information. It includes both information on what refresh methodology was chosen and the possible refresh methodologies that were considered. Useful for debugging why a materialized view failed to incrementalize. For more details, see TechniqueInformation. |
source_table_information |
Source table information. Can be useful for debugging why a materialized view failed to incrementalize. For details, see TableInformation object. |
target_table_information |
Target table information. For details, see TableInformation object. |
Details for hook_progress event
The details of a hook_progress
event. Includes the following fields:
Field | Description |
---|---|
name |
The name of the user hook. |
status |
The status of the user hook. |
Details for operation_progress event
The details of an operation_progress
event. Includes the following fields:
Field | Description |
---|---|
type |
The type of operation being tracked. One of:
|
status |
The status of the operation. One of:
|
duration_ms |
The total elapsed time of the operation in milliseconds. Only included in the end event (where status is COMPLETED , CANCELED , or FAILED ). |
Other objects
The following objects represent additional data or enums within the event objects.
AutoscaleInfo object
The autoscale metrics for a cluster. Only applicable for pipelines running on classic compute.
Field | Description |
---|---|
state |
The Autoscaling status. Can be one of:
|
optimal_num_executors |
The optimal number of executors. This is the optimal size suggested by the algorithm before being truncated by the user-specified min/max number of executors. |
latest_requested_num_executors |
The number of executors requested from the cluster manager by the state manager in the latest request. This is the number of executors the state manager is trying to scale to, and is updated when the state manager attempts to exit the scaling state in the event of timeouts. This field is not populated if there is no pending request. |
request_pending_seconds |
The length of time the scaling request has been pending. This is not populated if there is no pending request. |
CostModelRejectionSubType object
An enum of reasons that incrementalization is rejected, based on cost of full refresh versus incremental refresh in a planning_information
event.
Value | Description |
---|---|
NUM_JOINS_THRESHOLD_EXCEEDED |
Fully refresh because the query contains too many joins. |
CHANGESET_SIZE_THRESHOLD_EXCEEDED |
Fully refresh because too many rows in the base tables changed. |
TABLE_SIZE_THRESHOLD_EXCEEDED |
Fully refresh because the base table size exceeded the threshold. |
EXCESSIVE_OPERATOR_NESTING |
Fully refresh because the query definition is complex and has many levels of operator nesting. |
COST_MODEL_REJECTION_SUB_TYPE_UNSPECIFIED |
Fully refresh for any other reason. |
DataQualityMetrics object
Metrics about how expectations are being met within the flow. Used in a flow_progress
event details.
Field | Description |
---|---|
dropped_records |
The number of records that were dropped because they failed one or more expectations. |
expectations |
Metrics for expectations added to any dataset in the flow's query plan. When there are multiple expectations, this can be used to track which expectations were met or failed. For details, see ExpectationMetrics object. |
ExpectationMetrics object
Metrics about expectations, for a specific expectation.
Field | Description |
---|---|
name |
The name of the expectation. |
dataset |
The name of the dataset to which the expectation was added. |
passed_records |
The number of records that pass the expectation. |
failed_records |
The number of records that fail the expectation. Tracks whether the expectation was met, but does not describe what happens to the records (warn, fail, or drop the records). |
FlowMetrics object
Metrics about the flow, including both total for the flow, and broken out by specific source. Used in a flow_progress
event details.
Each streaming source supports only specific flow metrics. The following table shows the metrics available for supported streaming sources:
source | backlog bytes | backlog records | backlog seconds | backlog files |
---|---|---|---|---|
Kafka | ✓ | ✓ | ||
Kinesis | ✓ | ✓ | ||
Delta | ✓ | ✓ | ||
Auto Loader | ✓ | ✓ | ||
Google Pub/Sub | ✓ | ✓ |
Field | Description |
---|---|
num_output_rows |
Number of output rows written by an update of this flow. |
backlog_bytes |
Total backlog as bytes across all input sources in the flow. |
backlog_records |
Total backlog records across all input sources in the flow. |
backlog_files |
Total backlog files across all input sources in the flow. |
backlog_seconds |
Maximum backlog seconds across all input sources in the flow. |
executor_time_ms |
Sum of all task execution times in milliseconds of this flow over the reporting period. |
executor_cpu_time_ms |
Sum of all task execution CPU times in milliseconds of this flow over the reporting period. |
num_upserted_rows |
Number of output rows upserted into the dataset by an update of this flow. |
num_deleted_rows |
Number of existing output rows deleted from the dataset by an update of this flow. |
num_output_bytes |
Number of output bytes written by an update of this flow. |
source_metrics |
Metrics for each input source in the flow. Useful for monitoring ingestion progress from sources outside Lakeflow Declarative Pipelines (like Apache Kafka, Pulsar, or Auto Loader). Includes the fields:
|
IncrementalizationIssue object
Represents issues with incrementalization that could cause a full refresh when planning an update.
Field | Description |
---|---|
issue_type |
An issue type that could prevent the materialized view from incrementalizing. For details, see Issue Type. |
prevent_incrementalization |
Whether this issue prevented the incrementalization from happening. |
table_information |
Table information associated with issues like CDF_UNAVAILABLE , INPUT_NOT_IN_DELTA , DATA_FILE_MISSING . |
operator_name |
Plan-related information. Set for issues when the issue type is either PLAN_NOT_DETERMINISTIC or PLAN_NOT_INCREMENTALIZABLE to the operator or expression that causes the non-determinism or non-incrementalizability. |
expression_name |
The expression name. |
join_type |
Auxiliary information when the operator is a join. For example, JOIN_TYPE_LEFT_OUTER or JOIN_TYPE_INNER . |
plan_not_incrementalizable_sub_type |
Detailed category when the issue type is PLAN_NOT_INCREMENTALIZABLE . For details, see PlanNotIncrementalizableSubType object. |
plan_not_deterministic_sub_type |
Detailed category when the issue type is PLAN_NOT_DETERMINISTIC . For details, see PlanNotDeterministicSubType object. |
fingerprint_diff_before |
The diff from the fingerprint before. |
fingerprint_diff_current |
The diff from the current fingerprint. |
cost_model_rejection_subtype |
Detailed category when the issue type is INCREMENTAL_PLAN_REJECTED_BY_COST_MODEL . For details, see CostModelRejectionSubType object. |
IssueType object
An enum of issue types that could cause a full refresh.
Value | Description |
---|---|
CDF_UNAVAILABLE |
CDF (Change Data Feed) is not enabled on some base tables. The table_information field gives information on which table does not have CDF enabled. Use ALTER TABLE <table-name> SET TBLPROPERTIES ( 'delta.enableChangeDataFeed' = true) to enable CDF for the base table. If source table is a materialized view, CDF should be set to ON by default. |
DELTA_PROTOCOL_CHANGED |
Fully refresh because some base tables (details in the table_information field) had a Delta protocol change. |
DATA_SCHEMA_CHANGED |
Fully refresh because some base tables (details in the table_information field) had a data schema change in the columns used by the materialized view definition. Not relevant if a column that the materialized view does not use has been changed or added to the base table. |
PARTITION_SCHEMA_CHANGED |
Fully refresh because some base tables (details in the table_information field) had a partition schema change. |
INPUT_NOT_IN_DELTA |
Fully refresh because the materialized view definition involves some non-Delta input. |
DATA_FILE_MISSING |
Fully refresh because some base table files are already vacuumed due to their retention period. |
PLAN_NOT_DETERMINISTIC |
Fully refresh because some operators or expressions in the materialized view definition are not deterministic. The operator_name and expression_name fields give information on which operator or expression caused the issue. |
PLAN_NOT_INCREMENTALIZABLE |
Fully refresh because some operators or expressions in the materialized view definition are not incrementalizable. |
SERIALIZATION_VERSION_CHANGED |
Fully refresh because there was a significant change in the query fingerprinting logic. |
QUERY_FINGERPRINT_CHANGED |
Fully refresh because the materialized view definition changed, or Lakeflow Declarative Pipelines releases caused a change in the query evaluation plans. |
CONFIGURATION_CHANGED |
Fully refresh because key configurations (for example, spark.sql.ansi.enabled ) that might affect query evaluation have changed. Full recompute is required to avoid inconsistent states in the materialized view. |
CHANGE_SET_MISSING |
Fully refresh because it is the first compute of the materialized view. This is expected behavior for initial materialized view computation. |
EXPECTATIONS_NOT_SUPPORTED |
Fully refresh because the materialized view definition includes expectations, which are not supported for incremental updates. Remove expectations or handle them outside of the materialized view definition if incremental support is needed. |
TOO_MANY_FILE_ACTIONS |
Fully refresh because the number of file actions exceeded the threshold for incremental processing. Consider reducing file churn in base tables or increasing thresholds. |
INCREMENTAL_PLAN_REJECTED_BY_COST_MODEL |
Fully refresh because the cost model determined that a full refresh is more efficient than incremental maintenance. Review the cost model behavior or complexity of the query plan to allow incremental updates. |
ROW_TRACKING_NOT_ENABLED |
Fully refresh because row tracking is not enabled on one or more base tables. Enable row tracking using ALTER TABLE <table-name> SET TBLPROPERTIES ('delta.enableRowTracking' = true) . |
TOO_MANY_PARTITIONS_CHANGED |
Fully refresh because too many partitions changed in the base tables. Try to limit the number of partition changes to stay within incremental processing limits. |
MAP_TYPE_NOT_SUPPORTED |
Fully refresh because the materialized view definition includes a map type, which is not supported for incremental updates. Consider restructuring the data to avoid map types in the materialized view. |
TIME_ZONE_CHANGED |
Fully refresh because the session or system time zone setting changed. |
DATA_HAS_CHANGED |
Fully refresh because the data relevant to the materialized view changed in a way that prevents incremental updates. Evaluate the data changes and structure of the view definition to ensure compatibility with incremental logic. |
PRIOR_TIMESTAMP_MISSING |
Fully refresh because the timestamp of the last successful run is missing. This can occur after metadata loss or manual intervention. |
MaintenanceType object
An enum of maintenance types that might be chosen during a planning_information
event. If the type is not MAINTENANCE_TYPE_COMPLETE_RECOMPUTE
or MAINTENANCE_TYPE_NO_OP
, the type is an incremental refresh.
Value | Description |
---|---|
MAINTENANCE_TYPE_COMPLETE_RECOMPUTE |
Full recompute; always shown. |
MAINTENANCE_TYPE_NO_OP |
When base tables do not change. |
MAINTENANCE_TYPE_PARTITION_OVERWRITE |
Incrementally refresh affected partitions when the materialized view is co-partitioned with one of the source tables. |
MAINTENANCE_TYPE_ROW_BASED |
Incrementally refresh by creating modular changesets for various operations, such as JOIN , FILTER , and UNION ALL, and composing them to calculate complex queries. Used when Row tracking for the source tables is enabled, and there is a limited number of joins for the query. |
MAINTENANCE_TYPE_APPEND_ONLY |
Incrementally refresh by only computing new rows because there were no upserts or deletes in the source tables. |
MAINTENANCE_TYPE_GROUP_AGGREGATE |
Incrementally refresh by calculating changes for each aggregate value. Used when associative aggregates, such as count , sum , mean , and stddev , are at the topmost level of the query. |
MAINTENANCE_TYPE_GENERIC_AGGREGATE |
Incrementally refresh by calculating only the affected aggregate groups. Used when aggregates like median (not just associative ones) are at the topmost level of the query. |
MAINTENANCE_TYPE_WINDOW_FUNCTION |
Incrementally refresh queries with window functions like PARTITION BY by recomputing only the changed partitions. Used when all of the window functions have a PARTITION BY or JOIN clause and are at the topmost level of the query. |
Origin object
Where the event originated.
Field | Description |
---|---|
cloud |
The cloud provider. The possible values are:
|
region |
The cloud region. |
org_id |
The org id or workspace ID of the user. Unique within a cloud. Useful to identify the workspace, or to join with other tables, such as system billing tables. |
pipeline_id |
The id of the pipeline. A unique identifier for the pipeline. Useful to identify the pipeline, or to join with other tables, such as system billing tables. |
pipeline_type |
The type of the pipeline to show where the pipeline was created. The possible values are:
|
pipeline_name |
The name of the pipeline. |
cluster_id |
The id of the cluster where an execution happens. Globally unique. |
update_id |
The id of a single execution of the pipeline. This is equivalent to run ID. |
table_name |
The name of the (Delta) table being written to. |
dataset_name |
The fully qualified name of a dataset. |
sink_name |
The name of a sink. |
flow_id |
The id of the flow. It tracks the state of the flow being used across multiple updates. As long as the flow_id is the same, the flow is incrementally refreshing. The flow_id changes when the materialized view full refreshes, the checkpoint resets, or a full recomputation occurs within the materialized view. |
flow_name |
The name of the flow. |
batch_id |
The id of a microbatch. Unique within a flow. |
request_id |
The id of the request that caused an update. |
PlanNotDeterministicSubType object
An enum of non-deterministic cases for a planning_information
event.
Value | Description |
---|---|
STREAMING_SOURCE |
Fully refresh because the materialized view definition includes a streaming source, which is not supported. |
USER_DEFINED_FUNCTION |
Fully refresh because the materialized view includes an unsupported user-defined function. Only deterministic Python UDFs are supported. Other UDFs might prevent incremental updates. |
TIME_FUNCTION |
Fully refresh because the materialized view includes a time-based function such as CURRENT_DATE or CURRENT_TIMESTAMP . The expression_name property provides the name of the unsupported function. |
NON_DETERMINISTIC_EXPRESSION |
Fully refresh because the query includes a non-deterministic expression such as RANDOM() . The expression_name property indicates the non-deterministic function that prevents incremental maintenance. |
PlanNotIncrementalizableSubType object
An enum of reasons an update plan might not be incrementalizable.
Value | Description |
---|---|
OPERATOR_NOT_SUPPORTED |
Fully refresh because the query plan includes an unsupported operator. The operator_name property provides the name of the unsupported operator. |
AGGREGATE_NOT_TOP_NODE |
Fully refresh because an aggregate (GROUP BY ) operator is not at the top level of the query plan. Incremental maintenance supports aggregates only at the top level. Consider defining two materialized views to separate the aggregation. |
AGGREGATE_WITH_DISTINCT |
Fully refresh because the aggregation includes a DISTINCT clause, which is not supported for incremental updates. |
AGGREGATE_WITH_UNSUPPORTED_EXPRESSION |
Fully refresh because the aggregation includes unsupported expressions. The expression_name property indicates the problematic expression. |
SUBQUERY_EXPRESSION |
Fully refresh because the materialized view definition includes a subquery expression, which is not supported. |
WINDOW_FUNCTION_NOT_TOP_LEVEL |
Fully refresh because a window function is not at the top level of the query plan. |
WINDOW_FUNCTION_WITHOUT_PARTITION_BY |
Fully refresh because a window function is defined without a PARTITION BY clause. |
TableInformation object
Represents details of a table considered during a planning_information
event.
Field | Description |
---|---|
table_name |
Table name used in the query from Unity Catalog or Hive metastore. Might not be available in case of path-based access. |
table_id |
Required. Table ID from the Delta log. |
catalog_table_type |
Type of the table as specified in the catalog. |
partition_columns |
Partition columns of the table. |
table_change_type |
Change type in the table. One of: TABLE_CHANGE_TYPE_UNKNOWN , TABLE_CHANGE_TYPE_APPEND_ONLY , TABLE_CHANGE_TYPE_GENERAL_CHANGE . |
full_size |
The full size of the table in number of bytes. |
change_size |
Size of the changed rows in changed files. It is calculated using change_file_read_size * num_changed_rows / num_rows_in_changed_files . |
num_changed_partitions |
Number of changed partitions. |
is_size_after_pruning |
Whether full_size and change_size represent data after static file pruning. |
is_row_id_enabled |
Whether row ID is enabled on the table. |
is_cdf_enabled |
Whether CDF is enabled on the table. |
is_deletion_vector_enabled |
Whether deletion vector is enabled on the table. |
is_change_from_legacy_cdf |
Whether the table change is from legacy CDF or row-ID-based CDF. |
TaskSlotMetrics object
The task slot metrics for a cluster. Only applies to pipeline updates running on classic compute.
Field | Description |
---|---|
summary_duration_ms |
The duration in milliseconds over which aggregate metrics (for example, avg_num_task_slots ) are calculated. |
num_task_slots |
The number of Spark task slots at the reporting instant. |
avg_num_task_slots |
The average number of Spark task slots over summary duration. |
avg_task_slot_utilization |
The average task slot utilization (number of active tasks divided by number of task slots) over summary duration. |
num_executors |
The number of Spark executors at the reporting instant. |
avg_num_queued_tasks |
The average task queue size (number of total tasks minus number of active tasks) over summary duration. |
TechniqueInformation object
Refresh methodology information for a planning event.
Field | Description |
---|---|
maintenance_type |
Maintenance type related to this piece of information. If the type is not MAINTENANCE_TYPE_COMPLETE_RECOMPUTE or MAINTENANCE_TYPE_NO_OP , the flow incrementally refreshed.For details, see MaintenanceType object. |
is_chosen |
True for the technique that was chosen for the refresh. |
is_applicable |
Whether the maintenance type is applicable. |
incrementalization_issues |
Incrementalization issues that might cause an update to fully refresh. For details, see IncrementalizationIssue object. |
change_set_information |
Information about the final produced change set. Values are one of:
|