Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Important
This feature is in Public Preview.
This page provides information on how to create Unity Catalog external tables backed by Delta Lake from external clients and systems.
Note
Databricks recommends using Apache Spark to create external tables to ensure that column definitions are in a format compatible with Apache Spark. The API does not validate the correctness of the column specification. If the specification is not compatible with Apache Spark, then Databricks Runtime might be unable to read the tables.
Requirements
Enable External data access for your metastore. See Enable external data access on the metastore.
Grant the principal configuring the integration the folllowing privileges
EXTERNAL USE SCHEMA
privilege on the schema containing the objects.EXTERNAL USE LOCATION
privilege on the external location containing the path. See Grant a principal Unity Catalog privileges.CREATE TABLE
permission on the table,CREATE EXTERNAL TABLE
on the external location,USE CATALOG
on its parent catalog, andUSE SCHEMA
on its parent schema.
You can create external tables using Apache Spark, the Unity Catalog API, or other external clients.
Create Delta tables using Apache Spark
The following is an example of the settings to configure Apache Spark to create Unity Catalog external Delta tables:
"spark.sql.extensions": "io.delta.sql.DeltaSparkSessionExtension",
"spark.sql.catalog.spark_catalog": "io.unitycatalog.spark.UCSingleCatalog",
"spark.hadoop.fs.s3.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem",
"spark.sql.catalog.<uc-catalog-name>": "io.unitycatalog.spark.UCSingleCatalog",
"spark.sql.catalog.<uc-catalog-name>.uri": "<workspace-url>/api/2.1/unity-catalog",
"spark.sql.catalog.<uc-catalog-name>.token": "<token>",
"spark.sql.defaultCatalog": "<uc-catalog-name>",
Substitute the following variables:
<uc-catalog-name>
: The name of the catalog in Unity Catalog that contains your tables.<workspace-url>
: URL of the Azure Databricks workspace.<token>
: OAuth token for the principal configuring the integration.
For Apache Spark and Delta Lake to work together with Unity Catalog, you will need at least Apache Spark 3.5.3 and Delta Lake 3.2.1.
Include the following dependencies when launching Apache Spark:
--packages "org.apache.hadoop:hadoop-aws:3.3.4,\
io.delta:delta-spark_2.12:3.2.1,\
io.unitycatalog:unitycatalog-spark_2.12:0.2.0"
Now you can create external tables using SQL:
CREATE TABLE <uc-catalog-name>.<schema-name>.<table-name> (id INT, desc STRING)
USING delta
LOCATION <path>;
Create Delta tables using the API
To create an external Delta table using the Unity Catalog REST API, follow these steps:
Step 1: Make a POST request to the Create Table API
Use the following API request to register the table metadata in Unity Catalog:
curl --location --request POST 'https://<workspace-url>/api/2.0/unity-catalog/tables/' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '{
"name": "<table-name>",
"catalog_name": "<uc-catalog-name>",
"schema_name": "<schema-name>",
"table_type": "EXTERNAL",
"data_source_format": "DELTA",
"storage_location": "<path>",
"columns": [
{
"name": "id",
"type_name": "LONG",
"type_text": "bigint",
"type_json": "\"long\"",
"type_precision": 0,
"type_scale": 0,
"position": 0,
"nullable": true
},
{
"name": "name",
"type_name": "STRING",
"type_text": "string",
"type_json": "\"string\"",
"type_precision": 0,
"type_scale": 0,
"position": 1,
"nullable": true
}
]
}'
Substitute the following variables:
<workspace-url>
: URL of the Azure Databricks workspace<token>
: Token for the principal making the API call<uc-catalog-name>
: Name of the catalog in Unity Catalog that will contain the external table<schema-name>
: Name of the schema within the catalog where the table will be created<table-name>
: Name of the external table<path>
: Fully qualified path to the table data
Step 2: Initialize the Delta table location
The API call above registers the table in :[UC], but it does not create the Delta files at the storage location. To initialize the table location, write an empty Delta table using Spark:
The schema used in this step must exactly match the column definitions provided in the API request.
from pyspark.sql.types import StructType, StructField, StringType, LongType
# Define schema matching your API call
schema = StructType([
StructField("id", LongType(), True),
StructField("name", StringType(), True)
])
# Create an empty DataFrame and initialize the Delta table
empty_df = spark.createDataFrame([], schema)
empty_df.write \
.format("delta") \
.mode("overwrite") \
.save("<path>")
Note
The Create Table API for external clients has the following limitations:
- Only external Delta tables are supported (
"table_type": "EXTERNAL"
and"data_source_format": "DELTA"
). - Only the following fields are allowed:
name
catalog_name
schema_name
table_type
data_source_format
columns
storage_location
properties
- Column masks are not supported.