Share via


Create external Delta tables from external clients

Important

This feature is in Public Preview.

This page provides information on how to create Unity Catalog external tables backed by Delta Lake from external clients and systems.

Note

Databricks recommends using Apache Spark to create external tables to ensure that column definitions are in a format compatible with Apache Spark. The API does not validate the correctness of the column specification. If the specification is not compatible with Apache Spark, then Databricks Runtime might be unable to read the tables.

Requirements

You can create external tables using Apache Spark, the Unity Catalog API, or other external clients.

Create Delta tables using Apache Spark

The following is an example of the settings to configure Apache Spark to create Unity Catalog external Delta tables:

"spark.sql.extensions": "io.delta.sql.DeltaSparkSessionExtension",
"spark.sql.catalog.spark_catalog": "io.unitycatalog.spark.UCSingleCatalog",
"spark.hadoop.fs.s3.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem",
"spark.sql.catalog.<uc-catalog-name>": "io.unitycatalog.spark.UCSingleCatalog",
"spark.sql.catalog.<uc-catalog-name>.uri": "<workspace-url>/api/2.1/unity-catalog",
"spark.sql.catalog.<uc-catalog-name>.token": "<token>",
"spark.sql.defaultCatalog": "<uc-catalog-name>",

Substitute the following variables:

  • <uc-catalog-name>: The name of the catalog in Unity Catalog that contains your tables.
  • <workspace-url>: URL of the Azure Databricks workspace.
  • <token>: OAuth token for the principal configuring the integration.

For Apache Spark and Delta Lake to work together with Unity Catalog, you will need at least Apache Spark 3.5.3 and Delta Lake 3.2.1.

Include the following dependencies when launching Apache Spark:

--packages "org.apache.hadoop:hadoop-aws:3.3.4,\
io.delta:delta-spark_2.12:3.2.1,\
io.unitycatalog:unitycatalog-spark_2.12:0.2.0"

Now you can create external tables using SQL:

CREATE TABLE <uc-catalog-name>.<schema-name>.<table-name> (id INT, desc STRING)
USING delta
LOCATION <path>;

Create Delta tables using the API

To create an external Delta table using the Unity Catalog REST API, follow these steps:

Step 1: Make a POST request to the Create Table API

Use the following API request to register the table metadata in Unity Catalog:

curl --location --request POST 'https://<workspace-url>/api/2.0/unity-catalog/tables/' \
--header 'Authorization: Bearer <token>' \
--header 'Content-Type: application/json' \
--data '{
  "name": "<table-name>",
  "catalog_name": "<uc-catalog-name>",
  "schema_name": "<schema-name>",
  "table_type": "EXTERNAL",
  "data_source_format": "DELTA",
  "storage_location": "<path>",
  "columns": [
    {
      "name": "id",
      "type_name": "LONG",
      "type_text": "bigint",
      "type_json": "\"long\"",
      "type_precision": 0,
      "type_scale": 0,
      "position": 0,
      "nullable": true
    },
    {
      "name": "name",
      "type_name": "STRING",
      "type_text": "string",
      "type_json": "\"string\"",
      "type_precision": 0,
      "type_scale": 0,
      "position": 1,
      "nullable": true
    }
  ]
}'

Substitute the following variables:

  • <workspace-url>: URL of the Azure Databricks workspace
  • <token>: Token for the principal making the API call
  • <uc-catalog-name>: Name of the catalog in Unity Catalog that will contain the external table
  • <schema-name>: Name of the schema within the catalog where the table will be created
  • <table-name>: Name of the external table
  • <path>: Fully qualified path to the table data

Step 2: Initialize the Delta table location

The API call above registers the table in :[UC], but it does not create the Delta files at the storage location. To initialize the table location, write an empty Delta table using Spark:

The schema used in this step must exactly match the column definitions provided in the API request.


from pyspark.sql.types import StructType, StructField, StringType, LongType

# Define schema matching your API call
schema = StructType([
    StructField("id", LongType(), True),
    StructField("name", StringType(), True)
])

# Create an empty DataFrame and initialize the Delta table
empty_df = spark.createDataFrame([], schema)
empty_df.write \
    .format("delta") \
    .mode("overwrite") \
    .save("<path>")

Note

The Create Table API for external clients has the following limitations:

  • Only external Delta tables are supported ("table_type": "EXTERNAL" and "data_source_format": "DELTA").
  • Only the following fields are allowed:
    • name
    • catalog_name
    • schema_name
    • table_type
    • data_source_format
    • columns
    • storage_location
    • properties
  • Column masks are not supported.