Edit

Share via


Quickstart: Deploy an Azure Managed Apache Spark cluster with Azure Databricks

Azure Managed Instance for Apache Cassandra provides automated deployment and scaling operations for managed open-source Apache Cassandra datacenters. This feature accelerates hybrid scenarios and helps to reduce ongoing maintenance.

This quickstart demonstrates how to use the Azure portal to create a fully managed Apache Spark cluster inside the Azure virtual network of your Azure Managed Instance for Apache Cassandra cluster. You create the Spark cluster in Azure Databricks. Later, you can create or attach notebooks to the cluster, read data from different data sources, and analyze insights.

You can also learn more with detailed instructions on Deploy Azure Databricks in your Azure virtual network (virtual network injection).

Prerequisites

If you don't have an Azure subscription, create a free account before you begin.

Create an Azure Databricks cluster

Follow these steps to create an Azure Databricks cluster in a virtual network that has the Azure Managed Instance for Apache Cassandra:

  1. Sign in to the Azure portal.

  2. On the left pane, locate Resource groups. Go to your resource group that contains the virtual network where your managed instance is deployed.

  3. Open the Virtual network resource, and make a note of the Address space.

    Screenshot that shows where to get the address space of your virtual network.

  4. From the resource group, select Add and search for Azure Databricks in the search field.

    Screenshot that shows a search for Azure Databricks.

  5. Select Create to create an Azure Databricks account.

    Screenshot that shows Azure Databricks offering with Create selected.

  6. Enter the following values:

    • Workspace name: Provide a name for your Azure Databricks workspace.
    • Region: Make sure to select the same region as your virtual network.
    • Pricing Tier: Select Standard, Premium, or Trial. For more information on these tiers, see the Azure Databricks pricing page.

    Screenshot that shows a dialog box where you can enter the workspace name, region, and pricing tier for the Azure Databricks account.

  7. Select the Networking tab, and enter the following details:

    • Deploy Azure Databricks workspace in your Virtual Network (VNet): Select Yes.
    • Virtual Network: From the dropdown list, choose the virtual network where your managed instance exists.
    • Public Subnet Name: Enter a name for the public subnet.
    • Public Subnet CIDR Range: Enter an IP range for the public subnet.
    • Private Subnet Name: Enter a name for the private subnet.
    • Private Subnet CIDR Range: Enter an IP range for the private subnet.

    To avoid range collisions, ensure that you select higher ranges. If necessary, use a visual subnet calculator to divide the ranges.

    Screenshot that shows the Visual Subnet Calculator with two highlighted identical network addresses.

    The following screenshot shows example details on the networking pane.

    Screenshot that shows specified public and private subnet names.

  8. Select Review + create, and then select Create to deploy the workspace.

  9. Open the workspace after the workspace is created.

  10. You're redirected to the Azure Databricks portal. From the portal, select New Cluster.

  11. On the New cluster pane, accept default values for all fields other than the following fields:

    • Cluster Name: Enter a name for the cluster.
    • Databricks Runtime Version: We recommend that you select Azure Databricks runtime version 7.5 or later, for Spark 3.x support.

    Screenshot that shows the New Cluster dialog box with an Azure Databricks runtime version selected.

  12. Expand Advanced Options, and add the following configuration. Make sure to replace the node IPs and credentials.

    spark.cassandra.connection.host <node1 IP>,<node 2 IP>, <node IP>
    spark.cassandra.auth.password cassandra
    spark.cassandra.connection.port 9042
    spark.cassandra.auth.username cassandra
    spark.cassandra.connection.ssl.enabled true
    
  13. Add the Apache Spark Cassandra Connector library to your cluster to connect to both native and Azure Cosmos DB Cassandra endpoints. In your cluster, select Libraries > Install New > Maven, and then add com.datastax.spark:spark-cassandra-connector-assembly_2.12:3.0.0 in the Maven Coordinates field.

    Screenshot that shows searching for Maven packages in Azure Databricks.

  14. Select Install.

Clean up resources

If you aren't going to continue to use this managed instance cluster, follow these steps to delete it:

  1. On the left menu of the Azure portal, select Resource groups.
  2. From the list, select the resource group that you created for this quickstart.
  3. On the resource group Overview pane, select Delete resource group.
  4. On the next pane, enter the name of the resource group to delete, and then select Delete.

Next step

In this quickstart, you learned how to create a fully managed Apache Spark cluster inside the virtual network of your Azure Managed Instance for Apache Cassandra cluster. Next, learn how to manage the cluster and datacenter resources.