Edit

Share via


How to copy data using copy activity

In a Data Pipeline, you can use the Copy activity to copy data between data stores in the cloud. After you copy the data, you can use other activities in your pipeline to transform and analyze it.

The Copy activity connects to your data sources and destinations, then moves data efficiently between them. Here's how the service handles the copy process:

  1. Connects to your source: Creates a secure connection to read data from your source data store.
  2. Processes the data: Handles serialization/deserialization, compression/decompression, column mapping, and data type conversions based on your configuration.
  3. Writes to destination: Transfers the processed data to your destination data store.
  4. Provides monitoring: Tracks the copy operation and provides detailed logs and metrics for troubleshooting and optimization.

Tip

If you only need to copy your data and don't need transformations, a Copy job might be a better option for you. Copy jobs provide a simplified experience for data movement scenarios that don't require creating a full data pipeline. See: the Copy jobs overview or use our decision table to compare Copy activity and Copy job.

Prerequisites

To get started, you need to complete these prerequisites:

  • A Microsoft Fabric tenant account with an active subscription. Create an account for free.
  • A Microsoft Fabric enabled Workspace.

Add a copy activity using copy assistant

Follow these steps to set up your copy activity using copy assistant.

Start with copy assistant

  1. Open an existing data pipeline or create a new data pipeline.

  2. Select Copy data on the canvas to open the Copy Assistant tool to get started. Or select Use copy assistant from the Copy data drop down list under the Activities tab on the ribbon.

    Screenshot showing options for opening the copy assistant.

Configure your source

  1. Select a data source type from the category. You'll use Azure Blob Storage as an example. Select Azure Blob Storage.

    Screenshot of Choose data source screen.

  2. Create a connection to your data source by selecting Create new connection.

    Screenshot showing where to select New connection.

    After you select Create new connection, fill in the required connection information and then select Next. For the details of connection creation for each type of data source, you can refer to each connector article.

    If you already have connections, you can select Existing connection and select your connection from the drop-down list.

    Screenshot showing the existing connection.

  3. Choose the file or folder to be copied in this source configuration step, and then select Next.

    Screenshot showing where to select the data to be copied.

Configure your destination

  1. Select a data source type from the category. You'll use Azure Blob Storage as an example. You can either create a new connection that links to a new Azure Blob Storage account by following the steps in the previous section or use an existing connection from the connection drop-down list. The Test connection and Edit capabilities are available for each selected connection.

    Screenshot showing how to select Azure Blob Storage.

  2. Configure and map your source data to your destination. Then select Next to finish your destination configurations.

    Screenshot of Map to destination screen.

    Screenshot of Connect to data destination.

    Note

    You can only use a single on-premises data gateway within the same Copy activity. If both source and sink are on-premises data sources, they need to use the same gateway. To move data between on-premises data sources with different gateways, you need to copy using the first gateway to an intermediate cloud source in one Copy activity. Then you can use another Copy activity to copy it from the intermediate cloud source using the second gateway.

Review and create your copy activity

  1. Review your copy activity settings in the previous steps and select OK to finish. Or you can go back to the previous steps to edit your settings if needed in the tool.

    Screenshot showing the Review and create screen.

Once finished, the copy activity will then be added to your data pipeline canvas. All settings, including advanced settings to this copy activity, are available under the tabs when it’s selected.

Screenshot showing a copy activity on the data pipeline canvas.

Now you can either save your data pipeline with this single copy activity or continue to design your data pipeline.

Add a copy activity directly

Follow these steps to add a copy activity directly.

Add a copy activity

  1. Open an existing data pipeline or create a new data pipeline.

  2. Add a copy activity either by selecting Add pipeline activity > Copy activity or by selecting Copy data > Add to canvas under the Activities tab.

    Screenshot showing two ways to add a copy activity.

Configure your general settings under general tab

To learn how to configure your general settings, see General.

Configure your source under the source tab

  1. In Connection, select an existing connection, or select More to create a new connection.

    Screenshot showing where to select New.

    1. Choose the data source type from the pop-up window. You'll use Azure SQL Database as an example. Select Azure SQL Database, and then select Continue.

      Screenshot showing how to select the data source.

    2. It navigates to the connection creation page. Fill in the required connection information on the panel, and then select Create. For the details of connection creation for each type of data source, you can refer to each connector article.

      Screenshot showing New connection page.

    3. Once your connection is created, it takes you back to the data pipeline page. Then select Refresh to get the connection that you created from the drop-down list. You can also choose an existing Azure SQL Database connection from the drop-down directly if you already created it before. The Test connection and Edit capabilities are available for each selected connection. Then select Azure SQL Database in Connection type.

  2. Specify a table to be copied. Select Preview data to preview your source table. You can also use Query and Stored procedure to read data from your source.

  3. Expand Advanced for more advanced settings like query timeout, or partitioning. (Advanced settings vary by connector.)

Configure your destination under the destination tab

  1. In Connection select an existing connection, or select More to create a new connection. It can be either your internal first class data store from your workspace, such as Lakehouse, or your external data stores. In this example, we use Lakehouse.

  2. Once your connection is created, it takes you back to the data pipeline page. Then select Refresh to get the connection that you created from the drop-down list. You can also choose an existing Lakehouse connection from the drop-down directly if you already created it before.

  3. Specify a table or set up the file path to define the file or folder as the destination. Here select Tables and specify a table to write data.

  4. Expand Advanced for more advanced settings, like max rows per file, or table action. (Advanced settings vary by connector.)

Now you can either save your data pipeline with this copy activity or continue to design your data pipeline.

Configure your mappings under mapping tab

If the connector that you use supports mapping, you can go to Mapping tab to configure your mapping.

  1. Select Import schemas to import your data schema.

    Screenshot of mapping settings 1.

  2. You can see the auto mapping shows up. Specify your Source column and Destination column. If you create a new table in the destination, you can customize your Destination column name here. If you want to write data into the existing destination table, you can't modify the existing Destination column name. You can also view the Type of source and destination columns.

    Screenshot of mapping settings 2.

You can also select + New mapping to add new mapping, select Clear to clear all mapping settings, and select Reset to reset all mapping Source column.

Configure your other settings under settings tab

The Settings tab contains the settings of performance, staging, and so on.

Screenshot of Settings tab.

See the following table for the description of each setting.

Setting Description JSON script property
Intelligent throughput optimization Specify to optimize the throughput. You can choose from:
Auto
Standard
Balanced
Maximum

When you choose Auto, the optimal setting is dynamically applied based on your source-destination pair and data pattern. You can also customize your throughput, and custom value can be 2-256 while higher value implies more gains.
dataIntegrationUnits
Degree of copy parallelism Specify the degree of parallelism that data loading would use. parallelCopies
Fault tolerance When you select this option, you can ignore some errors that happen in the middle of copy process. For example, incompatible rows between source and destination store, file being deleted during data movement, etc. • enableSkipIncompatibleRow
• skipErrorFile:
   fileMissing
   fileForbidden
   invalidFileName
Enable logging When you select this option, you can log copied files, skipped files and rows. /
Enable staging Specify whether to copy data via an interim staging store. Enable staging only for helpful scenarios. enableStaging
Data store type When enable staging, you can choose Workspace and External as your data store type. /
For Workspace
Workspace Specify to use built-in staging storage. /
For External
Staging account connection Specify the connection of an Azure Blob Storage or Azure Data Lake Storage Gen2, which refers to the instance of Storage that you use as an interim staging store. Create a staging connection if you don't have it. connection (under externalReferences)
Storage path Specify the path that you want to contain the staged data. If you don't provide a path, the service creates a container to store temporary data. Specify a path only if you use Storage with a shared access signature, or you require temporary data to be in a specific location. path
Enable compression Specifies whether data should be compressed before it's copied to the destination. This setting reduces the volume of data being transferred. enableCompression
Preserve Specify whether to preserve metadata/ACLs during data copy. preserve

Note

If you use staged copy with compression enabled, the service principal authentication for staging blob connection isn't supported.

Configure parameters in a copy activity

Parameters can be used to control the behavior of a pipeline and its activities. You can use Add dynamic content to specify parameters for your copy activity properties. Let's take specifying Lakehouse/Data Warehouse/KQL Database as an example to see how to use it.

  1. In your source or destination, after selecting Workspace as data store type and specifying Lakehouse/Data Warehouse/KQL Database as workspace data store type, select Add dynamic content in the drop-down list of Lakehouse or Data Warehouse or KQL Database.

  2. In the pop-up Add dynamic content pane, under Parameters tab, select +.

    Screenshot showing the Add dynamic content page.

  3. Specify the name for your parameter and give it a default value if you want, or you can specify the value for the parameter after selecting Run in the pipeline.

    Screenshot shows creating a new parameter.

    The parameter value should be Lakehouse/Data Warehouse/KQL Database object ID. To get your Lakehouse/Data Warehouse/KQL Database object ID, open your Lakehouse/Data Warehouse/KQL Database in your workspace, and the ID is after /lakehouses/ or /datawarehouses/ or /databases/ in your URL.

    • Lakehouse object ID:

      Screenshot showing the Lakehouse object ID.

    • Data Warehouse object ID:

      Screenshot showing the Data Warehouse object ID.

    • KQL Database object ID:

      Screenshot showing the KQL Database object ID.

  4. Select Save to go back to the Add dynamic content pane. Then select your parameter so it appears in the expression box. Then select OK. You'll go back to the pipeline page and can see the parameter expression is specified after Lakehouse object ID/Data Warehouse object ID/KQL Database object ID.

    Screenshot showing selecting parameter.