Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Dataflow Gen2 now supports Continuous Integration/Continuous Deployment (CI/CD) and Git integration. This feature allows you to create, edit, and manage dataflows in a Git repository that's connected to your fabric workspace. Additionally, you can use the deployment pipelines feature to automate the deployment of dataflows from your workspace to other workspaces. This article goes deeper into how to use Dataflow Gen2 with CI/CD and Git integration in Fabric Data Factory.
New features
With Dataflow Gen2 (CI/CD), you can now:
- Use Git integration support for Dataflow Gen2.
- Use the deployment pipelines feature to automate the deployment of dataflows from your workspace to other workspaces.
- Use the Fabric settings and scheduler to refresh and edit settings for Dataflow Gen2.
- Create your Dataflow Gen2 directly into a workspace folder.
- Use Public APIs (preview) to create and manage Dataflow Gen2 with CI/CD and Git integration.
Prerequisites
To get started, you must complete the following prerequisites:
- Have a Microsoft Fabric tenant account with an active subscription. Create an account for free.
- Make sure you have a Microsoft Fabric enabled workspace.
- To enjoy Git integration, make sure it's enabled for your workspace. To learn more about enabling Git integration, go to Get started with Git integration.
Create a Dataflow Gen2 with CI/CD and Git support
To create a Dataflow Gen2 with CI/CD and Git support, follow these steps:
In the Fabric workspace, select Create new item and then select Dataflow Gen2.
Give your dataflow a name and enable the Git integration. Then select Create.
The dataflow is created and you're redirected to the dataflow authoring canvas. You can now start creating your dataflow.
When you're done developing your dataflow, select Save and run.
After you publish, the dataflow has a status of uncommitted.
To commit the dataflow to the Git repository, select the source control icon in the top right corner of the workspace view.
Select all the changes you want to commit and then select Commit.
You now have a Dataflow Gen2 with CI/CD and Git support. We suggest you follow the best practices for working with CI/CD and Git integration in Fabric described in the Scenario 2 - Develop using another workspace tutorial.
Refresh a Dataflow Gen2 or schedule a refresh
You can refresh a Dataflow Gen2 with CI/CD and Git support in two ways—manually or by scheduling a refresh. The following sections describe how to refresh a Dataflow Gen2 with CI/CD and Git support.
Refresh now
In the fabric workspace, select the more options ellipsis icon next to the dataflow you want to refresh.
Select refresh now.
Schedule a refresh
If your dataflow needs to be refreshed on a regular interval, you can schedule the refresh using the Fabric scheduler.
In the Fabric workspace, select the more options ellipsis icon next to the dataflow you want to refresh.
Select Schedule.
On the schedule page, you can set the refresh frequency and the start time and end time, after which you can apply changes.
To start the refresh now, select the Refresh button.
Refresh history and settings
To view the refresh history of the dataflow, you can either select the recent runs tab in the dropdown menu or go into the monitor hub and select the dataflow you want to view the refresh history of.
Settings for Dataflow Gen2 with CI/CD
Accessing the settings of the new Dataflow Gen2 with CI/CD and Git support is similar to any other Fabric item. You can access the settings by selecting the more options ellipsis icon next to the dataflow and selecting the settings.
Saving replaces the publish operation
With Dataflow Gen2 with CI/CD and Git support, the save operation replaces the publish operation. This change means that when you save your dataflow, it automatically "publishes" the changes to the dataflow.
The saving operation is directly overwriting the dataflow in the workspace. If you want to discard the changes, you can do that by selecting Discard changes when closing the editor.
Validation
During the save operation, we also check if the dataflow is in a valid state. If the dataflow isn't in a valid state, an error message is displayed in the dropdown menu in the workspace view. We determine the validity of the dataflow by running a "zero row" evaluation for all the queries in the dataflow.
This evaluation means that we run all the queries in the dataflow in a manner that only requests the schema of the query result, without returning any rows. If a query evaluation fails or a query’s schema can't be determined within 10 minutes, we fail validation and use the previously saved version of the dataflow for refreshes.
Just in time publishing
Dataflow Gen2 with CI/CD introduces an automated "just in time" publishing model to streamline your workflow. When you save a dataflow in the editor, your changes are immediately saved and published, making them available for the next refresh or execution. Using Save and run both publishes and refreshes the dataflow in a single step.
When you sync changes from Git or use deployment pipelines, the updated dataflow is saved in your workspace but not immediately published. Instead, the next time you trigger a refresh (either manually or on a schedule), the system automatically attempts to publish the latest saved version before running the refresh. If publishing fails (for example, due to validation errors), the system reports the error in refresh history.
This approach ensures that the most recent changes from Git or deployment pipelines are always considered at refresh time, without requiring a manual publish step. However, changes made in the editor aren't saved automatically if you close your browser or navigate away—you must explicitly save your dataflow to include your changes in the next publish or refresh.
In some scenarios, the backend decides to publish the dataflow automatically during a refresh operation. This happens when the dataflow backend is updated and requires the dataflow to be re-published to ensure compatibility with the latest backend changes. This automatic publishing occurs without any user intervention and is sporadic, depending on backend updates.
Additionally, APIs are available to refresh a dataflow without publishing or to manually trigger the publish operation for saved changes, giving you flexibility in managing your deployment workflows.
Limitations and known issues
While Dataflow Gen2 with CI/CD and Git support offers a powerful set of features for enterprise ready collaboration, this required us to rebuild the backend to the fabric architecture. This means that some features aren't yet available or have limitations. We're actively working to improve the experience and will update this article as new features are added.
- When you delete the last Dataflow Gen2 with CI/CD and Git support, the staging artifacts become visible in the workspace and are safe to be deleted by the user.
- Workspace view doesn't show the following: Ongoing refresh indication, last refresh, next refresh, and refresh failure indication.
- When your dataflow fails to refresh we do not support automatically sending you a failure notification. As a workaround you can leverarage the orchrestration capabilities of Data Pipelines.
- When branching out to another workspace, a Dataflow Gen2 refresh might fail with the message that the staging lakehouse couldn't be found. When this happens, create a new Dataflow Gen2 with CI/CD and Git support in the workspace to trigger the creation of the staging lakehouse. After this, all other dataflows in the workspace should start to function again.
- When you sync changes from GIT into the workspace or use deployment pipelines, you need to open the new or updated dataflow and save changes manually with the editor. This triggers a publish action in the background to allow the changes to be used during refresh of your dataflow. You can also use the on-demand Dataflow publish job API call to automate the publish operation.
- Power Automate connector for dataflows isn't working with the new Dataflow Gen2 with CI/CD and Git support.