Databricks Repos or Alternatives?

Janice Chi 580 Reputation points
2025-08-05T10:45:37.41+00:00

We’re working on a large-scale data migration project from IBM DB2 to Azure SQL Hyperscale using ADF, Kafka, and Databricks. Our ingestion and transformation logic (including CDC handling, recon, and retry orchestration) is implemented in Databricks notebooks.

We’re currently evaluating the best approach to manage and host these notebooks. We know that Databricks Repos allows Git integration and collaborative development, but we’d like Microsoft’s recommendation on the following:

For a multi-developer team with shared control table–driven notebooks (Extract, Transform, Load), is Databricks Repos the recommended practice?

Are there Microsoft-supported alternatives to host code without Repos that still allow version control, collaboration, and CI/CD?

What are the potential risks or limitations of not using Databricks Repos in enterprise-grade ingestion pipelines?

We plan to use ADF to trigger Databricks jobs with job-scoped clusters — does this change your recommendation around Repos usage or CI/CD setup?

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Venkat Reddy Navari 5,255 Reputation points Microsoft External Staff Moderator
    2025-08-05T11:41:54.67+00:00

    Hi Janice Chi

    Is Databricks Repos recommended for multi-developer teams

    Yes, Databricks Repos is the recommended approach for teams working on shared notebooks, especially when collaboration, version control, and traceability are important. It enables Git integration (Azure Repos, GitHub, Bitbucket, etc.), so developers can use familiar workflows like branching, pull requests, and code reviews directly from the Databricks UI.

    More info Docs: CI/CD techniques with Git folders (Repos)

    Are there Microsoft-supported alternatives to Repos for version control and CI/CD

    Yes, but with trade-offs:

    • You can manage notebooks in external Git repos (outside Databricks) and sync them manually using the Databricks CLI or REST API.
    • You can also build and deploy Python wheels or JARs to Databricks jobs using pipelines in Azure DevOps or GitHub Actions.

    These alternatives work well for code-based (non-notebook) projects but can be harder to manage for notebook-heavy pipelines and lack native collaboration support.

    Risks of not using Repos in an enterprise data pipeline

    Some potential limitations of skipping Repos:

    • No native version control inside Databricks you lose commit history and rollback options
    • Higher risk of overwrite conflicts between developers
    • Manual syncing between notebooks and Git introduces errors
    • Harder to implement automated CI/CD and enforce governance

    For enterprise-scale ingestion pipelines with audit, traceability, and reliability needs these risks can become blockers over time.

    More info Docs: Software engineering best practices for notebooks

    Does using ADF with job-scoped clusters change this recommendation?

    Not at all, your setup using ADF to trigger Databricks jobs with job-scoped clusters works very well with Repos. You can store your production-ready notebooks in a Git-connected Repo, reference them in Databricks jobs, and call those jobs from ADF.

    This structure supports:

    • Clear separation of orchestration (ADF) and logic (Databricks)
    • Code promotion through environments (dev/test/prod)
    • CI/CD pipelines to automate deployment

    More info Docs: CI/CD on Azure Databricks (Overview)

    Finally:

    • Use Databricks Repos for collaborative notebook development
    • Integrate Git (Azure Repos, GitHub, etc.) to track changes and manage PRs
    • Continue using ADF to orchestrate job runs using job-scoped clusters
    • Set up CI/CD pipelines with Azure DevOps or GitHub Actions to promote code across environments

    Hope this helps. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.