Hi Zhuoling Li,
Thank you for your questions. Let me address your points one by one regarding resubmitting pipeline jobs for retraining via REST API and integration with Azure Data Factory (ADF):
Is REST API the only way to interact with Azure Machine Learning pipelines from ADF?
Yes, if you are using Azure Machine Learning v2 pipelines (which do not require publishing), the only supported way to trigger them from Azure Data Factory (ADF) is by using the Azure ML REST API or a custom Azure Function/Logic App. The built-in ADF activity “Machine Learning Execute Pipeline” only supports v1 published pipelines and is not compatible with v2. While the Azure ML SDK and CLI are useful for local development and CI/CD workflows, they cannot be invoked directly from ADF. Therefore, using the REST API or creating a callable endpoint (e.g., via Azure Functions) is the recommended approach when integrating ADF with v2 pipelines for dynamic orchestration and scheduling.
What is the best practice for retraining and should a pipeline component be created even if it’s not shared?
Yes, it is considered a best practice to encapsulate your retraining logic inside a pipeline component even if you don’t intend to share or reuse it. Pipeline components provide a structured, trackable, and auditable framework for managing machine learning workflows. This is especially important for retraining scenarios, as it improves maintainability, supports better monitoring and lineage tracking, and allows for seamless integration with orchestrators like ADF or Azure ML schedules. Even for single-use or internal components, defining them explicitly helps maintain a clean and scalable architecture.
How to resubmit a pipeline job by REST API?
To rerun a previous pipeline job using the Azure ML REST API, using sourceJobId
alone is not sufficient. The error (e.g., "Invalid pipeline job since step jobs do not exist") typically means the referenced job no longer contains its full step definitions. The sourceJobId
field does not clone the original job definition. Instead, you must retrieve the full job payload (including all step configurations) from the original run, modify it as needed (such as updating input data or parameters), and submit it as a new job via the REST API.
For reference: Data ingestion with Azure Data Factory
Tutorial: Create production machine learning pipelines