Azure Machine Learning retraining and scoring with Data Factory

My previously created Azure Machine Learning retraining and scoring model created with Azure Logic App and PowerBI (here is more info https://www.kruth.fi/uncategorized/azure-machine-learning-retrain-running-r-scripts-with-power-bi-and-some-dax/) stopped working last January. I didn’t have enough motivation until now to start digging to find out what was wrong. Reason revealed to be removed component from Azure Logic App – namely Azure ML component. It just doesn’t exists any more.

I started to investigate what can I do to replace that solution and found this article: https://azure.microsoft.com/en-us/blog/getting-started-with-azure-data-factory-and-azure-machine-learning-4/. Instructions were a little bit outdated and missing some links to Azure ML, which gaps I try to fill with this article.

This process can be separated into three parts:

  1. Machine Learning model retraining
  2. Deploying retrained model
  3. Using updated model to scoring
Jatka lukemista ”Azure Machine Learning retraining and scoring with Data Factory”

Using Azure Data Factory to copy only new on-premise files, process 0-n files and delete those afterwards

Last time I promised to blog about Azure Data Factory Data Flows, but decided to do this first. My business problem was to process files on On-Premise file share with SSIS without moving original files anywhere. Challenge was that with SSIS that is not easily done without possibility to move the original file. Maybe with log table and custom code. 

I decided to use Azure Data Factory (ADF) and Azure Blob Storage to tackle this challenge. Even ADF is missing couple of critical features, but I managed to use workarounds. I wanted to write about this, because I couldn’t find any good instructions for this process in one place.

In ADF there is a nice component called “Get Metadata”. Basically you could do this also with SSIS script component. I decided to use File Creation Date and store that into BatchControl table for the next run.

Jatka lukemista ”Using Azure Data Factory to copy only new on-premise files, process 0-n files and delete those afterwards”