Wednesday, 7 June 2017

Message pick up strategies for stream analytics jobs

What happens to messages when a stream analytics job is stopped and start again after some time.
What are the decisions that impact this. Here are few options that are necessary to consider.



Refer to diagram below.






1. Now - when this option is chosen, it will pickup messages that are ingested now on wards. All messages which got ingested into system when job was stopped will not be picked up.

2. Custom - You can specify the time from which you want to pick messages. Let's say job was stopped for 4 hours, so you want to select messages during that time, in case you can choose this option.

3. When last stopped - This option will enable the job to pick up messages which were ingested even when job was stopped, From last run of the job, it keeps are checkpoint and hence pick up messages post that checkpoint.

Usual recommendation option is "When last stopped", But your choice may vary as per your requirement.


Different ways to automate Stream analytics jobs on azure

Stream analytics job on azure can be created from azure portal using wizard and step by step graphical guideline process. However let's think of the scenario where we need to create same jobs to different environments testing, UAT, pre-prod, prod, regression test environment etc. It will be highly tedious job to create jobs manually for each environment. And that when one may feel need to automate this process.

Following are some of the ways in which stream analytics job on azure can be automated and reuse at different places. Each approach pros and cons are also compared in below section.

Using ARM template
Pros
      Deployment of stream analytics jobs can be automated and solution can be deployed to different environment with minimal time span without repetitions of steps.
      Deployment through portal to 10 different environments manually would be tiresome and error prone job.
Cons
      Jobs need to be stopped till deployment and users need to be informed. **However same case is applicable even through portal.
      For specifying PowerBI output in stream analytics, it's necessary to perform a logon to the powerbi service something that is not possible during the ARM Template deployment. (https://github.com/vtex/VtexInsights/wiki/Stream-Analytics).

Using Stream Analytics Powershell cmdlets

Pros
      Jobs can be created with a simple PowerShell command.


Cons
      Separate commands are available for creating jobname,outputs,transformations. Need to combine them at one place as write custom commands to deploy one job including all input,transformation,outputs.

Using Stream Analytics REST API References

Pros
      Separate API’s for inputs, outputs, transformation. So we need to put all pieces together and customize to deploy all parts of SA-jobs at a time
Cons
      We will need to authenticate the API requests.


Using stream-analytics-dotnet-management-sdk

Pros
      Gives us more control over stream analytics job like programmatically monitor SA-jobs. Stream Analytics jobs created via REST APIs, Azure SDK, or Powershell do not have monitoring enabled by default. So this feature comes in use.

Cons
      Use only if you need more control over jobs and when none of above options satisfy your needs. 



Recommendations
             
                  Use ARM template, It serves following 2 purposes

      We can configure tumbling window of SA-jobs dynamically.
      It helps to automate the deployment of SA-jobs to multiple environments. If there are multiple environments for the applications, once development is complete, we don’t want to spend same time as development to move the solution to other environments. So it’s better to have parameterized variables as per environments and deploy SA jobs.