You have seen many videos on Hadoop/Spark cluster, where a ubiquitous example for map reduce is used of counting the words "Banana" from a clean text files. But, in real-life your log files are not this clean, and they are not on cluster itself. Clusters are expensive affairs, so how do you programmatically create cluster and automate your processing?
Here is a presentation about developing a real-life application using Spark cluster. In this presentation, we will parse Akamai logs kept on an Azure storage. We will introduce some of the tools available. After having the script run in Jupyter notebook, we will automate the solution which can be started by a call to an endpoint.
Introduction to Apache Spark on Azure HDInsight channel9.msdn.com/Shows/Azure-Friday/Announcing-Apache-Spark-on-Azure-HDInsight
C# Code to create cluster docs.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-create-linux-clusters-dotnet-sdk
Presenter: Lin Chan, Rafat Sarosh