Published on December 14, 2016 by Microsoft

It is hard to believe that it has been almost 2 years since we last had Maxim on our show, but I can tell you we are extremely excited he’s back. Maxim is a Senior Program Manager in the Big Data team at Microsoft and he’s back to talk about Interactive Spark on Azure.

Maxim begins our discussion by walking us through the process and challenges data scientists go through when processing data. He explains that data science is an iterative process but that typically their productivity is not efficient because they spend a lot of time waiting for jobs to complete. One of the big factors, Maxim explains, is the size and cleanliness of data which contributes to the long wait times.

At the [05:20] mark Maxim shows us how Spark on Azure provides a solution to this problem by limiting the length of iterations, thus helping you be more productive. Maxim walks us through how that is accomplished. He first introduces is to Apache Spark, and then discusses how Spark on Azure makes data exploration even better.

 At the [08:38] mark its DEMO TIME, where Maxim spends a few minutes showing us how to spin up a Spark HDInsight cluster, then spends the remaining 10 minutes demoing how to use Spark in HDInsight to execute jobs efficiently. I won’t give anything away here, so be sure to watch to see Maxim work his Spark magic! Awesome show!

We definitely look forward to having him back!

 

Leave a Reply

3 Comments on "Interactive Spark on Azure"

Notify of
avatar

stanleyjohns
Guest
stanleyjohns
5 months 23 days ago

@Luis are you sure you are not counting the sample data set size?You may be doing: select count(*) from taxi_trips_full  but that could be of the sample data set. Instead try select count(*) from taxi_rawHope this helps, or let me know if you found the solution.

Luis Simoes
Guest
Luis Simoes
1 year 4 months ago

Tried to use this dataset but the count is about 128k and not even close to the billion…Am I doing something wrong?

Amber
Guest
Amber
1 year 4 months ago

I am not very familiar with Microsoft Azure, as it is learning online, in which I prefer one-on-one guidance.

wpDiscuz