Macros in Data Pipelines

Neville Li works at Spotify on the Music Recommendation Team. They’ve been using Scala since early 2013, specifically, using data science tools like Scalding and Spark.

He describess a particular “powerful data combo” trio:

Parquet
Avro
Scalding / Spark

In this talk for NE Scala, Neville presents how Scala Macros can be used to improve data pipeline code levering the listed items above. Quoting from his abstract, “We use macros to generate parquet schema projection and filter predicates in compile time. Compared to the standard approach, the macros are type-safe, more concise, and user friendly.”

The code, that Neville and his team are using in production at Spotify, can be found here: https://github.com/nevillelyh/parquet-avro-extra

Further Resources

Scala Training from ProTech
Video & Tutorials on Scala