Macros in Data Pipelines



Neville Li works at Spotify on the Music Recommendation Team. They’ve been using Scala since early 2013, specifically, using data science tools like Scalding and Spark.

He describess a particular “powerful data combo” trio:

In this talk for NE Scala, Neville presents how Scala Macros can be used to improve data pipeline code levering the listed items above. Quoting from his abstract, “We use macros to generate parquet schema projection and filter predicates in compile time. Compared to the standard approach, the macros are type-safe, more concise, and user friendly.”

The code, that Neville and his team are using in production at Spotify, can be found here: https://github.com/nevillelyh/parquet-avro-extra

Further Resources

  • Scala Training from ProTech
  • Video & Tutorials on Scala
Published March 14, 2015