Spark template using Scala
Spark application template for running on Cloud
Quick Tips
- Import Spark library at
build.sbt
libraryDependencies ++= Seq( "org.apache.spark" % "spark-core_2.11" % "2.3.0", "org.apache.spark" % "spark-sql_2.11" % "2.3.0" )
- Spark only work with Scala version below 2.12 and Java 9 according to its documentation
- Compile and create scala program using sbt
sbt compile
sbt package
- Upload datasets on Cloud Storage - example
Example Use Case
- Running Spark application on Google Cloud Dataproc. Tutorial can be found here
- Save the output to a Parquet to Google Cloud Storage
- Import to Google BigQuery and further process it