Submit Apache Spark Job with REST API

29 April 2018

When working with Apache spark ,there are times when you need to trigger a Spark job on demand from outside the cluster.There are two ways in which we can submit Apache spark job in a cluster.

  • Spark Submit from within the Spark cluster

To submit a spark job from within the spark cluster we use spark-submit . Below is a sample shell script which submits the Spark job .Most of the argumenst are self-explanotary .

#!/bin/bash

$SPARK_HOME/bin/spark-submit \
 --class com.nitendragautam.sparkbatchapp.main.Boot \
--master spark://192.168.133.128:7077 \
--deploy-mode cluster \
--supervise \
--executor-memory 4G \
--driver-memory 4G \
--total-executor-cores 2 \
/home/hduser/sparkbatchapp.jar \
/home/hduser/NDSBatchApp/input \
/home/hduser/NDSBatchApp/output/

  • REST API from outside the Spark cluster

In this post i will explain how to trigger Spark job with the help of REST API.I Please make sure that Spark Cluster is running before submitting Spark Job.

Spark Master
Figure: Apache Spark Master

Trigger Spark batch job by using Shell Script

Create a Shell script named submit_spark_job.sh with below contents. Give the shells script

#!/bin/bash


curl -X POST http://192.168.133.128:6066/v1/submissions/create --header "Content-Type:application/json;charset=UTF-8" --data '{
  "appResource": "/home/hduser/sparkbatchapp.jar",
  "sparkProperties": {
    "spark.executor.memory": "4g",
    "spark.master": "spark://192.168.133.128:7077",
    "spark.driver.memory": "4g",
    "spark.driver.cores": "2",
    "spark.eventLog.enabled": "false",
    "spark.app.name": "Spark REST API201804291717022",
    "spark.submit.deployMode": "cluster",
    "spark.jars": "/home/hduser/sparkbatchapp.jar",
    "spark.driver.supervise": "true"
  },
  "clientSparkVersion": "2.0.1",
  "mainClass": "com.nitendragautam.sparkbatchapp.main.Boot",
  "environmentVariables": {
    "SPARK_ENV_LOADED": "1"
  },
  "action": "CreateSubmissionRequest",
  "appArgs": [
    "/home/hduser/NDSBatchApp/input",
    "/home/hduser/NDSBatchApp/output/"
  ]
}'

Once the spark Job successfully gets executed ,you will see a output with below contents.


[email protected]: sh submit_spark_job.sh
{
  "action" : "CreateSubmissionResponse",
  "message" : "Driver successfully submitted as driver-20180429125849-0001",
  "serverSparkVersion" : "2.0.1",
  "submissionId" : "driver-20180429125849-0001",
  "success" : true
}

Check Status of Spark Job using REST API

If you want to check the status of your Spark Job ,you can use the Submission Id and below shell script.

 curl http://192.168.133.128:6066/v1/submissions/status/driver-20180429125849-0001
{
  "action" : "SubmissionStatusResponse",
  "driverState" : "FINISHED",
  "serverSparkVersion" : "2.0.1",
  "submissionId" : "driver-20180429125849-0001",
  "success" : true,
  "workerHostPort" : "192.168.133.128:38451",
  "workerId" : "worker-20180429124356-192.168.133.128-38451"
}

Share: Twitter Facebook Google+ LinkedIn
comments powered by Disqus