Submits job to a cluster.

Scopes

You will need authorization for the https://www.googleapis.com/auth/cloud-platform scope to make a valid call.

If unset, the scope for this method defaults to https://www.googleapis.com/auth/cloud-platform. You can set the scope for this method like this: dataproc1 --scope <scope> projects regions-jobs-submit-as-operation ...

Required Scalar Arguments

  • <project-id> (string)
    • Required. The ID of the Google Cloud Platform project that the job belongs to.
  • <region> (string)
    • Required. The Dataproc region in which to handle the request.

Required Request Value

The request value is a data-structure with various fields. Each field may be a simple scalar or another data-structure. In the latter case it is advised to set the field-cursor to the data-structure's field to specify values more concisely.

For example, a structure like this:

SubmitJobRequest:
  job:
    done: boolean
    driver-control-files-uri: string
    driver-output-resource-uri: string
    driver-scheduling-config:
      memory-mb: integer
      vcores: integer
    flink-job:
      args: [string]
      jar-file-uris: [string]
      logging-config:
        driver-log-levels: { string: string }
      main-class: string
      main-jar-file-uri: string
      properties: { string: string }
      savepoint-uri: string
    hadoop-job:
      archive-uris: [string]
      args: [string]
      file-uris: [string]
      jar-file-uris: [string]
      logging-config:
        driver-log-levels: { string: string }
      main-class: string
      main-jar-file-uri: string
      properties: { string: string }
    hive-job:
      continue-on-failure: boolean
      jar-file-uris: [string]
      properties: { string: string }
      query-file-uri: string
      query-list:
        queries: [string]
      script-variables: { string: string }
    job-uuid: string
    labels: { string: string }
    pig-job:
      continue-on-failure: boolean
      jar-file-uris: [string]
      logging-config:
        driver-log-levels: { string: string }
      properties: { string: string }
      query-file-uri: string
      query-list:
        queries: [string]
      script-variables: { string: string }
    placement:
      cluster-labels: { string: string }
      cluster-name: string
      cluster-uuid: string
    presto-job:
      client-tags: [string]
      continue-on-failure: boolean
      logging-config:
        driver-log-levels: { string: string }
      output-format: string
      properties: { string: string }
      query-file-uri: string
      query-list:
        queries: [string]
    pyspark-job:
      archive-uris: [string]
      args: [string]
      file-uris: [string]
      jar-file-uris: [string]
      logging-config:
        driver-log-levels: { string: string }
      main-python-file-uri: string
      properties: { string: string }
      python-file-uris: [string]
    reference:
      job-id: string
      project-id: string
    scheduling:
      max-failures-per-hour: integer
      max-failures-total: integer
    spark-job:
      archive-uris: [string]
      args: [string]
      file-uris: [string]
      jar-file-uris: [string]
      logging-config:
        driver-log-levels: { string: string }
      main-class: string
      main-jar-file-uri: string
      properties: { string: string }
    spark-r-job:
      archive-uris: [string]
      args: [string]
      file-uris: [string]
      logging-config:
        driver-log-levels: { string: string }
      main-r-file-uri: string
      properties: { string: string }
    spark-sql-job:
      jar-file-uris: [string]
      logging-config:
        driver-log-levels: { string: string }
      properties: { string: string }
      query-file-uri: string
      query-list:
        queries: [string]
      script-variables: { string: string }
    status:
      details: string
      state: string
      state-start-time: string
      substate: string
    trino-job:
      client-tags: [string]
      continue-on-failure: boolean
      logging-config:
        driver-log-levels: { string: string }
      output-format: string
      properties: { string: string }
      query-file-uri: string
      query-list:
        queries: [string]
  request-id: string

can be set completely with the following arguments which are assumed to be executed in the given order. Note how the cursor position is adjusted to the respective structures, allowing simple field names to be used most of the time.

  • -r .job done=true
    • Output only. Indicates whether the job is completed. If the value is false, the job is still in progress. If true, the job is completed, and status.state field will indicate if it was successful, failed, or cancelled.
  • driver-control-files-uri=duo
    • Output only. If present, the location of miscellaneous control files which can be used as part of job setup and handling. If not present, control files might be placed in the same location as driver_output_uri.
  • driver-output-resource-uri=est
    • Output only. A URI pointing to the location of the stdout of the job's driver program.
  • driver-scheduling-config memory-mb=70
    • Required. The amount of memory in MB the driver is requesting.
  • vcores=81

    • Required. The number of vCPUs the driver is requesting.
  • ..flink-job args=sed

    • Optional. The arguments to pass to the driver. Do not include arguments, such as --conf, that can be set as job properties, since a collision might occur that causes an incorrect job submission.
    • Each invocation of this argument appends the given value to the array.
  • jar-file-uris=dolor
    • Optional. HCFS URIs of jar files to add to the CLASSPATHs of the Flink driver and tasks.
    • Each invocation of this argument appends the given value to the array.
  • logging-config driver-log-levels=key=gubergren

    • The per-package log levels for the driver. This can include "root" package name to configure rootLogger. Examples: - 'com.google = FATAL' - 'root = INFO' - 'org.apache = DEBUG'
    • the value will be associated with the given key
  • .. main-class=takimata

    • The name of the driver's main class. The jar file that contains the class must be in the default CLASSPATH or specified in jarFileUris.
  • main-jar-file-uri=et
    • The HCFS URI of the jar file that contains the main class.
  • properties=key=erat
    • Optional. A mapping of property names to values, used to configure Flink. Properties that conflict with values set by the Dataproc API might beoverwritten. Can include properties set in/etc/flink/conf/flink-defaults.conf and classes in user code.
    • the value will be associated with the given key
  • savepoint-uri=sea

    • Optional. HCFS URI of the savepoint, which contains the last saved progress for starting the current job.
  • ..hadoop-job archive-uris=vero

    • Optional. HCFS URIs of archives to be extracted in the working directory of Hadoop drivers and tasks. Supported file types: .jar, .tar, .tar.gz, .tgz, or .zip.
    • Each invocation of this argument appends the given value to the array.
  • args=diam
    • Optional. The arguments to pass to the driver. Do not include arguments, such as -libjars or -Dfoo=bar, that can be set as job properties, since a collision might occur that causes an incorrect job submission.
    • Each invocation of this argument appends the given value to the array.
  • file-uris=takimata
    • Optional. HCFS (Hadoop Compatible Filesystem) URIs of files to be copied to the working directory of Hadoop drivers and distributed tasks. Useful for naively parallel tasks.
    • Each invocation of this argument appends the given value to the array.
  • jar-file-uris=voluptua.
    • Optional. Jar file URIs to add to the CLASSPATHs of the Hadoop driver and tasks.
    • Each invocation of this argument appends the given value to the array.
  • logging-config driver-log-levels=key=et

    • The per-package log levels for the driver. This can include "root" package name to configure rootLogger. Examples: - 'com.google = FATAL' - 'root = INFO' - 'org.apache = DEBUG'
    • the value will be associated with the given key
  • .. main-class=sea

    • The name of the driver's main class. The jar file containing the class must be in the default CLASSPATH or specified in jar_file_uris.
  • main-jar-file-uri=aliquyam
    • The HCFS URI of the jar file containing the main class. Examples: 'gs://foo-bucket/analytics-binaries/extract-useful-metrics-mr.jar' 'hdfs:/tmp/test-samples/custom-wordcount.jar' 'file:///home/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar'
  • properties=key=ut

    • Optional. A mapping of property names to values, used to configure Hadoop. Properties that conflict with values set by the Dataproc API might be overwritten. Can include properties set in /etc/hadoop/conf/*-site and classes in user code.
    • the value will be associated with the given key
  • ..hive-job continue-on-failure=false

    • Optional. Whether to continue executing queries if a query fails. The default value is false. Setting to true can be useful when executing independent parallel queries.
  • jar-file-uris=dolor
    • Optional. HCFS URIs of jar files to add to the CLASSPATH of the Hive server and Hadoop MapReduce (MR) tasks. Can contain Hive SerDes and UDFs.
    • Each invocation of this argument appends the given value to the array.
  • properties=key=ea
    • Optional. A mapping of property names and values, used to configure Hive. Properties that conflict with values set by the Dataproc API might be overwritten. Can include properties set in /etc/hadoop/conf/*-site.xml, /etc/hive/conf/hive-site.xml, and classes in user code.
    • the value will be associated with the given key
  • query-file-uri=eirmod
    • The HCFS URI of the script that contains Hive queries.
  • query-list queries=et

    • Required. The queries to execute. You do not need to end a query expression with a semicolon. Multiple queries can be specified in one string by separating each with a semicolon. Here is an example of a Dataproc API snippet that uses a QueryList to specify a HiveJob: "hiveJob": { "queryList": { "queries": [ "query1", "query2", "query3;query4", ] } }
    • Each invocation of this argument appends the given value to the array.
  • .. script-variables=key=erat

    • Optional. Mapping of query variable names to values (equivalent to the Hive command: SET name="value";).
    • the value will be associated with the given key
  • .. job-uuid=no

    • Output only. A UUID that uniquely identifies a job within the project over time. This is in contrast to a user-settable reference.job_id that might be reused over time.
  • labels=key=amet
    • Optional. The labels to associate with this job. Label keys must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). Label values can be empty, but, if present, must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). No more than 32 labels can be associated with a job.
    • the value will be associated with the given key
  • pig-job continue-on-failure=false
    • Optional. Whether to continue executing queries if a query fails. The default value is false. Setting to true can be useful when executing independent parallel queries.
  • jar-file-uris=duo
    • Optional. HCFS URIs of jar files to add to the CLASSPATH of the Pig Client and Hadoop MapReduce (MR) tasks. Can contain Pig UDFs.
    • Each invocation of this argument appends the given value to the array.
  • logging-config driver-log-levels=key=dolor

    • The per-package log levels for the driver. This can include "root" package name to configure rootLogger. Examples: - 'com.google = FATAL' - 'root = INFO' - 'org.apache = DEBUG'
    • the value will be associated with the given key
  • .. properties=key=est

    • Optional. A mapping of property names to values, used to configure Pig. Properties that conflict with values set by the Dataproc API might be overwritten. Can include properties set in /etc/hadoop/conf/*-site.xml, /etc/pig/conf/pig.properties, and classes in user code.
    • the value will be associated with the given key
  • query-file-uri=et
    • The HCFS URI of the script that contains the Pig queries.
  • query-list queries=ipsum

    • Required. The queries to execute. You do not need to end a query expression with a semicolon. Multiple queries can be specified in one string by separating each with a semicolon. Here is an example of a Dataproc API snippet that uses a QueryList to specify a HiveJob: "hiveJob": { "queryList": { "queries": [ "query1", "query2", "query3;query4", ] } }
    • Each invocation of this argument appends the given value to the array.
  • .. script-variables=key=stet

    • Optional. Mapping of query variable names to values (equivalent to the Pig command: name=[value]).
    • the value will be associated with the given key
  • ..placement cluster-labels=key=stet

    • Optional. Cluster labels to identify a cluster where the job will be submitted.
    • the value will be associated with the given key
  • cluster-name=amet.
    • Required. The name of the cluster where the job will be submitted.
  • cluster-uuid=ut

    • Output only. A cluster UUID generated by the Dataproc service when the job is submitted.
  • ..presto-job client-tags=duo

    • Optional. Presto client tags to attach to this query
    • Each invocation of this argument appends the given value to the array.
  • continue-on-failure=false
    • Optional. Whether to continue executing queries if a query fails. The default value is false. Setting to true can be useful when executing independent parallel queries.
  • logging-config driver-log-levels=key=accusam

    • The per-package log levels for the driver. This can include "root" package name to configure rootLogger. Examples: - 'com.google = FATAL' - 'root = INFO' - 'org.apache = DEBUG'
    • the value will be associated with the given key
  • .. output-format=et

    • Optional. The format in which query output will be displayed. See the Presto documentation for supported output formats
  • properties=key=no
    • Optional. A mapping of property names to values. Used to set Presto session properties (https://prestodb.io/docs/current/sql/set-session.html) Equivalent to using the --session flag in the Presto CLI
    • the value will be associated with the given key
  • query-file-uri=lorem
    • The HCFS URI of the script that contains SQL queries.
  • query-list queries=ipsum

    • Required. The queries to execute. You do not need to end a query expression with a semicolon. Multiple queries can be specified in one string by separating each with a semicolon. Here is an example of a Dataproc API snippet that uses a QueryList to specify a HiveJob: "hiveJob": { "queryList": { "queries": [ "query1", "query2", "query3;query4", ] } }
    • Each invocation of this argument appends the given value to the array.
  • ...pyspark-job archive-uris=kasd

    • Optional. HCFS URIs of archives to be extracted into the working directory of each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.
    • Each invocation of this argument appends the given value to the array.
  • args=justo
    • Optional. The arguments to pass to the driver. Do not include arguments, such as --conf, that can be set as job properties, since a collision may occur that causes an incorrect job submission.
    • Each invocation of this argument appends the given value to the array.
  • file-uris=duo
    • Optional. HCFS URIs of files to be placed in the working directory of each executor. Useful for naively parallel tasks.
    • Each invocation of this argument appends the given value to the array.
  • jar-file-uris=nonumy
    • Optional. HCFS URIs of jar files to add to the CLASSPATHs of the Python driver and tasks.
    • Each invocation of this argument appends the given value to the array.
  • logging-config driver-log-levels=key=sea

    • The per-package log levels for the driver. This can include "root" package name to configure rootLogger. Examples: - 'com.google = FATAL' - 'root = INFO' - 'org.apache = DEBUG'
    • the value will be associated with the given key
  • .. main-python-file-uri=eirmod

    • Required. The HCFS URI of the main Python file to use as the driver. Must be a .py file.
  • properties=key=amet
    • Optional. A mapping of property names to values, used to configure PySpark. Properties that conflict with values set by the Dataproc API might be overwritten. Can include properties set in /etc/spark/conf/spark-defaults.conf and classes in user code.
    • the value will be associated with the given key
  • python-file-uris=sed

    • Optional. HCFS file URIs of Python files to pass to the PySpark framework. Supported file types: .py, .egg, and .zip.
    • Each invocation of this argument appends the given value to the array.
  • ..reference job-id=dolor

    • Optional. The job ID, which must be unique within the project.The ID must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), or hyphens (-). The maximum length is 100 characters.If not specified by the caller, the job ID will be provided by the server.
  • project-id=sea

    • Optional. The ID of the Google Cloud Platform project that the job belongs to. If specified, must match the request project ID.
  • ..scheduling max-failures-per-hour=56

    • Optional. Maximum number of times per hour a driver can be restarted as a result of driver exiting with non-zero code before job is reported failed.A job might be reported as thrashing if the driver exits with a non-zero code four times within a 10-minute window.Maximum value is 10.Note: This restartable job option is not supported in Dataproc workflow templates (https://cloud.google.com/dataproc/docs/concepts/workflows/using-workflows#adding_jobs_to_a_template).
  • max-failures-total=10

    • Optional. Maximum total number of times a driver can be restarted as a result of the driver exiting with a non-zero code. After the maximum number is reached, the job will be reported as failed.Maximum value is 240.Note: Currently, this restartable job option is not supported in Dataproc workflow templates (https://cloud.google.com/dataproc/docs/concepts/workflows/using-workflows#adding_jobs_to_a_template).
  • ..spark-job archive-uris=duo

    • Optional. HCFS URIs of archives to be extracted into the working directory of each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.
    • Each invocation of this argument appends the given value to the array.
  • args=duo
    • Optional. The arguments to pass to the driver. Do not include arguments, such as --conf, that can be set as job properties, since a collision may occur that causes an incorrect job submission.
    • Each invocation of this argument appends the given value to the array.
  • file-uris=sit
    • Optional. HCFS URIs of files to be placed in the working directory of each executor. Useful for naively parallel tasks.
    • Each invocation of this argument appends the given value to the array.
  • jar-file-uris=labore
    • Optional. HCFS URIs of jar files to add to the CLASSPATHs of the Spark driver and tasks.
    • Each invocation of this argument appends the given value to the array.
  • logging-config driver-log-levels=key=sit

    • The per-package log levels for the driver. This can include "root" package name to configure rootLogger. Examples: - 'com.google = FATAL' - 'root = INFO' - 'org.apache = DEBUG'
    • the value will be associated with the given key
  • .. main-class=ut

    • The name of the driver's main class. The jar file that contains the class must be in the default CLASSPATH or specified in SparkJob.jar_file_uris.
  • main-jar-file-uri=justo
    • The HCFS URI of the jar file that contains the main class.
  • properties=key=sed

    • Optional. A mapping of property names to values, used to configure Spark. Properties that conflict with values set by the Dataproc API might be overwritten. Can include properties set in /etc/spark/conf/spark-defaults.conf and classes in user code.
    • the value will be associated with the given key
  • ..spark-r-job archive-uris=sit

    • Optional. HCFS URIs of archives to be extracted into the working directory of each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.
    • Each invocation of this argument appends the given value to the array.
  • args=ipsum
    • Optional. The arguments to pass to the driver. Do not include arguments, such as --conf, that can be set as job properties, since a collision may occur that causes an incorrect job submission.
    • Each invocation of this argument appends the given value to the array.
  • file-uris=sanctus
    • Optional. HCFS URIs of files to be placed in the working directory of each executor. Useful for naively parallel tasks.
    • Each invocation of this argument appends the given value to the array.
  • logging-config driver-log-levels=key=sed

    • The per-package log levels for the driver. This can include "root" package name to configure rootLogger. Examples: - 'com.google = FATAL' - 'root = INFO' - 'org.apache = DEBUG'
    • the value will be associated with the given key
  • .. main-r-file-uri=justo

    • Required. The HCFS URI of the main R file to use as the driver. Must be a .R file.
  • properties=key=elitr

    • Optional. A mapping of property names to values, used to configure SparkR. Properties that conflict with values set by the Dataproc API might be overwritten. Can include properties set in /etc/spark/conf/spark-defaults.conf and classes in user code.
    • the value will be associated with the given key
  • ..spark-sql-job jar-file-uris=sed

    • Optional. HCFS URIs of jar files to be added to the Spark CLASSPATH.
    • Each invocation of this argument appends the given value to the array.
  • logging-config driver-log-levels=key=sed

    • The per-package log levels for the driver. This can include "root" package name to configure rootLogger. Examples: - 'com.google = FATAL' - 'root = INFO' - 'org.apache = DEBUG'
    • the value will be associated with the given key
  • .. properties=key=dolor

    • Optional. A mapping of property names to values, used to configure Spark SQL's SparkConf. Properties that conflict with values set by the Dataproc API might be overwritten.
    • the value will be associated with the given key
  • query-file-uri=no
    • The HCFS URI of the script that contains SQL queries.
  • query-list queries=rebum.

    • Required. The queries to execute. You do not need to end a query expression with a semicolon. Multiple queries can be specified in one string by separating each with a semicolon. Here is an example of a Dataproc API snippet that uses a QueryList to specify a HiveJob: "hiveJob": { "queryList": { "queries": [ "query1", "query2", "query3;query4", ] } }
    • Each invocation of this argument appends the given value to the array.
  • .. script-variables=key=ipsum

    • Optional. Mapping of query variable names to values (equivalent to the Spark SQL command: SET name="value";).
    • the value will be associated with the given key
  • ..status details=rebum.

    • Optional. Output only. Job state details, such as an error description if the state is ERROR.
  • state=lorem
    • Output only. A state message specifying the overall job state.
  • state-start-time=et
    • Output only. The time when this state was entered.
  • substate=no

    • Output only. Additional state information, which includes status reported by the agent.
  • ..trino-job client-tags=et

    • Optional. Trino client tags to attach to this query
    • Each invocation of this argument appends the given value to the array.
  • continue-on-failure=true
    • Optional. Whether to continue executing queries if a query fails. The default value is false. Setting to true can be useful when executing independent parallel queries.
  • logging-config driver-log-levels=key=no

    • The per-package log levels for the driver. This can include "root" package name to configure rootLogger. Examples: - 'com.google = FATAL' - 'root = INFO' - 'org.apache = DEBUG'
    • the value will be associated with the given key
  • .. output-format=et

    • Optional. The format in which query output will be displayed. See the Trino documentation for supported output formats
  • properties=key=dolor
    • Optional. A mapping of property names to values. Used to set Trino session properties (https://trino.io/docs/current/sql/set-session.html) Equivalent to using the --session flag in the Trino CLI
    • the value will be associated with the given key
  • query-file-uri=sit
    • The HCFS URI of the script that contains SQL queries.
  • query-list queries=eirmod

    • Required. The queries to execute. You do not need to end a query expression with a semicolon. Multiple queries can be specified in one string by separating each with a semicolon. Here is an example of a Dataproc API snippet that uses a QueryList to specify a HiveJob: "hiveJob": { "queryList": { "queries": [ "query1", "query2", "query3;query4", ] } }
    • Each invocation of this argument appends the given value to the array.
  • .... request-id=dolore

    • Optional. A unique id used to identify the request. If the server receives two SubmitJobRequest (https://cloud.google.com/dataproc/docs/reference/rpc/google.cloud.dataproc.v1#google.cloud.dataproc.v1.SubmitJobRequest)s with the same id, then the second request will be ignored and the first Job created and stored in the backend is returned.It is recommended to always set this value to a UUID (https://en.wikipedia.org/wiki/Universally_unique_identifier).The id must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), and hyphens (-). The maximum length is 40 characters.

About Cursors

The cursor position is key to comfortably set complex nested structures. The following rules apply:

  • The cursor position is always set relative to the current one, unless the field name starts with the . character. Fields can be nested such as in -r f.s.o .
  • The cursor position is set relative to the top-level structure if it starts with ., e.g. -r .s.s
  • You can also set nested fields without setting the cursor explicitly. For example, to set a value relative to the current cursor position, you would specify -r struct.sub_struct=bar.
  • You can move the cursor one level up by using ... Each additional . moves it up one additional level. E.g. ... would go three levels up.

Optional Output Flags

The method's return value a JSON encoded structure, which will be written to standard output by default.

  • -o out
    • out specifies the destination to which to write the server's result to. It will be a JSON-encoded structure. The destination may be - to indicate standard output, or a filepath that is to contain the received bytes. If unset, it defaults to standard output.

Optional General Properties

The following properties can configure any call, and are not specific to this method.

  • -p $-xgafv=string

    • V1 error format.
  • -p access-token=string

    • OAuth access token.
  • -p alt=string

    • Data format for response.
  • -p callback=string

    • JSONP
  • -p fields=string

    • Selector specifying which fields to include in a partial response.
  • -p key=string

    • API key. Your API key identifies your project and provides you with API access, quota, and reports. Required unless you provide an OAuth 2.0 token.
  • -p oauth-token=string

    • OAuth 2.0 token for the current user.
  • -p pretty-print=boolean

    • Returns response with indentations and line breaks.
  • -p quota-user=string

    • Available to use for quota purposes for server-side applications. Can be any arbitrary string assigned to a user, but should not exceed 40 characters.
  • -p upload-type=string

    • Legacy upload protocol for media (e.g. "media", "multipart").
  • -p upload-protocol=string

    • Upload protocol for media (e.g. "raw", "multipart").