Creates a batch workload that executes asynchronously.

Scopes

You will need authorization for the https://www.googleapis.com/auth/cloud-platform scope to make a valid call.

If unset, the scope for this method defaults to https://www.googleapis.com/auth/cloud-platform. You can set the scope for this method like this: dataproc1 --scope <scope> projects locations-batches-create ...

Required Scalar Argument

  • <parent> (string)
    • Required. The parent resource where this batch will be created.

Required Request Value

The request value is a data-structure with various fields. Each field may be a simple scalar or another data-structure. In the latter case it is advised to set the field-cursor to the data-structure's field to specify values more concisely.

For example, a structure like this:

Batch:
  create-time: string
  creator: string
  environment-config:
    execution-config:
      idle-ttl: string
      kms-key: string
      network-tags: [string]
      network-uri: string
      service-account: string
      staging-bucket: string
      subnetwork-uri: string
      ttl: string
    peripherals-config:
      metastore-service: string
      spark-history-server-config:
        dataproc-cluster: string
  labels: { string: string }
  name: string
  operation: string
  pyspark-batch:
    archive-uris: [string]
    args: [string]
    file-uris: [string]
    jar-file-uris: [string]
    main-python-file-uri: string
    python-file-uris: [string]
  runtime-config:
    container-image: string
    properties: { string: string }
    repository-config:
      pypi-repository-config:
        pypi-repository: string
    version: string
  runtime-info:
    approximate-usage:
      accelerator-type: string
      milli-accelerator-seconds: string
      milli-dcu-seconds: string
      shuffle-storage-gb-seconds: string
    current-usage:
      accelerator-type: string
      milli-accelerator: string
      milli-dcu: string
      milli-dcu-premium: string
      shuffle-storage-gb: string
      shuffle-storage-gb-premium: string
      snapshot-time: string
    diagnostic-output-uri: string
    endpoints: { string: string }
    output-uri: string
  spark-batch:
    archive-uris: [string]
    args: [string]
    file-uris: [string]
    jar-file-uris: [string]
    main-class: string
    main-jar-file-uri: string
  spark-r-batch:
    archive-uris: [string]
    args: [string]
    file-uris: [string]
    main-r-file-uri: string
  spark-sql-batch:
    jar-file-uris: [string]
    query-file-uri: string
    query-variables: { string: string }
  state: string
  state-message: string
  state-time: string
  uuid: string

can be set completely with the following arguments which are assumed to be executed in the given order. Note how the cursor position is adjusted to the respective structures, allowing simple field names to be used most of the time.

  • -r . create-time=voluptua.
    • Output only. The time when the batch was created.
  • creator=amet.
    • Output only. The email address of the user who created the batch.
  • environment-config.execution-config idle-ttl=consetetur
    • Optional. Applies to sessions only. The duration to keep the session alive while it's idling. Exceeding this threshold causes the session to terminate. This field cannot be set on a batch workload. Minimum value is 10 minutes; maximum value is 14 days (see JSON representation of Duration (https://developers.google.com/protocol-buffers/docs/proto3#json)). Defaults to 1 hour if not set. If both ttl and idle_ttl are specified for an interactive session, the conditions are treated as OR conditions: the workload will be terminated when it has been idle for idle_ttl or when ttl has been exceeded, whichever occurs first.
  • kms-key=diam
    • Optional. The Cloud KMS key to use for encryption.
  • network-tags=dolor
    • Optional. Tags used for network traffic control.
    • Each invocation of this argument appends the given value to the array.
  • network-uri=et
    • Optional. Network URI to connect workload to.
  • service-account=et
    • Optional. Service account that used to execute workload.
  • staging-bucket=sadipscing
    • Optional. A Cloud Storage bucket used to stage workload dependencies, config files, and store workload output and other ephemeral data, such as Spark history files. If you do not specify a staging bucket, Cloud Dataproc will determine a Cloud Storage location according to the region where your workload is running, and then create and manage project-level, per-location staging and temporary buckets. This field requires a Cloud Storage bucket name, not a gs://... URI to a Cloud Storage bucket.
  • subnetwork-uri=stet
    • Optional. Subnetwork URI to connect workload to.
  • ttl=dolor

    • Optional. The duration after which the workload will be terminated, specified as the JSON representation for Duration (https://protobuf.dev/programming-guides/proto3/#json). When the workload exceeds this duration, it will be unconditionally terminated without waiting for ongoing work to finish. If ttl is not specified for a batch workload, the workload will be allowed to run until it exits naturally (or run forever without exiting). If ttl is not specified for an interactive session, it defaults to 24 hours. If ttl is not specified for a batch that uses 2.1+ runtime version, it defaults to 4 hours. Minimum value is 10 minutes; maximum value is 14 days. If both ttl and idle_ttl are specified (for an interactive session), the conditions are treated as OR conditions: the workload will be terminated when it has been idle for idle_ttl or when ttl has been exceeded, whichever occurs first.
  • ..peripherals-config metastore-service=duo

    • Optional. Resource name of an existing Dataproc Metastore service.Example: projects/[project_id]/locations/[region]/services/[service_id]
  • spark-history-server-config dataproc-cluster=vero

    • Optional. Resource name of an existing Dataproc Cluster to act as a Spark History Server for the workload.Example: projects/[project_id]/regions/[region]/clusters/[cluster_name]
  • .... labels=key=vero

    • Optional. The labels to associate with this batch. Label keys must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). Label values may be empty, but, if present, must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). No more than 32 labels can be associated with a batch.
    • the value will be associated with the given key
  • name=invidunt
    • Output only. The resource name of the batch.
  • operation=stet
    • Output only. The resource name of the operation associated with this batch.
  • pyspark-batch archive-uris=vero
    • Optional. HCFS URIs of archives to be extracted into the working directory of each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.
    • Each invocation of this argument appends the given value to the array.
  • args=elitr
    • Optional. The arguments to pass to the driver. Do not include arguments that can be set as batch properties, such as --conf, since a collision can occur that causes an incorrect batch submission.
    • Each invocation of this argument appends the given value to the array.
  • file-uris=lorem
    • Optional. HCFS URIs of files to be placed in the working directory of each executor.
    • Each invocation of this argument appends the given value to the array.
  • jar-file-uris=diam
    • Optional. HCFS URIs of jar files to add to the classpath of the Spark driver and tasks.
    • Each invocation of this argument appends the given value to the array.
  • main-python-file-uri=no
    • Required. The HCFS URI of the main Python file to use as the Spark driver. Must be a .py file.
  • python-file-uris=ipsum

    • Optional. HCFS file URIs of Python files to pass to the PySpark framework. Supported file types: .py, .egg, and .zip.
    • Each invocation of this argument appends the given value to the array.
  • ..runtime-config container-image=accusam

    • Optional. Optional custom container image for the job runtime environment. If not specified, a default container image will be used.
  • properties=key=takimata
    • Optional. A mapping of property names to values, which are used to configure workload execution.
    • the value will be associated with the given key
  • repository-config.pypi-repository-config pypi-repository=consetetur

    • Optional. PyPi repository address
  • ... version=voluptua.

    • Optional. Version of the batch runtime.
  • ..runtime-info.approximate-usage accelerator-type=et

    • Optional. Accelerator type being used, if any
  • milli-accelerator-seconds=erat
    • Optional. Accelerator usage in (milliAccelerator x seconds) (see Dataproc Serverless pricing (https://cloud.google.com/dataproc-serverless/pricing)).
  • milli-dcu-seconds=consetetur
    • Optional. DCU (Dataproc Compute Units) usage in (milliDCU x seconds) (see Dataproc Serverless pricing (https://cloud.google.com/dataproc-serverless/pricing)).
  • shuffle-storage-gb-seconds=amet.

    • Optional. Shuffle storage usage in (GB x seconds) (see Dataproc Serverless pricing (https://cloud.google.com/dataproc-serverless/pricing)).
  • ..current-usage accelerator-type=sed

    • Optional. Accelerator type being used, if any
  • milli-accelerator=takimata
    • Optional. Milli (one-thousandth) accelerator. (see Dataproc Serverless pricing (https://cloud.google.com/dataproc-serverless/pricing))
  • milli-dcu=dolores
    • Optional. Milli (one-thousandth) Dataproc Compute Units (DCUs) (see Dataproc Serverless pricing (https://cloud.google.com/dataproc-serverless/pricing)).
  • milli-dcu-premium=gubergren
    • Optional. Milli (one-thousandth) Dataproc Compute Units (DCUs) charged at premium tier (see Dataproc Serverless pricing (https://cloud.google.com/dataproc-serverless/pricing)).
  • shuffle-storage-gb=et
    • Optional. Shuffle Storage in gigabytes (GB). (see Dataproc Serverless pricing (https://cloud.google.com/dataproc-serverless/pricing))
  • shuffle-storage-gb-premium=accusam
    • Optional. Shuffle Storage in gigabytes (GB) charged at premium tier. (see Dataproc Serverless pricing (https://cloud.google.com/dataproc-serverless/pricing))
  • snapshot-time=voluptua.

    • Optional. The timestamp of the usage snapshot.
  • .. diagnostic-output-uri=dolore

    • Output only. A URI pointing to the location of the diagnostics tarball.
  • endpoints=key=dolore
    • Output only. Map of remote access endpoints (such as web interfaces and APIs) to their URIs.
    • the value will be associated with the given key
  • output-uri=dolore

    • Output only. A URI pointing to the location of the stdout and stderr of the workload.
  • ..spark-batch archive-uris=voluptua.

    • Optional. HCFS URIs of archives to be extracted into the working directory of each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.
    • Each invocation of this argument appends the given value to the array.
  • args=amet.
    • Optional. The arguments to pass to the driver. Do not include arguments that can be set as batch properties, such as --conf, since a collision can occur that causes an incorrect batch submission.
    • Each invocation of this argument appends the given value to the array.
  • file-uris=ea
    • Optional. HCFS URIs of files to be placed in the working directory of each executor.
    • Each invocation of this argument appends the given value to the array.
  • jar-file-uris=sadipscing
    • Optional. HCFS URIs of jar files to add to the classpath of the Spark driver and tasks.
    • Each invocation of this argument appends the given value to the array.
  • main-class=lorem
    • Optional. The name of the driver main class. The jar file that contains the class must be in the classpath or specified in jar_file_uris.
  • main-jar-file-uri=invidunt

    • Optional. The HCFS URI of the jar file that contains the main class.
  • ..spark-r-batch archive-uris=no

    • Optional. HCFS URIs of archives to be extracted into the working directory of each executor. Supported file types: .jar, .tar, .tar.gz, .tgz, and .zip.
    • Each invocation of this argument appends the given value to the array.
  • args=est
    • Optional. The arguments to pass to the Spark driver. Do not include arguments that can be set as batch properties, such as --conf, since a collision can occur that causes an incorrect batch submission.
    • Each invocation of this argument appends the given value to the array.
  • file-uris=at
    • Optional. HCFS URIs of files to be placed in the working directory of each executor.
    • Each invocation of this argument appends the given value to the array.
  • main-r-file-uri=sed

    • Required. The HCFS URI of the main R file to use as the driver. Must be a .R or .r file.
  • ..spark-sql-batch jar-file-uris=sit

    • Optional. HCFS URIs of jar files to be added to the Spark CLASSPATH.
    • Each invocation of this argument appends the given value to the array.
  • query-file-uri=et
    • Required. The HCFS URI of the script that contains Spark SQL queries to execute.
  • query-variables=key=tempor

    • Optional. Mapping of query variable names to values (equivalent to the Spark SQL command: SET name="value";).
    • the value will be associated with the given key
  • .. state=aliquyam

    • Output only. The state of the batch.
  • state-message=ipsum
    • Output only. Batch state details, such as a failure description if the state is FAILED.
  • state-time=et
    • Output only. The time when the batch entered a current state.
  • uuid=sanctus
    • Output only. A batch UUID (Unique Universal Identifier). The service generates this value when it creates the batch.

About Cursors

The cursor position is key to comfortably set complex nested structures. The following rules apply:

  • The cursor position is always set relative to the current one, unless the field name starts with the . character. Fields can be nested such as in -r f.s.o .
  • The cursor position is set relative to the top-level structure if it starts with ., e.g. -r .s.s
  • You can also set nested fields without setting the cursor explicitly. For example, to set a value relative to the current cursor position, you would specify -r struct.sub_struct=bar.
  • You can move the cursor one level up by using ... Each additional . moves it up one additional level. E.g. ... would go three levels up.

Optional Output Flags

The method's return value a JSON encoded structure, which will be written to standard output by default.

  • -o out
    • out specifies the destination to which to write the server's result to. It will be a JSON-encoded structure. The destination may be - to indicate standard output, or a filepath that is to contain the received bytes. If unset, it defaults to standard output.

Optional Method Properties

You may set the following properties to further configure the call. Please note that -p is followed by one or more key-value-pairs, and is called like this -p k1=v1 k2=v2 even though the listing below repeats the -p for completeness.

  • -p batch-id=string

    • Optional. The ID to use for the batch, which will become the final component of the batch's resource name.This value must be 4-63 characters. Valid characters are /[a-z][0-9]-/.
  • -p request-id=string

    • Optional. A unique ID used to identify the request. If the service receives two CreateBatchRequest (https://cloud.google.com/dataproc/docs/reference/rpc/google.cloud.dataproc.v1#google.cloud.dataproc.v1.CreateBatchRequest)s with the same request_id, the second request is ignored and the Operation that corresponds to the first Batch created and stored in the backend is returned.Recommendation: Set this value to a UUID (https://en.wikipedia.org/wiki/Universally_unique_identifier).The value must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), and hyphens (-). The maximum length is 40 characters.

Optional General Properties

The following properties can configure any call, and are not specific to this method.

  • -p $-xgafv=string

    • V1 error format.
  • -p access-token=string

    • OAuth access token.
  • -p alt=string

    • Data format for response.
  • -p callback=string

    • JSONP
  • -p fields=string

    • Selector specifying which fields to include in a partial response.
  • -p key=string

    • API key. Your API key identifies your project and provides you with API access, quota, and reports. Required unless you provide an OAuth 2.0 token.
  • -p oauth-token=string

    • OAuth 2.0 token for the current user.
  • -p pretty-print=boolean

    • Returns response with indentations and line breaks.
  • -p quota-user=string

    • Available to use for quota purposes for server-side applications. Can be any arbitrary string assigned to a user, but should not exceed 40 characters.
  • -p upload-type=string

    • Legacy upload protocol for media (e.g. "media", "multipart").
  • -p upload-protocol=string

    • Upload protocol for media (e.g. "raw", "multipart").