Creates a cluster in a project. The returned Operation.metadata will be ClusterOperationMetadata (https://cloud.google.com/dataproc/docs/reference/rpc/google.cloud.dataproc.v1#clusteroperationmetadata).

Scopes

You will need authorization for the https://www.googleapis.com/auth/cloud-platform scope to make a valid call.

If unset, the scope for this method defaults to https://www.googleapis.com/auth/cloud-platform. You can set the scope for this method like this: dataproc1 --scope <scope> projects regions-clusters-create ...

Required Scalar Arguments

  • <project-id> (string)
    • Required. The ID of the Google Cloud Platform project that the cluster belongs to.
  • <region> (string)
    • Required. The Dataproc region in which to handle the request.

Required Request Value

The request value is a data-structure with various fields. Each field may be a simple scalar or another data-structure. In the latter case it is advised to set the field-cursor to the data-structure's field to specify values more concisely.

For example, a structure like this:

Cluster:
  cluster-name: string
  cluster-uuid: string
  config:
    autoscaling-config:
      policy-uri: string
    config-bucket: string
    encryption-config:
      gce-pd-kms-key-name: string
      kms-key: string
    endpoint-config:
      enable-http-port-access: boolean
      http-ports: { string: string }
    gce-cluster-config:
      confidential-instance-config:
        enable-confidential-compute: boolean
      internal-ip-only: boolean
      metadata: { string: string }
      network-uri: string
      node-group-affinity:
        node-group-uri: string
      private-ipv6-google-access: string
      reservation-affinity:
        consume-reservation-type: string
        key: string
        values: [string]
      service-account: string
      service-account-scopes: [string]
      shielded-instance-config:
        enable-integrity-monitoring: boolean
        enable-secure-boot: boolean
        enable-vtpm: boolean
      subnetwork-uri: string
      tags: [string]
      zone-uri: string
    gke-cluster-config:
      gke-cluster-target: string
      namespaced-gke-deployment-target:
        cluster-namespace: string
        target-gke-cluster: string
    lifecycle-config:
      auto-delete-time: string
      auto-delete-ttl: string
      idle-delete-ttl: string
      idle-start-time: string
    master-config:
      disk-config:
        boot-disk-size-gb: integer
        boot-disk-type: string
        local-ssd-interface: string
        num-local-ssds: integer
      image-uri: string
      instance-names: [string]
      is-preemptible: boolean
      machine-type-uri: string
      managed-group-config:
        instance-group-manager-name: string
        instance-group-manager-uri: string
        instance-template-name: string
      min-cpu-platform: string
      min-num-instances: integer
      num-instances: integer
      preemptibility: string
      startup-config:
        required-registration-fraction: number
    metastore-config:
      dataproc-metastore-service: string
    secondary-worker-config:
      disk-config:
        boot-disk-size-gb: integer
        boot-disk-type: string
        local-ssd-interface: string
        num-local-ssds: integer
      image-uri: string
      instance-names: [string]
      is-preemptible: boolean
      machine-type-uri: string
      managed-group-config:
        instance-group-manager-name: string
        instance-group-manager-uri: string
        instance-template-name: string
      min-cpu-platform: string
      min-num-instances: integer
      num-instances: integer
      preemptibility: string
      startup-config:
        required-registration-fraction: number
    security-config:
      identity-config:
        user-service-account-mapping: { string: string }
      kerberos-config:
        cross-realm-trust-admin-server: string
        cross-realm-trust-kdc: string
        cross-realm-trust-realm: string
        cross-realm-trust-shared-password-uri: string
        enable-kerberos: boolean
        kdc-db-key-uri: string
        key-password-uri: string
        keystore-password-uri: string
        keystore-uri: string
        kms-key-uri: string
        realm: string
        root-principal-password-uri: string
        tgt-lifetime-hours: integer
        truststore-password-uri: string
        truststore-uri: string
    software-config:
      image-version: string
      optional-components: [string]
      properties: { string: string }
    temp-bucket: string
    worker-config:
      disk-config:
        boot-disk-size-gb: integer
        boot-disk-type: string
        local-ssd-interface: string
        num-local-ssds: integer
      image-uri: string
      instance-names: [string]
      is-preemptible: boolean
      machine-type-uri: string
      managed-group-config:
        instance-group-manager-name: string
        instance-group-manager-uri: string
        instance-template-name: string
      min-cpu-platform: string
      min-num-instances: integer
      num-instances: integer
      preemptibility: string
      startup-config:
        required-registration-fraction: number
  labels: { string: string }
  metrics:
    hdfs-metrics: { string: string }
    yarn-metrics: { string: string }
  project-id: string
  status:
    detail: string
    state: string
    state-start-time: string
    substate: string
  virtual-cluster-config:
    auxiliary-services-config:
      metastore-config:
        dataproc-metastore-service: string
      spark-history-server-config:
        dataproc-cluster: string
    kubernetes-cluster-config:
      gke-cluster-config:
        gke-cluster-target: string
        namespaced-gke-deployment-target:
          cluster-namespace: string
          target-gke-cluster: string
      kubernetes-namespace: string
      kubernetes-software-config:
        component-version: { string: string }
        properties: { string: string }
    staging-bucket: string

can be set completely with the following arguments which are assumed to be executed in the given order. Note how the cursor position is adjusted to the respective structures, allowing simple field names to be used most of the time.

  • -r . cluster-name=dolor
    • Required. The cluster name, which must be unique within a project. The name must start with a lowercase letter, and can contain up to 51 lowercase letters, numbers, and hyphens. It cannot end with a hyphen. The name of a deleted cluster can be reused.
  • cluster-uuid=amet.
    • Output only. A cluster UUID (Unique Universal Identifier). Dataproc generates this value when it creates the cluster.
  • config.autoscaling-config policy-uri=kasd

    • Optional. The autoscaling policy used by the cluster.Only resource names including projectid and location (region) are valid. Examples: https://www.googleapis.com/compute/v1/projects/[project_id]/locations/[dataproc_region]/autoscalingPolicies/[policy_id] projects/[project_id]/locations/[dataproc_region]/autoscalingPolicies/[policy_id]Note that the policy must be in the same project and Dataproc region.
  • .. config-bucket=eirmod

    • Optional. A Cloud Storage bucket used to stage job dependencies, config files, and job driver console output. If you do not specify a staging bucket, Cloud Dataproc will determine a Cloud Storage location (US, ASIA, or EU) for your cluster's staging bucket according to the Compute Engine zone where your cluster is deployed, and then create and manage this project-level, per-location bucket (see Dataproc staging and temp buckets (https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/staging-bucket)). This field requires a Cloud Storage bucket name, not a gs://... URI to a Cloud Storage bucket.
  • encryption-config gce-pd-kms-key-name=amet.
    • Optional. The Cloud KMS key resource name to use for persistent disk encryption for all instances in the cluster. See Use CMEK with cluster data (https://cloud.google.com//dataproc/docs/concepts/configuring-clusters/customer-managed-encryption#use_cmek_with_cluster_data) for more information.
  • kms-key=takimata

    • Optional. The Cloud KMS key resource name to use for cluster persistent disk and job argument encryption. See Use CMEK with cluster data (https://cloud.google.com//dataproc/docs/concepts/configuring-clusters/customer-managed-encryption#use_cmek_with_cluster_data) for more information.When this key resource name is provided, the following job arguments of the following job types submitted to the cluster are encrypted using CMEK: FlinkJob args (https://cloud.google.com/dataproc/docs/reference/rest/v1/FlinkJob) HadoopJob args (https://cloud.google.com/dataproc/docs/reference/rest/v1/HadoopJob) SparkJob args (https://cloud.google.com/dataproc/docs/reference/rest/v1/SparkJob) SparkRJob args (https://cloud.google.com/dataproc/docs/reference/rest/v1/SparkRJob) PySparkJob args (https://cloud.google.com/dataproc/docs/reference/rest/v1/PySparkJob) SparkSqlJob (https://cloud.google.com/dataproc/docs/reference/rest/v1/SparkSqlJob) scriptVariables and queryList.queries HiveJob (https://cloud.google.com/dataproc/docs/reference/rest/v1/HiveJob) scriptVariables and queryList.queries PigJob (https://cloud.google.com/dataproc/docs/reference/rest/v1/PigJob) scriptVariables and queryList.queries PrestoJob (https://cloud.google.com/dataproc/docs/reference/rest/v1/PrestoJob) scriptVariables and queryList.queries
  • ..endpoint-config enable-http-port-access=true

    • Optional. If true, enable http access to specific ports on the cluster from external sources. Defaults to false.
  • http-ports=key=et

    • Output only. The map of port descriptions to URLs. Will only be populated if enable_http_port_access is true.
    • the value will be associated with the given key
  • ..gce-cluster-config.confidential-instance-config enable-confidential-compute=true

    • Optional. Defines whether the instance should have confidential compute enabled.
  • .. internal-ip-only=false

    • Optional. If true, all instances in the cluster will only have internal IP addresses. By default, clusters are not restricted to internal IP addresses, and will have ephemeral external IP addresses assigned to each instance. This internal_ip_only restriction can only be enabled for subnetwork enabled networks, and all off-cluster dependencies must be configured to be accessible without external IP addresses.
  • metadata=key=invidunt
    • Optional. The Compute Engine metadata entries to add to all instances (see Project and instance metadata (https://cloud.google.com/compute/docs/storing-retrieving-metadata#project_and_instance_metadata)).
    • the value will be associated with the given key
  • network-uri=elitr
    • Optional. The Compute Engine network to be used for machine communications. Cannot be specified with subnetwork_uri. If neither network_uri nor subnetwork_uri is specified, the "default" network of the project is used, if it exists. Cannot be a "Custom Subnet Network" (see Using Subnetworks (https://cloud.google.com/compute/docs/subnetworks) for more information).A full URL, partial URI, or short name are valid. Examples: https://www.googleapis.com/compute/v1/projects/[project_id]/global/networks/default projects/[project_id]/global/networks/default default
  • node-group-affinity node-group-uri=voluptua.

    • Required. The URI of a sole-tenant node group resource (https://cloud.google.com/compute/docs/reference/rest/v1/nodeGroups) that the cluster will be created on.A full URL, partial URI, or node group name are valid. Examples: https://www.googleapis.com/compute/v1/projects/[project_id]/zones/[zone]/nodeGroups/node-group-1 projects/[project_id]/zones/[zone]/nodeGroups/node-group-1 node-group-1
  • .. private-ipv6-google-access=justo

    • Optional. The type of IPv6 access for a cluster.
  • reservation-affinity consume-reservation-type=amet.
    • Optional. Type of reservation to consume
  • key=aliquyam
    • Optional. Corresponds to the label key of reservation resource.
  • values=et

    • Optional. Corresponds to the label values of reservation resource.
    • Each invocation of this argument appends the given value to the array.
  • .. service-account=gubergren

    • Optional. The Dataproc service account (https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/service-accounts#service_accounts_in_dataproc) (also see VM Data Plane identity (https://cloud.google.com/dataproc/docs/concepts/iam/dataproc-principals#vm_service_account_data_plane_identity)) used by Dataproc cluster VM instances to access Google Cloud Platform services.If not specified, the Compute Engine default service account (https://cloud.google.com/compute/docs/access/service-accounts#default_service_account) is used.
  • service-account-scopes=sed
    • Optional. The URIs of service account scopes to be included in Compute Engine instances. The following base set of scopes is always included: https://www.googleapis.com/auth/cloud.useraccounts.readonly https://www.googleapis.com/auth/devstorage.read_write https://www.googleapis.com/auth/logging.writeIf no scopes are specified, the following defaults are also provided: https://www.googleapis.com/auth/bigquery https://www.googleapis.com/auth/bigtable.admin.table https://www.googleapis.com/auth/bigtable.data https://www.googleapis.com/auth/devstorage.full_control
    • Each invocation of this argument appends the given value to the array.
  • shielded-instance-config enable-integrity-monitoring=false
    • Optional. Defines whether instances have integrity monitoring enabled.
  • enable-secure-boot=false
    • Optional. Defines whether instances have Secure Boot enabled.
  • enable-vtpm=false

    • Optional. Defines whether instances have the vTPM enabled.
  • .. subnetwork-uri=at

    • Optional. The Compute Engine subnetwork to be used for machine communications. Cannot be specified with network_uri.A full URL, partial URI, or short name are valid. Examples: https://www.googleapis.com/compute/v1/projects/[project_id]/regions/[region]/subnetworks/sub0 projects/[project_id]/regions/[region]/subnetworks/sub0 sub0
  • tags=et
    • The Compute Engine tags to add to all instances (see Tagging instances (https://cloud.google.com/compute/docs/label-or-tag-resources#tags)).
    • Each invocation of this argument appends the given value to the array.
  • zone-uri=accusam

    • Optional. The Compute Engine zone where the Dataproc cluster will be located. If omitted, the service will pick a zone in the cluster's Compute Engine region. On a get request, zone will always be present.A full URL, partial URI, or short name are valid. Examples: https://www.googleapis.com/compute/v1/projects/[project_id]/zones/[zone] projects/[project_id]/zones/[zone] [zone]
  • ..gke-cluster-config gke-cluster-target=sit

    • Optional. A target GKE cluster to deploy to. It must be in the same project and region as the Dataproc cluster (the GKE cluster can be zonal or regional). Format: 'projects/{project}/locations/{location}/clusters/{cluster_id}'
  • namespaced-gke-deployment-target cluster-namespace=voluptua.
    • Optional. A namespace within the GKE cluster to deploy into.
  • target-gke-cluster=kasd

    • Optional. The target GKE cluster to deploy to. Format: 'projects/{project}/locations/{location}/clusters/{cluster_id}'
  • ...lifecycle-config auto-delete-time=no

    • Optional. The time when cluster will be auto-deleted (see JSON representation of Timestamp (https://developers.google.com/protocol-buffers/docs/proto3#json)).
  • auto-delete-ttl=amet.
    • Optional. The lifetime duration of cluster. The cluster will be auto-deleted at the end of this period. Minimum value is 10 minutes; maximum value is 14 days (see JSON representation of Duration (https://developers.google.com/protocol-buffers/docs/proto3#json)).
  • idle-delete-ttl=aliquyam
    • Optional. The duration to keep the cluster alive while idling (when no jobs are running). Passing this threshold will cause the cluster to be deleted. Minimum value is 5 minutes; maximum value is 14 days (see JSON representation of Duration (https://developers.google.com/protocol-buffers/docs/proto3#json)).
  • idle-start-time=accusam

    • Output only. The time when cluster became idle (most recent job finished) and became eligible for deletion due to idleness (see JSON representation of Timestamp (https://developers.google.com/protocol-buffers/docs/proto3#json)).
  • ..master-config.disk-config boot-disk-size-gb=43

    • Optional. Size in GB of the boot disk (default is 500GB).
  • boot-disk-type=duo
    • Optional. Type of the boot disk (default is "pd-standard"). Valid values: "pd-balanced" (Persistent Disk Balanced Solid State Drive), "pd-ssd" (Persistent Disk Solid State Drive), or "pd-standard" (Persistent Disk Hard Disk Drive). See Disk types (https://cloud.google.com/compute/docs/disks#disk-types).
  • local-ssd-interface=kasd
    • Optional. Interface type of local SSDs (default is "scsi"). Valid values: "scsi" (Small Computer System Interface), "nvme" (Non-Volatile Memory Express). See local SSD performance (https://cloud.google.com/compute/docs/disks/local-ssd#performance).
  • num-local-ssds=76

    • Optional. Number of attached SSDs, from 0 to 8 (default is 0). If SSDs are not attached, the boot disk is used to store runtime logs and HDFS (https://hadoop.apache.org/docs/r1.2.1/hdfs_user_guide.html) data. If one or more SSDs are attached, this runtime bulk data is spread across them, and the boot disk contains only basic config and installed binaries.Note: Local SSD options may vary by machine type and number of vCPUs selected.
  • .. image-uri=no

    • Optional. The Compute Engine image resource used for cluster instances.The URI can represent an image or image family.Image examples: https://www.googleapis.com/compute/v1/projects/[project_id]/global/images/[image-id] projects/[project_id]/global/images/[image-id] image-idImage family examples. Dataproc will use the most recent image from the family: https://www.googleapis.com/compute/v1/projects/[project_id]/global/images/family/[custom-image-family-name] projects/[project_id]/global/images/family/[custom-image-family-name]If the URI is unspecified, it will be inferred from SoftwareConfig.image_version or the system default.
  • instance-names=kasd
    • Output only. The list of instance names. Dataproc derives the names from cluster_name, num_instances, and the instance group.
    • Each invocation of this argument appends the given value to the array.
  • is-preemptible=true
    • Output only. Specifies that this instance group contains preemptible instances.
  • machine-type-uri=gubergren
    • Optional. The Compute Engine machine type used for cluster instances.A full URL, partial URI, or short name are valid. Examples: https://www.googleapis.com/compute/v1/projects/[project_id]/zones/[zone]/machineTypes/n1-standard-2 projects/[project_id]/zones/[zone]/machineTypes/n1-standard-2 n1-standard-2Auto Zone Exception: If you are using the Dataproc Auto Zone Placement (https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/auto-zone#using_auto_zone_placement) feature, you must use the short name of the machine type resource, for example, n1-standard-2.
  • managed-group-config instance-group-manager-name=accusam
    • Output only. The name of the Instance Group Manager for this group.
  • instance-group-manager-uri=lorem
    • Output only. The partial URI to the instance group manager for this group. E.g. projects/my-project/regions/us-central1/instanceGroupManagers/my-igm.
  • instance-template-name=dolor

    • Output only. The name of the Instance Template used for the Managed Instance Group.
  • .. min-cpu-platform=sanctus

    • Optional. Specifies the minimum cpu platform for the Instance Group. See Dataproc -> Minimum CPU Platform (https://cloud.google.com/dataproc/docs/concepts/compute/dataproc-min-cpu).
  • min-num-instances=91
    • Optional. The minimum number of primary worker instances to create. If min_num_instances is set, cluster creation will succeed if the number of primary workers created is at least equal to the min_num_instances number.Example: Cluster creation request with num_instances = 5 and min_num_instances = 3: If 4 VMs are created and 1 instance fails, the failed VM is deleted. The cluster is resized to 4 instances and placed in a RUNNING state. If 2 instances are created and 3 instances fail, the cluster in placed in an ERROR state. The failed VMs are not deleted.
  • num-instances=22
    • Optional. The number of VM instances in the instance group. For HA cluster master_config groups, must be set to 3. For standard cluster master_config groups, must be set to 1.
  • preemptibility=amet
    • Optional. Specifies the preemptibility of the instance group.The default value for master and worker groups is NON_PREEMPTIBLE. This default cannot be changed.The default value for secondary instances is PREEMPTIBLE.
  • startup-config required-registration-fraction=0.5324707971081809

    • Optional. The config setting to enable cluster creation/ updation to be successful only after required_registration_fraction of instances are up and running. This configuration is applicable to only secondary workers for now. The cluster will fail if required_registration_fraction of instances are not available. This will include instance creation, agent registration, and service registration (if enabled).
  • ...metastore-config dataproc-metastore-service=lorem

    • Required. Resource name of an existing Dataproc Metastore service.Example: projects/[project_id]/locations/[dataproc_region]/services/[service-name]
  • ..secondary-worker-config.disk-config boot-disk-size-gb=12

    • Optional. Size in GB of the boot disk (default is 500GB).
  • boot-disk-type=consetetur
    • Optional. Type of the boot disk (default is "pd-standard"). Valid values: "pd-balanced" (Persistent Disk Balanced Solid State Drive), "pd-ssd" (Persistent Disk Solid State Drive), or "pd-standard" (Persistent Disk Hard Disk Drive). See Disk types (https://cloud.google.com/compute/docs/disks#disk-types).
  • local-ssd-interface=amet
    • Optional. Interface type of local SSDs (default is "scsi"). Valid values: "scsi" (Small Computer System Interface), "nvme" (Non-Volatile Memory Express). See local SSD performance (https://cloud.google.com/compute/docs/disks/local-ssd#performance).
  • num-local-ssds=51

    • Optional. Number of attached SSDs, from 0 to 8 (default is 0). If SSDs are not attached, the boot disk is used to store runtime logs and HDFS (https://hadoop.apache.org/docs/r1.2.1/hdfs_user_guide.html) data. If one or more SSDs are attached, this runtime bulk data is spread across them, and the boot disk contains only basic config and installed binaries.Note: Local SSD options may vary by machine type and number of vCPUs selected.
  • .. image-uri=et

    • Optional. The Compute Engine image resource used for cluster instances.The URI can represent an image or image family.Image examples: https://www.googleapis.com/compute/v1/projects/[project_id]/global/images/[image-id] projects/[project_id]/global/images/[image-id] image-idImage family examples. Dataproc will use the most recent image from the family: https://www.googleapis.com/compute/v1/projects/[project_id]/global/images/family/[custom-image-family-name] projects/[project_id]/global/images/family/[custom-image-family-name]If the URI is unspecified, it will be inferred from SoftwareConfig.image_version or the system default.
  • instance-names=elitr
    • Output only. The list of instance names. Dataproc derives the names from cluster_name, num_instances, and the instance group.
    • Each invocation of this argument appends the given value to the array.
  • is-preemptible=false
    • Output only. Specifies that this instance group contains preemptible instances.
  • machine-type-uri=et
    • Optional. The Compute Engine machine type used for cluster instances.A full URL, partial URI, or short name are valid. Examples: https://www.googleapis.com/compute/v1/projects/[project_id]/zones/[zone]/machineTypes/n1-standard-2 projects/[project_id]/zones/[zone]/machineTypes/n1-standard-2 n1-standard-2Auto Zone Exception: If you are using the Dataproc Auto Zone Placement (https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/auto-zone#using_auto_zone_placement) feature, you must use the short name of the machine type resource, for example, n1-standard-2.
  • managed-group-config instance-group-manager-name=lorem
    • Output only. The name of the Instance Group Manager for this group.
  • instance-group-manager-uri=no
    • Output only. The partial URI to the instance group manager for this group. E.g. projects/my-project/regions/us-central1/instanceGroupManagers/my-igm.
  • instance-template-name=sea

    • Output only. The name of the Instance Template used for the Managed Instance Group.
  • .. min-cpu-platform=et

    • Optional. Specifies the minimum cpu platform for the Instance Group. See Dataproc -> Minimum CPU Platform (https://cloud.google.com/dataproc/docs/concepts/compute/dataproc-min-cpu).
  • min-num-instances=23
    • Optional. The minimum number of primary worker instances to create. If min_num_instances is set, cluster creation will succeed if the number of primary workers created is at least equal to the min_num_instances number.Example: Cluster creation request with num_instances = 5 and min_num_instances = 3: If 4 VMs are created and 1 instance fails, the failed VM is deleted. The cluster is resized to 4 instances and placed in a RUNNING state. If 2 instances are created and 3 instances fail, the cluster in placed in an ERROR state. The failed VMs are not deleted.
  • num-instances=51
    • Optional. The number of VM instances in the instance group. For HA cluster master_config groups, must be set to 3. For standard cluster master_config groups, must be set to 1.
  • preemptibility=ea
    • Optional. Specifies the preemptibility of the instance group.The default value for master and worker groups is NON_PREEMPTIBLE. This default cannot be changed.The default value for secondary instances is PREEMPTIBLE.
  • startup-config required-registration-fraction=0.7508283376265702

    • Optional. The config setting to enable cluster creation/ updation to be successful only after required_registration_fraction of instances are up and running. This configuration is applicable to only secondary workers for now. The cluster will fail if required_registration_fraction of instances are not available. This will include instance creation, agent registration, and service registration (if enabled).
  • ...security-config.identity-config user-service-account-mapping=key=vero

    • Required. Map of user to service account.
    • the value will be associated with the given key
  • ..kerberos-config cross-realm-trust-admin-server=sanctus

    • Optional. The admin server (IP or hostname) for the remote trusted realm in a cross realm trust relationship.
  • cross-realm-trust-kdc=dolores
    • Optional. The KDC (IP or hostname) for the remote trusted realm in a cross realm trust relationship.
  • cross-realm-trust-realm=elitr
    • Optional. The remote realm the Dataproc on-cluster KDC will trust, should the user enable cross realm trust.
  • cross-realm-trust-shared-password-uri=diam
    • Optional. The Cloud Storage URI of a KMS encrypted file containing the shared password between the on-cluster Kerberos realm and the remote trusted realm, in a cross realm trust relationship.
  • enable-kerberos=false
    • Optional. Flag to indicate whether to Kerberize the cluster (default: false). Set this field to true to enable Kerberos on a cluster.
  • kdc-db-key-uri=lorem
    • Optional. The Cloud Storage URI of a KMS encrypted file containing the master key of the KDC database.
  • key-password-uri=magna
    • Optional. The Cloud Storage URI of a KMS encrypted file containing the password to the user provided key. For the self-signed certificate, this password is generated by Dataproc.
  • keystore-password-uri=duo
    • Optional. The Cloud Storage URI of a KMS encrypted file containing the password to the user provided keystore. For the self-signed certificate, this password is generated by Dataproc.
  • keystore-uri=et
    • Optional. The Cloud Storage URI of the keystore file used for SSL encryption. If not provided, Dataproc will provide a self-signed certificate.
  • kms-key-uri=no
    • Optional. The URI of the KMS key used to encrypt sensitive files.
  • realm=sadipscing
    • Optional. The name of the on-cluster Kerberos realm. If not specified, the uppercased domain of hostnames will be the realm.
  • root-principal-password-uri=sit
    • Optional. The Cloud Storage URI of a KMS encrypted file containing the root principal password.
  • tgt-lifetime-hours=81
    • Optional. The lifetime of the ticket granting ticket, in hours. If not specified, or user specifies 0, then default value 10 will be used.
  • truststore-password-uri=stet
    • Optional. The Cloud Storage URI of a KMS encrypted file containing the password to the user provided truststore. For the self-signed certificate, this password is generated by Dataproc.
  • truststore-uri=diam

    • Optional. The Cloud Storage URI of the truststore file used for SSL encryption. If not provided, Dataproc will provide a self-signed certificate.
  • ...software-config image-version=accusam

    • Optional. The version of software inside the cluster. It must be one of the supported Dataproc Versions (https://cloud.google.com/dataproc/docs/concepts/versioning/dataproc-versions#supported_dataproc_versions), such as "1.2" (including a subminor version, such as "1.2.29"), or the "preview" version (https://cloud.google.com/dataproc/docs/concepts/versioning/dataproc-versions#other_versions). If unspecified, it defaults to the latest Debian version.
  • optional-components=dolore
    • Optional. The set of components to activate on the cluster.
    • Each invocation of this argument appends the given value to the array.
  • properties=key=eirmod

    • Optional. The properties to set on daemon config files.Property keys are specified in prefix:property format, for example core:hadoop.tmp.dir. The following are supported prefixes and their mappings: capacity-scheduler: capacity-scheduler.xml core: core-site.xml distcp: distcp-default.xml hdfs: hdfs-site.xml hive: hive-site.xml mapred: mapred-site.xml pig: pig.properties spark: spark-defaults.conf yarn: yarn-site.xmlFor more information, see Cluster properties (https://cloud.google.com/dataproc/docs/concepts/cluster-properties).
    • the value will be associated with the given key
  • .. temp-bucket=lorem

    • Optional. A Cloud Storage bucket used to store ephemeral cluster and jobs data, such as Spark and MapReduce history files. If you do not specify a temp bucket, Dataproc will determine a Cloud Storage location (US, ASIA, or EU) for your cluster's temp bucket according to the Compute Engine zone where your cluster is deployed, and then create and manage this project-level, per-location bucket. The default bucket has a TTL of 90 days, but you can use any TTL (or none) if you specify a bucket (see Dataproc staging and temp buckets (https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/staging-bucket)). This field requires a Cloud Storage bucket name, not a gs://... URI to a Cloud Storage bucket.
  • worker-config.disk-config boot-disk-size-gb=14
    • Optional. Size in GB of the boot disk (default is 500GB).
  • boot-disk-type=est
    • Optional. Type of the boot disk (default is "pd-standard"). Valid values: "pd-balanced" (Persistent Disk Balanced Solid State Drive), "pd-ssd" (Persistent Disk Solid State Drive), or "pd-standard" (Persistent Disk Hard Disk Drive). See Disk types (https://cloud.google.com/compute/docs/disks#disk-types).
  • local-ssd-interface=sadipscing
    • Optional. Interface type of local SSDs (default is "scsi"). Valid values: "scsi" (Small Computer System Interface), "nvme" (Non-Volatile Memory Express). See local SSD performance (https://cloud.google.com/compute/docs/disks/local-ssd#performance).
  • num-local-ssds=98

    • Optional. Number of attached SSDs, from 0 to 8 (default is 0). If SSDs are not attached, the boot disk is used to store runtime logs and HDFS (https://hadoop.apache.org/docs/r1.2.1/hdfs_user_guide.html) data. If one or more SSDs are attached, this runtime bulk data is spread across them, and the boot disk contains only basic config and installed binaries.Note: Local SSD options may vary by machine type and number of vCPUs selected.
  • .. image-uri=ipsum

    • Optional. The Compute Engine image resource used for cluster instances.The URI can represent an image or image family.Image examples: https://www.googleapis.com/compute/v1/projects/[project_id]/global/images/[image-id] projects/[project_id]/global/images/[image-id] image-idImage family examples. Dataproc will use the most recent image from the family: https://www.googleapis.com/compute/v1/projects/[project_id]/global/images/family/[custom-image-family-name] projects/[project_id]/global/images/family/[custom-image-family-name]If the URI is unspecified, it will be inferred from SoftwareConfig.image_version or the system default.
  • instance-names=amet.
    • Output only. The list of instance names. Dataproc derives the names from cluster_name, num_instances, and the instance group.
    • Each invocation of this argument appends the given value to the array.
  • is-preemptible=false
    • Output only. Specifies that this instance group contains preemptible instances.
  • machine-type-uri=ut
    • Optional. The Compute Engine machine type used for cluster instances.A full URL, partial URI, or short name are valid. Examples: https://www.googleapis.com/compute/v1/projects/[project_id]/zones/[zone]/machineTypes/n1-standard-2 projects/[project_id]/zones/[zone]/machineTypes/n1-standard-2 n1-standard-2Auto Zone Exception: If you are using the Dataproc Auto Zone Placement (https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/auto-zone#using_auto_zone_placement) feature, you must use the short name of the machine type resource, for example, n1-standard-2.
  • managed-group-config instance-group-manager-name=et
    • Output only. The name of the Instance Group Manager for this group.
  • instance-group-manager-uri=et
    • Output only. The partial URI to the instance group manager for this group. E.g. projects/my-project/regions/us-central1/instanceGroupManagers/my-igm.
  • instance-template-name=at

    • Output only. The name of the Instance Template used for the Managed Instance Group.
  • .. min-cpu-platform=sed

    • Optional. Specifies the minimum cpu platform for the Instance Group. See Dataproc -> Minimum CPU Platform (https://cloud.google.com/dataproc/docs/concepts/compute/dataproc-min-cpu).
  • min-num-instances=36
    • Optional. The minimum number of primary worker instances to create. If min_num_instances is set, cluster creation will succeed if the number of primary workers created is at least equal to the min_num_instances number.Example: Cluster creation request with num_instances = 5 and min_num_instances = 3: If 4 VMs are created and 1 instance fails, the failed VM is deleted. The cluster is resized to 4 instances and placed in a RUNNING state. If 2 instances are created and 3 instances fail, the cluster in placed in an ERROR state. The failed VMs are not deleted.
  • num-instances=56
    • Optional. The number of VM instances in the instance group. For HA cluster master_config groups, must be set to 3. For standard cluster master_config groups, must be set to 1.
  • preemptibility=voluptua.
    • Optional. Specifies the preemptibility of the instance group.The default value for master and worker groups is NON_PREEMPTIBLE. This default cannot be changed.The default value for secondary instances is PREEMPTIBLE.
  • startup-config required-registration-fraction=0.5213406658219825

    • Optional. The config setting to enable cluster creation/ updation to be successful only after required_registration_fraction of instances are up and running. This configuration is applicable to only secondary workers for now. The cluster will fail if required_registration_fraction of instances are not available. This will include instance creation, agent registration, and service registration (if enabled).
  • .... labels=key=sit

    • Optional. The labels to associate with this cluster. Label keys must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). Label values may be empty, but, if present, must contain 1 to 63 characters, and must conform to RFC 1035 (https://www.ietf.org/rfc/rfc1035.txt). No more than 32 labels can be associated with a cluster.
    • the value will be associated with the given key
  • metrics hdfs-metrics=key=rebum.
    • The HDFS metrics.
    • the value will be associated with the given key
  • yarn-metrics=key=sanctus

    • YARN metrics.
    • the value will be associated with the given key
  • .. project-id=no

    • Required. The Google Cloud Platform project ID that the cluster belongs to.
  • status detail=stet
    • Optional. Output only. Details of cluster's state.
  • state=diam
    • Output only. The cluster's state.
  • state-start-time=ipsum
    • Output only. Time when this state was entered (see JSON representation of Timestamp (https://developers.google.com/protocol-buffers/docs/proto3#json)).
  • substate=eos

    • Output only. Additional state information that includes status reported by the agent.
  • ..virtual-cluster-config.auxiliary-services-config.metastore-config dataproc-metastore-service=erat

    • Required. Resource name of an existing Dataproc Metastore service.Example: projects/[project_id]/locations/[dataproc_region]/services/[service-name]
  • ..spark-history-server-config dataproc-cluster=at

    • Optional. Resource name of an existing Dataproc Cluster to act as a Spark History Server for the workload.Example: projects/[project_id]/regions/[region]/clusters/[cluster_name]
  • ...kubernetes-cluster-config.gke-cluster-config gke-cluster-target=amet

    • Optional. A target GKE cluster to deploy to. It must be in the same project and region as the Dataproc cluster (the GKE cluster can be zonal or regional). Format: 'projects/{project}/locations/{location}/clusters/{cluster_id}'
  • namespaced-gke-deployment-target cluster-namespace=justo
    • Optional. A namespace within the GKE cluster to deploy into.
  • target-gke-cluster=justo

    • Optional. The target GKE cluster to deploy to. Format: 'projects/{project}/locations/{location}/clusters/{cluster_id}'
  • ... kubernetes-namespace=eirmod

    • Optional. A namespace within the Kubernetes cluster to deploy into. If this namespace does not exist, it is created. If it exists, Dataproc verifies that another Dataproc VirtualCluster is not installed into it. If not specified, the name of the Dataproc Cluster is used.
  • kubernetes-software-config component-version=key=duo
    • The components that should be installed in this Dataproc cluster. The key must be a string from the KubernetesComponent enumeration. The value is the version of the software to be installed. At least one entry must be specified.
    • the value will be associated with the given key
  • properties=key=sanctus

    • The properties to set on daemon config files.Property keys are specified in prefix:property format, for example spark:spark.kubernetes.container.image. The following are supported prefixes and their mappings: spark: spark-defaults.confFor more information, see Cluster properties (https://cloud.google.com/dataproc/docs/concepts/cluster-properties).
    • the value will be associated with the given key
  • ... staging-bucket=aliquyam

    • Optional. A Cloud Storage bucket used to stage job dependencies, config files, and job driver console output. If you do not specify a staging bucket, Cloud Dataproc will determine a Cloud Storage location (US, ASIA, or EU) for your cluster's staging bucket according to the Compute Engine zone where your cluster is deployed, and then create and manage this project-level, per-location bucket (see Dataproc staging and temp buckets (https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/staging-bucket)). This field requires a Cloud Storage bucket name, not a gs://... URI to a Cloud Storage bucket.

About Cursors

The cursor position is key to comfortably set complex nested structures. The following rules apply:

  • The cursor position is always set relative to the current one, unless the field name starts with the . character. Fields can be nested such as in -r f.s.o .
  • The cursor position is set relative to the top-level structure if it starts with ., e.g. -r .s.s
  • You can also set nested fields without setting the cursor explicitly. For example, to set a value relative to the current cursor position, you would specify -r struct.sub_struct=bar.
  • You can move the cursor one level up by using ... Each additional . moves it up one additional level. E.g. ... would go three levels up.

Optional Output Flags

The method's return value a JSON encoded structure, which will be written to standard output by default.

  • -o out
    • out specifies the destination to which to write the server's result to. It will be a JSON-encoded structure. The destination may be - to indicate standard output, or a filepath that is to contain the received bytes. If unset, it defaults to standard output.

Optional Method Properties

You may set the following properties to further configure the call. Please note that -p is followed by one or more key-value-pairs, and is called like this -p k1=v1 k2=v2 even though the listing below repeats the -p for completeness.

  • -p action-on-failed-primary-workers=string

    • Optional. Failure action when primary worker creation fails.
  • -p request-id=string

    • Optional. A unique ID used to identify the request. If the server receives two CreateClusterRequest (https://cloud.google.com/dataproc/docs/reference/rpc/google.cloud.dataproc.v1#google.cloud.dataproc.v1.CreateClusterRequest)s with the same id, then the second request will be ignored and the first google.longrunning.Operation created and stored in the backend is returned.It is recommended to always set this value to a UUID (https://en.wikipedia.org/wiki/Universally_unique_identifier).The ID must contain only letters (a-z, A-Z), numbers (0-9), underscores (_), and hyphens (-). The maximum length is 40 characters.

Optional General Properties

The following properties can configure any call, and are not specific to this method.

  • -p $-xgafv=string

    • V1 error format.
  • -p access-token=string

    • OAuth access token.
  • -p alt=string

    • Data format for response.
  • -p callback=string

    • JSONP
  • -p fields=string

    • Selector specifying which fields to include in a partial response.
  • -p key=string

    • API key. Your API key identifies your project and provides you with API access, quota, and reports. Required unless you provide an OAuth 2.0 token.
  • -p oauth-token=string

    • OAuth 2.0 token for the current user.
  • -p pretty-print=boolean

    • Returns response with indentations and line breaks.
  • -p quota-user=string

    • Available to use for quota purposes for server-side applications. Can be any arbitrary string assigned to a user, but should not exceed 40 characters.
  • -p upload-type=string

    • Legacy upload protocol for media (e.g. "media", "multipart").
  • -p upload-protocol=string

    • Upload protocol for media (e.g. "raw", "multipart").