Creates a pipeline. For a batch pipeline, you can pass scheduler information. Data Pipelines uses the scheduler information to create an internal scheduler that runs jobs periodically. If the internal scheduler is not configured, you can use RunPipeline to run jobs.

Scopes

You will need authorization for the https://www.googleapis.com/auth/cloud-platform scope to make a valid call.

If unset, the scope for this method defaults to https://www.googleapis.com/auth/cloud-platform. You can set the scope for this method like this: datapipelines1 --scope <scope> projects locations-pipelines-create ...

Required Scalar Argument

  • <parent> (string)
    • Required. The location name. For example: projects/PROJECT_ID/locations/LOCATION_ID.

Required Request Value

The request value is a data-structure with various fields. Each field may be a simple scalar or another data-structure. In the latter case it is advised to set the field-cursor to the data-structure's field to specify values more concisely.

For example, a structure like this:

GoogleCloudDatapipelinesV1Pipeline:
  create-time: string
  display-name: string
  job-count: integer
  last-update-time: string
  name: string
  pipeline-sources: { string: string }
  schedule-info:
    next-job-time: string
    schedule: string
    time-zone: string
  scheduler-service-account-email: string
  state: string
  type: string
  workload:
    dataflow-flex-template-request:
      launch-parameter:
        container-spec-gcs-path: string
        environment:
          additional-experiments: [string]
          additional-user-labels: { string: string }
          enable-streaming-engine: boolean
          flexrs-goal: string
          ip-configuration: string
          kms-key-name: string
          machine-type: string
          max-workers: integer
          network: string
          num-workers: integer
          service-account-email: string
          subnetwork: string
          temp-location: string
          worker-region: string
          worker-zone: string
          zone: string
        job-name: string
        launch-options: { string: string }
        parameters: { string: string }
        transform-name-mappings: { string: string }
        update: boolean
      location: string
      project-id: string
      validate-only: boolean
    dataflow-launch-template-request:
      gcs-path: string
      launch-parameters:
        environment:
          additional-experiments: [string]
          additional-user-labels: { string: string }
          bypass-temp-dir-validation: boolean
          enable-streaming-engine: boolean
          ip-configuration: string
          kms-key-name: string
          machine-type: string
          max-workers: integer
          network: string
          num-workers: integer
          service-account-email: string
          subnetwork: string
          temp-location: string
          worker-region: string
          worker-zone: string
          zone: string
        job-name: string
        parameters: { string: string }
        transform-name-mapping: { string: string }
        update: boolean
      location: string
      project-id: string
      validate-only: boolean

can be set completely with the following arguments which are assumed to be executed in the given order. Note how the cursor position is adjusted to the respective structures, allowing simple field names to be used most of the time.

  • -r . create-time=et
    • Output only. Immutable. The timestamp when the pipeline was initially created. Set by the Data Pipelines service.
  • display-name=magna
    • Required. The display name of the pipeline. It can contain only letters ([A-Za-z]), numbers ([0-9]), hyphens (-), and underscores (_).
  • job-count=90
    • Output only. Number of jobs.
  • last-update-time=ipsum
    • Output only. Immutable. The timestamp when the pipeline was last modified. Set by the Data Pipelines service.
  • name=voluptua.
    • The pipeline name. For example: projects/PROJECT_ID/locations/LOCATION_ID/pipelines/PIPELINE_ID. * PROJECT_ID can contain letters ([A-Za-z]), numbers ([0-9]), hyphens (-), colons (:), and periods (.). For more information, see Identifying projects. * LOCATION_ID is the canonical ID for the pipeline's location. The list of available locations can be obtained by calling google.cloud.location.Locations.ListLocations. Note that the Data Pipelines service is not available in all regions. It depends on Cloud Scheduler, an App Engine application, so it's only available in App Engine regions. * PIPELINE_ID is the ID of the pipeline. Must be unique for the selected project and location.
  • pipeline-sources=key=at
    • Immutable. The sources of the pipeline (for example, Dataplex). The keys and values are set by the corresponding sources during pipeline creation.
    • the value will be associated with the given key
  • schedule-info next-job-time=sanctus
    • Output only. When the next Scheduler job is going to run.
  • schedule=sed
    • Unix-cron format of the schedule. This information is retrieved from the linked Cloud Scheduler.
  • time-zone=amet.

    • Timezone ID. This matches the timezone IDs used by the Cloud Scheduler API. If empty, UTC time is assumed.
  • .. scheduler-service-account-email=takimata

    • Optional. A service account email to be used with the Cloud Scheduler job. If not specified, the default compute engine service account will be used.
  • state=amet.
    • Required. The state of the pipeline. When the pipeline is created, the state is set to 'PIPELINE_STATE_ACTIVE' by default. State changes can be requested by setting the state to stopping, paused, or resuming. State cannot be changed through UpdatePipeline requests.
  • type=duo
    • Required. The type of the pipeline. This field affects the scheduling of the pipeline and the type of metrics to show for the pipeline.
  • workload.dataflow-flex-template-request.launch-parameter container-spec-gcs-path=ipsum
    • Cloud Storage path to a file with a JSON-serialized ContainerSpec as content.
  • environment additional-experiments=gubergren
    • Additional experiment flags for the job.
    • Each invocation of this argument appends the given value to the array.
  • additional-user-labels=key=lorem
    • Additional user labels to be specified for the job. Keys and values must follow the restrictions specified in the labeling restrictions. An object containing a list of key/value pairs. Example: { &#34;name&#34;: &#34;wrench&#34;, &#34;mass&#34;: &#34;1kg&#34;, &#34;count&#34;: &#34;3&#34; }.
    • the value will be associated with the given key
  • enable-streaming-engine=false
    • Whether to enable Streaming Engine for the job.
  • flexrs-goal=dolor
    • Set FlexRS goal for the job. https://cloud.google.com/dataflow/docs/guides/flexrs
  • ip-configuration=ea
    • Configuration for VM IPs.
  • kms-key-name=ipsum
    • Name for the Cloud KMS key for the job. Key format is: projects//locations//keyRings//cryptoKeys/
  • machine-type=invidunt
    • The machine type to use for the job. Defaults to the value from the template if not specified.
  • max-workers=54
    • The maximum number of Compute Engine instances to be made available to your pipeline during execution, from 1 to 1000.
  • network=duo
    • Network to which VMs will be assigned. If empty or unspecified, the service will use the network "default".
  • num-workers=51
    • The initial number of Compute Engine instances for the job.
  • service-account-email=sed
    • The email address of the service account to run the job as.
  • subnetwork=ut
    • Subnetwork to which VMs will be assigned, if desired. You can specify a subnetwork using either a complete URL or an abbreviated path. Expected to be of the form "https://www.googleapis.com/compute/v1/projects/HOST_PROJECT_ID/regions/REGION/subnetworks/SUBNETWORK" or "regions/REGION/subnetworks/SUBNETWORK". If the subnetwork is located in a Shared VPC network, you must use the complete URL.
  • temp-location=gubergren
    • The Cloud Storage path to use for temporary files. Must be a valid Cloud Storage URL, beginning with gs://.
  • worker-region=rebum.
    • The Compute Engine region (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in which worker processing should occur, e.g. "us-west1". Mutually exclusive with worker_zone. If neither worker_region nor worker_zone is specified, defaults to the control plane region.
  • worker-zone=est
    • The Compute Engine zone (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in which worker processing should occur, e.g. "us-west1-a". Mutually exclusive with worker_region. If neither worker_region nor worker_zone is specified, a zone in the control plane region is chosen based on available capacity. If both worker_zone and zone are set, worker_zone takes precedence.
  • zone=ipsum

    • The Compute Engine availability zone for launching worker instances to run your pipeline. In the future, worker_zone will take precedence.
  • .. job-name=ipsum

    • Required. The job name to use for the created job. For an update job request, the job name should be the same as the existing running job.
  • launch-options=key=est
    • Launch options for this Flex Template job. This is a common set of options across languages and templates. This should not be used to pass job parameters.
    • the value will be associated with the given key
  • parameters=key=gubergren
    • The parameters for the Flex Template. Example: {&#34;num_workers&#34;:&#34;5&#34;}
    • the value will be associated with the given key
  • transform-name-mappings=key=ea
    • Use this to pass transform name mappings for streaming update jobs. Example: {&#34;oldTransformName&#34;:&#34;newTransformName&#34;,...}
    • the value will be associated with the given key
  • update=false

    • Set this to true if you are sending a request to update a running streaming job. When set, the job name should be the same as the running job.
  • .. location=lorem

    • Required. The [regional endpoint] (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) to which to direct the request. For example, us-central1, us-west1.
  • project-id=eos
    • Required. The ID of the Cloud Platform project that the job belongs to.
  • validate-only=false

    • If true, the request is validated but not actually executed. Defaults to false.
  • ..dataflow-launch-template-request gcs-path=sed

    • A Cloud Storage path to the template from which to create the job. Must be a valid Cloud Storage URL, beginning with 'gs://'.
  • launch-parameters.environment additional-experiments=duo
    • Additional experiment flags for the job.
    • Each invocation of this argument appends the given value to the array.
  • additional-user-labels=key=sed
    • Additional user labels to be specified for the job. Keys and values should follow the restrictions specified in the labeling restrictions page. An object containing a list of key/value pairs. Example: { "name": "wrench", "mass": "1kg", "count": "3" }.
    • the value will be associated with the given key
  • bypass-temp-dir-validation=true
    • Whether to bypass the safety checks for the job's temporary directory. Use with caution.
  • enable-streaming-engine=true
    • Whether to enable Streaming Engine for the job.
  • ip-configuration=et
    • Configuration for VM IPs.
  • kms-key-name=et
    • Name for the Cloud KMS key for the job. The key format is: projects//locations//keyRings//cryptoKeys/
  • machine-type=vero
    • The machine type to use for the job. Defaults to the value from the template if not specified.
  • max-workers=70
    • The maximum number of Compute Engine instances to be made available to your pipeline during execution, from 1 to 1000.
  • network=sed
    • Network to which VMs will be assigned. If empty or unspecified, the service will use the network "default".
  • num-workers=81
    • The initial number of Compute Engine instances for the job.
  • service-account-email=dolore
    • The email address of the service account to run the job as.
  • subnetwork=et
    • Subnetwork to which VMs will be assigned, if desired. You can specify a subnetwork using either a complete URL or an abbreviated path. Expected to be of the form "https://www.googleapis.com/compute/v1/projects/HOST_PROJECT_ID/regions/REGION/subnetworks/SUBNETWORK" or "regions/REGION/subnetworks/SUBNETWORK". If the subnetwork is located in a Shared VPC network, you must use the complete URL.
  • temp-location=voluptua.
    • The Cloud Storage path to use for temporary files. Must be a valid Cloud Storage URL, beginning with gs://.
  • worker-region=amet.
    • The Compute Engine region (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in which worker processing should occur, e.g. "us-west1". Mutually exclusive with worker_zone. If neither worker_region nor worker_zone is specified, default to the control plane's region.
  • worker-zone=consetetur
    • The Compute Engine zone (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in which worker processing should occur, e.g. "us-west1-a". Mutually exclusive with worker_region. If neither worker_region nor worker_zone is specified, a zone in the control plane's region is chosen based on available capacity. If both worker_zone and zone are set, worker_zone takes precedence.
  • zone=diam

    • The Compute Engine availability zone for launching worker instances to run your pipeline. In the future, worker_zone will take precedence.
  • .. job-name=dolor

    • Required. The job name to use for the created job.
  • parameters=key=et
    • The runtime parameters to pass to the job.
    • the value will be associated with the given key
  • transform-name-mapping=key=et
    • Map of transform name prefixes of the job to be replaced to the corresponding name prefixes of the new job. Only applicable when updating a pipeline.
    • the value will be associated with the given key
  • update=false

    • If set, replace the existing pipeline with the name specified by jobName with this pipeline, preserving state.
  • .. location=stet

    • The [regional endpoint] (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) to which to direct the request.
  • project-id=dolor
    • Required. The ID of the Cloud Platform project that the job belongs to.
  • validate-only=false
    • If true, the request is validated but not actually executed. Defaults to false.

About Cursors

The cursor position is key to comfortably set complex nested structures. The following rules apply:

  • The cursor position is always set relative to the current one, unless the field name starts with the . character. Fields can be nested such as in -r f.s.o .
  • The cursor position is set relative to the top-level structure if it starts with ., e.g. -r .s.s
  • You can also set nested fields without setting the cursor explicitly. For example, to set a value relative to the current cursor position, you would specify -r struct.sub_struct=bar.
  • You can move the cursor one level up by using ... Each additional . moves it up one additional level. E.g. ... would go three levels up.

Optional Output Flags

The method's return value a JSON encoded structure, which will be written to standard output by default.

  • -o out
    • out specifies the destination to which to write the server's result to. It will be a JSON-encoded structure. The destination may be - to indicate standard output, or a filepath that is to contain the received bytes. If unset, it defaults to standard output.

Optional General Properties

The following properties can configure any call, and are not specific to this method.

  • -p $-xgafv=string

    • V1 error format.
  • -p access-token=string

    • OAuth access token.
  • -p alt=string

    • Data format for response.
  • -p callback=string

    • JSONP
  • -p fields=string

    • Selector specifying which fields to include in a partial response.
  • -p key=string

    • API key. Your API key identifies your project and provides you with API access, quota, and reports. Required unless you provide an OAuth 2.0 token.
  • -p oauth-token=string

    • OAuth 2.0 token for the current user.
  • -p pretty-print=boolean

    • Returns response with indentations and line breaks.
  • -p quota-user=string

    • Available to use for quota purposes for server-side applications. Can be any arbitrary string assigned to a user, but should not exceed 40 characters.
  • -p upload-type=string

    • Legacy upload protocol for media (e.g. "media", "multipart").
  • -p upload-protocol=string

    • Upload protocol for media (e.g. "raw", "multipart").