Creates a new job to inspect storage or calculate risk metrics. See https://cloud.google.com/sensitive-data-protection/docs/inspecting-storage and https://cloud.google.com/sensitive-data-protection/docs/compute-risk-analysis to learn more. When no InfoTypes or CustomInfoTypes are specified in inspect jobs, the system will automatically choose what detectors to run. By default this may be all types, but may change over time as detectors are updated.
Scopes
You will need authorization for the https://www.googleapis.com/auth/cloud-platform scope to make a valid call.
If unset, the scope for this method defaults to https://www.googleapis.com/auth/cloud-platform.
You can set the scope for this method like this: dlp2 --scope <scope> projects locations-dlp-jobs-create ...
Required Scalar Argument
- <parent> (string)
- Required. Parent resource name. The format of this value varies depending on whether you have specified a processing location: + Projects scope, location specified:
projects/
PROJECT_ID/locations/
LOCATION_ID + Projects scope, no location specified (defaults to global):projects/
PROJECT_ID The following exampleparent
string specifies a parent project with the identifierexample-project
, and specifies theeurope-west3
location for processing data: parent=projects/example-project/locations/europe-west3
- Required. Parent resource name. The format of this value varies depending on whether you have specified a processing location: + Projects scope, location specified:
Required Request Value
The request value is a data-structure with various fields. Each field may be a simple scalar or another data-structure. In the latter case it is advised to set the field-cursor to the data-structure's field to specify values more concisely.
For example, a structure like this:
GooglePrivacyDlpV2CreateDlpJobRequest:
inspect-job:
inspect-config:
content-options: [string]
exclude-info-types: boolean
include-quote: boolean
limits:
max-findings-per-item: integer
max-findings-per-request: integer
min-likelihood: string
inspect-template-name: string
storage-config:
big-query-options:
rows-limit: string
rows-limit-percent: integer
sample-method: string
table-reference:
dataset-id: string
project-id: string
table-id: string
cloud-storage-options:
bytes-limit-per-file: string
bytes-limit-per-file-percent: integer
file-set:
regex-file-set:
bucket-name: string
exclude-regex: [string]
include-regex: [string]
url: string
file-types: [string]
files-limit-percent: integer
sample-method: string
datastore-options:
kind:
name: string
partition-id:
namespace-id: string
project-id: string
hybrid-options:
description: string
labels: { string: string }
required-finding-label-keys: [string]
timespan-config:
enable-auto-population-of-timespan-config: boolean
end-time: string
start-time: string
timestamp-field:
name: string
job-id: string
location-id: string
risk-job:
privacy-metric:
categorical-stats-config:
field:
name: string
delta-presence-estimation-config:
region-code: string
k-anonymity-config:
entity-id:
field:
name: string
k-map-estimation-config:
region-code: string
l-diversity-config:
sensitive-attribute:
name: string
numerical-stats-config:
field:
name: string
source-table:
dataset-id: string
project-id: string
table-id: string
can be set completely with the following arguments which are assumed to be executed in the given order. Note how the cursor position is adjusted to the respective structures, allowing simple field names to be used most of the time.
-r .inspect-job.inspect-config content-options=sanctus
- Deprecated and unused.
- Each invocation of this argument appends the given value to the array.
exclude-info-types=false
- When true, excludes type information of the findings. This is not used for data profiling.
include-quote=true
- When true, a contextual quote from the data that triggered a finding is included in the response; see Finding.quote. This is not used for data profiling.
limits max-findings-per-item=68
- Max number of findings that are returned for each item scanned. When set within an InspectContentRequest, this field is ignored. This value isn't a hard limit. If the number of findings for an item reaches this limit, the inspection of that item ends gradually, not abruptly. Therefore, the actual number of findings that Cloud DLP returns for the item can be multiple times higher than this value.
-
max-findings-per-request=35
- Max number of findings that are returned per request or job. If you set this field in an InspectContentRequest, the resulting maximum value is the value that you set or 3,000, whichever is lower. This value isn't a hard limit. If an inspection reaches this limit, the inspection ends gradually, not abruptly. Therefore, the actual number of findings that Cloud DLP returns can be multiple times higher than this value.
-
.. min-likelihood=amet
- Only returns findings equal to or above this threshold. The default is POSSIBLE. In general, the highest likelihood setting yields the fewest findings in results and the lowest chance of a false positive. For more information, see Match likelihood.
-
.. inspect-template-name=sea
- If provided, will be used as the default for all values in InspectConfig.
inspect_config
will be merged into the values persisted as part of the template.
- If provided, will be used as the default for all values in InspectConfig.
storage-config.big-query-options rows-limit=sadipscing
- Max number of rows to scan. If the table has more rows than this value, the rest of the rows are omitted. If not set, or if set to 0, all rows will be scanned. Only one of rows_limit and rows_limit_percent can be specified. Cannot be used in conjunction with TimespanConfig.
rows-limit-percent=17
- Max percentage of rows to scan. The rest are omitted. The number of rows scanned is rounded down. Must be between 0 and 100, inclusively. Both 0 and 100 means no limit. Defaults to 0. Only one of rows_limit and rows_limit_percent can be specified. Cannot be used in conjunction with TimespanConfig. Caution: A known issue is causing the
rowsLimitPercent
field to behave unexpectedly. We recommend usingrowsLimit
instead.
- Max percentage of rows to scan. The rest are omitted. The number of rows scanned is rounded down. Must be between 0 and 100, inclusively. Both 0 and 100 means no limit. Defaults to 0. Only one of rows_limit and rows_limit_percent can be specified. Cannot be used in conjunction with TimespanConfig. Caution: A known issue is causing the
sample-method=amet
- How to sample the data.
table-reference dataset-id=invidunt
- Dataset ID of the table.
project-id=invidunt
- The Google Cloud Platform project ID of the project containing the table. If omitted, project ID is inferred from the API call.
-
table-id=dolores
- Name of the table.
-
...cloud-storage-options bytes-limit-per-file=diam
- Max number of bytes to scan from a file. If a scanned file's size is bigger than this value then the rest of the bytes are omitted. Only one of
bytes_limit_per_file
andbytes_limit_per_file_percent
can be specified. This field can't be set if de-identification is requested. For certain file types, setting this field has no effect. For more information, see Limits on bytes scanned per file.
- Max number of bytes to scan from a file. If a scanned file's size is bigger than this value then the rest of the bytes are omitted. Only one of
bytes-limit-per-file-percent=43
- Max percentage of bytes to scan from a file. The rest are omitted. The number of bytes scanned is rounded down. Must be between 0 and 100, inclusively. Both 0 and 100 means no limit. Defaults to 0. Only one of bytes_limit_per_file and bytes_limit_per_file_percent can be specified. This field can't be set if de-identification is requested. For certain file types, setting this field has no effect. For more information, see Limits on bytes scanned per file.
file-set.regex-file-set bucket-name=sed
- The name of a Cloud Storage bucket. Required.
exclude-regex=eos
- A list of regular expressions matching file paths to exclude. All files in the bucket that match at least one of these regular expressions will be excluded from the scan. Regular expressions use RE2 syntax; a guide can be found under the google/re2 repository on GitHub.
- Each invocation of this argument appends the given value to the array.
-
include-regex=sit
- A list of regular expressions matching file paths to include. All files in the bucket that match at least one of these regular expressions will be included in the set of files, except for those that also match an item in
exclude_regex
. Leaving this field empty will match all files by default (this is equivalent to including.*
in the list). Regular expressions use RE2 syntax; a guide can be found under the google/re2 repository on GitHub. - Each invocation of this argument appends the given value to the array.
- A list of regular expressions matching file paths to include. All files in the bucket that match at least one of these regular expressions will be included in the set of files, except for those that also match an item in
-
.. url=et
- The Cloud Storage url of the file(s) to scan, in the format
gs:///
. Trailing wildcard in the path is allowed. If the url ends in a trailing slash, the bucket or directory represented by the url will be scanned non-recursively (content in sub-directories will not be scanned). This means thatgs://mybucket/
is equivalent togs://mybucket/*
, andgs://mybucket/directory/
is equivalent togs://mybucket/directory/*
. Exactly one ofurl
orregex_file_set
must be set.
- The Cloud Storage url of the file(s) to scan, in the format
-
.. file-types=ea
- List of file type groups to include in the scan. If empty, all files are scanned and available data format processors are applied. In addition, the binary content of the selected files is always scanned as well. Images are scanned only as binary if the specified region does not support image inspection and no file_types were specified. Image inspection is restricted to 'global', 'us', 'asia', and 'europe'.
- Each invocation of this argument appends the given value to the array.
files-limit-percent=2
- Limits the number of files to scan to this percentage of the input FileSet. Number of files scanned is rounded down. Must be between 0 and 100, inclusively. Both 0 and 100 means no limit. Defaults to 0.
-
sample-method=sadipscing
- How to sample the data.
-
..datastore-options.kind name=diam
- The name of the kind.
-
..partition-id namespace-id=at
- If not empty, the ID of the namespace to which the entities belong.
-
project-id=at
- The ID of the project to which the entities belong.
-
...hybrid-options description=kasd
- A short description of where the data is coming from. Will be stored once in the job. 256 max length.
labels=key=magna
- To organize findings, these labels will be added to each finding. Label keys must be between 1 and 63 characters long and must conform to the following regular expression:
[a-z]([-a-z0-9]*[a-z0-9])?
. Label values must be between 0 and 63 characters long and must conform to the regular expression([a-z]([-a-z0-9]*[a-z0-9])?)?
. No more than 10 labels can be associated with a given finding. Examples: *"environment" : "production"
*"pipeline" : "etl"
- the value will be associated with the given
key
- To organize findings, these labels will be added to each finding. Label keys must be between 1 and 63 characters long and must conform to the following regular expression:
-
required-finding-label-keys=amet.
- These are labels that each inspection request must include within their 'finding_labels' map. Request may contain others, but any missing one of these will be rejected. Label keys must be between 1 and 63 characters long and must conform to the following regular expression:
[a-z]([-a-z0-9]*[a-z0-9])?
. No more than 10 keys can be required. - Each invocation of this argument appends the given value to the array.
- These are labels that each inspection request must include within their 'finding_labels' map. Request may contain others, but any missing one of these will be rejected. Label keys must be between 1 and 63 characters long and must conform to the following regular expression:
-
..timespan-config enable-auto-population-of-timespan-config=true
- When the job is started by a JobTrigger we will automatically figure out a valid start_time to avoid scanning files that have not been modified since the last time the JobTrigger executed. This will be based on the time of the execution of the last run of the JobTrigger or the timespan end_time used in the last run of the JobTrigger.
end-time=eos
- Exclude files, tables, or rows newer than this value. If not set, no upper time limit is applied.
start-time=dolore
- Exclude files, tables, or rows older than this value. If not set, no lower time limit is applied.
-
timestamp-field name=tempor
- Name describing the field.
-
..... job-id=stet
- The job id can contain uppercase and lowercase letters, numbers, and hyphens; that is, it must match the regular expression:
[a-zA-Z\d-_]+
. The maximum length is 100 characters. Can be empty to allow the system to generate one.
- The job id can contain uppercase and lowercase letters, numbers, and hyphens; that is, it must match the regular expression:
location-id=accusam
- Deprecated. This field has no effect.
-
risk-job.privacy-metric.categorical-stats-config.field name=et
- Name describing the field.
-
...delta-presence-estimation-config region-code=dolor
- ISO 3166-1 alpha-2 region code to use in the statistical modeling. Set if no column is tagged with a region-specific InfoType (like US_ZIP_5) or a region code.
-
..k-anonymity-config.entity-id.field name=diam
- Name describing the field.
-
....k-map-estimation-config region-code=elitr
- ISO 3166-1 alpha-2 region code to use in the statistical modeling. Set if no column is tagged with a region-specific InfoType (like US_ZIP_5) or a region code.
-
..l-diversity-config.sensitive-attribute name=sea
- Name describing the field.
-
...numerical-stats-config.field name=vero
- Name describing the field.
-
....source-table dataset-id=et
- Dataset ID of the table.
project-id=lorem
- The Google Cloud Platform project ID of the project containing the table. If omitted, project ID is inferred from the API call.
table-id=sit
- Name of the table.
About Cursors
The cursor position is key to comfortably set complex nested structures. The following rules apply:
- The cursor position is always set relative to the current one, unless the field name starts with the
.
character. Fields can be nested such as in-r f.s.o
. - The cursor position is set relative to the top-level structure if it starts with
.
, e.g.-r .s.s
- You can also set nested fields without setting the cursor explicitly. For example, to set a value relative to the current cursor position, you would specify
-r struct.sub_struct=bar
. - You can move the cursor one level up by using
..
. Each additional.
moves it up one additional level. E.g....
would go three levels up.
Optional Output Flags
The method's return value a JSON encoded structure, which will be written to standard output by default.
- -o out
- out specifies the destination to which to write the server's result to.
It will be a JSON-encoded structure.
The destination may be
-
to indicate standard output, or a filepath that is to contain the received bytes. If unset, it defaults to standard output.
- out specifies the destination to which to write the server's result to.
It will be a JSON-encoded structure.
The destination may be
Optional General Properties
The following properties can configure any call, and are not specific to this method.
-
-p $-xgafv=string
- V1 error format.
-
-p access-token=string
- OAuth access token.
-
-p alt=string
- Data format for response.
-
-p callback=string
- JSONP
-
-p fields=string
- Selector specifying which fields to include in a partial response.
-
-p key=string
- API key. Your API key identifies your project and provides you with API access, quota, and reports. Required unless you provide an OAuth 2.0 token.
-
-p oauth-token=string
- OAuth 2.0 token for the current user.
-
-p pretty-print=boolean
- Returns response with indentations and line breaks.
-
-p quota-user=string
- Available to use for quota purposes for server-side applications. Can be any arbitrary string assigned to a user, but should not exceed 40 characters.
-
-p upload-type=string
- Legacy upload protocol for media (e.g. "media", "multipart").
-
-p upload-protocol=string
- Upload protocol for media (e.g. "raw", "multipart").