crawlers
Creates, updates, deletes or gets a crawler
resource or lists crawlers
in a region
Overview
Name | crawlers |
Type | Resource |
Description | Resource Type definition for AWS::Glue::Crawler |
Id | aws.glue.crawlers |
Fields
Name | Datatype | Description |
---|---|---|
classifiers | array | A list of UTF-8 strings that specify the names of custom classifiers that are associated with the crawler. |
description | string | A description of the crawler. |
schema_change_policy | object | The policy that specifies update and delete behaviors for the crawler. The policy tells the crawler what to do in the event that it detects a change in a table that already exists in the customer's database at the time of the crawl. The SchemaChangePolicy does not affect whether or how new tables and partitions are added. New tables and partitions are always created regardless of the SchemaChangePolicy on a crawler. The SchemaChangePolicy consists of two components, UpdateBehavior and DeleteBehavior. |
configuration | string | Crawler configuration information. This versioned JSON string allows users to specify aspects of a crawler's behavior. |
recrawl_policy | object | When crawling an Amazon S3 data source after the first crawl is complete, specifies whether to crawl the entire dataset again or to crawl only folders that were added since the last crawler run. For more information, see Incremental Crawls in AWS Glue in the developer guide. |
database_name | string | The name of the database in which the crawler's output is stored. |
targets | object | Specifies data stores to crawl. |
crawler_security_configuration | string | The name of the SecurityConfiguration structure to be used by this crawler. |
name | string | The name of the crawler. |
role | string | The Amazon Resource Name (ARN) of an IAM role that's used to access customer resources, such as Amazon Simple Storage Service (Amazon S3) data. |
lake_formation_configuration | object | Specifies AWS Lake Formation configuration settings for the crawler |
schedule | object | A scheduling object using a cron statement to schedule an event. |
table_prefix | string | The prefix added to the names of tables that are created. |
tags | object | The tags to use with this crawler. |
region | string | AWS region. |
For more information, see AWS::Glue::Crawler
.
Methods
Name | Accessible by | Required Params |
---|---|---|
create_resource | INSERT | Role, Targets, region |
delete_resource | DELETE | data__Identifier, region |
update_resource | UPDATE | data__Identifier, data__PatchDocument, region |
list_resources | SELECT | region |
get_resource | SELECT | data__Identifier, region |
SELECT
examples
Gets all crawlers
in a region.
SELECT
region,
classifiers,
description,
schema_change_policy,
configuration,
recrawl_policy,
database_name,
targets,
crawler_security_configuration,
name,
role,
lake_formation_configuration,
schedule,
table_prefix,
tags
FROM aws.glue.crawlers
WHERE region = 'us-east-1';
Gets all properties from an individual crawler
.
SELECT
region,
classifiers,
description,
schema_change_policy,
configuration,
recrawl_policy,
database_name,
targets,
crawler_security_configuration,
name,
role,
lake_formation_configuration,
schedule,
table_prefix,
tags
FROM aws.glue.crawlers
WHERE region = 'us-east-1' AND data__Identifier = '<Name>';
INSERT
example
Use the following StackQL query and manifest file to create a new crawler
resource, using stack-deploy
.
- Required Properties
- All Properties
- Manifest
/*+ create */
INSERT INTO aws.glue.crawlers (
Targets,
Role,
region
)
SELECT
'{{ Targets }}',
'{{ Role }}',
'{{ region }}';
/*+ create */
INSERT INTO aws.glue.crawlers (
Classifiers,
Description,
SchemaChangePolicy,
Configuration,
RecrawlPolicy,
DatabaseName,
Targets,
CrawlerSecurityConfiguration,
Name,
Role,
LakeFormationConfiguration,
Schedule,
TablePrefix,
Tags,
region
)
SELECT
'{{ Classifiers }}',
'{{ Description }}',
'{{ SchemaChangePolicy }}',
'{{ Configuration }}',
'{{ RecrawlPolicy }}',
'{{ DatabaseName }}',
'{{ Targets }}',
'{{ CrawlerSecurityConfiguration }}',
'{{ Name }}',
'{{ Role }}',
'{{ LakeFormationConfiguration }}',
'{{ Schedule }}',
'{{ TablePrefix }}',
'{{ Tags }}',
'{{ region }}';
version: 1
name: stack name
description: stack description
providers:
- aws
globals:
- name: region
value: '{{ vars.AWS_REGION }}'
resources:
- name: crawler
props:
- name: Classifiers
value:
- '{{ Classifiers[0] }}'
- name: Description
value: '{{ Description }}'
- name: SchemaChangePolicy
value:
UpdateBehavior: '{{ UpdateBehavior }}'
DeleteBehavior: '{{ DeleteBehavior }}'
- name: Configuration
value: '{{ Configuration }}'
- name: RecrawlPolicy
value:
RecrawlBehavior: '{{ RecrawlBehavior }}'
- name: DatabaseName
value: '{{ DatabaseName }}'
- name: Targets
value:
S3Targets:
- ConnectionName: '{{ ConnectionName }}'
Path: '{{ Path }}'
SampleSize: '{{ SampleSize }}'
Exclusions:
- '{{ Exclusions[0] }}'
DlqEventQueueArn: '{{ DlqEventQueueArn }}'
EventQueueArn: '{{ EventQueueArn }}'
CatalogTargets:
- ConnectionName: '{{ ConnectionName }}'
DatabaseName: '{{ DatabaseName }}'
DlqEventQueueArn: '{{ DlqEventQueueArn }}'
Tables:
- '{{ Tables[0] }}'
EventQueueArn: '{{ EventQueueArn }}'
DeltaTargets:
- ConnectionName: '{{ ConnectionName }}'
CreateNativeDeltaTable: '{{ CreateNativeDeltaTable }}'
WriteManifest: '{{ WriteManifest }}'
DeltaTables:
- '{{ DeltaTables[0] }}'
MongoDBTargets:
- ConnectionName: '{{ ConnectionName }}'
Path: '{{ Path }}'
JdbcTargets:
- ConnectionName: '{{ ConnectionName }}'
Path: '{{ Path }}'
Exclusions:
- '{{ Exclusions[0] }}'
EnableAdditionalMetadata:
- '{{ EnableAdditionalMetadata[0] }}'
DynamoDBTargets:
- Path: '{{ Path }}'
IcebergTargets:
- ConnectionName: '{{ ConnectionName }}'
Paths:
- '{{ Paths[0] }}'
Exclusions:
- '{{ Exclusions[0] }}'
MaximumTraversalDepth: '{{ MaximumTraversalDepth }}'
- name: CrawlerSecurityConfiguration
value: '{{ CrawlerSecurityConfiguration }}'
- name: Name
value: '{{ Name }}'
- name: Role
value: '{{ Role }}'
- name: LakeFormationConfiguration
value:
UseLakeFormationCredentials: '{{ UseLakeFormationCredentials }}'
AccountId: '{{ AccountId }}'
- name: Schedule
value:
ScheduleExpression: '{{ ScheduleExpression }}'
- name: TablePrefix
value: '{{ TablePrefix }}'
- name: Tags
value: {}
DELETE
example
/*+ delete */
DELETE FROM aws.glue.crawlers
WHERE data__Identifier = '<Name>'
AND region = 'us-east-1';
Permissions
To operate on the crawlers
resource, the following permissions are required:
Create
glue:CreateCrawler,
glue:GetCrawler,
glue:TagResource,
iam:PassRole
Read
glue:GetCrawler,
glue:GetTags,
iam:PassRole
Update
glue:UpdateCrawler,
glue:UntagResource,
glue:TagResource,
iam:PassRole
Delete
glue:DeleteCrawler,
glue:GetCrawler,
glue:StopCrawler,
iam:PassRole
List
glue:ListCrawlers,
iam:PassRole