Skip to main content

crawlers

Creates, updates, deletes or gets a crawler resource or lists crawlers in a region

Overview

Namecrawlers
TypeResource
DescriptionResource Type definition for AWS::Glue::Crawler
Idaws.glue.crawlers

Fields

NameDatatypeDescription
classifiersarrayA list of UTF-8 strings that specify the names of custom classifiers that are associated with the crawler.
descriptionstringA description of the crawler.
schema_change_policyobjectThe policy that specifies update and delete behaviors for the crawler. The policy tells the crawler what to do in the event that it detects a change in a table that already exists in the customer's database at the time of the crawl. The SchemaChangePolicy does not affect whether or how new tables and partitions are added. New tables and partitions are always created regardless of the SchemaChangePolicy on a crawler. The SchemaChangePolicy consists of two components, UpdateBehavior and DeleteBehavior.
configurationstringCrawler configuration information. This versioned JSON string allows users to specify aspects of a crawler's behavior.
recrawl_policyobjectWhen crawling an Amazon S3 data source after the first crawl is complete, specifies whether to crawl the entire dataset again or to crawl only folders that were added since the last crawler run. For more information, see Incremental Crawls in AWS Glue in the developer guide.
database_namestringThe name of the database in which the crawler's output is stored.
targetsobjectSpecifies data stores to crawl.
crawler_security_configurationstringThe name of the SecurityConfiguration structure to be used by this crawler.
namestringThe name of the crawler.
rolestringThe Amazon Resource Name (ARN) of an IAM role that's used to access customer resources, such as Amazon Simple Storage Service (Amazon S3) data.
lake_formation_configurationobjectSpecifies AWS Lake Formation configuration settings for the crawler
scheduleobjectA scheduling object using a cron statement to schedule an event.
table_prefixstringThe prefix added to the names of tables that are created.
tagsobjectThe tags to use with this crawler.
regionstringAWS region.

For more information, see AWS::Glue::Crawler.

Methods

NameAccessible byRequired Params
create_resourceINSERTRole, Targets, region
delete_resourceDELETEdata__Identifier, region
update_resourceUPDATEdata__Identifier, data__PatchDocument, region
list_resourcesSELECTregion
get_resourceSELECTdata__Identifier, region

SELECT examples

Gets all crawlers in a region.

SELECT
region,
classifiers,
description,
schema_change_policy,
configuration,
recrawl_policy,
database_name,
targets,
crawler_security_configuration,
name,
role,
lake_formation_configuration,
schedule,
table_prefix,
tags
FROM aws.glue.crawlers
WHERE region = 'us-east-1';

Gets all properties from an individual crawler.

SELECT
region,
classifiers,
description,
schema_change_policy,
configuration,
recrawl_policy,
database_name,
targets,
crawler_security_configuration,
name,
role,
lake_formation_configuration,
schedule,
table_prefix,
tags
FROM aws.glue.crawlers
WHERE region = 'us-east-1' AND data__Identifier = '<Name>';

INSERT example

Use the following StackQL query and manifest file to create a new crawler resource, using stack-deploy.

/*+ create */
INSERT INTO aws.glue.crawlers (
Targets,
Role,
region
)
SELECT
'{{ Targets }}',
'{{ Role }}',
'{{ region }}';

DELETE example

/*+ delete */
DELETE FROM aws.glue.crawlers
WHERE data__Identifier = '<Name>'
AND region = 'us-east-1';

Permissions

To operate on the crawlers resource, the following permissions are required:

Create

glue:CreateCrawler,
glue:GetCrawler,
glue:TagResource,
iam:PassRole

Read

glue:GetCrawler,
glue:GetTags,
iam:PassRole

Update

glue:UpdateCrawler,
glue:UntagResource,
glue:TagResource,
iam:PassRole

Delete

glue:DeleteCrawler,
glue:GetCrawler,
glue:StopCrawler,
iam:PassRole

List

glue:ListCrawlers,
iam:PassRole