Management your AWS Glue Studio improvement interface with AWS Glue job mode API property


Lately, because the significance of huge knowledge has grown, environment friendly knowledge processing and evaluation have grow to be essential components in figuring out an organization’s competitiveness. AWS Glue, a serverless knowledge integration service for integrating knowledge throughout a number of knowledge sources at scale, addresses these knowledge processing wants. Amongst its options, the AWS Glue Jobs API stands out as a very noteworthy instrument.

The AWS Glue Jobs API is a strong interface that permits knowledge engineers and builders to programmatically handle and run ETL jobs. Through the use of this API, it turns into potential to automate, schedule, and monitor knowledge pipelines, enabling environment friendly operation of large-scale knowledge processing duties.

To enhance buyer expertise with the AWS Glue Jobs API, we added a brand new property describing the job mode akin to script, visible, or pocket book. On this submit, we discover how the up to date AWS Glue Jobs API works in depth and reveal the brand new expertise with the up to date API.

JobMode property

A brand new property JobMode describes the mode of AWS Glue jobs (script, visible, or pocket book) to enhance your UI expertise. AWS Glue customers can use the mode that most closely fits your desire. Some extract, rework, and cargo (ETL) builders desire to make use of visible mode and create visible jobs utilizing AWS Glue Studio visible editor. Some knowledge scientists desire to make use of notebooks jobs and use AWS Glue Studio notebooks. Some knowledge engineers and builders desire to implement script via the AWS Glue Studio script editor or most well-liked built-in improvement setting (IDE). After the job is created with the popular mode, you may seek for it by filtering on the job mode inside your saved AWS Glue jobs web page and discover it simply. Moreover, in case you are migrating current iPython pocket book recordsdata to AWS Glue Studio pocket book jobs, now you can select and set the job mode and accomplish that for a number of jobs utilizing this new API property, as demonstrated on this submit.

How CreateJob API works with the brand new JobMode property

You need to use CreateJob API to create AWS Glue script or a visible or pocket book job. The next is an instance of the way it works for a visible job utilizing AWS SDK for Python (Boto3): (substitute together with your S3 bucket)

CODE_GEN_JSON_STR = '''
{
  "node-1": {
    "S3ParquetSource": {
      "Identify": "Amazon S3",
      "Paths": [
        "s3://aws-bigdata-blog/generated_synthetic_reviews/data/product_category=Books/"
      ],
      "Exclusions": [],
      "Recurse": true,
      "AdditionalOptions": {
        "EnableSamplePath": false,
        "SamplePath": "s3://aws-bigdata-blog/generated_synthetic_reviews/knowledge/product_category=Books/73612da260b94159b705cf4df12364cb_0.snappy.parquet"
      },
      "OutputSchemas": [
        {
          "Columns": [
            {
              "Name": "marketplace",
              "Type": "string"
            },
            {
              "Name": "customer_id",
              "Type": "string"
            },
            {
              "Name": "review_id",
              "Type": "string"
            },
            {
              "Name": "product_id",
              "Type": "string"
            },
            {
              "Name": "product_title",
              "Type": "string"
            },
            {
              "Name": "star_rating",
              "Type": "bigint"
            },
            {
              "Name": "helpful_votes",
              "Type": "bigint"
            },
            {
              "Name": "total_votes",
              "Type": "bigint"
            },
            {
              "Name": "insight",
              "Type": "string"
            },
            {
              "Name": "review_headline",
              "Type": "string"
            },
            {
              "Name": "review_body",
              "Type": "string"
            },
            {
              "Name": "review_date",
              "Type": "timestamp"
            },
            {
              "Name": "review_year",
              "Type": "bigint"
            }
          ]
        }
      ]
    }
  },
  "node-2": {
    "DropFields": {
      "Identify": "Drop Fields",
      "Inputs": [
        "node-1"
      ],
      "Paths": [
        [
          "review_headline"
        ],
        [
          "review_body"
        ],
        [
          "review_date"
        ]
      ]
    }
  },
  "node-3": {
    "S3DirectTarget": {
      "Identify": "Amazon S3",
      "Inputs": [
        "node-2"
      ],
      "PartitionKeys": [],
      "Path": "s3:///knowledge/jobmode-blog/output/parquet/",
      "Compression": "snappy",
      "Format": "parquet",
      "SchemaChangePolicy": {
        "EnableUpdateCatalog": false
      }
    }
  }
}
'''

glue_client = boto3.shopper('glue')
codeGenJson = json.masses(constants.CODE_GEN_JSON_STR, strict=False)

# Name the create_job methodology
attempt:
    glue_client.create_job(
        Identify="glue-visual-job",
        Description="Glue Visible ETL job",
        Command={'Identify': 'glueetl', 'ScriptLocation': "s3://aws-glue-assets--/scripts/glue-visual-job", 'PythonVersion': "3"},
        WorkerType=constants.WORKERTYPE,
        NumberOfWorkers="G.1X",
        Position=,  
        GlueVersion="4.0",        
        CodeGenConfigurationNodes=codeGenJson,
        JobMode="VISUAL"
    )
    print("Efficiently created Glue job")
besides Exception as e:
    print(f"Error creating Glue job: {str(e)}")

CODE_GEN_JSON_STR represents the visible nodes for the AWS Glue Job. There are three nodes: node-1 makes use of S3 supply, node-2 does transformation, and node-3 makes use of S3 goal. The script instantiates the AWS Glue Boto3 shopper, masses the JSON, and calls the create_job. JobMode is ready to VISUAL.

After you run the Python script, a brand new job is created. The next screenshot reveals how the created job seems in AWS Glue visible editor.

There are three nodes within the visible directed acyclic graph (DAG): node 1 sources product overview knowledge for the product_category e book from the general public S3 bucket, node-2 drops a few of the fields that aren’t wanted for downstream programs, and node-3 persists the reworked knowledge in an area S3 bucket.

How CloudFormation works with the brand new JobMode property

You need to use AWS CloudFormation to create various kinds of AWS Glue jobs by specifying the JobMode parameter with the AWS::Glue::Job useful resource. The supported job modes embody:

On this instance, you create a AWS Glue pocket book job utilizing AWS CloudFormation, which requires setting the JobMode parameter to NOTEBOOK.

  1. Create a Jupyter Pocket book file containing your logic and code, and save the pocket book file with a descriptive identify, resembling my-glue-notebook.ipynb. Alternatively you may obtain the pocket book file, and rename it to my-glue-notebook.ipynb.
  2. Add the Pocket book file to the notebooks/ folder inside the aws-glue-assets-- S3 bucket.
  3. Create a brand new CloudFormation template to create a brand new AWS Glue job, specifying the NotebookJobName parameter as the identical identify because the Pocket book file. Right here’s the pattern snippet of CloudFormation template:
    AWSTemplateFormatVersion: '2010-09-09'
    Description: CloudFormation template for creating an AWS Glue ETL job utilizing a Jupyter Pocket book
    
    Parameters:
      NotebookJobName:
        Sort: String
        Description: Identify of the AWS Glue ETL Pocket book job
    
    Assets:
      GlueJobRole:
        Sort: AWS::IAM::Position
        Properties:
          RoleName: !Sub ${AWS::StackName}-GlueJobRole
          AssumeRolePolicyDocument:
            Model: '2012-10-17'
            Assertion:
              - Impact: Permit
                Principal:
                  Service:
                    - glue.amazonaws.com
                Motion:
                  - sts:AssumeRole
          ManagedPolicyArns:
            - arn:aws:iam::aws:coverage/service-role/AWSGlueServiceRole
          Insurance policies:
            - PolicyName: GlueJobS3Access
              PolicyDocument:
                Model: '2012-10-17'
                Assertion:
                  - Impact: Permit
                    Motion:
                      - iam:PassRole
                    Useful resource:
                      - !Sub arn:aws:iam::${AWS::AccountId}:function/${AWS::StackName}-GlueJobRole
    
      ETLNotebookJob:
        Sort: AWS::Glue::Job
        Properties:
          Identify: !Ref NotebookJobName
          Description: ETL job utilizing a Jupyter Pocket book
          Position: !GetAtt GlueJobRole.Arn
          Command:
            Identify: glueetl
            PythonVersion: '3'
            ScriptLocation: !Sub s3://aws-glue-assets-${AWS::AccountId}-${AWS::Area}/scripts/${NotebookJobName}.py
          DefaultArguments:
            '--job-bookmark-option': job-bookmark-enable
          JobMode: NOTEBOOK
    
    Outputs:
      ETLNotebookJobName:
        Worth: !Ref ETLNotebookJob
        Description: Identify of the ETL Pocket book job

  4. Deploy the CloudFormation template. For NotebookJobName, enter identical identify because the pocket book file.
  5. Confirm that the AWS Glue job you created is listed and that it has the identify you specified within the CloudFormation template.

AWS Glue pocket book reveals the Pocket book job that comprises the present cells that you simply had within the ipynb file. You’ll be able to overview the job particulars to verify it’s configured appropriately.

Console expertise

On the AWS Glue console, within the navigation pane, select ETL Jobs to look at all of your ETL jobs listed. Right here you’ve gotten totally different columns Job identify, Sort, Created by, Final modified, and AWS Glue model. You’ll be able to kind and filter by these columns. The next screenshot reveals the way it seems.

We additionally enhanced the console expertise with the JobMode introduction. The Created by column on the console provides you details about JobMode of the job. You’ll be able to filter entry jobs created by VISUAL, NOTEBOOK, or SCRIPT, as proven within the following screenshot.

This new console expertise helps you search and uncover your jobs based mostly on JobMode.

Conclusion

This submit demonstrated how AWS Glue Job API works with the newly launched job mode property. With the brand new property, you may explicitly select the mode of every job. The steps instructed detailed utilization in API, AWS SDK, and CloudFormation. Moreover, the property makes it easy to go looking and uncover your jobs rapidly on the AWS Glue console.


Concerning the Authors

Shovan Kanjilal is a Senior Analytics and Machine Studying Architect with Amazon Net Providers. He’s obsessed with serving to prospects construct scalable, safe, and high-performance knowledge options within the cloud.

Manoj Shunmugam is a DevOps Advisor in Skilled Providers at Amazon Net Providers. He works with prospects to determine infrastructures utilizing cloud-centered and/or container-based platforms within the AWS Cloud.

Noritaka Sekiyama is a Principal Huge Knowledge Architect on the AWS Glue crew. He’s chargeable for constructing software program artifacts to assist prospects. In his spare time, he enjoys biking on his street bike.

Gal HeyneGal Heyne is a Product Supervisor for AWS Glue with a powerful concentrate on AI/ML, knowledge engineering, and BI. She is obsessed with growing a deep understanding of consumers’ enterprise wants and collaborating with engineers to design easy-to-use knowledge merchandise.

Leave a Reply

Your email address will not be published. Required fields are marked *