CODEFILES : https://github.com/curiouscurrent/data-engineering-mlops/tree/main/Trigger-GlueJob-Using-AWSLambda
- We have data incoming to our S3 bucket in CSV file format
- We need to create one AWS Glue Job (ETL) , which will transfer data from AWS S3 input bucket to another AWS S3 target bucket in the form of JSON.
- We have to trigger this Glue Job using AWS Lambda.
- When the glue job is triggered, it fetches the glue job script from Glue Job Script bucket. (which can be set in (Glue "Job Details - Advanced Properties")
-
Create 3 buckets : for input, target and to store the Glue Job script.
-
Now create a ETL - AWS Glue job
-
Create an IAM role for the Glue Job, and assign the following permissions : Allows Glue to call AWS services on your behalf.
-
Now create a Lambda function and add the trigger : Only triggered when a user pushes a .csv file into the input bucket (only allow the "PUT" Event)
-
Add the IAM role for the Lambda function and assign the same permissions as attached to AWS Glue Job. Now deploy the code
-
Now let us upload a .csv file in input S3 bucket
- Now check if the GLue Job has executed
- Now check the target bucket, you should find a .json
