AWS Developers | Building Data Quality in ETL pipelines using AWS Glue Data Quality @awsdevelopers | Uploaded 10 months ago | Updated 1 hour ago
Data volume is growing exponentially and business needs reliable data at a faster pace in order to be agile. At the same time, quality data is critical for efficient business operation, otherwise lot of time is wasted in fixing data and data consumers lose trust in the data. Applications use Machine learning to make data-driven decisions. However the success is heavily dependent on the quality of data. At the pace of data, we need data quality checks in the ETL pipeline so it is available immediately. This demo shows how you can achieve data quality in ETL pipeline using AWS Glue data quality.
Resources:
👉 aws.amazon.com/glue/features/data-quality
🌐 aws.amazon.com/blogs/big-data/getting-started-with-aws-glue-data-quality-for-etl-pipelines
⚡️ aws.amazon.com/blogs/big-data/aws-glue-data-quality-is-generally-available
Follow AWS Developers!
🐦 Twitter: twitter.com/awsdevelopers
💼 LinkedIn: linkedin.com/showcase/aws-developers
👾 Twitch: twitch.tv/aws
📺 Instagram: instagram.com/awsdevelopers/?hl=en
0:00 - Introduction
00:24 - Data Preview
00:37 - Evaluate Data Quality
00:52 - Add Data quality Rules
01:35 - Use Referential Integrity Rule
02:01 - Redirect Output of Data Quality results
02:23 - Actions on Data Quality output
02:38 - Integrate Data Quality rules in Glue Scripts for CI/CD
03:00 - Segregate data quality outputs
03:21 - Evaluate Data Quality Score
#data #ETL #MachineLearning
Data volume is growing exponentially and business needs reliable data at a faster pace in order to be agile. At the same time, quality data is critical for efficient business operation, otherwise lot of time is wasted in fixing data and data consumers lose trust in the data. Applications use Machine learning to make data-driven decisions. However the success is heavily dependent on the quality of data. At the pace of data, we need data quality checks in the ETL pipeline so it is available immediately. This demo shows how you can achieve data quality in ETL pipeline using AWS Glue data quality.
Resources:
👉 aws.amazon.com/glue/features/data-quality
🌐 aws.amazon.com/blogs/big-data/getting-started-with-aws-glue-data-quality-for-etl-pipelines
⚡️ aws.amazon.com/blogs/big-data/aws-glue-data-quality-is-generally-available
Follow AWS Developers!
🐦 Twitter: twitter.com/awsdevelopers
💼 LinkedIn: linkedin.com/showcase/aws-developers
👾 Twitch: twitch.tv/aws
📺 Instagram: instagram.com/awsdevelopers/?hl=en
0:00 - Introduction
00:24 - Data Preview
00:37 - Evaluate Data Quality
00:52 - Add Data quality Rules
01:35 - Use Referential Integrity Rule
02:01 - Redirect Output of Data Quality results
02:23 - Actions on Data Quality output
02:38 - Integrate Data Quality rules in Glue Scripts for CI/CD
03:00 - Segregate data quality outputs
03:21 - Evaluate Data Quality Score
#data #ETL #MachineLearning