This solution demonstrate a design pattern how to implement data preparation with a serverless AWS Glue ETL pipeline and Amazon SageMaker Data Wrangler in an end-to-end machine learning (ML) workflow.
It is used internally at Amazon for verifying the quality of large production datasets ... A serverless data quality framework based on Deequ and running on AWS Glue is showcased in this repository.
Glue CTO and co-founder Will Blaschko previously worked at Amazon leading a team working on Alexa voice technologies. Pioneer Square Labs’ venture arm participated in the pre-seed round ...