Sunday, March 31, 2024

Constructing a Batch Knowledge Pipeline with Athena and MySQL | by 💡Mike Shakhomirov | Oct, 2023

Must read

An Finish-To-Finish Tutorial for Inexperienced persons

Towards Data Science
Photograph by Redd F on Unsplash

On this story I’ll discuss some of the fashionable methods to run information transformation duties — batch information processing. This information pipeline design sample turns into extremely helpful when we have to course of information in chunks making it very environment friendly for ETL jobs that require scheduling. I’ll show how it may be achieved by constructing an information transformation pipeline utilizing MySQL and Athena. We are going to use infrastructure as code to deploy it within the cloud.

Think about that you’ve got simply joined an organization as a Knowledge Engineer. Their information stack is trendy, event-driven, cost-effective, versatile, and may scale simply to satisfy the rising information sources you’ve. Exterior information sources and information pipelines in your information platform are managed by the info engineering workforce utilizing a versatile surroundings setup with CI/CD GitHub integration.

As an information engineer it’s good to create a enterprise intelligence dashboard that shows the geography of firm income streams as proven beneath. Uncooked fee information is saved within the server database (MySQL). You wish to construct a batch pipeline that extracts information from that database every day, then use AWS S3 to retailer information recordsdata and Athena to course of it.

Income dashboard. Picture by creator.

Batch information pipeline

A knowledge pipeline could be thought-about as a sequence of information processing steps. On account of logical information stream connections between these levels, every stage generates an output that serves as an enter for the next stage.

There’s a information pipeline at any time when there may be information processing between factors A and B.

Knowledge pipelines is perhaps completely different due it their conceptual and logical nature. I beforehand wrote about it right here [1]:

Supply hyperlink

More articles


Please enter your comment!
Please enter your name here

Latest article