Wednesday, March 20, 2024

Amassing Knowledge with Apache Airflow on a Raspberry Pi | by Dmitrii Eliuseev | Oct, 2023

Must read

A Raspberry Pi is All You Want

Towards Data Science
Raspberry Pi Zero (mannequin 2021), Picture supply Wikipedia

Typically, we have to gather some information inside a sure time period. It may be information from the IoT sensor, statistical information from social networks, or one thing else. For example, the YouTube Knowledge API permits us to get the variety of views and subscribers for any channel on the present second, however the analytics and historic information can be found solely to the channel proprietor. Thus, if we need to get weekly or month-to-month summaries about these channels, we have to gather this information ourselves. Within the case of the IoT sensor, there could also be no API in any respect, and we additionally want to gather and save information on our personal. On this article, I’ll present the way to configure Apache Airflow on a Raspberry Pi, which permits operating duties for a protracted time period with out involving any cloud supplier.

Clearly, when you’re working for a big firm, you’ll most likely not want a Raspberry Pi. In that case, when you want an additional cloud occasion, simply create a Jira ticket in your MLOps division 😉 However for a pet challenge or a low-budget startup, it may be an attention-grabbing resolution.

Let’s see the way it works.

Raspberry Pi

What is definitely a Raspberry Pi? For these readers who’ve by no means been inquisitive about {hardware} for the final 10 years (the primary Raspberry Pi mannequin was launched in 2012), I can briefly clarify that this can be a single-board laptop operating full-fledged Linux. Often, a Raspberry Pi has a 1GHz, 2–4-core ARM CPU and 1–8 MB of RAM. It’s small, low cost, and silent; it has no followers and no disk drive (the OS is operating from a Micro SD card). A Raspberry Pi wants solely an ordinary USB energy provide; it may be related through Wi-Fi or Ethernet to a community and run totally different duties inside months and even years.

For my information science pet challenge, I wished to gather the YouTube channel statistics inside 2 weeks. For a process that requires solely 30–60 seconds twice per day, a serverless structure is usually a good resolution, and we are able to use one thing like Google Cloud Operate for that. However each tutorial from Google began with the phrase “allow billing in your challenge”. There may be free first credit score and free quotas supplied by Google, however I didn’t need to have one other headache of monitoring how a lot cash I…

Supply hyperlink

More articles


Please enter your comment!
Please enter your name here

Latest article