Sunday, May 26, 2024

Fetching Knowledge from an HTTP API with Python

Must read


On this fast tip, excerpted from Helpful Python, Stuart reveals you the way simple it’s to make use of an HTTP API from Python utilizing a few third-party modules.

More often than not when working with third-party knowledge we’ll be accessing an HTTP API. That’s, we’ll be making an HTTP name to an internet web page designed to be learn by machines relatively than by individuals. API knowledge is often in a machine-readable format—normally both JSON or XML. (If we come throughout knowledge in one other format, we will use the strategies described elsewhere on this guide to transform it to JSON, in fact!) Let’s take a look at learn how to use an HTTP API from Python.

The overall rules of utilizing an HTTP API are easy:

  1. Make an HTTP name to the URLs for the API, presumably together with some authentication data (comparable to an API key) to indicate that we’re approved.
  2. Get again the information.
  3. Do one thing helpful with it.

Python offers sufficient performance in its normal library to do all this with none extra modules, however it would make our life quite a bit simpler if we decide up a few third-party modules to clean over the method. The primary is the requests module. That is an HTTP library for Python that makes fetching HTTP knowledge extra nice than Python’s built-in urllib.request, and it may be put in with python -m pip set up requests.

To point out how simple it’s to make use of, we’ll use Pixabay’s API (documented right here). Pixabay is a inventory photograph website the place the photographs are all obtainable for reuse, which makes it a really useful vacation spot. What we’ll deal with right here is fruit. We’ll use the fruit footage we collect in a while, when manipulating information, however for now we simply need to discover footage of fruit, as a result of it’s tasty and good for us.

To begin, we’ll take a fast take a look at what footage can be found from Pixabay. We’ll seize 100 photographs, shortly look via them, and select those we wish. For this, we’ll want a Pixabay API key, so we have to create an account after which seize the important thing proven within the API documentation underneath “Search Pictures”.

The requests Module

The essential model of creating an HTTP request to an API with the requests module includes setting up an HTTP URL, requesting it, after which studying the response. Right here, that response is in JSON format. The requests module makes every of those steps simple. The API parameters are a Python dictionary, a get() perform makes the decision, and if the API returns JSON, requests makes that obtainable as .json on the response. So a easy name will appear like this:

import requests

PIXABAY_API_KEY = "11111111-7777777777777777777777777"

base_url = "https://pixabay.com/api/"
base_params = {
    "key": PIXABAY_API_KEY,
    "q": "fruit",
    "image_type": "photograph",
    "class": "meals",
    "safesearch": "true"
}

response = requests.get(base_url, params=base_params)
outcomes = response.json()

This can return a Python object, because the API documentation suggests, and we will take a look at its components:

>>> print(len(outcomes["hits"]))
20
>>> print(outcomes["hits"][0])
{'id': 2277, 'pageURL': 'https://pixabay.com/images/berries-fruits-food-blackberries-2277/', 'kind': 'photograph', 'tags': 'berries, fruits, meals', 'previewURL': 'https://cdn.pixabay.com/photograph/2010/12/13/10/05/berries-2277_150.jpg', 'previewWidth': 150, 'previewHeight': 99, 'webformatURL': 'https://pixabay.com/get/gc9525ea83e582978168fc0a7d4f83cebb500c652bd3bbe1607f98ffa6b2a15c70b6b116b234182ba7d81d95a39897605_640.jpg', 'webformatWidth': 640, 'webformatHeight': 426, 'largeImageURL': 'https://pixabay.com/get/g26eb27097e94a701c0569f1f77ef3975cf49af8f47e862d3e048ff2ba0e5e1c2e30fadd7a01cf2de605ab8e82f5e68ad_1280.jpg', 'imageWidth': 4752, 'imageHeight': 3168, 'imageSize': 2113812, 'views': 866775, 'downloads': 445664, 'collections': 1688, 'likes': 1795, 'feedback': 366, 'user_id': 14, 'consumer': 'PublicDomainPictures', 'userImageURL': 'https://cdn.pixabay.com/consumer/2012/03/08/00-13-48-597_250x250.jpg'}

The API returns 20 hits per web page, and we’d like 100 outcomes. To do that, we add a web page parameter to our listing of params. Nonetheless, we don’t need to alter our base_params each time, so the way in which to strategy that is to create a loop after which make a copy of the base_params for every request. The built-in copy module does precisely this, so we will name the API 5 instances in a loop:

for web page in vary(1, 6):
    this_params = copy.copy(base_params)
    this_params["page"] = web page
    response = requests.get(base_url, params=params)

This can make 5 separate requests to the API, one with web page=1, the subsequent with web page=2, and so forth, getting totally different units of picture outcomes with every name. This can be a handy method to stroll via a big set of API outcomes. Most APIs implement pagination, the place a single name to the API solely returns a restricted set of outcomes. We then ask for extra pages of outcomes—very like wanting via question outcomes from a search engine.

Since we wish 100 outcomes, we might merely resolve that that is 5 calls of 20 outcomes every, however it could be extra sturdy to maintain requesting pages till we have now the hundred outcomes we’d like after which cease. This protects the calls in case Pixabay modifications the default variety of outcomes to fifteen or related. It additionally lets us deal with the scenario the place there aren’t 100 photographs for our search phrases. So we have now a whereas loop and increment the web page quantity each time, after which, if we’ve reached 100 photographs, or if there are not any photographs to retrieve, we escape of the loop:

photographs = []
web page = 1
whereas len(photographs) < 100:
    this_params = copy.copy(base_params)
    this_params["page"] = web page
    response = requests.get(base_url, params=this_params)
    if not response.json()["hits"]: break
    for lead to response.json()["hits"]:
        photographs.append({
            "pageURL": outcome["pageURL"],
            "thumbnail": outcome["previewURL"],
            "tags": outcome["tags"],
        })
    web page += 1

This manner, after we end, we’ll have 100 photographs, or we’ll have all the photographs if there are fewer than 100, saved within the photographs array. We are able to then go on to do one thing helpful with them. However earlier than we do this, let’s discuss caching.

Caching HTTP Requests

It’s a good suggestion to keep away from making the identical request to an HTTP API greater than as soon as. Many APIs have utilization limits with the intention to keep away from them being overtaxed by requesters, and a request takes effort and time on their half and on ours. We must always attempt to not make wasteful requests that we’ve accomplished earlier than. Luckily, there’s a helpful approach to do that when utilizing Python’s requests module: set up requests-cache with python -m pip set up requests-cache. This can seamlessly document any HTTP calls we make and save the outcomes. Then, later, if we make the identical name once more, we’ll get again the regionally saved outcome with out going to the API for it in any respect. This protects each time and bandwidth. To make use of requests_cache, import it and create a CachedSession, after which as a substitute of requests.get use session.get to fetch URLs, and we’ll get the advantage of caching with no further effort:

import requests_cache
session = requests_cache.CachedSession('fruit_cache')
...
response = session.get(base_url, params=this_params)

Making Some Output

To see the outcomes of our question, we have to show the photographs someplace. A handy approach to do that is to create a easy HTML web page that reveals every of the photographs. Pixabay offers a small thumbnail of every picture, which it calls previewURL within the API response, so we might put collectively an HTML web page that reveals all of those thumbnails and hyperlinks them to the primary Pixabay web page—from which we might select to obtain the photographs we wish and credit score the photographer. So every picture within the web page may appear like this:

<li>
    <a href="https://pixabay.com/images/berries-fruits-food-blackberries-2277/">
        <img src="https://cdn.pixabay.com/photograph/2010/12/13/10/05/berries-2277_150.jpg" alt="berries, fruits, meals">
    </a>
</li>

We are able to assemble that from our photographs listing utilizing a listing comprehension, after which be a part of collectively all the outcomes into one large string with "n".be a part of():

html_image_list = [
    f"""<li>
            <a href="https://www.sitepoint.com/python-fetching-data-http-api/{image["pageURL"]}">
                <img src="https://www.sitepoint.com/python-fetching-data-http-api/{picture["thumbnail']}" alt="https://www.sitepoint.com/python-fetching-data-http-api/{picture["tags"]}">
            </a>
        </li>
    """
    for picture in photographs
]
html_image_list = "n".be a part of(html_image_list)

At that time, if we write out a really plain HTML web page containing that listing, it’s simple to open that in an internet browser for a fast overview of all of the search outcomes we obtained from the API, and click on any one among them to leap to the total Pixabay web page for downloads:

html = f"""<!doctype html>
<html><head><meta charset="utf-8">
<title>Pixabay seek for {base_params['q']}</title>
<type>
ul {{
    list-style: none;
    line-height: 0;
    column-count: 5;
    column-gap: 5px;
}}
li {{
    margin-bottom: 5px;
}}
</type>
</head>
<physique>
<ul>
{html_image_list}
</ul>
</physique></html>
"""
output_file = f"searchresults-{base_params['q']}.html"
with open(output_file, mode="w", encoding="utf-8") as fp:
    fp.write(html)
print(f"Search outcomes abstract written as {output_file}")

This text is excerpted from Helpful Python, obtainable on SitePoint Premium and from e book retailers.





Supply hyperlink

More articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest article