A Picture's Worth a Gazillion Bits
This weekend, I tripped over a neat news REST-ful api called newsapi. Grab an API key and you're off to the races.
There are tons of live headlines - News API can provide headlines from 70 worldwide sources.
There are basically two api endpoints:
Register for an account and generate an api-key and let's get started.
Accessing the API with Python
newsapi can easily be accessed using a browser since the REST-ful method used is a GET method. But, accessing the api from the browser is limiting.
Accessing the api programmatically with python isn't difficult to do. There are libraries that we can use to make the task a breeze!
Two libraries are essential for accessing a REST-ful api:
- urllib3 - a librarly to formulate requests and process the resulting return from the request.
- json - a library to marshall and unmarshal JSON data structures.
Let's walk through the basics of using our library to interact with the api.
import urllib3 import json h = urllib3.PoolManager() r = h.request('GET', 'https://newsapi.org/v1/articles?source=abc-news-au&sortBy=top&apiKey=e8612ef18bcb4b9c932680026f6b6d42')
(note - you may need to use
pip3 install urllib3 certifi if your imports fail to load)
That's it - you just made a request to the ABC News (AU) source and sorted by top with our apiKey that you received when you registered
How do we know if this request actually worked? We assigned the results of the request to a variable,
r, and contained
within that variable are members. The member,
status lets us know the result of the HTTP request. Result codes are well defined
and the value can be inspected to determine the validity of the results returned from the request.
If the request status is valid (i.e. equals $200$), then we can examine the data, located in
r.data. Examining the data shows
a string of JSON-encoded information. In order to access the information, we want to decode the JSON string into a JSON data
json library, we can use
loads and input the JSON string and return a python dictionary of json key-value pairs.
json_ds = json.loads(r.data)
Now that we have the data in a json-encoded data structure, we can inspect it and see that there are the following keys:
status-> 200, indicating an ‘OK’ result
source-> ‘abc-news-au’, the name of the source requested in the GET request
sortBy-> ‘top’, the value of the sortBy value passed to the GET request
articles-> a list of json-encoded articles, itself a dictionary of json-encoded key-value pairs.
Accessing any of these values is as simple as using the key in quotes as the index to the
json_ds dictionary. For example, to
retrieve the list of articles,
json_ds['articles'], retrieves the list. Using
len() to determine how many articles are returned
from the request.
We an iterate through all the articles, and print out the author and title as follows:
for _, a in enumerate(json_ds['articles']): print(a['author'],a['title'])
A brief note on the
enumerate function. Rather than use
range where we would have to wrap our dictionary with
produce a valid integer-based range, we use
enumerate and pass the dictionary,
json_ds['articles'] directly to the
function. The function returns a tuple,
(index, value). Since we don't need to use the index, an underscore,
the return of the value and the variable,
a, received the article enumerated over the list.
We can now programmatically, access, manipulate, and do whatever we want with the data returned from the request. Far more useful than just returning the requested data in your browser.
So what about that
The curious reader likely noticed that there are two urls in our list of articles. One of them (url) is a link to the full article. The other is a link to an image associated with the article. Let's continue our programmatic quest and grab this image and create a thumbnail image for each image we retrieve.
In order to manipulate images, we need to do a couple of things:
- issue a request to retrieve the image link data.
- use the
PILlibrary to save our image and create and save our thumbnail image.
Just like every other library, we must
import PIL library components before we can access them. Specifically, to import the Image
PIL, add the following:
from PIL import Image.
Here's some code to retrieve the image and then create thumbnails.
def saveImage(h, url, filename): r = h.request('GET', url) if r.status == OK: f = open(filename, "wb") raw = bytearray(r.data) f.write(raw) f.close()
def thumb(filename, thumbFilename): try: im = Image.open(filename) im.thumbnail(THUMB_SIZE) im.save(thumbFilename, "JPEG") except IOError: print("cannot create thumbnail for", filename, thumbFilename)
We can wrap both of these function together to produce a program that retrieves the articles from multiple sources, retrieves the images from each article, and creates an associated thumbnail from each image. Here's a gist the complete program:
From the above code, you can see how easy it is to use the
urllib3 to grab the articles of interest from the newsapi, and then
for each article, grab the image url and save it to a file. Now that you have these tools in your possession, the ability to
create fun and interesting applications outside of the browser await!
For fun, how would you create a collage image from the thumbprints? Can you use the
Image library to construct a new image
that is a composition of blocks of thumbprint images? Give it a try, and email me if you get stuck!