Basics of API

The idea of an API (Application Programmer Interface) is to create a structured query language that allows us to request data from a website or sever. While the return queries can come in a variety of forms, the most common object to be returned is a JSON (JavaScript Object Notation), equivalent to an unnamed an not-necessarily-structured list in R. This is in stark contrast to SQL, where queries are more highly structured, engage with a specific database, and return table-formatted queries with data.

One thing to be immediately aware of is the fact that every API is a little bit different, necessitating to some degree that you review the documentation for any particulate website. That being said, there are a few things that are generally true of API calls.

From the perspective of API calls in R, every request will include:

In addition to this, most public facing APIs require authentication of a user. In some cases, this is free but requires authentication because of the sensitivity of the data (for example, the YouTube API requires an account, as does Canvas API for your grades). Other times authentication is required because the API is not public and requires a paid subscription. Some fall in between, with public facing data with a pay-walled API (Twitter and Reddit, for example). This greatly limits the number of resources we have available that are both free and do not require authentication.

APIs can be used for virtually anything, from aggregation and data analysis to posting Twitter updates; at it’s core, it is simply a structured method for programmers to interface with an application.

We’ll need the following packages to get started (copy them into your console, not your Rmd)

install.packages(c("jsonlite", "httr2"))

Pokemon

A very thorough and apparently comprehensive public facing API is that of PokeAPI, indexing as much information about Pokemon and the associated video games. The documentation for this API is available here

Our First API Call

We see under Resource Lists/Pagination section in the documentation just how the API works. Specifically, the first line here says, “Calling any API endpoint without a resource ID or name will return a paginated list of available resources for that API. By default, a list”page” will contain up to 20 resources”

This means, submitting a query directly to the API without specifying an endpoint (as node or structured list of information) will return a list of resources that are available for query. This is lovely, useful, and is not true of most APIs. The sequence of calls we need is

  • request(), containing the URL or endpoint
  • req_url_query(), specifying any query parameters. Here, we request only 10 pages
  • req_perform(), which performs the query
  • resp_body_json(), which returns the json file from the query

Here we make a call to the base url (included in request()) to see a list of all of the endpoints available. Note that this includes both the name of the endpoints, as well as the path to the endpoint for future calls (also note that it ignores the limit = 10 in this case)

library(httr2)
library(jsonlite)
library(tidyr)

query <- request("https://pokeapi.co/api/v2/") %>%
  req_url_query(limit = 10) %>%
  req_perform() %>% 
  resp_body_json()

query
## $ability
## [1] "https://pokeapi.co/api/v2/ability/"
## 
## $berry
## [1] "https://pokeapi.co/api/v2/berry/"
## 
## $`berry-firmness`
## [1] "https://pokeapi.co/api/v2/berry-firmness/"
## 
## $`berry-flavor`
## [1] "https://pokeapi.co/api/v2/berry-flavor/"
## 
## $characteristic
## [1] "https://pokeapi.co/api/v2/characteristic/"
## 
## $`contest-effect`
## [1] "https://pokeapi.co/api/v2/contest-effect/"
## 
## $`contest-type`
## [1] "https://pokeapi.co/api/v2/contest-type/"
## 
## $`egg-group`
## [1] "https://pokeapi.co/api/v2/egg-group/"
## 
## $`encounter-condition`
## [1] "https://pokeapi.co/api/v2/encounter-condition/"
## 
## $`encounter-condition-value`
## [1] "https://pokeapi.co/api/v2/encounter-condition-value/"
## 
## $`encounter-method`
## [1] "https://pokeapi.co/api/v2/encounter-method/"
## 
## $`evolution-chain`
## [1] "https://pokeapi.co/api/v2/evolution-chain/"
## 
## $`evolution-trigger`
## [1] "https://pokeapi.co/api/v2/evolution-trigger/"
## 
## $gender
## [1] "https://pokeapi.co/api/v2/gender/"
## 
## $generation
## [1] "https://pokeapi.co/api/v2/generation/"
## 
## $`growth-rate`
## [1] "https://pokeapi.co/api/v2/growth-rate/"
## 
## $item
## [1] "https://pokeapi.co/api/v2/item/"
## 
## $`item-attribute`
## [1] "https://pokeapi.co/api/v2/item-attribute/"
## 
## $`item-category`
## [1] "https://pokeapi.co/api/v2/item-category/"
## 
## $`item-fling-effect`
## [1] "https://pokeapi.co/api/v2/item-fling-effect/"
## 
## $`item-pocket`
## [1] "https://pokeapi.co/api/v2/item-pocket/"
## 
## $language
## [1] "https://pokeapi.co/api/v2/language/"
## 
## $location
## [1] "https://pokeapi.co/api/v2/location/"
## 
## $`location-area`
## [1] "https://pokeapi.co/api/v2/location-area/"
## 
## $machine
## [1] "https://pokeapi.co/api/v2/machine/"
## 
## $meta
## [1] "https://pokeapi.co/api/v2/meta/"
## 
## $move
## [1] "https://pokeapi.co/api/v2/move/"
## 
## $`move-ailment`
## [1] "https://pokeapi.co/api/v2/move-ailment/"
## 
## $`move-battle-style`
## [1] "https://pokeapi.co/api/v2/move-battle-style/"
## 
## $`move-category`
## [1] "https://pokeapi.co/api/v2/move-category/"
## 
## $`move-damage-class`
## [1] "https://pokeapi.co/api/v2/move-damage-class/"
## 
## $`move-learn-method`
## [1] "https://pokeapi.co/api/v2/move-learn-method/"
## 
## $`move-target`
## [1] "https://pokeapi.co/api/v2/move-target/"
## 
## $nature
## [1] "https://pokeapi.co/api/v2/nature/"
## 
## $`pal-park-area`
## [1] "https://pokeapi.co/api/v2/pal-park-area/"
## 
## $`pokeathlon-stat`
## [1] "https://pokeapi.co/api/v2/pokeathlon-stat/"
## 
## $pokedex
## [1] "https://pokeapi.co/api/v2/pokedex/"
## 
## $pokemon
## [1] "https://pokeapi.co/api/v2/pokemon/"
## 
## $`pokemon-color`
## [1] "https://pokeapi.co/api/v2/pokemon-color/"
## 
## $`pokemon-form`
## [1] "https://pokeapi.co/api/v2/pokemon-form/"
## 
## $`pokemon-habitat`
## [1] "https://pokeapi.co/api/v2/pokemon-habitat/"
## 
## $`pokemon-shape`
## [1] "https://pokeapi.co/api/v2/pokemon-shape/"
## 
## $`pokemon-species`
## [1] "https://pokeapi.co/api/v2/pokemon-species/"
## 
## $region
## [1] "https://pokeapi.co/api/v2/region/"
## 
## $stat
## [1] "https://pokeapi.co/api/v2/stat/"
## 
## $`super-contest-effect`
## [1] "https://pokeapi.co/api/v2/super-contest-effect/"
## 
## $type
## [1] "https://pokeapi.co/api/v2/type/"
## 
## $version
## [1] "https://pokeapi.co/api/v2/version/"
## 
## $`version-group`
## [1] "https://pokeapi.co/api/v2/version-group/"

Suppose that of these, we are interested in using the pokemon endpoint. Extraction of the relevant URL is easy enough:

query[["pokemon"]]
## [1] "https://pokeapi.co/api/v2/pokemon/"

Let’s use this to grab the first 5 pokemon in the index

## Get general list of pokemon (grab 5)
query <- request("https://pokeapi.co/api/v2/pokemon") %>%
  req_url_query(limit = 5) %>%
  req_perform()

pokemon <- resp_body_json(query)

Here, the “endpoint” is pokemon, so we can review the API documentation to see what is returned. We see that we get a list of 4 objects:

  1. count tells us the total number of resources in this endpoint (in this case, the number of pokemon)
  2. next and previous tell us the URL for the next set of resources. More on this shortly
  3. results has the actual results from the query

Let’s start by investigating the results. We see that it is a collection of the first 5 pokemon

pokemon[["results"]]
## [[1]]
## [[1]]$name
## [1] "bulbasaur"
## 
## [[1]]$url
## [1] "https://pokeapi.co/api/v2/pokemon/1/"
## 
## 
## [[2]]
## [[2]]$name
## [1] "ivysaur"
## 
## [[2]]$url
## [1] "https://pokeapi.co/api/v2/pokemon/2/"
## 
## 
## [[3]]
## [[3]]$name
## [1] "venusaur"
## 
## [[3]]$url
## [1] "https://pokeapi.co/api/v2/pokemon/3/"
## 
## 
## [[4]]
## [[4]]$name
## [1] "charmander"
## 
## [[4]]$url
## [1] "https://pokeapi.co/api/v2/pokemon/4/"
## 
## 
## [[5]]
## [[5]]$name
## [1] "charmeleon"
## 
## [[5]]$url
## [1] "https://pokeapi.co/api/v2/pokemon/5/"

What we see is a named list that includes (1) the name of the pokemon and (2) the URL to retrieve data specific to that pokemon. This hints at how we will be able to navigate the API through calls in R.

Before moving on, it’s worthwhile to consider the next and previous entries in the query return as well. Making large calls to a server can be expensive, and it is polite (and sometimes necessary) to make requests in chunks. We do this by limiting the number of calls we make in a single query. If we wanted to then get the next 5 pokemon in the list, our original query gives us a convenient method for doing so. Here, we pass pokemon[["next"]] in for the URL

request(pokemon[["next"]]) %>% 
  req_perform() %>% 
  resp_body_json()
## $count
## [1] 1350
## 
## $`next`
## [1] "https://pokeapi.co/api/v2/pokemon?offset=10&limit=5"
## 
## $previous
## [1] "https://pokeapi.co/api/v2/pokemon?offset=0&limit=5"
## 
## $results
## $results[[1]]
## $results[[1]]$name
## [1] "charizard"
## 
## $results[[1]]$url
## [1] "https://pokeapi.co/api/v2/pokemon/6/"
## 
## 
## $results[[2]]
## $results[[2]]$name
## [1] "squirtle"
## 
## $results[[2]]$url
## [1] "https://pokeapi.co/api/v2/pokemon/7/"
## 
## 
## $results[[3]]
## $results[[3]]$name
## [1] "wartortle"
## 
## $results[[3]]$url
## [1] "https://pokeapi.co/api/v2/pokemon/8/"
## 
## 
## $results[[4]]
## $results[[4]]$name
## [1] "blastoise"
## 
## $results[[4]]$url
## [1] "https://pokeapi.co/api/v2/pokemon/9/"
## 
## 
## $results[[5]]
## $results[[5]]$name
## [1] "caterpie"
## 
## $results[[5]]$url
## [1] "https://pokeapi.co/api/v2/pokemon/10/"

Done this way, the general idea is to create a loop: take a query, save the “next” url, perform actions on the query, loop. We will save this as an exercise.

Bulbasaur

To explore how to use the API for data generation, let’s start with the first pokemon in the entry, which happens to be Bulbasaur

## Get general list of pokemon (grab 10)
resp <- request("https://pokeapi.co/api/v2/pokemon") %>%
  req_url_query(limit = 10) %>%
  req_perform()

pokemon <- resp_body_json(resp)

## We see that bulbasaur is the first in the results list
pokemon$results[[1]]
## $name
## [1] "bulbasaur"
## 
## $url
## [1] "https://pokeapi.co/api/v2/pokemon/1/"

Again, we see that we have the name of the pokemon, along with another endpoint. Let’s use this for our next request

## Request bulbasaur info
bulb <- request(pokemon$results[[1]]$url) %>% 
  req_perform() %>% 
  resp_body_json()

Again, we never know exactly (without reading documentation or investigating ourselves) how a requested json will be formatted. In this case, we find that it is a named list

names(bulb)
##  [1] "abilities"                "base_experience"         
##  [3] "cries"                    "forms"                   
##  [5] "game_indices"             "height"                  
##  [7] "held_items"               "id"                      
##  [9] "is_default"               "location_area_encounters"
## [11] "moves"                    "name"                    
## [13] "order"                    "past_abilities"          
## [15] "past_stats"               "past_types"              
## [17] "species"                  "sprites"                 
## [19] "stats"                    "types"                   
## [21] "weight"

Because lists are the most general data structure in R, we can’t really anticipate anything about this structure until we investigate it further. For example, by checking weight or height, we see we have a single integer.

bulb$height
## [1] 7
bulb$weight
## [1] 69

Contrast this with abilities, which not only includes multiple abilities, where each ability comes with its own set of attributes with yet another endpoint for further investigation.

length(bulb$abilities)
## [1] 2
bulb$abilities[[1]]
## $ability
## $ability$name
## [1] "overgrow"
## 
## $ability$url
## [1] "https://pokeapi.co/api/v2/ability/65/"
## 
## 
## $is_hidden
## [1] FALSE
## 
## $slot
## [1] 1

Just as before, we can use this to learn more about the ability “Overgrow”

## Pass in the ability endpoint to see what data available
ability <- request(bulb$abilities[[1]]$ability$url) %>% 
  req_perform() %>% 
  resp_body_json()

We can then repeat the process of exploration as before. We see that effect_entries, for example, tells us what effects the ability has in three different languages.

names(ability)
## [1] "effect_changes"      "effect_entries"      "flavor_text_entries"
## [4] "generation"          "id"                  "is_main_series"     
## [7] "name"                "names"               "pokemon"
ability$effect_entries
## [[1]]
## [[1]]$effect
## [1] "Quand ce Pokémon a 1/3 de ses PV restants ou moins, ses capacités de type Plante infligent 1.5× plus de dégâts réguliers."
## 
## [[1]]$language
## [[1]]$language$name
## [1] "fr"
## 
## [[1]]$language$url
## [1] "https://pokeapi.co/api/v2/language/5/"
## 
## 
## [[1]]$short_effect
## [1] "Renforce les capacités Plante pour infliger 1.5× de dégâts à 1/3 des PV max ou moins."
## 
## 
## [[2]]
## [[2]]$effect
## [1] "Wenn ein Pokémon mit dieser Fähigkeit nur noch 1/3 seiner maximalen KP oder weniger hat, werden all seine Pflanze Attacken verstärkt, so dass sie 1,5× so viel Regulärer Schaden anrichten wie sonst."
## 
## [[2]]$language
## [[2]]$language$name
## [1] "de"
## 
## [[2]]$language$url
## [1] "https://pokeapi.co/api/v2/language/6/"
## 
## 
## [[2]]$short_effect
## [1] "Erhöht den Schaden von Pflanze Attacken um 50% wenn nur noch 1/3 der maximalen KP oder weniger übrig sind."
## 
## 
## [[3]]
## [[3]]$effect
## [1] "When this Pokémon has 1/3 or less of its HP remaining, its Grass-type moves inflict 1.5× as much regular damage."
## 
## [[3]]$language
## [[3]]$language$name
## [1] "en"
## 
## [[3]]$language$url
## [1] "https://pokeapi.co/api/v2/language/9/"
## 
## 
## [[3]]$short_effect
## [1] "Strengthens Grass moves to inflict 1.5× damage at 1/3 max HP or less."

Theoretically, once we knew how to find all the information we wanted, we could begin the process of importing it all into R. We’ll start by creating a function that, given a result from the original pokemon query, will return a data.frame with the pokemon’s name, weight, height, and first special ability, along with a description of that ability.

(Note: this is a useful depiction of how functions are typically generated. Find a process that you used to perform an action, then distill it to something useful)

## Practice writing function with bulbasaur in mind
x <- pokemon$results[[1]]

buildPokemonEntry <- function(x) {
  poke <- request(x$url) %>% 
    req_perform() %>% 
    resp_body_json()
  
  ## Get name
  name <- poke$name
  
  ## Get height and weight
  wt <- poke$weight
  ht <- poke$height
  
  # Get first ability and description
  ability_url <- poke$abilities[[1]]$ability$url
  
  ability_info <- request(ability_url) %>% 
    req_perform() %>% 
    resp_body_json()
  
  ab_name <- ability_info$name
  ab_desc <- ability_info$effect_entries[[3]]$short_effect # 3 is english

  data.frame(Name = name, Height = ht, Weight = wt, 
             Ability = ab_name, 
             Description = ab_desc)
}

## Ok, let's pass in our bulbaur and check
buildPokemonEntry(pokemon$results[[1]])
##        Name Height Weight  Ability
## 1 bulbasaur      7     69 overgrow
##                                                             Description
## 1 Strengthens Grass moves to inflict 1.5× damage at 1/3 max HP or less.

With this in mind, we can combine what we have learned in other labs to generate this table for the first ten pokemon

## Let's build this for first ten pokemon using lapply on original API call
first10 <- request("https://pokeapi.co/api/v2/pokemon") %>%
  req_url_query(limit = 10) %>%
  req_perform() %>% 
  resp_body_json()

## Recall that first10 is a list with count, next, previous, and then results
poke <- lapply(first10$results, buildPokemonEntry)

## Use Reduce to combine data.frames with rbind (rowbind)
Reduce(rbind, poke)
##          Name Height Weight     Ability
## 1   bulbasaur      7     69    overgrow
## 2     ivysaur     10    130    overgrow
## 3    venusaur     20   1000    overgrow
## 4  charmander      6     85       blaze
## 5  charmeleon     11    190       blaze
## 6   charizard     17    905       blaze
## 7    squirtle      5     90     torrent
## 8   wartortle     10    225     torrent
## 9   blastoise     16    855     torrent
## 10   caterpie      3     29 shield-dust
##                                                              Description
## 1  Strengthens Grass moves to inflict 1.5× damage at 1/3 max HP or less.
## 2  Strengthens Grass moves to inflict 1.5× damage at 1/3 max HP or less.
## 3  Strengthens Grass moves to inflict 1.5× damage at 1/3 max HP or less.
## 4   Strengthens Fire moves to inflict 1.5× damage at 1/3 max HP or less.
## 5   Strengthens Fire moves to inflict 1.5× damage at 1/3 max HP or less.
## 6   Strengthens Fire moves to inflict 1.5× damage at 1/3 max HP or less.
## 7  Strengthens Water moves to inflict 1.5× damage at 1/3 max HP or less.
## 8  Strengthens Water moves to inflict 1.5× damage at 1/3 max HP or less.
## 9  Strengthens Water moves to inflict 1.5× damage at 1/3 max HP or less.
## 10                       Protects against incoming moves' extra effects.

The moral of the API story is: read the documentation, play around, and explore.

Exercise Explore what kind of data is given in a specific pokemon endpoint (for example, bulbasur had height and weight). Then do the following:

  • Come up with 3-4 attributes to extract, then create a data.frame as we did above
  • Limit your query to only include 10 pokemon at a time
  • Use the next info to loop through the query 5 times until you have a total of 50 pokemon
  • Finally, return a data.frame that has 50 rows (one for each pokemon), will attributes as collumns

Here are some things you can do on your own

Cats

Another fun (free, authentication free) API is Cataas (Cats-As-A-Service), the documentation of which is another good example of how APIs can differ. Looking at the documentation, we see a few things.

First, we see that all of the given API calls are GET, indicating that all calls to the API are those that allow us to retrieve data. We also see that the API is broken into two sections: Cats (Cataas API) and API (Public API). The Cats section describes literal URL endpoints that can be used by users to generate cat images, whereas the API section tells us how we can make structured requests for information.

Let’s start with the API section, mirroring what we have already done. We specify a request to the API, perform the request, and then investigate the resulting JSON

## Here we request 10 api/cats
cat_api <- request("https://cataas.com/api/cats") %>%
  req_url_query(limit = 10) %>% 
  req_perform() %>% 
  resp_body_json()

## Not a named list this time
names(cat_api)
## NULL
## This shows us that each element is a specific cat, with associated cat info
cat_api[[1]]
## $id
## [1] "04eEQhDfAL8l5nt3"
## 
## $tags
## $tags[[1]]
## [1] "two"
## 
## $tags[[2]]
## [1] "double"
## 
## $tags[[3]]
## [1] "black"
## 
## 
## $mimetype
## [1] "image/jpeg"
## 
## $createdAt
## [1] "2022-07-18T11:28:29.596Z"

We can then use this cat picture metadata to make calls to the Cataas API to request images. (You’ll have to copy this code and run it yourself)

base_url <- "https://cataas.com/cat/"
query_url <- paste0(base_url, cat_api[[1]]$id)

browseURL(query_url)

Joyous. Playing around with this, we can find collections of tags.

tags <- request("https://cataas.com/api/tags") %>%
  req_url_query(limit = 10) %>% 
  req_perform() %>% 
  resp_body_json()

It turns out there are a ton of these, and a length 1163 vector can be tricky to navigate. A nice trick here is to turn it into a matrix with columns for easier navigation

head(matrix(tags, ncol = 5))
## Warning in matrix(tags, ncol = 5): data length [1163] is not a sub-multiple or
## multiple of the number of rows [233]
##      [,1]            [,2]            [,3]      [,4]          [,5]       
## [1,] ""              "bird"          "fight"   "mcdonalds"   "sluggish" 
## [2,] "#christmascat" "birthday"      "fighter" "mean"        "smack"    
## [3,] "#scottishfold" "biscuit"       "file"    "meditating"  "small"    
## [4,] "."             "bitting"       "fire"    "medium hair" "small cat"
## [5,] "2cats"         "black"         "fisheye" "meet"        "smart"    
## [6,] "4"             "black & white" "fist"    "melon"       "smile"

The says portion of the url will apparently generate text on a given image using the syntax /cat/{tag}/says/{text}. Again, you’ll need to paste this into your own R code to run it

## Let's use the "fat" tag to add text to a random cat
query_url <- paste0(base_url, "fat/says/yum kitty")
browseURL(query_url)

Use this information responsibly.

Exercise Play around with the API, finding a list of tags and returning your own images. Give me three good queries to show me what you have. Bonus if you can include your images in Rmarkdown document (not difficult, but also beyond the point of this lab).

Dogs

Lucky for us there is also a dog API, with the documentation given here. This API is again a little different than each of the others we have seen, as it contains no general index of information and all calls must go directly to an endpoint.

Here is a call to get you started:

dogs <- request("https://dogapi.dog/api/v2/breeds") %>%
  req_url_query() %>% 
  req_perform() %>% 
  resp_body_json()

Exercise Create a small data.frame (20 or fewer rows) that contain information about dogs. They can be dog facts, breed information, or anything else. Try to include 4-5 total columns in your final result.