Sunday, June 4, 2023

Data Wrangling of GIS API Data Using Python

Data wrangling in the context of GIS (Geographic Information System) typically involves processing and manipulating spatial data to extract valuable insights or prepare it for further analysis.

In this post, we shall look at extracting API data to prepare it for further analysis in QGIS or any GIS software. Basically, we will use the two different API datasets listed below:-

1. Digital Atlas of the Roman Empire

2. REST countries

Lets get started... So we want to get the API data into a friendly format that a GIS software will read in for further analysis. In this case we want the format to be a spread sheet in .CSV extension.


 Digital Atlas of the Roman Empire

Just as the title suggest, the API provides information on cities of the Roman Empire.


import json
import requests
import pandas as pd

resp = requests.get('http://imperium.ahlfeldt.se/api/geojson.php').text
json_obj = json.loads(resp)
# --------------------------


json_obj_df = pd.DataFrame(json_obj)

data_list = []
for item in json_obj_df['features']:
    coordinates = json_obj_df['features'][0]['geometry']['coordinates']

    name = json_obj_df['features'][0]['properties']['name']
    ids = json_obj_df['features'][0]['properties']['id']
    ancient = json_obj_df['features'][0]['properties']['ancient']
    country = json_obj_df['features'][0]['properties']['country']
    types = json_obj_df['features'][0]['properties']['type']
    numType = json_obj_df['features'][0]['properties']['numType']
    precision = json_obj_df['features'][0]['properties']['precision']

    data = coordinates, name, ids, ancient, country, types, numType, precision
    data_list.append(data)
# --------------------------

data_list_df = pd.DataFrame(data_list)
data_list_df


REST countries

This is an API that provides information about countries via a RESTful API.


The code below is real world application where the data was wrangled and visualized using bokeh library.

# importing the modules
import json
import requests
from datetime import datetime

import pandas as pd

import pandas_bokeh # pip install pandas-bokeh
pandas_bokeh.output_notebook()

from bokeh.plotting import figure, output_file, show



# Bokeh is a Data Visualization library that provides interactive charts and plots.
# Use this command to install Bokeh: pip install bokeh

# Get API content using requests library...
response = requests.get('https://restcountries.com/v3.1/all')
data = json.loads(response.text)


# Write API data to text file...
fname = datetime.today().strftime('%Y%b%d%H%M%S')
with open(f'{fname}.txt', 'w', encoding="utf-8") as f:
    print(data, file=f)

# Write to CSV file..
df = pd.DataFrame(data)
df.to_csv(f'{fname}.csv', encoding="utf-8-sig", index=False)

# Read data from text file...
with open(f'{fname}.txt', 'r', encoding="utf-8") as f:
    txt_data = f.readlines()


# Prepare data for visualization using Bokeh....
common_name = [ x['name']['common'] for x in data ]
official_name = [ x['name']['official'] for x in data ]
population = [ x['population'] for x in data ]
region = [ x['region'] for x in data ]
continent = [ x['continents'][0] for x in data ]
area = [ x['area'] for x in data ]
latlng = [ x['latlng'] for x in data ]
lat_Y = [x[0] for x in latlng]
lng_X = [x[1] for x in latlng]


# Create scatter plot of countries latlong coordinates...
# create a new plot with a title and axis labels
p = figure(title="Coutries Location", x_axis_label="Longitude", y_axis_label="Latitude")

# add circle renderer with additional arguments
p.circle(
    lng_X,
    lat_Y,
    legend_label="Countries",
    fill_color="blue",
    fill_alpha=0.2,
    line_color="blue",
    size=8,
)


# show the results
show(p)

Happy coding...!

No comments:

Post a Comment