Disclaimer: I originally submitted this article to DataCamp on "Jan 27, 2018". Since they didn't publish it on the platform, I have decided to do it here so that someone out there will find it useful.
Download the original files in HTML and Jupyter Notebook formats
DataCamp Tutorial - Geocoding and Reverse Geocoding with Python
The increasing use of location-aware data and technologies that are able to give directions relative to location and access geographically aware data has given rise to category of data scientists with strong knowledge of geospatial data - Geo-data Scientists.
In this tutorial, you will discover how to use PYTHON to carry out geocoding task. Specifically, you will learn to use GeoPy, Pandas and Folium PYTHON libraries to complete geocoding tasks. Because this is a geocoding tutorial, the article will cover more of GeoPy than Pandas. If you are not familiar with Pandas, you should definitely consider studying the Pandas Tutorial by Karlijn Willems so also this Pandas cheat sheet will be handy to your learning.
Tutorial Overview
- What is Geocoding?
- Geocoding with Python
- Putting it all together – Bulk Geocoding
- Accuracy of the Result
- Mapping Geocoding Result
- Conclusion
In [1]:
# Importing the necessary modules for this tutorial
# Folium Library for visualizing data on interactive map
# Pandas Library for fast, flexible, and expressive data structures designed
import folium
import pandas as pd
from geopy.geocoders import Nominatim, ArcGIS, GoogleV3 # Geocoder APIs
In [2]:
g = Nominatim() # You can tryout ArcGIS or GoogleV3 APIs to compare the results
In [3]:
# Geocoding - Address to lat/long
n = g.geocode('Empire State Building New York', timeout=10) # Address to geocode
print(n.latitude, n.longitude)
In [4]:
# Reverse Geocoding - lat/long to Address
n = g.reverse((40.7484284, -73.9856546198733), timeout=10) # Lat, Long to reverse geocode
print(n.address)
In [ ]:
In [5]:
# Create a dataframe from the copied table columns on the clipboard and display its first 10 records
df = pd.read_clipboard()
df.head(10)
Out[5]:
In [6]:
# Remove all characters except letters belonging to english alphabet, spaces and tabs
df['Name'] = df['Name'].str.replace('[^A-Za-z\s0-9]+', '')
df.head(10)
Out[6]:
In [7]:
# Create a new column "Address_1" to hold the updated building names
df['Address_1'] = (df['Name'] + ', New York City')
df.head(10)
Out[7]:
In [8]:
add_list = [] # an empty list to hold the geocoded results
for add in df['Address_1']:
print ('Processing .... ', add)
try:
n = g.geocode(add, timeout=10)
data = (add, n.latitude, n.longitude, n.address)
add_list.append(data)
except Exception:
data = (add, "None", "None", "None")
add_list.append(data)
In [9]:
# make a new dataframe to hold geocoded reult
add_list_df = pd.DataFrame(add_list, columns=['Address_1', 'Latitude', 'Longitude', 'Full Address'])
add_list_df.head(10)
Out[9]:
In [ ]:
In [10]:
# Extract the records where value of Latitude and Longitude are the same (that is: None)
geocode_found = add_list_df.loc[add_list_df['Latitude'] != add_list_df['Longitude']]
geocode_not_found = add_list_df.loc[add_list_df['Latitude'] == add_list_df['Longitude']]
geocode_not_found
Out[10]:
In [ ]:
In [11]:
g = ArcGIS() # redefine the API object
In [12]:
add_list = []
for add in geocode_not_found['Address_1']:
print ('Processing .... ', add)
try:
n = g.geocode(add, timeout=10)
data = (add, n.latitude, n.longitude, n.address)
add_list.append(data)
except Exception:
data = (add, "None", "None", "None")
add_list.append(data)
In [13]:
add_list_df = pd.DataFrame(add_list, columns=['Address_1', 'Latitude', 'Longitude', 'Full Address'])
add_list_df.head(10)
Out[13]:
In [ ]:
In [14]:
# convert Full Address, Latitude and Longitude dataframe columns to list
full_address_list = list(geocode_found['Full Address'])
long_list = list(geocode_found["Longitude"])
lat_list = list(geocode_found["Latitude"])
# create folium map object
geocoded_map = folium.Map(location=[40.7484284, -73.9856546], zoom_start=13) # location=[Lat, Long]
# loop through the lists and create markers on the map object
for long, lat, address in zip(long_list, lat_list, full_address_list):
geocoded_map.add_child(folium.Marker(location=[lat, long], popup=address))
geocoded_map.add_child(folium.CircleMarker(location=[lat, long], popup=address, radius=5, color='green', fill_color='green', fill_opacity=.2))
# Display the map inline
geocoded_map
Out[14]:
In [ ]:
In [ ]:
No comments:
Post a Comment