Thursday, October 24, 2019

Generating Zip code map download links

Overview
For each of the US states zip code map listed on this page, we want to construct the map download link.

There several ways to get this task completed including using selenium or request/beatifulsoup modules. However, in this exercise, we are going to keep it simple and assume that we know the pattern at which the maps download links are made-up (indeed the pattern is same for all states and readily known) and we just need to generate them based on the file names.

The download link pattern is: URL + state_name + -zip-code-map.png
Examples:-
https://www.unitedstateszipcodes.org/maps/alabama-zip-code-map.png, 
https://www.unitedstateszipcodes.org/maps/alaska-zip-code-map.png, 
https://www.unitedstateszipcodes.org/maps/arizona-zip-code-map.png, 
e.t.c

So, we can easily construct each state's download link from their respective names...



Objectives
At the end of this tutorial, you should become familiar with:-
a) To become familiar with using string concatenation
b) To become familiar with using string split(), replace() and lower() methods
c) To become familiar with convert pandas series to list using tolist() method
d) To be able to use for loop to append string to empty list



Pseudocode
a) Read the spreadsheet containing the file names into a list
b) Clean the names to remove unwanted characters such as ' ('
c) Concatenate the strings to form the URLs



Code Snippet

import pandas as pd

# Read the spreadsheet file...
zip_df = pd.read_csv(r"C:\Users\Yusuf_08039508010\Desktop\GIS Data Processing Scripts\US_ZipMap_Size.csv")

# Convert the column to a list...
zip_list = zip_df['Maps'].tolist()


# ----------------------
# For each item in the list, split at ' (' and keep the first part...
name_list = []
for item in zip_list:
    name_list.append(item.split(' (')[0])


# ----------------------
# Download link URL is: 'https://www.unitedstateszipcodes.org/maps/' + stateName + '-zip-code-map.png'
download_link = []
for name in name_list:
    download_link.append('https://www.unitedstateszipcodes.org/maps/' + name.replace(' ', '-').lower() + '.png')

download_link




Explanation

Step 1: First we import pandas module and read the spreadsheet file into a dataframe.

import pandas as pd

# Read the spreadsheet file...
zip_df = pd.read_csv(r"C:\Users\Yusuf_08039508010\Desktop\GIS Data Processing Scripts\US_ZipMap_Size.csv")

# Convert the column to a list...
zip_list = zip_df['Maps'].tolist()

Step 2: Next, we need to split the sting and keep the useful part. The part needed is that before the ' (' character. Note that the character has a space followed by the open brace/parentheses.

# ----------------------
# For each item in the list, split at ' (' and keep the first part...
name_list = []
for item in zip_list:
    name_list.append(item.split(' (')[0])

Step 3: The last step is to replace spaces within the string by '-' and concatenate the url string to the variable string. The first part of the string is: 'https://www.unitedstateszipcodes.org/maps/' while the end part of the string is: '.png'

download_link = []
for name in name_list:
    download_link.append('https://www.unitedstateszipcodes.org/maps/' + name.replace(' ', '-').lower() + '.png')



Assignment Takeaway
An exercise to help you learn further is: Write a script that will extend the above script by adding the resulting list to a new column that corresponds to the file names as seen below, then save the result to spreadsheet file. The result will look like this:-




Reference Material

1] https://www.programiz.com/python-programming/methods/string/split
2] https://www.w3schools.com/python/ref_string_split.asp
3] https://www.programiz.com/python-programming/methods/string/replace

No comments:

Post a Comment