Thursday, October 24, 2019

Generating Zip code map download links

For each of the US states zip code map listed on this page, we want to construct the map download link.

There several ways to get this task completed including using selenium or request/beatifulsoup modules. However, in this exercise, we are going to keep it simple and assume that we know the pattern at which the maps download links are made-up (indeed the pattern is same for all states and readily known) and we just need to generate them based on the file names.

The download link pattern is: URL + state_name + -zip-code-map.png

So, we can easily construct each state's download link from their respective names...

At the end of this tutorial, you should become familiar with:-
a) To become familiar with using string concatenation
b) To become familiar with using string split(), replace() and lower() methods
c) To become familiar with convert pandas series to list using tolist() method
d) To be able to use for loop to append string to empty list

a) Read the spreadsheet containing the file names into a list
b) Clean the names to remove unwanted characters such as ' ('
c) Concatenate the strings to form the URLs

Code Snippet

import pandas as pd

# Read the spreadsheet file...
zip_df = pd.read_csv(r"C:\Users\Yusuf_08039508010\Desktop\GIS Data Processing Scripts\US_ZipMap_Size.csv")

# Convert the column to a list...
zip_list = zip_df['Maps'].tolist()

# ----------------------
# For each item in the list, split at ' (' and keep the first part...
name_list = []
for item in zip_list:
    name_list.append(item.split(' (')[0])

# ----------------------
# Download link URL is: '' + stateName + '-zip-code-map.png'
download_link = []
for name in name_list:
    download_link.append('' + name.replace(' ', '-').lower() + '.png')



Step 1: First we import pandas module and read the spreadsheet file into a dataframe.

import pandas as pd

# Read the spreadsheet file...
zip_df = pd.read_csv(r"C:\Users\Yusuf_08039508010\Desktop\GIS Data Processing Scripts\US_ZipMap_Size.csv")

# Convert the column to a list...
zip_list = zip_df['Maps'].tolist()

Step 2: Next, we need to split the sting and keep the useful part. The part needed is that before the ' (' character. Note that the character has a space followed by the open brace/parentheses.

# ----------------------
# For each item in the list, split at ' (' and keep the first part...
name_list = []
for item in zip_list:
    name_list.append(item.split(' (')[0])

Step 3: The last step is to replace spaces within the string by '-' and concatenate the url string to the variable string. The first part of the string is: '' while the end part of the string is: '.png'

download_link = []
for name in name_list:
    download_link.append('' + name.replace(' ', '-').lower() + '.png')

Assignment Takeaway
An exercise to help you learn further is: Write a script that will extend the above script by adding the resulting list to a new column that corresponds to the file names as seen below, then save the result to spreadsheet file. The result will look like this:-

Reference Material


Monday, October 21, 2019

Building SQL Expressions in ArcGIS and QGIS

In the modern GIS industry, programming skill is an essential asset and one of the programming languages the is very popular within the industry is Structure Query Language (SQL) as you will later find out how it is been used in few moments.

Most of the query expressions used in ArcGIS or QGIS desktop software are derived from SQL. SQL is a standard language for storing, manipulating and retrieving data in databases.

Both ArcGIS and QGIS support the following common database engines: SQLite, MySQL, SQL Server, MS Access, Oracle, Sybase, Informix, PostgreSQL, and other database systems. When connected to any of them, you can take full advantage of SQL in GIS.

For small GIS projects where database isn't required, we make use of traditional GIS files type such as "Shapefile" which allow us to use query expressions that adhere to standard SQL expressions.

What is an expressions?
An expressions is a combination of "Constants, Variables, Operators and Functions" organized in an ordered statement that returns an output value. Expressions are unique to the computer language they are created in. An example of SQL expressions is: "SELECT * FROM <Layer_name> WHERE <Field_name> <Operator> <Value or String>".

If you data resides in a shapefiles or any of these (coverages, INFO tables, and dBASE tables), this part of the SQL expression (SELECT * FROM <Layer_name> WHERE) is automatically supplied for you, so you only provide this part (<Field_name> <Operator> <Value or String>) to query your data.

Since GIS data is made up of Spatial and Attribute, it is worth noting that 'Attribute Query' is similar to the standard SQL queries found in traditional database systems and this will be our focus in this article. On the other hand, 'Spatial Query' which allow operations such as "Contains, Equals, Intersects, Is Disjoint, Overlaps, Touches, Within and Crosses" requires some extension installed on traditional database systems to make them work.

Building SQL Expressions for Attribute Query

There are many places you can build expressions within both ArcGIS and QGIS software, some of the common places are listed below:-

ArcGIS Tools:
~ Select By Attribute
~ Definition Query
~ Field Calculator
~ Label Expression Dialog Box
~ Add Query Layer

QGIS Tools:
~ Select By Expression
~ Filter Query Builder
~ Field Calculator
~ Label Expression Dialog Box
~ Layer Property Display
~ Database Manager

Wednesday, October 9, 2019

Calculating the total size of zip code maps

On this US printable zip codes maps page, there is a list of all the US states zip code maps with their respective sizes in braces like this "Alabama ZIP Code Map (3.59MB)" as seen below...

Lets calculate the total size of all the maps using python scripting!

Off course, there are several or even better ways to get this done. But here we want to test our python skills on this, let us stick to using python 😏.

Some other reasons it is good idea we use python is that we can easily use our python skill to:-
1) make HTTP request to scrape/download the map data
2) generate the download links on the fly
3) create a bot to monitor change in map size (which could indicate the map has been updated).
4) visualization of the string including map/geographic visualization.

The list can go on and on, but I will keep it simple here to just calculate the total sum the map sizes.

Step 1:
First thing is to get the string/text off the web page into our python environment. There are several ways to do this as I have mentioned above, but I will just select, copy and paste it in a CSV file as seen below.

Step 2:
Read the CSV file in python. Here I will use the pandas module to read the CSV file, could have also used the CSV module to do this.

Monday, October 7, 2019

Filtering Missing Zip codes out of master Zip codes list

Here I have a list of zip codes, I want to know the missing zip code from the given list (these are the postal code in Texas, USA).

List 'available_zipcodes' contains the master zip codes and list 'given_zipcodes' contain the provided or working zip codes. Now I want check and filter out those zipcode that are NOT in the master zip codes.

These three lines of python code below will do it. It uses the 'for' loop with and 'if' statement. Basically, we loop through the list of 'given_zipcodes' and if it is not in the 'available_zipcodes', then we print it out.

If you care to run the script and don't want to type all that out, here below is the Code is...

available_zipcodes = [77389, 77086, 77346, 77018, 77040, 77388, 77065, 77080, 77041, 77396, 77385, 77354, 77382, 77067, 77066, 77090, 77345, 77355, 77373, 77339, 77043, 77302, 77304, 77070, 77375, 77095, 77433, 77069, 77038, 77091, 77380, 77092, 77316, 77429, 77377, 77379, 77064, 77088, 77338, 77449, 77386, 77381, 77493, 77356, 77068, 77014, 77084, 77055, 77301, 77303, 77384]

given_zipcodes = [77325, 77339, 77345, 77346, 77380, 77381, 77382, 77383, 77384, 77385, 77386, 77301, 77302, 77303, 77304, 77316, 77354, 77356, 77389, 77014, 77018, 77038, 77040, 77041, 77043, 77055, 77064, 77065, 77066, 77067, 77068, 77069, 77070, 77080, 77084, 77086, 77088, 77090, 77091, 77092, 77095, 77375, 77377, 77379, 77388, 77429, 77433, 77449, 77493, 77373, 77338, 77347, 77391, 77396, 77355]

for zipcode in given_zipcodes:
    if zipcode not in available_zipcodes:

In the case above, the missing zip codes are: 77325, 77383, 77347, 77391

Note: In a production job. these zip codes will probably come in a text file, just read the file into python lists and loop through as seen above.

That is it!