This post was inspired by John Nelson YouTube video on "How to Make This Drought Map Pt 1: DATA WRANGLING", where he manually wrangled the dataset for the year 2018.
What he did was great if you are just doing it for a single year. If you intend to repeat the workflow for several years, then the process can be time consuming and prone to mistakes. For this reason, I will recreate the workflow using python scripting and the whole process can be automated with few button clicks.
More specifically, I will cover the following processes:-
- Download and extract the zip folder
- Combine the shapefiles into a single folder
- Merge the shapefiles into shapefile
Lets get started.
1) Construct and Download zip files
First we need to download the dataset for all previous years. Here lets use python to generate the zip folder download links for all the years.
# Construct a list of droughtmonitor shp download links for several years...
dm_url_list = []
for x in range(0, 23):
x = str(x)
if len(x) < 2:
base_url = f'https://droughtmonitor.unl.edu/data/shapefiles_m//20{str(0) + x}_USDM_M.zip'
dm_url_list.append(base_url)
else:
base_url = f'https://droughtmonitor.unl.edu/data/shapefiles_m//20{x}_USDM_M.zip'
dm_url_list.append(base_url)
print(dm_url_list)
['https://droughtmonitor.unl.edu/data/shapefiles_m//2000_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2001_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2002_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2003_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2004_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2005_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2006_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2007_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2008_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2009_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2010_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2011_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2012_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2013_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2014_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2015_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2016_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2017_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2018_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2019_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2020_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2021_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2022_USDM_M.zip']
There are many ways to download files from URL in python, however simply using the inbuilt python module called webbrowser will do what we intended here as seen belwo.
import webbrowser
dm_url_list = ['https://droughtmonitor.unl.edu/data/shapefiles_m//2000_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2001_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2002_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2003_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2004_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2005_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2006_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2007_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2008_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2009_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2010_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2011_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2012_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2013_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2014_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2015_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2016_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2017_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2018_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2019_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2020_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2021_USDM_M.zip', 'https://droughtmonitor.unl.edu/data/shapefiles_m//2022_USDM_M.zip']
for url in dm_url_list:
webbrowser.open(url, new=2)
With the files downloaded, next we need to extract the zip files.
2) Combine the shapefiles into a single folder
Lets extract the zip files into a single folder.
We shall first extract the content of the parent/main zip file, before extracting the sub zip files.
# Extracting year main zip file
import os
from zipfile import ZipFile
# specifying the zip file name
zipfile_name = r"C:\Users\Yusuf_08039508010\Desktop\Working_Files\...\DataWrangling - U.S. Drought Monitor\2019_USDM_M.zip"
# opening the zip file in READ mode
with ZipFile(zipfile_name, 'r') as zip:
# zip.printdir() # printing all the contents of the zip file
folder_name = os.path.basename(zipfile_name).split('.')[0]
folder_path = os.path.dirname(zipfile_name)
complete_path = f'{folder_path}\\{folder_name}'
# Make directory
os.makedirs(complete_path, exist_ok=True)
# Extract the content of the zipfile into complete_path
zip.extractall(path=complete_path)
print('Done!')
# Extracting year sub zip files
year_folder = r"C:\Users\Yusuf_08039508010\Desktop\Working_Files\...\DataWrangling - U.S. Drought Monitor\2017_USDM_M"
shp_folder = year_folder +'\\'+ year_folder.split('\\')[-1] + 'SHP'
# Make directory
os.makedirs(shp_folder, exist_ok=True)
for dirpath, subdirs, files in os.walk(year_folder):
for f in files:
if f.endswith(".zip"):
with ZipFile(f'{year_folder}\\{f}', 'r') as zip:
zip.extractall(path=shp_folder)
print('Done!')
3) Merge the shapefiles into single shapefile
There are many ways to accomplish this using python. One easy way is by running processing algorithms in QGIS python console.
Note that the merge we will perform here is on polygon shapefiles with the same attribute fields and shape type.
import glob
import processing
# Using OS module
#pa = 'C:/Users/Yusuf_08039508010/Desktop/.../DataWrangling - U.S. Drought Monitor/2017_USDM_M/2017_USDM_MSHP'
#shp_files = [pa+'/'+x for x in os.listdir(shp_folder) if x.endswith('.shp')]
shp_folder = r'C:\\Users\\Yusuf_08039508010\\Desktop\\...\\DataWrangling - U.S. Drought Monitor\\2017_USDM_M\\2017_USDM_MSHP'
shp_files = glob.glob(f'{shp_folder}\\*.shp')
parameters = {
'LAYERS':shp_files,
'CRS':None,
'OUTPUT':'C:/Users/Yusuf_08039508010/Desktop/Working_Files/.../DataWrangling - U.S. Drought Monitor/2017_USDM_M/2017_USDM_MSHP/Merge_USDM.shp'}
processing.runAndLoadResults("native:mergevectorlayers", parameters )
print('Done....')
We can also use geopandas to merge the shapefiles as follow:-
import glob
import geopandas as gpd
shp_folder = r'C:\\Users\\Yusuf_08039508010\\Desktop\\Working_Files\\Fiverr\\2021\\012-December\\DataWrangling - U.S. Drought Monitor\\2017_USDM_M\\2017_USDM_MSHP'
shp_files = glob.glob(f'{shp_folder}\*.shp')
shp_gdf_list = []
for shp in shp_files:
shp_gdf = gpd.read_file(shp)
shp_gdf_list.append(shp_gdf)
# Merge the shapefile by concatinating them together...
merge_shp = gpd.GeoDataFrame(pd.concat(shp_gdf_list))
# Save to shp...
merge_shp.to_file('merge_shp_from_Geopandas.shp')
Conclusion
In this article, we have seen how to use python to wrangle the US drought monitoring data. Now we have an automated workflow that we can replicate to make our working process faster.
That is it!
No comments:
Post a Comment