Monday, April 3, 2023

Wrangling and Mapping ‘list of colleges teaching MBBS in India’ with Python

 In this post, I will demonstrate how to wrangle and map the ‘list of colleges teaching MBBS in India’. The list is available on this webpage.

First I will collected the dataset, clean it into a GIS friendly format, geocode the colleges, then map it on the map of India sourced from GADM. This is a project you could 100% complete without coding using tools like QGIS or ArcGIS, but here I will use python code to do it from scratch.


Data collection

Lets collect the list of data from the web page into CSV file. There are ways to do this in python like using requests module, selenium module, beautifulsoup module, scrapy module etc.

In this situation, I will copy the html element the represent the table into a local file, then extract the table from the local CSV spreadsheet using python.

Using pandas library, with few lines of code we got the list into a CSV file as seen below;-

mbbs_df = pd.read_html(r"mbbs.html")
mbbs_df[0].to_csv(r"mbbs.csv", index=False)
mbbs_df = mbbs_df[0]

mbbs_df