Tuesday, December 29, 2020

Count Numbers of Rows from bulk excel files

Here I got many excel files in a folder and I wanted to know the number of rows in each file.

import glob
import pandas as pd
files = glob.glob("C:\\Users\\Yusuf_08039508010\\Desktop\\LinkedIn Profile Scrapping\\Old Project\\Second Project\\New folder\\*.xlsx")

# Get file names
file_names = [os.path.basename(x) for x in files]
# ---------------------

count_list = []
i = 0

for f in files:
    df = pd.read_excel(f)
    # count rows based on column index...
    rows_count = df.count()[0] # this will not count NaN

    data = file_names[i], rows_count # Combine filename and row count into a dataframe
    
    count_list.append(data)
    
    i = i+1

df_count = pd.DataFrame(count_list, columns=['File', 'Count'])

df_count

 This will return a table with each files name and corresponding number of rows in the file as seen below.


Enjoy!

Thursday, December 10, 2020

Python - Separating photos into subfolders

Read Images in a Folder and Move to Sub folder Based on given Condition

Few weeks ago on this blog post, we downloaded pictures of all the US presidents. On this post, we want to separate the photos into subfolders depending on whether the president is from 'Democratic' or 'Republican'. 

If this sounds interesting to you, then keep reading!

Tuesday, December 1, 2020

Google Apps Script (GAS) - Make GoogleForms from GoogleSheets

On this post, I will demonstrate how tom make a GoogleForms from GoogleSheets using Google Apps Script (GAS). There are several advantages of using GAS to automate your form creation over manual procedure.

GAS is a Googles owned scripting language that is flavor of a modern JavaScript, so if you already know javascript, then you should be confortable with GAS. With GAS, you can do more with Google apps — like GoogleCalendar, GoogleDocs, GoogleDrive, GoogleGmail, GoogleSheets, and GoogleSlides.

The form we will make is a multi choice question, and the obvious reason why we will use GAS to make it is because it has a lot questions to be added. Hence, it makes sense to prepare the questions and there option in a GoogleSheets, then generate the form from there. That is the approach we are going to take.

Create a GoogleSheet where the first column A is the question and the columns B, C and D are the answer options (only one option is the correct answer) as seen below.


Friday, November 20, 2020

Pandas Dataframe to JSON

 Pandas has a function (.to_json()) that takes a series/dataframe and converts it into JSON object.

It has an interesting parameter (orient) that allow different orientations of the JSON string format. Allowed values are: {'split','records','index','columns','values','table'}. Let see how it works.

Assuming we have the table below in out dataframe and now we want it as a JSON data for use on another web platform (JSON work better on web than tabular data).



As usual, we read the data table into a dataframe variable (here it is called: json_df).


Then the .to_json() function is called on the dataframe to convert it to JSON like this:-

json_df.to_json()

There are many parameters we can pass into the function such as path to save the JSON and orient to orient the JSON in different formats. Below, we will explore the various orient options available.


As you can see, each orient value has its own different JSON format or structure.

print(json_df.to_json(orient='columns'))
output = {"Question":{"0":"In what country is the city 'Tokyo'?","1":"In what country is the city 'Delhi'?","2":"In what country is the city 'Cairo'?","3":"In what country is the city 'Chongqing'?","4":"In what country is the city 'Orlando'?","5":"In what country is the city 'Abuja'?"},"Option 1":{"0":"Japan","1":"India","2":"Argentina","3":"Philippines","4":"Mexico","5":"Spain"},"Option 2":{"0":"India","1":"United States","2":"Bangladesh","3":"Pakistan","4":"Angola","5":"Nigeria"},"Option 3":{"0":"Brazil","1":"China","2":"Egypt","3":"China","4":"United States","5":"Canada"}}


print(json_df.to_json(orient='split'))
output = {"columns":["Question","Option 1","Option 2","Option 3"],"index":[0,1,2,3,4,5],"data":[["In what country is the city 'Tokyo'?","Japan","India","Brazil"],["In what country is the city 'Delhi'?","India","United States","China"],["In what country is the city 'Cairo'?","Argentina","Bangladesh","Egypt"],["In what country is the city 'Chongqing'?","Philippines","Pakistan","China"],["In what country is the city 'Orlando'?","Mexico","Angola","United States"],["In what country is the city 'Abuja'?","Spain","Nigeria","Canada"]]}


print(json_df.to_json(orient='records'))
output = [{"Question":"In what country is the city 'Tokyo'?","Option 1":"Japan","Option 2":"India","Option 3":"Brazil"},{"Question":"In what country is the city 'Delhi'?","Option 1":"India","Option 2":"United States","Option 3":"China"},{"Question":"In what country is the city 'Cairo'?","Option 1":"Argentina","Option 2":"Bangladesh","Option 3":"Egypt"},{"Question":"In what country is the city 'Chongqing'?","Option 1":"Philippines","Option 2":"Pakistan","Option 3":"China"},{"Question":"In what country is the city 'Orlando'?","Option 1":"Mexico","Option 2":"Angola","Option 3":"United States"},{"Question":"In what country is the city 'Abuja'?","Option 1":"Spain","Option 2":"Nigeria","Option 3":"Canada"}]


print(json_df.to_json(orient='values'))
output = [["In what country is the city 'Tokyo'?","Japan","India","Brazil"],["In what country is the city 'Delhi'?","India","United States","China"],["In what country is the city 'Cairo'?","Argentina","Bangladesh","Egypt"],["In what country is the city 'Chongqing'?","Philippines","Pakistan","China"],["In what country is the city 'Orlando'?","Mexico","Angola","United States"],["In what country is the city 'Abuja'?","Spain","Nigeria","Canada"]]


print(json_df.to_json(orient='table'))
output = {"schema": {"fields":[{"name":"index","type":"integer"},{"name":"Question","type":"string"},{"name":"Option 1","type":"string"},{"name":"Option 2","type":"string"},{"name":"Option 3","type":"string"}],"primaryKey":["index"],"pandas_version":"0.20.0"}, "data": [{"index":0,"Question":"In what country is the city 'Tokyo'?","Option 1":"Japan","Option 2":"India","Option 3":"Brazil"},{"index":1,"Question":"In what country is the city 'Delhi'?","Option 1":"India","Option 2":"United States","Option 3":"China"},{"index":2,"Question":"In what country is the city 'Cairo'?","Option 1":"Argentina","Option 2":"Bangladesh","Option 3":"Egypt"},{"index":3,"Question":"In what country is the city 'Chongqing'?","Option 1":"Philippines","Option 2":"Pakistan","Option 3":"China"},{"index":4,"Question":"In what country is the city 'Orlando'?","Option 1":"Mexico","Option 2":"Angola","Option 3":"United States"},{"index":5,"Question":"In what country is the city 'Abuja'?","Option 1":"Spain","Option 2":"Nigeria","Option 3":"Canada"}]}


print(json_df.to_json(orient='index'))
output = {"0":{"Question":"In what country is the city 'Tokyo'?","Option 1":"Japan","Option 2":"India","Option 3":"Brazil"},"1":{"Question":"In what country is the city 'Delhi'?","Option 1":"India","Option 2":"United States","Option 3":"China"},"2":{"Question":"In what country is the city 'Cairo'?","Option 1":"Argentina","Option 2":"Bangladesh","Option 3":"Egypt"},"3":{"Question":"In what country is the city 'Chongqing'?","Option 1":"Philippines","Option 2":"Pakistan","Option 3":"China"},"4":{"Question":"In what country is the city 'Orlando'?","Option 1":"Mexico","Option 2":"Angola","Option 3":"United States"},"5":{"Question":"In what country is the city 'Abuja'?","Option 1":"Spain","Option 2":"Nigeria","Option 3":"Canada"}}


The structure you will adopt will depend on where you want to use it. Usually, you would find sample JSON structure from the platform you intend to use. Compare the option and adopt the one that suit your project.

That is it!

Thursday, November 12, 2020

Pandas dataframe.append() function - Append 'Jeo Biden' to US presidents dataframe

Here we got a dataframe of 45 USA presidents with four columns as seen below.


The dataframe is this wikipedia table that lists the presidents of the United States. As you already saw, the wikipedia table  has been updated. So, we need to update our dataframe with 46th president.



Now we got a new president-elect (Joseph Robinette Biden Jr.), we will create a new dataframe for him and use the append() function to add it to the end of our dataframe.

To create the Biden's dataframe, we store his data in a dictionary and use it to make a one row dataframe to be appended to the main dataframe above.
biden_data = {
            'Name':['Joseph Robinette Biden Jr.'], 
              'Image':['https://upload.wikimedia.org/wikipedia/commons/9/99/Joe_Biden_official_portrait_2013_cropped_%28cropped%29.jpg'], 
              'Party':['Democratic'], 
              'Presidency Period':['20-Jan-21 – Incumbent']
             }

biden_df = pd.DataFrame(biden_data)
biden_df

Call the append function on the USA presidents dataframe like this:- You'll notice that the index is '0' for the total line. We want to change that from '0' to '46' using rename.

us_presidents_df.append(biden_df).rename(index={0:"45"})


That is it!

Mass make folders from list of states

 I have written on this before at How to make Multiple Folders/Directories in Python. Refer to that page for more detailed instructions.

Anyway, here I have a list of states in Nigeria that I want to generate folder for each.

import os
states = ['Ebonyi', 'Edo', 'Ekiti', 'Enugu', 'Fct', 'Gombe', 'Imo', 'Jigawa', 'Kaduna', 'Katsina', 'Kebbi', 'Kogi', 'Kwara', 'Lagos', 'Nasarawa', 'Niger', 'Ogun', 'Ondo', 'Osun', 'Akwa Ibom', 'Oyo', 'Plateau', 'Rivers', 'Sokoto', 'Taraba', 'Yobe', 'Zamfara', 'Anambra', 'Bauchi', 'Bayelsa', 'Benue', 'Borno', 'Cross River', 'Delta']

for n in states: # Looping over the individual state to make folder name
    if type(n) == str:
        print ('Making directory for.... ', n)

        directory = n + '__state'
        if not os.path.exists(directory):
            os.makedirs(directory)

The resulting output is seen below:-

Enjoy!

Tuesday, November 10, 2020

Format number to thousands string ₦ currency using Python f-string

 Here we have list of digits representing prices of items in Nigeria Naira (₦) currency like this:-

money_naira = [2495, 93988, 39118, 19973, 39579, 35723, 80216, 56725, 16132, 82275, 18439, 34919, 17117, 85879, 51153, 7737, 35367, 9753, 86648, 87650, 58011, 2219, 1768, 8612, 2901, 5041, 3405, 8486, 7742, 5008, 7150, 5553, 9320, 2736, 9151, 9894, 2812, 6466, 1194, 4322, 6696, 6144, 6227, 2479, 3027, 4052, 7580, 1736, 9979, 1638, 2369, 8702, 1353, 9695, 4072, 4065, 7742, 7887, 7620]
But there is a problem! These number doesn't reads well by humans, it is just a numeric digit that can mean anything. So, we have to represent it in such a way that readers can tell they are actual money in Nigeria Naira (₦).

That is to say; 2495 will be presented as ₦2,495.00


This can be achieved using python string formatting (f-string or format()). f-string is the newest way to format string in python, it is available since Python 3.6. You read more from this f-string tutorial.

Lets see how it done.

for money in money_naira:
    # Using f-string
    print(f"₦{money:,.2f}")
    
    # Using format() function
    print("₦{:,.2f}".format(money))

The output is:

₦2,495.00, ₦93,988.00, ₦39,118.00, ₦19,973.00, ₦39,579.00, ₦35,723.00, ₦80,216.00, ₦56,725.00, ₦16,132.00, ₦82,275.00, ₦18,439.00, ₦34,919.00, ₦17,117.00, ₦85,879.00, ₦51,153.00, ₦7,737.00, ₦35,367.00, ₦9,753.00, ₦86,648.00, ₦87,650.00, ₦58,011.00, ₦2,219.00, ₦1,768.00, ₦8,612.00, ₦2,901.00, ₦5,041.00, ₦3,405.00, ₦8,486.00, ₦7,742.00, ₦5,008.00, ₦7,150.00, ₦5,553.00, ₦9,320.00, ₦2,736.00, ₦9,151.00, ₦9,894.00, ₦2,812.00, ₦6,466.00, ₦1,194.00, ₦4,322.00, ₦6,696.00, ₦6,144.00, ₦6,227.00, ₦2,479.00, ₦3,027.00, ₦4,052.00, ₦7,580.00, ₦1,736.00, ₦9,979.00, ₦1,638.00, ₦2,369.00, ₦8,702.00, ₦1,353.00, ₦9,695.00, ₦4,072.00, ₦4,065.00, ₦7,742.00, ₦7,887.00, ₦7,620.00

₦2,495.00, ₦93,988.00, ₦39,118.00, ₦19,973.00, ₦39,579.00, ₦35,723.00, ₦80,216.00, ₦56,725.00, ₦16,132.00, ₦82,275.00, ₦18,439.00, ₦34,919.00, ₦17,117.00, ₦85,879.00, ₦51,153.00, ₦7,737.00, ₦35,367.00, ₦9,753.00, ₦86,648.00, ₦87,650.00, ₦58,011.00, ₦2,219.00, ₦1,768.00, ₦8,612.00, ₦2,901.00, ₦5,041.00, ₦3,405.00, ₦8,486.00, ₦7,742.00, ₦5,008.00, ₦7,150.00, ₦5,553.00, ₦9,320.00, ₦2,736.00, ₦9,151.00, ₦9,894.00, ₦2,812.00, ₦6,466.00, ₦1,194.00, ₦4,322.00, ₦6,696.00, ₦6,144.00, ₦6,227.00, ₦2,479.00, ₦3,027.00, ₦4,052.00, ₦7,580.00, ₦1,736.00, ₦9,979.00, ₦1,638.00, ₦2,369.00, ₦8,702.00, ₦1,353.00, ₦9,695.00, ₦4,072.00, ₦4,065.00, ₦7,742.00, ₦7,887.00, ₦7,620.00


That is it!

Monday, November 9, 2020

Javascript array Vs Python list

Introduction

In this article, I will talk about one of the most important and similar data structure in both Javascript and Python.

Javascript array (similar to Python list) is an object that lets you store multiple values in a single variable. For example, an array/list of countries would look like this: ['Nigeria', 'Canada', 'England', 'Mexico', 'China', 'India', 'Kuwait']. While this array/list is made up of only string data type, it can contain multiple data types.

Lets take a look at some commonly used methods when working with array/list in both Javascript and Python.


Creating array/list


There are several ways of creating an array/list in both JS and Py. We will see some of the common ways here.

JS:
let countries0 = ['Nigeria', 'Canada', 'England', 'Mexico', 'China', 'India', 'Kuwait'];

let countries1 = new Array('Nigeria', 'Canada', 'England', 'Mexico', 'China', 'India', 'Kuwait'); 

let countries2 = new Array()
countries1[0] = 'Nigeria';
countries1[1] = 'Canada';
countries1[3] = 'England';



Py:

countries = ['Nigeria', 'Canada', 'England', 'Mexico', 'China', 'India', 'Kuwait']

The list() function in python is used to convert from other types to list.

Saturday, November 7, 2020

Screenshots of Google maps at different zoom levels

 The scenario here is that we have latitude and longitude coordinates of cities that we want their Google maps screenshots at different zoom levels.

As at the time of writing, the construct for google maps url is like this: https://www.google.ng/maps/@latitude,longitude,zoomz >>>> https://www.google.ng/maps/@9.057809,7.4903376,15z

So, at that city coordinate we want to take the screenshots at varying zooms from 6 to 20. The url construct will be:-

  • https://www.google.ng/maps/@9.057809,7.4903376,6z
  • https://www.google.ng/maps/@9.057809,7.4903376,7z
  • https://www.google.ng/maps/@9.057809,7.4903376,8z
  • https://www.google.ng/maps/@9.057809,7.4903376,9z
  • https://www.google.ng/maps/@9.057809,7.4903376,10z
  • https://www.google.ng/maps/@9.057809,7.4903376,11z
  • https://www.google.ng/maps/@9.057809,7.4903376,12z
  • https://www.google.ng/maps/@9.057809,7.4903376,13z
  • https://www.google.ng/maps/@9.057809,7.4903376,14z
  • https://www.google.ng/maps/@9.057809,7.4903376,15z
  • https://www.google.ng/maps/@9.057809,7.4903376,16z
  • https://www.google.ng/maps/@9.057809,7.4903376,17z
  • https://www.google.ng/maps/@9.057809,7.4903376,18z
  • https://www.google.ng/maps/@9.057809,7.4903376,19z
  • https://www.google.ng/maps/@9.057809,7.4903376,20z


As you would have noticed from the url above, only the zoom level changes. The city coordinate remains the same for the fifteen zoom levels (6z - 20z).

Lets write a python script that will handle this for us.


1) Generate the URLs

base_url = 'https://www.google.ng/maps/@9.057809,7.4903376,{}z'

for x in range(6, 21):
    print(base_url.format(str(x)))


2) Take screenshots

Here I used selenium module to open and take screenshot of each map url. I also used the time module to make two second delays to allow the map load completely before taking the screenshots.

Also note that I saved the urls in a list called: url_list

import time
from selenium import webdriver


# Load chrome browser driver
chrome_driver = 'C:\\Users\\Yusuf_08039508010\\Documents\\chromedriver.exe'
driver = webdriver.Chrome(chrome_driver)


for url in url_list:
    driver.get(url)
    time.sleep(2)
    
    img_name = url.split(',')[-1]
    driver.save_screenshot(img_name + ".png")
    
    print('Saving image...', img_name)

That is it!

Wednesday, November 4, 2020

Split dataframe into chunks

The dataset we will use for this demo is the sample 'World Cities Database' from Simplemaps.com. Download a copy from the given url above.



Upon inspecting the file, you will see that there are 15493 rows and 11 columns. 

There several ways to split a dataframe into parts or chunks. How it is split depends on how to dataframe want to be used. In this post we will take a look at three ways a dataframe can be split into parts. Specifically, we will look at:-

Friday, October 30, 2020

Mp3 of Nigeria state names using python Text to Speech (TTS) library

I was working on this project, where I needed an audio file to say something like "Kaduna state has a population of 6,113,503".

Doing this manually by recording the mp3 and saving it for the whole 30+ states is not going to be an easy task, so I decided to look for a script solution. Luckily, I found a python library that can do it. 

There are several TTS libraries in python. The one am going to use is called "pyttsx3" which works without internet connection and supports multiple TTS engines including Sapi5, nsss, and espeak.

Lets get our hands dirty.

I made use of this table (List of Nigerian states by population) on wikipedia. To get the table data, lets use pandas read_html() function which return list of table from HTML as dataframes. The table we are interested in is the first table, so it is the first dataframe.
import pandas as pd

# Read table from HTML....
df = pd.read_html('https://en.wikipedia.org/wiki/List_of_Nigerian_states_by_population')
df[0]


import pandas as pd
import pyttsx3


# Note that df is a list of dataframe....

# Construct the TTS string from the df[0] that say: "<XXX State> has a population of <Population>"
# df[0]['State'][0] + ' has a population of ' + format(df[0]['Population (2006)'][0], ',')

# Using for loop...
tts_list = []
i, j = 0, 0
for item in df[0]['State']:
    tts = df[0]['State'][i] + ' has a population of ' + format(df[0]['Population (2006)'][j], ',')
    tts_list.append(tts)
    
    i += 1
    j += 1
    

# Using list comprehension with zip()...
tts_list = [ i + ' has a population of ' + format(j, ',') for i, j in zip(df[0]['State'], df[0]['Population (2006)']) ]


# We now have a list of all the string we wanted, so lets use the list to create mp3...
engine = pyttsx3.init()

i = 0
for s in tts_list:
    # engine.say(s) # Speak the mp3
    engine.save_to_file(s, df[0]['State'][i]+'.mp3') # say mp3 to file/folder
    engine.runAndWait()
    
    i += 1

The pyttsx3 module can do a lot more including setting volume and setting voices for male/female etc.

You can download the MP3 files here.

Enjoy!

Tuesday, October 27, 2020

QGIS Attribute table Conditional formatting

Not in all instances that we want to run query to select items on attribute table. Sometimes we just want to highlight of format the attribute table so that certain item that met a specified characteristics are noticeable on the table.

For example, we may want to see rows that have duplicate records or rows whose name ends with a certain alphabet. Whatever the case maybe, QGIS has a handy tool called "Conditional formatting" that allow you format your attribute table as you wanted. Let see some few examples:-


Conditional formatting base on duplicate records

We can use the count() function which returns the count of matching features to check for duplicte records. To do this, we check if the count is greater than one, if so than the record appeared more than once and hence it is duplicated.

The expression is: count("Name" , "Name") > 1 where "Name" is the attribute field name. We can select the duplicated attributes as seen below. However, what we wanted here is to conditionally format the table base on duplicate records.


To achieve conditional formatting, click on the "conditional formatting" icon as seen below. Then add a new rule. Note, you can format just the field or the full row.



Add a name, enter the conditional expression (in this case it is: count("Name" , "Name") > 1). Then set the background and text colors. After which you will click on 'Done' button to apply the formatting.



Conditional formatting base on name ending with 'a'

Here we will use the regexp_match() function to find names that end with alphabet "a". The correct expression is: regexp_match("Name", 'a$'), this has to do with regular expressions.

We will add this formatting to the one above. So, we just need to add another "New Rule", to format items under 'Name' field/column that ends with 'a'.


We now have colors separating all the criteria we wanted in a nice color table.

That is it!

Friday, October 23, 2020

Download image files using python requests module

 Here I have list of data with links to their corresponding images. Specifically, this is a list of presidents in the USA and Nigeria.

I have to download the images/photos. So, I wrote a python script that uses the requests library to download the images as seen below:-

There are two approaches when saving an image file (or any file) on the hard disk. One is using the open with close methods or using the with statement.

import requests
import pandas as pd

image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/7/70/John_Adams%2C_Gilbert_Stuart%2C_c1800_1815.jpg/842px-John_Adams%2C_Gilbert_Stuart%2C_c1800_1815.jpg"

# -------- 1st approach ------------

response = requests.get(image_url)

img_file = open("downloaded_img/sample_image.png", "wb")
img_file.write(response.content)
img_file.close()

# -------- 2nd approach ------------

img_file = requests.get(image_url).content
with open('downloaded_img/image_name.jpg', 'wb') as f:
    f.write(img_file)

Downloading the US presidents photos:-




us_presidents_df = pd.read_csv(r"C:\Users\Yusuf_08039508010\Desktop\US Presidents\USA List.csv")

# Bulk processing....
for name, img in zip(us_presidents_df['Name'], us_presidents_df['Image']):
    print('Processing....', name, img)
    
    img_file = requests.get(img).content
    with open('downloaded_img/US Presidents/'+name+'.jpg', 'wb') as f:
        f.write(img_file)


Downloading Nigerian presidents photos:-

nigeria_presidents_df = pd.read_csv(r"C:\Users\Yusuf_08039508010\Desktop\NG List.csv")

for name, img in zip(nigeria_presidents_df['Name'], nigeria_presidents_df['Image']):
    print('Processing....', name, img)
    
    response = requests.get(img)

    img_file = open('downloaded_img/NG Presidents/'+name+'.png', "wb")
    img_file.write(response.content)
    img_file.close()
    
print('Completed....')





That is it!

Tuesday, October 13, 2020

Extracting Hyperlink 'href' attribute from excel

On this page I will show you how to extract the link/url from the text in each cell of the excel sheet. The excel document (.xlsx file) is as seen below, contains hyperlink text. We are interested in the actual url embedded in the href link anchor.


We can access each link be right clicking on each cell and edit hyperlink as seen below;-


Doing this manually is tedious especially if we have to do it for many cells. So, here I will use a python module for manipulating excel files. This module is called: openpyxl.

The pseudocode is as follow:-

  • Read or Load the excel file
  • Read the active sheet or get the sheet by name
  • Access the sheet's column or cells

import openpyxl

# Read or Load the excel file...
work_book = openpyxl.load_workbook(r"C:\Users\Yusuf_08039508010\Desktop\Book1_gethref.xlsx")

# Read the sheet to use by name or active
work_sheet = work_book['Sheet1']
# work_sheet = work_book.active

# Access sheet cells...
print(work_sheet.cell(row=2, column=1).hyperlink.target)
print(work_sheet['A1'].value)
print(work_sheet['A2'].value)
print(work_sheet['A3'].value)


# Access sheet column...
print(work_sheet['A'])

# Loop over column and get the text and hyperlink...
for cell in work_sheet['A']:
    print(cell.value)
    try:
        print(cell.hyperlink.target)
    except AttributeError:
        pass

That is it!

Saturday, October 3, 2020

Render pandas Dataframe as HTML

 Here is a situation where we go bunch of spreadsheet files and we want all of them converted into beautiful HTML table.

Fortunately, pandas has the 'to_html()' method which converts a dataframe into html table. However, the table will need some CSS applied to make is look beautiful. So, instead of writing the CSS manually, we can use Bootstrap classes to achieve same decoration.



First lets read the CSV files and create a list dataframes using the code below...

import glob
import pandas as pd


csv_files = glob.glob(r'C:\Users\Yusuf_08039508010\Desktop\*.csv')

# create dataframe list...
df_list = [pd.read_csv(f) for f in csv_files]


Now, lets use the 'to_html()' method to convert the first dataframe into HTML.


html_df = df_list[0]

# render dataframe as html
html = html_df.to_html()

# write html to file
text_file = open("index.html", "w")
text_file.write(html)
text_file.close()

Thursday, October 1, 2020

Python URL Encode

Introduction
URL encoding, is a way to encode special characters in URL so that it is in safe and secure to transmit over the internet. It is also known as 'percent encoding'.

URLs can only be sent over the Internet using the ASCII (American Standard Code for Information Interchange) character-set, so characters outside ASCII need to be converted into a valid ASCII format.

Let assume we have some set of phrases that we want to search via google and retrieve the search result on the first page. This mean we have to parse those phrases into a google search url that will look like this: 'https://www.google.com/search?q=X' where X stands for the search phrase.

Now we can replace the X with the specific phrases we wanted to search. But the problem is if one o the phrases contains special characters the search url won't work.

For example, this (https://www.google.com/search?q=Advanced-mobile-success) will work fine but this (https://www.google.com/search?q=Advanced mobile success) won't work fine as expected because of the spaces between the word. Those space need to be converted or encoded into internet friendly character that will look like this "https://www.google.com/search?q=Advanced%20mobile%20success".

Some regular modern browsers have the ability to do this encoding on the fly without you knowing. But when you want to construct such query url in a program or to communicate with an API, it is save if you encode them before parsing them. In python, this encoding can be handled using the urllib.parse module.

When you import the module like this import urllib.parse you will access to this functions quote(),
quote_plus() and urlencode(). They slight do different encoding, however urlencode() is more frequently used so I will talk about it in few seconds.

To encode the query string query_string = 'Hellö Wörld@Python' to our base url, the code will look like this:-
import urllib.parse

base_url = 'https://www.google.com/search?q='
query_string = 'Hellö Wörld@Python'

urllib.parse.quote(query_string)

url = base_url + urllib.parse.quote(query_string)
url



Assuming of base url has more than just the query string parameter, then we will use urlencode() function like this;-

import urllib.parse

base_url = 'https://www.google.com/search?q='
query_string = 'local search on '

params = {'name': 'Umar Yusuf', 'school': ['school 1', 'school 2', 'school 3']}

urllib.parse.urlencode(params, doseq=True)

url = base_url + urllib.parse.quote(query_string) + urllib.parse.urlencode(params)
url


As you can see, we are able to encode multiple parameters into the url suing the urlencode() function. This is how powerful the module is and you can construct very complex friendly urls.


It should be noted that for a simple url encoding like this (https://www.google.ng/maps/@9.057809,7.4903376,15z), the route above will be overkilling. A simple string .format() function will be enough.

This simple url link to a google maps location defined by three variables (latitude, longitude and zoom level). So, we can use the string .format() function to handle its string construct as seen below.


latitude = 9.057809
logitude = 7.4903376
zoom = 15

base_url = 'https://www.google.ng/maps/@{},{},{}z'
url = base_url.format(latitude, logitude, zoom)

Happy coding!

Friday, September 25, 2020

Mapping dictionary keys/values to pandas dataframe

 This is a senario where we have a dataframe representing states and their geo-political zones. The geo-political zone column is however abbreviated as seen below, so we want to have a new column that will hold the full meaning of the corresponding abbreviation.


The abbreviated letters are defined in dictionary, which will be used to map the new full meaning column.
look_up_dict = {'NC': 'North Central', 'NE': 'North East', 'NW': 'North West', 'SE': 'South East', 'SW': 'South West', 'SS': 'South South'}

The lookup dictionary could come from a different dataframe, that is to say the dataframe can be converted to dictionary for this lookup purpose.

A solution here was to use the map() function to create a new column by mapping the dictionar keys to the values on geopolitical zone column 'GeoPZ'

df['Geo Political Zones'] = df['GeoPZ'].map(look_up_dict)

How about if we just want to replace the existing 'GeoPZ' column without creating a new one? There are couple of ways to get this done by using replace() or update() methods or simply by overriding the existing column.

df['GeoPZ'] = df['GeoPZ'].map(look_up_dict)

df['GeoPZ'] = df['GeoPZ'].replace(look_up_dict)

To use update() method, your dictionary keys must be numeric indices. Use .keys() to check your dictionary keys.
df.GeoPZ.update(pd.Series(look_up_dict))


Lambda function

You can use a Lamda function to perform operations on the fly while creating a new column. Lets say, we want a new column that will hold character count/lenght of each state's name. Then a Lamda function will come in handy as seen below;-

df['State_LCount'] = df['State'].map(lambda x: len(str(x)))

Thank you for following!

Tuesday, September 22, 2020

ColorBrewer for map designers

This article is about choosing a friendly colors for your next map design.

Random color ramp

Greens color ramp

Most modern mapping software will give you great color ideas for your map. However, the problem with the software suggested color is that there are only a handful of those colors and in some cases you will like to manually setup your colors since the default color ranges may not fit into your design.



If you find yourself in this situation and you are struggling to combine colors in a color friendly manner, then you are in the right place.

Here I will talk about a tool that help solve this problem and at the end you should be comfortable selecting proper colors for your map.

To tool is called ColorBrewer. It is a web based tool that provides "Color Advice for Maps".


The tool provides different properties to select based on your map. For example number of data class for your map, nature of the data (Sequential, Diverging and Qualitative). And many other properties as seen on the website.

Some other tools to help you combine friendly colors are:-











3) Flat UI Color Picker





Study the tools and choose the perfect friendly color for your next map.

Happy mapping!