Wednesday, August 28, 2019

QGIS Remove Black Background Boarder from Raster Image

Often times, you are left with black boarder around an image you manipulated in QGIS as seen below. This is often cause because there is no data to display around data part of the image.

Here is how to get ride of the black background in QGIS 3.

Open the raster image layer property window and select the 'Transparency' tab. Then enter '0' under No data value >> Additional no data value.

Click 'Ok' to apply the changes. Your raster image should now have no black background color surrounding it as seen below.

That is it.

Monday, August 26, 2019

Get the row count of multiple excel spreadsheet files

Here I have many excel spreadsheet files within a folder as seen below...

The task is to return the number of rows in each of the excel files. I can go manually, open each file, scroll to the bottom and note down the row number. That will be cumbersome and time consuming given that number of files I have to cover.

So, I have to write a simple script in python that will handle this boring task accordingly as follow:-

Step 1: First things first, lets find a way to read all the .xlsx files. Here I used the glob module to handle this.

import glob

folder_xlsx = r"C:\Users\Yusuf_08039508010\Desktop\my-xlsx-folder"

# read all the individual order xlsx files
xlsx_files = glob.glob(folder_xlsx + '/*.xlsx')
what I have above is a list that contains path to all the excel files in the folder. Lets move on...

Step 2: Next step is to read each excel file into a pandas dataframe and use a function to count the number of rows in the dataframes. There are many functions to count the number of rows as seen below, but I will use this function 'len(df.index)'.

Here is the solution for the fisrt dataframe.

df = pd.read_excel(xlsx_files[0])

row_count = len(df.index)

To do for the whole excel files, we just write a for loop and save the into a list as seen below. Noticed that I used rsplit() function to get the file names to print it along its corresponding row count.

import pandas as pd
row_count_list = []
for xls_file in xlsx_files:
    df = pd.read_excel(xls_file)
    row_count = len(df.index)
    file_name = xls_file.rsplit('\\', 1)[1]
    file_details = file_name, row_count
print (row_count_list)

That is it!

P.S: You could easily extend the script above to do many other thing with the files. An example will be to merge all the files into one file using the pandas concat() method. So, instead of appending the file names and the row counts, we will simply append the dataframe as seen below.

df_list = []
for xls_file in xlsx_files:
    df = pd.read_excel(xls_file)
merge_df = pd.concat(df_list)

Thursday, August 8, 2019

Split string at the last occurrence of a string

I have a list of strings with varying length. However, the each string always end with certain same information (country in this case) as seen below.

data_list = ['Adams Smith, white, UK', 
             'Samuel Tom, Black, 29 leen st. NY, USA', 
             'Yaks Ramson, New Student, Yet to register, Romania']

As you can see, there are three items in the list and each item ends with a country name after a comma (,) sign.

When you loop through the items, you can split each item by comma like this: item.split(','). However, this isn't what I wanted, I want to split just at the last comma. In other words, I want to plit each of the string at the last occurrence of the comma (,) sign.

So, here the solution is to use a list method call rsplit(',', 1), which accept a second argument that tells how many times you want to split a string. Here I want to split the string just once, so my script will look like this...

data_list = ['Adams Smith, white, UK', 
             'Samuel Tom, Black, 29 leen st. NY, USA', 
             'Yaks Ramson, New Student, Yet to register, Romania']

item_list = []
for item in data_list:
    item_1 = item.rsplit(',', 1), # Not item.split(',')

Now, each item is split into two and you can access the individual countries as seen below:-