Friday, April 9, 2021

Extracting data from .HAR file

 A HTTP Archive file (shorten as 'HAR file'), is a JSON format used for tracking information between a web browser and a website. The common extension for these files is '.har'.

In python there is third party module called "Haralyzer" developed for getting useful stuff out of HAR files.

Since HAR files are in JSON formats, I will not use the "Haralyzer" module instead I will read the .har file and extract data from the text. Another reason I don't want to use the library is that I don't want to install new third party library on my machine most especially that the haralyzer module depends on another third library "six".

Other than that nothing wrong in using a library that reads the .har file directly.

Let's get our hands dirty...


How to get a HAR file

Practically, any website that uses JSON format as its data communication pipeline will generate a .har file on clients browser which can be accessed from the browser's developer tool.

Lets use this use this website on Earthquake data by USGS. Open the website and go to your browser developer tool, then select 'Network' tab >> XHR >> Expot HAR...


This will download a HAR files that contains JSON representation of the earthquake data as seen below...


You can save the file with any name in a location you can remember, we will use it in the next section. Note that the file is a GeoJSON with Padding.

Thursday, April 1, 2021

How to make black and white Road network map in Mapbox Studio

Mapbox Studio is a platform used by developers to prepare maps for mobile, desktop and web applications.

In this post, we are going to prepare a black and white road network map similar to what you see below.


There are many startup web applications that allow you create this kind of map and download it for a fee! In few moments, you will learn how to make your own maps and you don't have to spend money buying maps.

See an example below that cost $59.


Step-by-step instructions



  • Create a new map by clicking on "New Style" button.


  • Select from the map style templates (here we will use the Blank Template)


  • Rename the map to a suitable name and add map components and layers.


To achieve this type of map, we need to add the following map components and layers:-

  1. Road Network
  2. Land, Water and Sky
  3. Administrative Boundaries

On each component/layer adjust the settings to fit what you wanted. For example, I set the administrative boundaries base to white etc.


After which you can publish and share the map as WMTS for use in a desktop software like QGIS for further map processing as you will see below in a moment.


Wednesday, March 24, 2021

Looping over an iterable (array/list) in JavaScript Vs Python

 Lets see what it takes to loop over an iterable using for-loop in both JavaScript Vs Python. By the way, an iterable is an object capable of returning its members one at a time, permitting it to be iterated over in a for-loop.

Assuming we have this iterable : m = [3.23, 4.56, 5.3, 2.44, 6.7, 12.4, 566] and we want to perform some math operation on each element (in this case: 2 to the power of element divided by 2).

The math formulae is as follow:-

For JavaScript

Math.pow(2, element) / 2


For Python

(2**element)/2


The solutions

JavaScript Solution

m = [3.23, 4.56, 5.3, 2.44, 6.7, 12.4, 566]

for (let i=0; i<m.length; ++i){
console.log(Math.pow(2, m[i]) / 2);
}


m = [3.23, 4.56, 5.3, 2.44, 6.7, 12.4, 566]

for (let i in m){
console.log(Math.pow(2, m[i]) / 2);
}

Python Solution

m = [3.23, 4.56, 5.3, 2.44, 6.7, 12.4, 566]

for i in m:
    print((2**i)/2)




That is it!

Thursday, March 18, 2021

Python script - Merge PDF files into a single file

 This script make use of the PyPDF2 library to merge list of pdf files into one big file.


from PyPDF2 import PdfFileMerger
from os import listdir

input_dir = r"C:\Users\Yusuf_08039508010\ND 2 WAEC Result Check\Result" 
#your input directory path


merge_list = []

for x in listdir(input_dir):
    if not x.endswith('.pdf'):
        continue
    merge_list.append(input_dir +'\\'+ x)

merger = PdfFileMerger()

for pdf in merge_list:
    merger.append(pdf)

merger.write(input_dir + "\pdf_file_name.pdf") #your output directory and pdf_file name
merger.close()

print('Finished...')


Enjoy!

Tuesday, March 9, 2021

Spread column-wise data to row-wise

 On the web, it is very common to find dataset displayed in column-wise manner as seen below.


As you will already noticed, each new record is separated by the company name in bold capital letters. So, lets add a sign to separate one record from the other. I used this "------------------------" sign, but you can use anything as long as it is unique and not part of the records itself.

So, the working data copied from the website is like this:-

ACCESS CREDIT MANAGEMENT, INC.
Tim Cullen, Attorney
11225 Huron Ln Ste 222
Little Rock, AR 72211-1861
United States
Phone: (501) 664-2922
Fax: (501) 664-3207
MAP Attorney
------------------------
CREDIT CONTROL CO., INC.
Bill Caldwell, President
Bill Caldwell, Ethics Contact
10201 W Markham St Ste 104
Little Rock, AR 72205-2180
United States
Phone: (501) 225-2050
Fax: (501) 225-2135
ACA Member since 1982
Line of Business: Third Party Collections
------------------------
THE MCHUGHES LAW FIRM, PLLC
Becky A. McHughes Esq., Attorney at Law
10810 Executive Center Dr
Danville Bldg Ste 312
Little Rock, AR 72211
United States
Phone: (501) 376-9131
Fax: (501) 374-9332
http://www.mchugheslaw.com
ACA Member since 2013
Line of Business: Law Firm
Line of Business: Third Party Collections
------------------------
THE MCHUGHES LAW FIRM, PLLC
Becky A. McHughes Esq., Attorney at Law
10809 Executive Center Dr
Danville Bldg Ste 312
Little Rock, AR 72204
United States
Phone: (501) 376-9131
Fax: (501) 374-9332
MAP Attorney
Lowell
------------------------
CENTRAL RESEARCH, INC.
Karena Holt, Vice President of Operations
Karena Holt, Ethics Contact
122 N. Bloominton Ste 1
Lowell, AR 72745
United States
Phone: (479) 419-5456
Fax: (479) 419-5460
http://www.central-research.com
ACA Member since 2016
Line of Business: Third Party Collections
------------------------
CENTRAL RESEARCH, INC.
Shane Taylor
106 N Bloomington
Ste S
Lowell, AR 72745-8988
United States
Phone: (479) 419-5456
MAP Attorney
Mabelvale
------------------------
FIRST COLLECTION SERVICES
Chris Dunkum, President
Chris Dunkum, Ethics Contact
10925 Otter Creek East Blvd
Mabelvale, AR 72103-1661
United States
Phone: (501) 455-1658
http://www.FCScollects.com
ACA Member since 1983
Line of Business: Outsourced First Party or Billing Company
Line of Business: Third Party Collections


What we really want is something like this:-

ACCESS CREDIT MANAGEMENT, INC. :: Tim Cullen, Attorney :: 11225 Huron Ln Ste 222 :: Little Rock, AR 72211-1861 :: United States :: Phone: (501) 664-2922 :: Fax: (501) 664-3207 :: MAP Attorney
------------------------
CREDIT CONTROL CO., INC. :: Bill Caldwell, President :: Bill Caldwell, Ethics Contact :: 10201 W Markham St Ste 104 :: Little Rock, AR 72205-2180 :: United States :: Phone: (501) 225-2050 :: Fax: (501) 225-2135 :: ACA Member since 1982 :: Line of Business: Third Party Collections
------------------------
THE MCHUGHES LAW FIRM, PLLC
Becky A. McHughes Esq., Attorney at Law :: 10810 Executive Center Dr :: Danville Bldg Ste 312 :: Little Rock, AR 72211 :: United States :: Phone: (501) 376-9131 :: Fax: (501) 374-9332 :: http://www.mchugheslaw.com :: ACA Member since 2013 :: Line of Business: Law Firm :: Line of Business: Third Party Collections
------------------------
THE MCHUGHES LAW FIRM, PLLC :: Becky A. McHughes Esq., Attorney at Law :: 10809 Executive Center Dr :: Danville Bldg Ste 312 :: Little Rock, AR 72204 :: United States :: Phone: (501) 376-9131 :: Fax: (501) 374-9332 :: MAP Attorney :: Lowell
------------------------
CENTRAL RESEARCH, INC. :: Karena Holt, Vice President of Operations :: Karena Holt, Ethics Contact :: 122 N. Bloominton Ste 1 :: Lowell, AR 72745 :: United States :: Phone: (479) 419-5456 :: Fax: (479) 419-5460 :: http://www.central-research.com :: ACA Member since 2016 :: Line of Business: Third Party Collections
From vertical arrangement to horizontal arrangement. The horizontal (row-wise) arrangement, works best in spreadsheet. We will have common column for the same records.

What we have (vertical/column-wise arrangement)


What we want (horizontal/row-wise arrangement)

Monday, March 8, 2021

RegexOne.com alternative solution

 RegexOne.com has an interactive lessons for Regular Expression and in this post, I want to solve all the lessons with a solution different from the one the provided.

For example: \w matches any word character (equal to [a-zA-Z0-9_]), so if the solution on RegexOne.com is \w then I have to look for another way like [a-zA-Z0-9_] to solve to lesson.

Lets get started...

Exercise 1: Matching Characters



Exercise 1½: Matching Digits



Exercise 2: Matching With Wildcards




Exercise 3: Matching Characters



Exercise 4: Excluding Characters



Exercise 5: Matching Character Ranges


Tuesday, March 2, 2021

PyQGIS - Add multiple shapefile vector layers to the QGIS project instance

 Sometimes, I need to load many shpafiles which are located in various folders into the QGIS project. A handy way to overcome this repetitive boring task is to use the PyQGIS script below.


import glob

# Use glob to recursively search all folders for .shp files...
shp_files = glob.glob(r'C:\Users\Yusuf_08039508010\Desktop\GIS Data\NGR\**\*.shp', recursive=True)
# print(shp_files)

layer_count = 0
for shp in shp_files:
    print("Loading...", shp)
    layer_name = shp.split('\\')[-1].split('.')[0]
    vlayer = QgsVectorLayer(shp, layer_name, "ogr")
    
    if not vlayer.isValid():
        print("Error: Layer Failed to Load!")
    else:
        QgsProject.instance().addMapLayer(vlayer)
        layer_count += 1

print(f'Finished Loading total of: {layer_count} shapefiles.')


As seen below, I will have to open 11 folders and sun folders to load all the shapefiles into QGIS project. But with the script above, I just run once and all the shpafiles in both parent and child folders are loaded in few second.


Here the script loaded 66 shapefiles from all the directories as seen below.




Enjoy!

Sunday, February 28, 2021

Rename multiple files with new names in excel spreadsheet

In the past, I have written similar script titled "Python script to rename multiple files/folders".

The only difference here is that the new file names will come from a column in excel spreadsheet instead of being generated within the script.

Here below is the spreadsheet file that contains the current file names and their corresponding new names.




For example image '4.jpg' would be renamed to 'Barack Obama.jpg', '9.jpg' to 'Donald Trump.jpg', '30.jpg' to 'Joseph Robinette Biden Jr.jpg'... and so on.

Note that all the images are of the same extension (.jpg), so we will maintain the extension.


The script

First, we will read the excel file using pandas (into a dataframe) and create a dictionary with the two columns where the keys are the 'old name' and the values are the 'new names'.

import os
import pandas as pd
import natsort


names_df = pd.read_excel(r"C:\Users\Yusuf_08039508010\Desktop\rename.xlsx")
names_df


names_df_dict = dict(zip(names_df['Old Name'], names_df['New Name']))
names_df_dict

Now, we can access the values of the dictionary by their keys like so: names_df_dict['1.jpg']. With this, we will loop over the keys dynamically and rename the images accordingly.

images_folder = r'C:\Users\Yusuf_08039508010\Documents\US Presidents'

for file in os.listdir(images_folder):
    print ('Renaming...', names_df_dict[file])
    
    # Use os.path() to contruct absolute path to the images... 
    # ALternatively, we could change directory (os.chdir()) to the images folder
    old_img_name = os.path.join(images_folder, file)
    new_img_name = os.path.join(images_folder, names_df_dict[file] + '.jpg')
    
    os.rename(old_img_name, new_img_name)
    
print('Finished....')

To be sure our renaming script did a perfect job, lets verify the last three presidents that is:-

  • '4.jpg' would be renamed to 'Barack Obama.jpg', 
  • '9.jpg' to 'Donald Trump.jpg', 
  • '30.jpg' to 'Joseph Robinette Biden Jr.jpg'


That is it!

Tuesday, February 16, 2021

Get Emails from Google search given company Name/Domain

Given a list of company names, search google to retrieve their email addresses:-

import re
import pandas as pd
import numpy as np

import requests, lxml.html
from bs4 import BeautifulSoup
import urllib.request


list_of_url = ['http://umaryusuf.com', 'another website']

# REGEX to search for emails...
EMAIL_REGEX = r"""(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])"""

unique_emails_list = []

for name in list_of_url:    
    search_query = name + " email"
    print('Processing...', name)

    # -------------- FOR BULK GOOGLE SEARCH USE A PROXY -----------------
    params = (
            ('api_key', 'XXXXXXXXXXXXXXXXXXXXXXXXXXX'),
            ('url', 'https://www.google.com/search?q='+search_query),
        )
    response = requests.get('http://api.scraperapi.com/', params=params)
    # -------------------------------------------------------------------


    print(response.status_code)

    soup = BeautifulSoup(response.content, 'html.parser')
    text = soup.get_text()

    emails_1 = [re_match.group() for re_match in re.finditer(EMAIL_REGEX, text)]

    emails_2 = re.findall(r"[A-Za-z0-9._%+-]+"
                         r"@[A-Za-z0-9.-]+"
                         r"\.[A-Za-z]{2,4}", text)

    unique_emails = list(set(emails_1 + emails_2))
    data = name, unique_emails

    unique_emails_list.append(data)
    print(data)


Given a list of company domain names, access each domain web page and get all emails from the web page:-

import re
import pandas as pd
import numpy as np

import requests, lxml.html
from bs4 import BeautifulSoup
import urllib.request


list_of_url = ['http://umaryusuf.com']



site_list = []

for domain in list_of_url:

    print('Processing...', domain)
    
    try:
        f = urllib.request.urlopen(domain)
        s = f.read().decode('ISO-8859-1')
        emails = re.findall(r"[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}", s)
        newemails = list(set(emails))
        d = domain, newemails

        site_list.append(d)
        print (d)
    except Exception:
        d = domain, 'Error Occured'
        site_list.append(d)
        print (d)

print("Finished...")


Enjoy!

Monday, February 15, 2021

PyQGIS - Write vector layers field name and type to text file

 The pyqgis script below will write the field name and field type of a given vector layer to a text file.

This is the same information you will find when you access the 'Fields' tab from the layer's property window as seen below.


Currently, there is no way to copy/save this table to a text or similar file for use off QGIS interface. So, it did be great if we write a little script to do the hard work for us.


The code:-

# Read active layer from the QGIS layer panel or read the shapefile from its path
layer = qgis.utils.iface.activeLayer()

# vector_file = r"C:\path_to_shapefile.shp"
# layer = QgsVectorLayer(vector_file, 'DISPLAYNAME', 'ogr')

# Count the number of feature (rows) and number of fields (columns)
faetureCount = layer.featureCount()
fieldCount = layer.fields().count()

# Loop through the layer fields to get each field name and type
data_list = []
for field in layer.fields():
    field_name = field.name()
    field_type = field.typeName()

    data = field_name, field_type
    data_list.append(data)


# Write the data_list to text file...
txtFileName = layer.name() # from layer name
with open(txtFileName +'.txt', 'w', encoding="utf-8") as f:
    print(data_list, end='\n', file = f)

# Print location of the text file...    
import os
print('The text file is save at: ', os.getcwd(), ' and its file name is: ', txtFileName)

The comments in the code are self explanatory, also remember to import the necessary modules.



You can extend the script by writing it to spreadsheet file using CSV or Pandas module.

That is it!