Thursday, October 1, 2020

Python URL Encode

Introduction
URL encoding, is a way to encode special characters in URL so that it is in safe and secure to transmit over the internet. It is also known as 'percent encoding'.

URLs can only be sent over the Internet using the ASCII (American Standard Code for Information Interchange) character-set, so characters outside ASCII need to be converted into a valid ASCII format.

Let assume we have some set of phrases that we want to search via google and retrieve the search result on the first page. This mean we have to parse those phrases into a google search url that will look like this: 'https://www.google.com/search?q=X' where X stands for the search phrase.

Now we can replace the X with the specific phrases we wanted to search. But the problem is if one o the phrases contains special characters the search url won't work.

For example, this (https://www.google.com/search?q=Advanced-mobile-success) will work fine but this (https://www.google.com/search?q=Advanced mobile success) won't work fine as expected because of the spaces between the word. Those space need to be converted or encoded into internet friendly character that will look like this "https://www.google.com/search?q=Advanced%20mobile%20success".

Some regular modern browsers have the ability to do this encoding on the fly without you knowing. But when you want to construct such query url in a program or to communicate with an API, it is save if you encode them before parsing them. In python, this encoding can be handled using the urllib.parse module.

When you import the module like this import urllib.parse you will access to this functions quote(),
quote_plus() and urlencode(). They slight do different encoding, however urlencode() is more frequently used so I will talk about it in few seconds.

To encode the query string query_string = 'Hellö Wörld@Python' to our base url, the code will look like this:-
import urllib.parse

base_url = 'https://www.google.com/search?q='
query_string = 'Hellö Wörld@Python'

urllib.parse.quote(query_string)

url = base_url + urllib.parse.quote(query_string)
url



Assuming of base url has more than just the query string parameter, then we will use urlencode() function like this;-

import urllib.parse

base_url = 'https://www.google.com/search?q='
query_string = 'local search on '

params = {'name': 'Umar Yusuf', 'school': ['school 1', 'school 2', 'school 3']}

urllib.parse.urlencode(params, doseq=True)

url = base_url + urllib.parse.quote(query_string) + urllib.parse.urlencode(params)
url


As you can see, we are able to encode multiple parameters into the url suing the urlencode() function. This is how powerful the module is and you can construct very complex friendly urls.


It should be noted that for a simple url encoding like this (https://www.google.ng/maps/@9.057809,7.4903376,15z), the route above will be overkilling. A simple string .format() function will be enough.

This simple url link to a google maps location defined by three variables (latitude, longitude and zoom level). So, we can use the string .format() function to handle its string construct as seen below.


latitude = 9.057809
logitude = 7.4903376
zoom = 15

base_url = 'https://www.google.ng/maps/@{},{},{}z'
url = base_url.format(latitude, logitude, zoom)

Happy coding!

No comments:

Post a Comment