Sunday, December 31, 2017

UnicodeDecodeError while reading CSV into Pandas

Hello,

A common error that frustrates a developer while reading CSV file with Pandas is this:-
UnicodeDecodeError Traceback (most recent call last)...UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 9: invalid start byte
The Cause

The usual cause is because the CSV has some hidden special characters and pandas is unable to detect the right encoding system to read the file correctly.


The Solution

I was able to over come this error most of the time by open the CSV file in a text editor such as Sublime Text and re-save it with and encode system.

In Sublime Text, go to: File >> Save with Encoding >> UTF-8 (chose the right encoding for the file in question.)

Doing this means you have defined the encoding system for the file. In other word, you re-defined the file's encoding system which will allow pandas read it correctly.



That is it.
Good Luck

No comments:

Post a Comment