In this post, I will be separating a 'String' or 'List' column into individual columns/cells.
The problem
I got a table where each cell is a bulky string separated by newline character '\n' as seen below.
It is required that each separate be on its own cell/column.
The Solution
- Step 1: Read the table into dataframe
- Step 2: Convert the string column to 'list' where '\n' serves as the separator
- Step 3: Convert the list element into independent columns/cells
# Step 1: Read the table into dataframe
df = pd.read_excel('Sep_Str_List.xlsx')
# Step 2: Convert the string column to 'list' where '\n' serves as the separator
# Since the string is separated by '\n', let split on it to convert to list
df['List_Col'] = df['Str_Col'].apply(lambda x: x.split('\n'))
# Step 3: Convert the list element into independent columns/cells
# Separating a 'List' column into individual columns/cells
df_individual_col = df['List_Col'].apply(pd.Series)
# Concatenate df to df_individual_col
df_final = pd.concat([df, df_individual_col], axis=1)
df_final
That is it!
No comments:
Post a Comment