Monday, July 5, 2021

Separating a String column into individual columns/cells

 In this post, I will be separating a 'String' or 'List' column into individual columns/cells.


The problem

I got a table where each cell is a bulky string separated by newline character '\n' as seen below. 


It is required that each separate be on its own cell/column.


The Solution

  • Step 1: Read the table into dataframe
  • Step 2: Convert the string column to 'list' where '\n' serves as the separator
  • Step 3: Convert the list element into independent columns/cells
# Step 1: Read the table into dataframe
df = pd.read_excel('Sep_Str_List.xlsx')

# Step 2: Convert the string column to 'list' where '\n' serves as the separator
# Since the string is separated by '\n', let split on it to convert to list
df['List_Col'] = df['Str_Col'].apply(lambda x: x.split('\n'))

# Step 3: Convert the list element into independent columns/cells
# Separating a 'List' column into individual columns/cells
df_individual_col = df['List_Col'].apply(pd.Series)

# Concatenate df to df_individual_col
df_final = pd.concat([df, df_individual_col], axis=1)
df_final



That is it!

No comments:

Post a Comment