Tuesday, January 12, 2021

Map two data frames base on common column

 The aim is to enrich a dataframe from a second dataframe. In other words, copy corresponding cell in a dataframe onto a second dataframe based on their common columns.

This is very common task that is difficult to explain! This works like a "vLookUp" function found in MS excel. Where we have two tables and we want to update the second table using correspond values based on column that relates them.

If the two tables have same name for the relating/common column, then the merge() function can be used like so: -

df2.merge(df1, on="common_column_name", how='inner')

On the other hand, if the relating/common column have varying column names and you don't want rename them. The below example would be more flexible:-

We create dictionary from columns in df2 and map it to df1

import pandas as pd
df1 = pd.DataFrame({'lkey': ['foo', 'bar', 'baz'],
                    'value0': [1, 2, 3]})

df2 = pd.DataFrame({'rkey': ['foo', 'bar', 'baz'],
                    'value1': [5, 6, 7],
                   'value2': [8, 9, 10]})

# This is a list of columns in: df2
s = ['rkey', 'value1', 'value2'] # Don't want to create it manually? then use: s = df2.columns.tolist()

# loop through all columns of df1 and map to df2
for column in s:
    # make a mapping dictionary from df...
    mydict = dict(zip(df2['rkey'], df2[column]))
    # create new column based on the map column and dictionary
    df1['new_'+column] = df1['lkey'].map(mydict)


Here below is a real world example:-

Below are two tables with different attribute columns from countries in the world. The 'Country Name' and 'Country Code' columns are the same for both tables, so we need to merge them together.

Since, the two table have the same column names we can use this approach...

df1.merge(df2, on="Country Name", how='outer')

In case, the result isn't what you expected. Then play around with the parameters to get what you wanted.


No comments:

Post a Comment