Create MultiIndex in Pandas for Data Manipulation

Creating MultiIndex sets multiple columns as hierarchical indexes, enabling complex data slicing and grouping. Create hierarchical indexes by setting multiple columns as the index with set_index().

In this lesson, we will see two examples:

  • Create a MultiIndex
  • Create a MultiIndex and perform slicing

Create a MultiIndex in Pandas

Let us see an example to create a DataFrame with City, Year, and Population columns. Then, we will set a MultiIndex using City and Year, so rows are uniquely identified by both values instead of a single column.

# Create multiindex

import pandas as pd

data = {'City': ['Delhi','Delhi', 'Mumbai'],
        'Year': [2020, 2021, 2020],
        'Population': [30, 32, 20]
        }
df = pd.DataFrame(data)
print(df)

# Multiindex
df.set_index(['City','Year'], inplace=True)
print(df)

Output

     City  Year  Population
0   Delhi  2020          30
1   Delhi  2021          32
2  Mumbai  2020          20
             Population
City   Year            
Delhi  2020          30
       2021          32
Mumbai 2020          20

Create a MultiIndex and perform slicing

Let us see an example to create a MultiIndex DataFrame using Region and City, then demonstrate slicing and grouping. It selects subsets (like all cities in North or Hyderabad in South) and finally sums Sales by region.

# Create multiindex and perform slicing

import pandas as pd

data = {
    'Region': ['North', 'North', 'South', 'South', 'East', 'East', 'West'],
    'City': ['Delhi', 'Chandigarh', 'Chennai', 'Hyderabad', 'Kolkatta', 'Guwahati', 'Gujarat'],
    'Sales': [1500, 1200, 1800, 1600, 1400, 1100, 1200]
}
df = pd.DataFrame(data)
print("DataFrame:\n",df)

# Creating the Multiindex
df.set_index(['Region','City'], inplace=True)
print("\nDataFrame with multindex:\n",df)

# Select cities in the "North" region
north_data = df.loc['North']
print("\nNorth data:\n",north_data)

# Access Hyderabad specifically withing the South region
south_data = df.loc['South', 'Hyderabad']
print("\nSouth data (Hyderabad):\n",south_data)

# Sum the sales by Region level
region_totals = df.groupby(level='Region').sum()
# region_totals = df.groupby('Region').sum()

print("\nSum of Sales by Region:\n",region_totals)

Output

DataFrame:
   Region        City  Sales
0  North       Delhi   1500
1  North  Chandigarh   1200
2  South     Chennai   1800
3  South   Hyderabad   1600
4   East    Kolkatta   1400
5   East    Guwahati   1100
6   West     Gujarat   1200

DataFrame with multindex:
                    Sales
Region City             
North  Delhi        1500
       Chandigarh   1200
South  Chennai      1800
       Hyderabad    1600
East   Kolkatta     1400
       Guwahati     1100
West   Gujarat      1200

North data:
             Sales
City             
Delhi        1500
Chandigarh   1200

South data (Hyderabad):
 Sales    1600
Name: (South, Hyderabad), dtype: int64

Sum of Sales by Region:
         Sales
Region       
East     2500
North    2700
South    3400
West     1200

If you liked the tutorial, spread the word and share the link and our website, Studyopedia, with others:


For Videos, Join Our YouTube Channel: Join Now


Read More:

set_index() in Pandas for Data Manipulation
Filtering in Pandas for Data Manipulation
Studyopedia Editorial Staff
contact@studyopedia.com

We work to create programming tutorials for all.

No Comments

Post A Comment