18 Apr Create MultiIndex in Pandas for Data Manipulation
Creating MultiIndex sets multiple columns as hierarchical indexes, enabling complex data slicing and grouping. Create hierarchical indexes by setting multiple columns as the index with set_index().
In this lesson, we will see two examples:
- Create a MultiIndex
- Create a MultiIndex and perform slicing
Create a MultiIndex in Pandas
Let us see an example to create a DataFrame with City, Year, and Population columns. Then, we will set a MultiIndex using City and Year, so rows are uniquely identified by both values instead of a single column.
# Create multiindex
import pandas as pd
data = {'City': ['Delhi','Delhi', 'Mumbai'],
'Year': [2020, 2021, 2020],
'Population': [30, 32, 20]
}
df = pd.DataFrame(data)
print(df)
# Multiindex
df.set_index(['City','Year'], inplace=True)
print(df)
Output
City Year Population
0 Delhi 2020 30
1 Delhi 2021 32
2 Mumbai 2020 20
Population
City Year
Delhi 2020 30
2021 32
Mumbai 2020 20
Create a MultiIndex and perform slicing
Let us see an example to create a MultiIndex DataFrame using Region and City, then demonstrate slicing and grouping. It selects subsets (like all cities in North or Hyderabad in South) and finally sums Sales by region.
# Create multiindex and perform slicing
import pandas as pd
data = {
'Region': ['North', 'North', 'South', 'South', 'East', 'East', 'West'],
'City': ['Delhi', 'Chandigarh', 'Chennai', 'Hyderabad', 'Kolkatta', 'Guwahati', 'Gujarat'],
'Sales': [1500, 1200, 1800, 1600, 1400, 1100, 1200]
}
df = pd.DataFrame(data)
print("DataFrame:\n",df)
# Creating the Multiindex
df.set_index(['Region','City'], inplace=True)
print("\nDataFrame with multindex:\n",df)
# Select cities in the "North" region
north_data = df.loc['North']
print("\nNorth data:\n",north_data)
# Access Hyderabad specifically withing the South region
south_data = df.loc['South', 'Hyderabad']
print("\nSouth data (Hyderabad):\n",south_data)
# Sum the sales by Region level
region_totals = df.groupby(level='Region').sum()
# region_totals = df.groupby('Region').sum()
print("\nSum of Sales by Region:\n",region_totals)
Output
DataFrame:
Region City Sales
0 North Delhi 1500
1 North Chandigarh 1200
2 South Chennai 1800
3 South Hyderabad 1600
4 East Kolkatta 1400
5 East Guwahati 1100
6 West Gujarat 1200
DataFrame with multindex:
Sales
Region City
North Delhi 1500
Chandigarh 1200
South Chennai 1800
Hyderabad 1600
East Kolkatta 1400
Guwahati 1100
West Gujarat 1200
North data:
Sales
City
Delhi 1500
Chandigarh 1200
South data (Hyderabad):
Sales 1600
Name: (South, Hyderabad), dtype: int64
Sum of Sales by Region:
Sales
Region
East 2500
North 2700
South 3400
West 1200
If you liked the tutorial, spread the word and share the link and our website, Studyopedia, with others:
For Videos, Join Our YouTube Channel: Join Now
Read More:
No Comments