Pandas multi Index#

.set_index() Method#

প্রয়োজনীয় data-set যা এই post এ ব্যবহার করা হয়েছে DOWNLOAD

আমরা পূর্বেই দেখেছি যে, set_index() method ব্যবহার করে আমরা যে কোন column কে Index বানিয়ে ব্যবহার করতে পারি। কিন্তু যদি আমাদের nested index এর প্রয়োজন হয় তা হইলে আমরা আমাদের একাধিক index এর নাম .set_index() এর মাদ্ধমে defain করতে পারি। .set_index() এর parameter key তে আমরা column নাম Index হিসেবে define করি।

উদাহরণ হিসেবে আমরা নির্দিষ্ট data এ কোন country তে macBook এর price কত ছিল অথবা নির্দিষ্ট country তে কোন date এ MacBook এর price কত ছিল তা জানার জন্য আমাদের multiple index এর প্রয়োজন হয়।

আমরা multi-index এ dataFrame.sort_index() ব্যবহার করে আমরা Multiple Index কে একটার পর একটা Sort করতে পারি।

আমরা dataFrame.index.names ব্যবহার করে Index গুলোর নাম দেখতে পারি। এবং type(dataFrame.index) ব্যবহার করে Data Type দেখতে পাই MultiIndex।

Example:

import pandas as pd

bigmac = pd.read_csv("bigmac.csv", parse_dates = ["Date"])
bigmac.head(3)

bigmac.set_index(keys = ["Date","Country"], inplace = True)  # work with respected Index data as mach as a layer keys=[Index1,Index2]
bigmac.head(3)

bigmac.sort_index() # sort data in alpha numaric. here Now Date Is 1st index, country is 2nd index


# see name of Index by
bigmac.index.names
bigmac.index.names[0]


# we can see its type as multindex by
type(bigmac.index)

OUTPUT:

pandas.core.indexes.multi.MultiIndex

.get_level_values() Method#

আমরা data-frame define করার সময়ই index_col=[“column1”, “column2”] define করার মাদ্ধমে multi-index ডিফাইন করতে পারি। set_index() method এর বদলে index_col ব্যবহার করা যায়। কিন্তু good practice হিসেবে আমি set_index() method কে prefer করি।

multi index এর ক্ষেত্রে যদি আমাদের specific index এর data নিয়ে কাজ করতে হয় তা হইলে dataFrame.index এর দ্বারা প্রয়োজনীয় index diclar করা যায়। আমরা index এর নাম অথবা index এর possition define করে index এর সব data pick করতে পারি।

অথবা specific index এর value data-frame হিসেবে পেতে হইলে .get_level_values() ব্যবহার করতে হবে।

dataFrame.index.get_level_values() তে আমরা index এর নাম অথবা index এর possition define করে index এর সব data এর একটি data-frame পেতে পারি।

Example

import pandas as pd

bigmac = pd.read_csv("bigmac.csv", parse_dates = ["Date"], index_col = ["Date", "Country"])
bigmac.sort_index(inplace = True)
bigmac.head(3)

# we can see all index by
bigmac.index
bigmac.index[0]

# we can see the value of our index by
bigmac.index.get_level_values(1)
bigmac.index.get_level_values("Date")

OUTPUT:

DatetimeIndex(['2010-01-01', '2010-01-01', '2010-01-01', '2010-01-01',
               '2010-01-01', '2010-01-01', '2010-01-01', '2010-01-01',
               '2010-01-01', '2010-01-01',
               ...
               '2016-01-01', '2016-01-01', '2016-01-01', '2016-01-01',
               '2016-01-01', '2016-01-01', '2016-01-01', '2016-01-01',
               '2016-01-01', '2016-01-01'],
              dtype='datetime64[ns]', name='Date', length=652, freq=None)

.set_names() Method#

অনেক সময় data-frame এর column’s name বা label’s name পরিবর্তন করার প্রয়োজন হয় এই ক্ষেত্রে dataframe.index.set_names([“1st Label New Name”,”2ne Label New Name”]) ব্যবহার করতে পারি।

যদি আমরা কোন label’s name পরিবর্তন করতে না চাই তা হইলে আমরা label টার যে নাম আছে তাই New Name এ ব্যবহার করব।

Example

import pandas as pd

bigmac = pd.read_csv("bigmac.csv", parse_dates = ["Date"], index_col = ["Date", "Country"])
bigmac.sort_index(inplace = True)
bigmac.head(3)


# We can change our Index Name By set new index name as
bigmac.index.set_names(["Day", "Location"], inplace = True)
bigmac.head(3)

OUTPUT:

		Price in US Dollars
Day	Location
2010-01-01	Argentina	1.84
	Australia	3.98
	Brazil	4.76

.sort_index() Method#

আমরা জানি .sort_index() এর দ্বারা data-frame এর index এর value গুলো sort করা যায়।

multi index এ label value গুলো যদি random order বা custom order এ sort করতে চাই তা হইলে আমাদের ascinding = True বা ascinding = False(decending) define করতে হবে। একটা list এর দ্বারা সকল index এর sort type define boolen formate এ define করা যায়। dataFrame.sort_index(ascending = [boolen, boolen]) দ্বারা define করা হয়।

Example

import pandas as pd

bigmac = pd.read_csv("bigmac.csv", parse_dates = ["Date"], index_col = ["Date", "Country"])
bigmac.sort_index(inplace = True)
bigmac.head(3)


# we can sort all as ascending/discending or can be sort specific order in maltiIndex as
bigmac.sort_index(ascending = [True, False], inplace = True) # we sort Date ascending and Country Descending
bigmac.head()

OUTPUT:

		Price in US Dollars
Date	Country
2010-01-01	Uruguay	3.32
	United States	3.58
	Ukraine	1.83
	UAE	2.99
	Turkey	3.83

.loc() Extract Rows from MultiIndex#

আমরা data-frame থেকে specific label value এর অন্তর্গত সব value এর একটি নতুন data-frame পাওয়ার জন্য dataframe.loc[(“outer_index_value_name”, “inner_index_value_name”), “label_name” ] এই syntex ব্যবহার করতে পারি।
উদাহরণ স্বরূপ আপনি specific কোন outer index data এর অন্তর্গত সকল value পেতে পারি । বা specific কোন outer index data এর অন্তর্গত specific কোন inner index data এর সকল value পেতে পারি। বা বা specific কোন outer index data এর অন্তর্গত specific কোন inner index data এর সকল value নির্দিষ্ট কিছু label এর জন্য পেতে পারি।

NOTE: ‘ix’ pandas 1.0.0 থেকে remove করে দেওয়া হয়েছে , পূর্বের version গুলোতে কাজ করে। install previous version যদি .ix[] use করতে হয়। conda install pandas=0.25.1

Example

import pandas as pd

bigmac = pd.read_csv("bigmac.csv", parse_dates = ["Date"], index_col = ["Date", "Country"])
bigmac.sort_index(inplace = True)
bigmac.head(3)

'''
# same  using .ix()                # .ix() need pandas 0.25 version
bigmac.ix[("2010-01-01","China")] 
bigmac.ix[("2010-01-01","China"), 0] # if we have more column we can be define after index tuple
'''

# loc accept index level
bigmac.loc[("2010-01-01")]  # all data inside "2010-01-01" outer-index name
bigmac.loc[("2010-01-01","China")]    # all data inside "2010-01-01" outer-index and inside "China" inner index name

# if we have more column we can be define after index tuple
bigmac.loc[("2010-01-01","China"), "Price in US Dollars"] # all data inside "2010-01-01" outer-index 
                                                            # and inside "China"  name, 
                                                            #  only for "Price in US Dollars" label

OUTPUT:

Date        Country
2010-01-01  China      1.83
Name: Price in US Dollars, dtype: float64

.transpose() Method#

আমরা যদি row এবং column এর axis পরিবর্তন করতে চাই বা data-frame টাকে vertical axis থেকে horizontal axis বানাইতে চাই তা হইলে dataframe.transpose() ব্যবহার করে তা করতে পারি।

Example

import pandas as pd

bigmac = pd.read_csv("bigmac.csv", parse_dates = ["Date"], index_col = ["Date", "Country"])
bigmac.sort_index(inplace = True)
bigmac.head(3)

# replace rows to columns and columns to row
bigmac = bigmac.transpose()
bigmac.head(3)

#bigmac.loc["Price in US Dollars", ("2016-01-01", "Denmark")]  #filter data

OUTPUT:

Date	2010-01-01										...	2016-01-01
Country	Argentina	Australia	Brazil	Britain	Canada	Chile	China	Colombia	Costa Rica	Czech Republic	...	Switzerland	Taiwan	Thailand	Turkey	UAE	Ukraine	United States	Uruguay	Venezuela	Vietnam
Price in US Dollars	1.84	3.98	4.76	3.67	3.97	3.18	1.83	3.91	3.52	3.71	...	6.44	2.08	3.09	3.41	3.54	1.54	4.93	3.74	0.66	2.67

1 rows × 652 columns

.swaplevel() Method#

প্রয়োজনে multi Index এ inner label আর outer label column গুলো exchange করতে পারি। dataFrame.swaplevel() ব্যবহার করে inner label কে outer label আর outer label কে inner label এ পরিণত করা যায়।

Example

import pandas as pd

bigmac = pd.read_csv("bigmac.csv", parse_dates = ["Date"], index_col = ["Date", "Country"])
bigmac.sort_index(inplace = True)
bigmac.head(3)

# if your index have more than 2 level then need to swap by
bigmac = bigmac.swaplevel() # no inplace parameter, need to declar as bigmac = bigmac.swaplevel()
bigmac.head(3)

OUTPUT:

		Price in US Dollars
Country	Date
Argentina	2010-01-01	1.84
Australia	2010-01-01	3.98
Brazil	2010-01-01	4.76

.stack() Method#

data-frame এর প্রতিটা index label value এর জন্য অন্য সব label ভালুএ এর মান কি তা দেখার জন্য dataFrame.stack() ব্যবহার করা হয়। আমরা একটি index value এর জন্য অন্য সব label value এর মান তারপর next index value এর জন্য অন্য সব label value এর মান এইভাবে পর্যায়ক্রমে data সাজিয়া data-frame বানানর জন্য .stack() method ব্যবহার করতে পারি।

Example

import pandas as pd

world = pd.read_csv("worldstats.csv",index_col = ["country","year"])
world.head(3)

# display total value for a specific multiIndex by 
a = world.stack()
type(world.stack()) # its a series 

# we can convert series to dataFrame by .to_frame()
a.to_frame()

OUTPUT:

			0
country	year
Arab World	2015	Population	3.920223e+08
	2015	GDP	2.530102e+12
	2014	Population	3.842226e+08
	2014	GDP	2.873600e+12
	2013	Population	3.765043e+08
...	...	...	...
Zimbabwe	1962	GDP	1.117602e+09
	1961	Population	3.876638e+06
	1961	GDP	1.096647e+09
	1960	Population	3.752390e+06
	1960	GDP	1.052990e+09

22422 rows × 1 columns

.unstack() Method#

আমরা যে কোন stack customize data কে unstack করে segmentation করার জন্য dataFrame.unstack() ব্যবহার করা হয়।

Example

import pandas as pd

world = pd.read_csv("worldstats.csv",index_col = ["country","year"])
world.head(3)

# display total value for a specific multiIndex by 
a = world.stack()

# unstack besically reverse of stack method
a.unstack() # its unstack our created stack

b = a.unstack().unstack() # its change our multi layer's inner Index as column 

c = a.unstack().unstack().unstack()  # its make data frame to series

import pandas as pd

world = pd.read_csv("worldstats.csv",index_col = ["country","year"])
world.head(3)

a = world.stack()
a
a.unstack(2)  # unstack 3 number column (Population,GDP)
a.unstack(0)  # unstack 1 number column (country )
a.unstack(-1)  # unstack last number column 

a.unstack("country")  # unstack 1 number column (country )

# we can unstack column as new levels 
a.unstack(level = ["country", "year"])

s1 = a.unstack("year")

# we can fill NaN value with 0 or specific value by
s = a.unstack("year", fill_value = 0)

print(s)

OUTPUT:

	year	1960	1961	1962	1963	1964	1965	1966	1967	1968	1969	...	2006	2007	2008	2009	2010	2011	2012	2013	2014	2015
country
Afghanistan	Population	8.994793e+06	9.164945e+06	9.343772e+06	9.531555e+06	9.728645e+06	9.935358e+06	1.014884e+07	1.036860e+07	1.059979e+07	1.084951e+07	...	2.518362e+07	2.587754e+07	2.652874e+07	2.720729e+07	2.796221e+07	2.880917e+07	2.972680e+07	3.068250e+07	3.162751e+07	3.252656e+07
Afghanistan	GDP	5.377778e+08	5.488889e+08	5.466667e+08	7.511112e+08	8.000000e+08	1.006667e+09	1.400000e+09	1.673333e+09	1.373333e+09	1.408889e+09	...	7.057598e+09	9.843842e+09	1.019053e+10	1.248694e+10	1.593680e+10	1.793024e+10	2.053654e+10	2.004633e+10	2.005019e+10	1.919944e+10
Albania	Population	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	...	2.992547e+06	2.970017e+06	2.947314e+06	2.927519e+06	2.913021e+06	2.904780e+06	2.900247e+06	2.896652e+06	2.893654e+06	2.889167e+06
Albania	GDP	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	...	8.992642e+09	1.070101e+10	1.288135e+10	1.204421e+10	1.192695e+10	1.289087e+10	1.231978e+10	1.278103e+10	1.327796e+10	1.145560e+10
Algeria	Population	1.112489e+07	1.140486e+07	1.169015e+07	1.198513e+07	1.229597e+07	1.262695e+07	1.298027e+07	1.335420e+07	1.374438e+07	1.414444e+07	...	3.374933e+07	3.426197e+07	3.481106e+07	3.540179e+07	3.603616e+07	3.671713e+07	3.743943e+07	3.818614e+07	3.893433e+07	3.966652e+07
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
Yemen, Rep.	GDP	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	...	1.908173e+10	2.563367e+10	3.039720e+10	2.845950e+10	3.090675e+10	3.107886e+10	3.207477e+10	3.595450e+10	0.000000e+00	0.000000e+00
Zambia	Population	3.049586e+06	3.142848e+06	3.240664e+06	3.342894e+06	3.449266e+06	3.559687e+06	3.674088e+06	3.792864e+06	3.916928e+06	4.047479e+06	...	1.238151e+07	1.273868e+07	1.311458e+07	1.350785e+07	1.391744e+07	1.434353e+07	1.478658e+07	1.524609e+07	1.572134e+07	1.621177e+07
Zambia	GDP	6.987397e+08	6.823597e+08	6.792797e+08	7.043397e+08	8.226397e+08	1.061200e+09	1.239000e+09	1.340639e+09	1.573739e+09	1.926399e+09	...	1.275686e+10	1.405696e+10	1.791086e+10	1.532834e+10	2.026555e+10	2.345952e+10	2.550306e+10	2.804552e+10	2.713464e+10	2.120156e+10
Zimbabwe	Population	3.752390e+06	3.876638e+06	4.006262e+06	4.140804e+06	4.279561e+06	4.422132e+06	4.568320e+06	4.718612e+06	4.874113e+06	5.036321e+06	...	1.312794e+07	1.329780e+07	1.349546e+07	1.372100e+07	1.397390e+07	1.425559e+07	1.456548e+07	1.489809e+07	1.524586e+07	1.560275e+07
Zimbabwe	GDP	1.052990e+09	1.096647e+09	1.117602e+09	1.159512e+09	1.217138e+09	1.311436e+09	1.281750e+09	1.397002e+09	1.479600e+09	1.747999e+09	...	5.443896e+09	5.291950e+09	4.415703e+09	8.157077e+09	9.422161e+09	1.095623e+10	1.239272e+10	1.349023e+10	1.419691e+10	1.389294e+10

504 rows × 56 columns

.pivot() Method#

যদি কোন label এ same value repeatedly থাকে তা হইলে আমরা dataFrame.pivot() ব্যবহার করে repeated value গুলোেকে label এবং অন্য label এ ঐ value গুলো এর যে মান আছে তা value হিসেবে define করতে পারি।

dataFrame.pivot(index, columns, values)

index এ আমরা যে label কে index বানাইতে চাই তা দিব।
columns এ যে label এ repeated value আছে বা যে label এর মানগুলোকে new label বানাইতে চাই তা দিব।
আর values এ আমরা new columns গুলো এর মান যে label থেকে হবে তা define করব।

Example

import pandas as pd

sales = pd.read_csv("salesmen.csv",parse_dates = ["Date"])
sales["Salesman"] = sales["Salesman"].astype("category")
sales.head(3)

# we can specifi all levels data to needed catagory by .pivot()
sales.pivot(index = "Date", columns = "Salesman", values = "Revenue")

OUTPUT:

Salesman	Bob	Dave	Jeb	Oscar	Ronald
Date
2016-01-01	7172	1864	4430	5250	2639
2016-01-02	6362	8278	8026	8661	4951
2016-01-03	5982	4226	5188	7075	2703
2016-01-04	7917	3868	3144	2524	4258
2016-01-05	7837	2287	938	2793	7771
...	...	...	...	...	...
2016-12-27	2045	2843	6666	835	2981
2016-12-28	100	8888	1243	3073	6129
2016-12-29	4115	9490	3498	6424	7662
2016-12-30	2577	3594	8858	7088	2570
2016-12-31	3845	6830	9717	8408	2619

366 rows × 5 columns

.pivot_table() Method#

.pivot_table() method টা অনেকটা .pivot() method এর মত। .pivot_table() method এ অতিরিক্ত parameter হিসেবে aggfunc আছে, যা একটা label এর সমস্ত data গুলোকে নিয়ে, Arithmetic Operators করে থাকে। values label এর aggfunc হয়ে থাকে।

Example

import pandas as pd

foods = pd.read_csv("foods.csv")
foods.head(3)

# work like microsoft xl by .pivot_table
foods.pivot_table(values = "Spend", index = "Gender", aggfunc = "min")

# work with multiIndex
foods.pivot_table(values = "Spend", index = ["Gender","Item"], aggfunc = "max")

# for specific colums 
foods.pivot_table(values = "Spend", index = ["Gender","Item"],columns = "City", aggfunc = "sum")

OUTPUT:

	City	New York	Philadelphia	Stamford
Gender	Item
Female	Burger	1239.04	1639.24	1216.02
	Burrito	978.95	1458.76	1820.11
	Chalupa	876.58	1673.33	1602.35
	Donut	1446.78	1639.26	1656.96
	Ice Cream	1521.62	1479.22	1032.03
	Sushi	1480.29	1742.88	1459.91
Male	Burger	1294.09	938.18	1439.16
	Burrito	1399.40	1312.93	1300.29
	Chalupa	1227.77	1114.23	1150.26
	Donut	1345.27	1249.36	1421.13
	Ice Cream	1603.63	2191.27	1059.22
	Sushi	1396.15	1395.88	1267.82

pd.melt() Method#

.melt() method অনেক টা .pivot() method এর মত আবার অনেক টা বিপরীত। আমরা অনেকগুলো label থেকে specific label বানানর জন্য .melt() method use করে থাকি।

pandas.melt(dataframe, id_vars, var_name, value_name)

dataFrame এ আমরা dataFrame এর নাম define করব।
id_vars এ label define করব, যে label আমাদের index এর মত হবে।
var_name এ একটা নাম দিব যা একটা new label এর নাম হবে এবং var_name এর value গুলো অন্য সব label name compress হয়ে তৈরি হবে।
value_name এ একটা নাম দিব যা একটা new label এর নাম হবে এবং এর মান গুলো var_name এর মান হবে।

Example

import pandas as pd

sales = pd.read_csv("quarters.csv")
sales  

pd.melt(sales, id_vars = "Salesman")

# give variable_name and value_name
pd.melt(sales, id_vars = "Salesman", var_name = "Quarter", value_name = "Revenue")

OUTPUT:

	Salesman	Quarter	Revenue
0	Boris	Q1	602908
1	Bob	Q1	43790
2	Tommy	Q1	392668
3	Travis	Q1	834663
4	Donald	Q1	580935
5	Ted	Q1	656644
6	Jeb	Q1	486141
7	Stacy	Q1	479662
8	Morgan	Q1	992673
9	Boris	Q2	233879
10	Bob	Q2	514863
11	Tommy	Q2	113579
12	Travis	Q2	266785
13	Donald	Q2	411379
14	Ted	Q2	70803
15	Jeb	Q2	600753
16	Stacy	Q2	742806
17	Morgan	Q2	879183
18	Boris	Q3	354479
19	Bob	Q3	297151
20	Tommy	Q3	430882
21	Travis	Q3	749238
22	Donald	Q3	110390
23	Ted	Q3	375948
24	Jeb	Q3	742716
25	Stacy	Q3	770712
26	Morgan	Q3	37945
27	Boris	Q4	32704
28	Bob	Q4	544493
29	Tommy	Q4	247231
30	Travis	Q4	570524
31	Donald	Q4	651572
32	Ted	Q4	321388
33	Jeb	Q4	404995
34	Stacy	Q4	2501
35	Morgan	Q4	293710