loc vs iloc. All the differences that you want to know
You will learn everything about the rows and columns selection using loc and iloc, In the end we have also compared the speed, let’s find out which one is faster, loc or iloc?
There are many ways but loc and iloc are two frequently used functions two select the rows and columns of a pandas DataFrame.
df.loc select the rows and columns of a DataFrame with specific labels, specific labels, i.e row_names, and column names.
df.iloc select the row and columns of a DataFrame at specific integer positions, i.e the row index number and column index number.
Outline:
1. Single Value Selection
2. Single Row Selection
3. Single Column Selection
4. Multiple rows and columns selection
5. Selection of a Dataframe with int-row indices
6. Selecting with conditions
7. Speed Difference: Which one is more efficient?
8. Conclusion
9. Notes and Resources
Let’s create a DataFrame first.
import pandas as pd
names = ['L. Messi', 'Cristiano Ronaldo', 'Neymar Jr', 'J. Oblak', 'E. Hazard']
age = [32, 34, 27, 26, 28]
height_cm = [170, 187, 175, 188, 175]
nationality = ['Argentina', 'Portugal', 'Brazil', 'Slovenia', 'Belgium']
club = ['Paris Saint-Germain', 'Manchester United', 'Paris Saint-Germain', 'Atlético Madrid', 'Real Madrid']
#dataframe with index names
df = pd.DataFrame(index=names, data={'age':age, 'height_cm':height_cm, 'nationality':nationality, 'club':club})
#dataframe without index names
df1 = pd.DataFrame(data={'names':names,'age':age, 'height_cm':height_cm, 'nationality':nationality, 'club':club})
#Data-Source = Frank Andrade published in towards data science
Here we got the Dataframe with 5 rows and 4 columns, just for the reminder, player names do not belong to a column, they are row indexes, by the way, who is your favorite player? let’s start by selecting a single value.
1. Single Value Selection
Let’s select this one. Neymar’s height i.e 175.
df.loc
df.iloc
2. Single Row Selection
Select the info of your favorite player. I am going to select for Cristiano Ronaldo.
df.loc
df.iloc
3. Single Column Selection
You select the height, I am going to select the age of each player.
df.loc
df.iloc
4. Multiple rows and columns selection
It’s little bit tricky, try yourself first.
df.loc
df.iloc
Did you notice, when the range is defined like df.iloc[0:5,0:2 ] they are not included in a square bracket like this df.iloc[[0:5],[0:2]], if you run this it will throw an error. It makes sense, they are not the list items to enclose inside the square brackets, they are indices. If you want to select individual items like df.iloc[[2,4],[2,3]], or df.loc[[‘Neymar Jr’ , ‘E. Hazard’] , [‘nationality’ , ‘club’]] you have to enclose inside square brackets.
5. Selection of a DataFrame with int-row indices
Now we have a DataFrame with 5 rows and 5 columns, but this time we don’t have row-label names instead we have row-indexes.
Selection using iloc function is the same as before but it might be a little confusing using loc, Let’s Select the first three-row and first three columns of the data frame using the loc and iloc functions. Loc functions assume row indexes as a label name instead of integer locations that's why in order to select 0,1 and 2 rows loc defines the range as [0:2] whereas iloc defines the range as [0:3].
6. Selecting with conditions
6.1 Single Condition
Let’s select all the players whose heights are greater than 180cm. Just need to remember that iloc needs a boolean list, so we have to use the list() function to convert our Series to a boolean list.
or just select specific column names
# loc
columns = ['age', 'height_cm', 'club']
df.loc[df['height_cm']>180, columns]
# iloc
columns = [0,1,3]
df.iloc[list(df['height_cm']>180), columns]
6.2 Multiple conditions
Select players with a height above 180cm that played for Manchester United.
# loc
df.loc[(df['height_cm']>170) & (df['club']='Manchester United'), :]
# iloc
df.iloc[list((df['height_cm']>170) & (df['club']=='Manchester United')), :]
7. Speed Difference: Which one is more efficient?
iloc performs almost 200% faster than loc. iloc takes advantage of the order of indices that are already sorted and is therefore faster.
8. Conclusion
Which one will you prefer loc or iloc, please write in the comment, and I hope you enjoyed reading this article. Before you go, would you mind giving some claps for this article? Thanks for reading! Have a nice day!
9. Notes and Resources
Code: https://www.kaggle.com/ravin235/pandas-loc-vs-iloc
Frank Andrade: https://towardsdatascience.com/?source=search_post---------0----------------------------
DataCamp: https://youtu.be/A_V6daPQZIU