Pandas Cheat Sheet

Pandas Cheat Sheet will guide you to work on Pandas with basics and advanced topics. Cheat Sheet for students, engineers, and professionals.

Introduction

Pandas is a powerful and easy-to-use open-source tool built on top of the Python programming language. It is useful for data analysis and manipulation. Python with pandas is widely used in Statistics, Finance, Neuroscience, Economics, Web Analytics, Advertising, etc.

Features

The following are the features of the Pandas Library:

  • Analyze Data
  • Manipulate Data
  • Group the rows/ columns of a DataFrame/ Series
  • Plotting is possible
  • Fix the inaccurate data
  • Clean the Data completely

Installation

To install Pandas, use the PIP package manager. Install Python and PIP, and then use PIP to install the Pandas Python library:

DataFrames in Pandas

The Pandas DataFrame is a Two-dimensional, tabular data, table with rows and columns. The DataFrame() method is used for this purpose and has the following parameters:

  • data: The data to be stored in the Pandas DataFrame
  • index: The index values to be provided for the resultant frame.
  • columns: Set the column labels for the resultant frame if data does not mention before
  • dtype: It is the datatype and only a single type is allowed.
  • copy: To copy the input data

Let us see how to create a Pandas DataFrame:

Output

DataFrame – Attributes and Methods

Let us see such attributes and methods in Python Pandas for DataFrame:

  • dtypes: Return the dtypes in the DataFrame
  • ndim: Return the number of dimensions of the DataFrame
  • size: Return the number of elements in the DataFrame.
  • shape: Return the dimensionality of the DataFrame in the form of a tuple.
  • index: Return the index of the DataFrame
  • T: Transpose the rows and columns
  • head(): Return the first n rows.
  • tail(): Return the last n rows.

Series in Pandas

Series in Pandas is a one-dimensional array, like a column in a table. It is a labeled array that can hold data of any type. The Series() method is used for this and has the following parameters:

  • data: The data to be stored in the Pandas Series
  • index: The index values should have the same length as the data.
  • dtype: It is the datatype for the output Series.
  • name: Set the series name with the name parameter
  • copy: To copy the input data

Let us now see an example to create a Pandas Series:

Output: The 0,1,2,3, etc. are the index numbers i.e. labels.

Series – Attributes and Methods

let us see such attributes and methods in Python Pandas for Series:

  • dtype: Return the dtype.
  • ndim: Return the Number of dimensions
  • size: Return the number of elements.
  • name: Return the name of the Series.
  • hasnans: Returns True if NaNs are in the series.
  • index: The index of the series
  • head(): Return the first n rows.
  • tail(): Return the last n rows.
  • info(): Display the Summary of the series

Categorical Data

It is a Pandas data type corresponding to categorical variables in statistics. A categorical variable takes on a limited number of possible values. Examples are gender, blood type, country affiliation, rating, etc.

  • Create Categorical Series: Use the dtype=”category” while creating a series to create a Categorical Series. Let us see an example:
  • Create Categorical DataFrame: Use the dtype=”category” while creating a DataFrame to create a Categorical DataFrame. Let us see an example. We have created 3 categories here:

    Output

Working with Categories

Learn how to work with Categories in Pandas:

  • Append new categories: To append new categories, use the add_categories() method in Python Pandas. Let us see an example:
  • Remove a category:
    To remove a category, use the remove_categories() method in Python Pandas. Let us see an example:

    Output

Read CSV

The read_csv() method is used to reach CSV in Pandas. Let’s say we have a CSV file Students.csv. We will read it now:

Output

String Operations on Text Data

The following string operations can be performed on data in Pandas:

  • lower(): Perform lowercase on text data
  • upper(): Perform uppercase on text data
  • title(): Convert text data to camel case
  • len(): To get the length of each element in the Series.
  • count(): Count the non-empty cells for each column or row
  • contain(): Search for a value in a column.

Remove Whitespace

To remove whitespace on text data in a Series or DataFrame, use the following methods in Python Pandas:

  • strip(): Strip whitespace from the left and right
  • lstrip(): Strip whitespace from only the left side
  • rstrip(): Strip whitespace from only the right side

Sorting

Sort the DataFrame in Pandas using the sort_values() method:

  • Sort the Pandas DataFrame: To sort the dataframe, use the sort_values() method. The default is ascending.
  • Sort the Pandas DataFrame in Descending Order: To sort the dataframe in descending order, use the sort_values() method. Set the ascending parameter of the method to False for descending order sort.

Indexing

Indexing means selecting specific rows and columns of data from DataFrame. A DataFrame includes columns, index, and data. Let us see some examples:

  • Indexing in Pandas using the indexing operator: We can directly use the [] i.e. the indexing operator in Pandas to retrieve records
  • Indexing in Pandas using loc[]: To retrieve a single row in Panda, use the loc[] in Pandas.
  • Indexing in Pandas using iloc[]: To retrieve the rows and columns by position, use the iloc[] in Pandas.

Group the Data

In Pandas, group data in a DataFrame and perform operations on it:

  • Split the object and combine the result: The groupby() method is used in Pandas to split the object. We can define groupby() as grouping the rows/columns into specific groups.
  • Iterate the Group: Iterate and loop through the groups with groupby() using the for-in loop.
  • View the Group: Use the groups property in Python Pandas to view the group.
  • Perform Aggregation Operations on Groups: After grouping, we can perform operations on the grouped data using the agg() method. Through this method, get mean or even get the size of each group, etc. Let’s see some examples:
    • Get the mean of grouped data: To get the mean of the grouped data, first, group and then use the agg() method with numpy.mean().
    • Get the size of each group: To get the size of each group, use the Numpy size attribute in Pandas. We have grouped by the Player column using the groupby().

Statistical Functions

We can easily work around statistics operations using the statistical functions in Python Pandas.  It can be applied to a Series or DataFrame:

  • sum(): Return the sum of the values.
  • count(): Return the count of non-empty values.
  • max(): Return the maximum of the values.
  • min(): Return the minimum of the values.
  • mean(): Return the mean of the values.
  • median(): Return the median of the values.
  • std(): Return the standard deviation of the values.
  • describe(): Return the summary statistics for each column.

Plotting

To plot in Pandas, we need to use the plot() method and the Matplotlib library. The pyplot module from Matplotlib is also used for plotting in Pandas. The pyplot.show() is used to display the figure. Plot:

  • Histogram: To create a Histogram, set the kind argument of the plot() method to hist. For this, we only need a single column.
  • Pie Chart: Use the plot.pie() method to draw a Pie Chart
  • Scatter Plot: Set the kind argument of the plot() method to scatter. For this, we will also set the x-axis and y-axis
  • Area Plot: Use the plot.area() method to draw an Area Plot.

Find and Remove Duplicates from rows in Pandas

  • Find Duplicates: To find duplicates from rows in a Pandas DataFrame or Series, use the duplicated() method.
  • Remove Duplicates: To remove duplicates from rows in a Pandas DataFrame or Series, use the drop_duplicates() method.

Clean the Data

Cleaning the data in Pandas means working on the incorrect data to fix it. This incorrect data can empty data, null, duplicate data, etc. The following are the functions to clean the data:

  • isnull(): Find the NULL values and replace them with True.
  • notnull(): Find the NOT NULL values and replace them with True.
  • df.dropna(): Drop rows with NULL values.
  • df.fillna(x): Replace NULL values with a specific value

What’s next?

After completing Pandas, follow the below tutorials and learn Python Libraries:

If you liked the tutorial, spread the word and share the link and our website Studyopedia with others.


For Videos, Join Our YouTube Channel: Join Now


 

Pandas Introduction
Join Pandas DataFrame
Studyopedia Editorial Staff
contact@studyopedia.com

We work to create programming tutorials for all.

No Comments

Post A Comment

Discover more from Studyopedia

Subscribe now to keep reading and get access to the full archive.

Continue reading