Free Consultation for First Time Clients!

vansoftlabs

Office Address

  • Gurumangat Road Gulberg III, Lahore, Punjab.
  • +92-317-7201520
  • info@example.com

Social List

Let’s learn about Python Dataframes

Let’s learn about Python Dataframes

python dataframes
Photo by Rubaitul Azad on Unsplash

A DataFrame is a 2D, tabular data structure from the pandas library, a cornerstone of data manipulation and analysis in Python. Resembling spreadsheets or SQL tables, DataFrames provide a structured and intuitive approach to organizing and analyzing data. Each column in a Python DataFrame represents a variable, while each row corresponds to a specific observation.

Creating a Python DataFrame

import pandas as pd

# Creating a DataFrame from a dictionary

data = {'Name': ['Alice', 'Bob', 'Charlie'],

        'Age': [25, 30, 22],

        'City': ['New York', 'San Francisco', 'Los Angeles']}

df = pd.DataFrame(data)

print(df)

Output:

    Name  Age           City
0  Alice   25       New York
1    Bob   30  San Francisco
2 Charlie   22    Los Angeles

Basic Operations with DataFrames:

Manipulating and analyzing data becomes seamless with DataFrames. Fundamental operations include selecting and filtering data, handling missing values, and grouping and aggregating data.

Basic DataFrame Operations

Example 1:

# Selecting a specific column

ages = df['Age']

Output:

0    25
1    30
2    22
Name: Age, dtype: int64

Example 2:

# Filtering data based on a condition

young_people = df[df['Age'] < 30]

Output:

   Name  Age           City
0  Alice   25       New York
2 Charlie   22    Los Angeles

Example 3:

# Handling missing values

df.fillna(0, inplace=True)

Explanation:

This code fills any missing values in the DataFrame with 0. Since the provided DataFrame does not have any missing values, there won’t be a noticeable change in the output. The inplace=True parameter modifies the original DataFrame.

Related: Python Lists Guide

Example 4:

# Grouping and aggregating data

average_age_by_city = df.groupby('City')['Age'].mean()

Output:

City
Los Angeles      22.0
New York         25.0
San Francisco    30.0
Name: Age, dtype: float64

Indexing and Slicing:

Efficiently extracting subsets of data is crucial. DataFrames support both label-based and position-based indexing and slicing.

# Selecting a row by label
alice_info = df.loc[0]
print("Row by Label - Alice's Information:")
print(alice_info)
print("----------------")

# Slicing rows and columns
subset = df.loc[1:2, ['Name', 'City']]
print("Subset of DataFrame - Rows 1 to 2, Columns 'Name' and 'City':")
print(subset)

Output:

Row by Label - Alice's Information:
Name        Alice
Age            25
City    New York
Name: 0, dtype: object
----------------
Subset of DataFrame - Rows 1 to 2, Columns 'Name' and 'City':
    Name           City
1    Bob  San Francisco
2 Charlie    Los Angeles

Merging and Concatenating DataFrames:

In real-world scenarios, data is often scattered across multiple sources. DataFrames allow seamless merging or concatenation of datasets.

Merging DataFrames

# Creating another DataFrame
data2 = {'Name': ['David', 'Eve'],
         'Age': [28, 35],
         'City': ['Chicago', 'Seattle']}
df2 = pd.DataFrame(data2)

# Merging DataFrames based on a common column
merged_df = pd.merge(df, df2, on='City')

print("Original DataFrame:")
print(df)
print("----------------")

print("DataFrame to be merged:")
print(df2)
print("----------------")

print("Merged DataFrame:")
print(merged_df)

Output:

Original DataFrame:
    Name  Age           City
0  Alice   25       New York
1    Bob   30  San Francisco
2 Charlie   22    Los Angeles
----------------
DataFrame to be merged:
   Name  Age     City
0 David   28  Chicago
1   Eve   35  Seattle
----------------
Merged DataFrame:
    Name_x  Age_x           City Name_y  Age_y
0    Alice     25       New York    NaN    NaN
1      Bob     30  San Francisco    NaN    NaN
2  Charlie     22    Los Angeles    NaN    NaN
3      NaN    NaN        Chicago  David   28.0
4      NaN    NaN        Seattle    Eve   35.0

Advanced Topics:

Delving into advanced topics, we explore reshaping and pivoting data, handling time series data, and using custom functions with DataFrames.

Reshaping Data

# Reshaping data using the melt function
melted_df = pd.melt(df, id_vars=['Name'], value_vars=['Age', 'City'])

print("Original DataFrame:")
print(df)
print("----------------")

print("Melted DataFrame:")
print(melted_df)

Output:

Original DataFrame:
    Name  Age           City
0  Alice   25       New York
1    Bob   30  San Francisco
2 Charlie   22    Los Angeles
----------------
Melted DataFrame:
    Name variable          value
0  Alice      Age             25
1    Bob      Age             30
2 Charlie      Age             22
3  Alice     City       New York
4    Bob     City  San Francisco
5 Charlie     City    Los Angeles

You might Like: For loops in python

Conclusion:

As we conclude our exploration of Python DataFrames, their indispensable role in data manipulation and analysis becomes evident. The flexibility, efficiency, and extensive functionality of DataFrames make them a cornerstone of data workflows. Mastering the art of working with DataFrames unlocks Python’s full potential for deriving insights and making informed decisions in the world of data science.

Post a Comment

Your email address will not be published. Required fields are marked *