AI / ML

Pandas Basics

Data manipulation

Pandas Basics

Pandas is a powerful data manipulation and analysis library for Python. It provides data structures and functions needed to work with structured data seamlessly.

Install Pandas

pip install pandas

Import Pandas

import pandas as pd

Create DataFrame

# From dictionary
data = {
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35],
    'city': ['New York', 'London', 'Tokyo']
}
df = pd.DataFrame(data)
print(df)

Read CSV File

df = pd.read_csv('data.csv')
print(df.head())  # First 5 rows
print(df.info())  # Data info
print(df.describe())  # Statistical summary

Data Selection

# Select column
ages = df['age']

# Select multiple columns
subset = df[['name', 'age']]

# Select rows
first_row = df.iloc[0]
first_three = df.iloc[0:3]

# Filter data
young = df[df['age'] < 30]

Data Manipulation

# Add new column
df['salary'] = [50000, 60000, 70000]

# Drop column
df = df.drop('city', axis=1)

# Sort data
df_sorted = df.sort_values('age')

# Group by
grouped = df.groupby('city')['age'].mean()

Handle Missing Data

# Check for missing values
print(df.isnull().sum())

# Drop rows with missing values
df_clean = df.dropna()

# Fill missing values
df_filled = df.fillna(0)