AI / ML
Pandas Basics
Data manipulation
Pandas Basics
Pandas is a powerful data manipulation and analysis library for Python. It provides data structures and functions needed to work with structured data seamlessly.
Install Pandas
pip install pandas
Import Pandas
import pandas as pd
Create DataFrame
# From dictionary
data = {
'name': ['Alice', 'Bob', 'Charlie'],
'age': [25, 30, 35],
'city': ['New York', 'London', 'Tokyo']
}
df = pd.DataFrame(data)
print(df)
Read CSV File
df = pd.read_csv('data.csv')
print(df.head()) # First 5 rows
print(df.info()) # Data info
print(df.describe()) # Statistical summary
Data Selection
# Select column
ages = df['age']
# Select multiple columns
subset = df[['name', 'age']]
# Select rows
first_row = df.iloc[0]
first_three = df.iloc[0:3]
# Filter data
young = df[df['age'] < 30]
Data Manipulation
# Add new column
df['salary'] = [50000, 60000, 70000]
# Drop column
df = df.drop('city', axis=1)
# Sort data
df_sorted = df.sort_values('age')
# Group by
grouped = df.groupby('city')['age'].mean()
Handle Missing Data
# Check for missing values
print(df.isnull().sum())
# Drop rows with missing values
df_clean = df.dropna()
# Fill missing values
df_filled = df.fillna(0)