Python Machine Learning

Categorical Data

Working with categorical data

Categorical Data

Categorical data are data that cannot be measured, but can only be categorized. Examples include gender, nationality, or product type.

Handling Categorical Data

import pandas as pd

df = pd.DataFrame({
    'Country': ['France', 'Spain', 'Germany', 'Spain', 'Germany', 'France', 'Spain', 'France', 'Germany', 'France'],
    'Age': [44, 27, 30, 38, 40, 35, None, 48, 50, 37],
    'Salary': [72000, 48000, 54000, 61000, None, 58000, 52000, 79000, 83000, 67000]
})

dummies = pd.get_dummies(df['Country'])
df = pd.concat([df, dummies], axis=1)
df = df.drop('Country', axis=1)

print(df)