Python Machine Learning
Categorical Data
Working with categorical data
Categorical Data
Categorical data are data that cannot be measured, but can only be categorized. Examples include gender, nationality, or product type.
Handling Categorical Data
import pandas as pd
df = pd.DataFrame({
'Country': ['France', 'Spain', 'Germany', 'Spain', 'Germany', 'France', 'Spain', 'France', 'Germany', 'France'],
'Age': [44, 27, 30, 38, 40, 35, None, 48, 50, 37],
'Salary': [72000, 48000, 54000, 61000, None, 58000, 52000, 79000, 83000, 67000]
})
dummies = pd.get_dummies(df['Country'])
df = pd.concat([df, dummies], axis=1)
df = df.drop('Country', axis=1)
print(df)