AI / ML

Scikit-learn Intro

Machine learning library

Scikit-learn Intro

Scikit-learn is one of the most popular machine learning libraries in Python. It provides simple and efficient tools for data mining and data analysis, built on NumPy, SciPy, and matplotlib.

Install Scikit-learn

pip install scikit-learn

Import Scikit-learn

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import numpy as np

Simple Linear Regression

# Generate sample data
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 6, 8, 10])

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Create and train model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions
predictions = model.predict(X_test)

# Evaluate
mse = mean_squared_error(y_test, predictions)
print(f"Mean Squared Error: {mse}")

Classification Example

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Train model
model = DecisionTreeClassifier()
model.fit(X_train, y_train)

# Predict
predictions = model.predict(X_test)

# Evaluate
accuracy = accuracy_score(y_test, predictions)
print(f"Accuracy: {accuracy}")

Common Algorithms

  • Linear Regression: For regression problems
  • Logistic Regression: For classification
  • Decision Trees: For both classification and regression
  • Random Forest: Ensemble method
  • K-Means: Clustering
  • Support Vector Machines: Classification

Tip: Scikit-learn follows a consistent API - most models use fit(), predict(), and score() methods.