Data Preprocessing & Feature

AI is technology that enables computers and machines to simulate human learning.

What is Data Preprocessing?

Data preprocessing is the process of cleaning, transforming, and preparing raw data so that machine learning algorithms can understand it.

💡 Think of it like cleaning and organizing ingredients before cooking — if the data is messy, the model won’t learn properly.

Steps:

1. Data Cleaning

Handle missing values (NaN, blanks).
Remove duplicates.
Correct inconsistencies (e.g., “Male” vs. “M”).

2. Data Transformation

Convert categorical values into numbers (One-Hot Encoding, Label Encoding).
Normalize/standardize numeric values (scale values to the same range).
Convert text to tokens (for NLP).

3. Data Reduction

Remove irrelevant columns.
Dimensionality reduction (PCA, t-SNE).

What are Features?

Features are the measurable properties or characteristics of the data that the model uses to make predictions.

In a dataset, columns = features (inputs), and one column is usually the target (output).

Example: Predicting house prices

Features → size, location, number_of_rooms
Target → price

Feature Engineering

Feature engineering means creating new features or modifying existing ones to improve model performance.

Examples:

Creating new features: From date_of_birth, create a new feature age. From transaction_amount, create a log(transaction_amount) to handle skewness.
Feature Selection: Keep only the most relevant features and remove noise (irrelevant columns).
Encoding categorical features: Example: Convert ["Red", "Blue", "Green"] → [1,0,0], [0,1,0], [0,0,1].

✅ In summary:

Data preprocessing = cleaning + transforming data.
Features = inputs that describe the problem.
Feature engineering = creating/selecting the best features to boost model performance.

Popular AI Libraries

AI is technology that enables computers and machines to simulate human learning.

Model Training & Testing

AI is technology that enables computers and machines to simulate human learning.