Training Data: The dataset used to train a machine learning model. It contains input-output pairs (features and labels) that the model learns from.
Test Data: A separate dataset used to evaluate the performance of the trained model. It helps assess how well the model generalizes to new data.
Validation Data: A subset of the data used during training to tune model parameters and prevent overfitting.
Algorithms:
Supervised Learning: The model is trained on labeled data, where the input comes with corresponding output labels. The goal is to learn a mapping from inputs to outputs. Examples include classification and regression.
Unsupervised Learning: The model is trained on unlabeled data and tries to find patterns or structures in the data. Examples include clustering and dimensionality reduction.
Semi-Supervised Learning: Combines a small amount of labeled data with a large amount of unlabeled data during training.
Reinforcement Learning: The model learns by interacting with an environment and receiving feedback in the form of rewards or penalties. It focuses on learning a strategy to maximize cumulative rewards.
Model:
Training: The process of using data to adjust the parameters of a model so that it can make accurate predictions or decisions.
Evaluation: The process of assessing the model's performance on test data to determine how well it generalizes to new, unseen examples.
Prediction: Using the trained model to make predictions or decisions based on new input data.
Features and Labels:
Features: The input variables or attributes used by the model to make predictions or decisions. They represent the data's characteristics.
Labels: The output variable or target that the model is trying to predict or classify. In supervised learning, labels are provided during training.
Loss Function:
A function used to measure how well the model's predictions match the actual results. The goal is to minimize the loss function to improve the model's accuracy.
Optimization:
The process of adjusting the model's parameters to minimize the loss function and improve performance. Common optimization techniques include gradient descent and its variants.