What is the difference between stochastic gradient descent (SGD) and gradient descent (GD)?
- Gradient Descent (GD) computes gradients using the entire dataset in each iteration.
- Stochastic Gradient Descent (SGD) updates the model's parameters using only one randomly selected data point (or a small batch) in each iteration. It's computationally efficient but can have more erratic convergence.