What is a Neural Network?
A neural network is a computing system loosely inspired by biological neurons. It's composed of layers of interconnected nodes (artificial neurons) that process information. Each connection has a weight that gets adjusted during training — this is how the network "learns."
Think of it as a function approximator. Given enough neurons and data, a neural network can learn to approximate virtually any mathematical function — from recognizing handwritten digits to generating human-like text.
The Anatomy of a Neuron
Each artificial neuron performs three operations:
- Weighted Sum — multiply each input by its weight, then sum them: z = w₁x₁ + w₂x₂ + ... + wₙxₙ + b
- Activation Function — apply a non-linear function to the sum: a = f(z)
- Output — pass the result to the next layer
The bias term (b) allows the neuron to shift its activation. Without bias, the function would always pass through the origin.
Network Architecture
Neural networks are organized in layers:
- Input Layer — receives raw data (pixel values, word embeddings, feature vectors)
- Hidden Layers — where computation happens. Each layer learns increasingly abstract features
- Output Layer — produces the final prediction (class probabilities, regression value)
"Deep" learning simply means using networks with many hidden layers. A network with 2+ hidden layers is considered deep. Modern architectures like GPT-4 have hundreds of layers.
Activation Functions
Without activation functions, a neural network would just be a linear transformation — no matter how many layers you stack. Activation functions introduce non-linearity, enabling the network to learn complex patterns.
Sigmoid: σ(x) = 1/(1 + e⁻ˣ)
Squashes output to (0, 1). Historically popular but suffers from vanishing gradients — during backpropagation, gradients become extremely small in deep networks, preventing early layers from learning.
ReLU: f(x) = max(0, x)
Simple and effective. Returns 0 for negative inputs, x for positive. Solves the vanishing gradient problem. Used in most modern architectures. Variants include Leaky ReLU and GELU.
Softmax
Converts a vector of raw scores into probabilities that sum to 1. Used in the output layer for multi-class classification.
How Neural Networks Learn: Backpropagation
Training a neural network involves two phases repeated thousands of times:
Forward Pass
Data flows through the network from input to output. Each layer computes its weighted sum and activation, passing results forward. The output layer produces a prediction.
Backward Pass (Backpropagation)
The network compares its prediction to the actual answer using a loss function (e.g., cross-entropy for classification, MSE for regression). It then computes the gradient of the loss with respect to every weight using the chain rule of calculus. These gradients tell each weight how to adjust to reduce the error.
Gradient Descent
Weights are updated in the direction that reduces loss: w_new = w_old - learning_rate × gradient. The learning rate controls step size — too large and you overshoot; too small and training takes forever.
This process repeats for many epochs (complete passes through the training data) until the loss converges.
Common Architectures
- Feedforward Networks — basic architecture where data flows in one direction
- Convolutional Neural Networks (CNNs) — specialized for grid-like data (images). Use convolution filters to detect features like edges, textures, and objects
- Recurrent Neural Networks (RNNs) — designed for sequential data. LSTMs and GRUs solve the vanishing gradient problem for long sequences
- Transformers — use self-attention to process all positions simultaneously. Foundation of GPT, BERT, and modern LLMs
Practical Tips for Training
- Data normalization — scale inputs to similar ranges (0-1 or zero mean, unit variance)
- Batch normalization — normalize activations between layers for stable training
- Dropout — randomly deactivate neurons during training to prevent overfitting
- Learning rate scheduling — decrease learning rate as training progresses
- Early stopping — stop training when validation loss stops improving
Dive deeper with our Neural Networks & Deep Learning lesson, which includes interactive quizzes and visual explanations. Get full access to all 31 lessons covering everything from neurons to production AI systems.