Neural Networks Explained: How They Work and Why They Matter

What is a Neural Network?

A neural network is a computing system loosely inspired by biological neurons. It's composed of layers of interconnected nodes (artificial neurons) that process information. Each connection has a weight that gets adjusted during training — this is how the network "learns."

Think of it as a function approximator. Given enough neurons and data, a neural network can learn to approximate virtually any mathematical function — from recognizing handwritten digits to generating human-like text.

The Anatomy of a Neuron

Each artificial neuron performs three operations:

Weighted Sum — multiply each input by its weight, then sum them: z = w₁x₁ + w₂x₂ + ... + wₙxₙ + b
Activation Function — apply a non-linear function to the sum: a = f(z)
Output — pass the result to the next layer

The bias term (b) allows the neuron to shift its activation. Without bias, the function would always pass through the origin.

Network Architecture

Neural networks are organized in layers:

Input Layer — receives raw data (pixel values, word embeddings, feature vectors)
Hidden Layers — where computation happens. Each layer learns increasingly abstract features
Output Layer — produces the final prediction (class probabilities, regression value)

"Deep" learning simply means using networks with many hidden layers. A network with 2+ hidden layers is considered deep. Modern architectures like GPT-4 have hundreds of layers.

Activation Functions

Without activation functions, a neural network would just be a linear transformation — no matter how many layers you stack. Activation functions introduce non-linearity, enabling the network to learn complex patterns.

Sigmoid: σ(x) = 1/(1 + e⁻ˣ)

Squashes output to (0, 1). Historically popular but suffers from vanishing gradients — during backpropagation, gradients become extremely small in deep networks, preventing early layers from learning.

ReLU: f(x) = max(0, x)

Simple and effective. Returns 0 for negative inputs, x for positive. Solves the vanishing gradient problem. Used in most modern architectures. Variants include Leaky ReLU and GELU.

Softmax

Converts a vector of raw scores into probabilities that sum to 1. Used in the output layer for multi-class classification.

How Neural Networks Learn: Backpropagation

Training a neural network involves two phases repeated thousands of times:

Forward Pass

Data flows through the network from input to output. Each layer computes its weighted sum and activation, passing results forward. The output layer produces a prediction.

Backward Pass (Backpropagation)

The network compares its prediction to the actual answer using a loss function (e.g., cross-entropy for classification, MSE for regression). It then computes the gradient of the loss with respect to every weight using the chain rule of calculus. These gradients tell each weight how to adjust to reduce the error.

Gradient Descent

Weights are updated in the direction that reduces loss: w_new = w_old - learning_rate × gradient. The learning rate controls step size — too large and you overshoot; too small and training takes forever.

This process repeats for many epochs (complete passes through the training data) until the loss converges.

Common Architectures

Feedforward Networks — basic architecture where data flows in one direction
Convolutional Neural Networks (CNNs) — specialized for grid-like data (images). Use convolution filters to detect features like edges, textures, and objects
Recurrent Neural Networks (RNNs) — designed for sequential data. LSTMs and GRUs solve the vanishing gradient problem for long sequences
Transformers — use self-attention to process all positions simultaneously. Foundation of GPT, BERT, and modern LLMs

Practical Tips for Training

Data normalization — scale inputs to similar ranges (0-1 or zero mean, unit variance)
Batch normalization — normalize activations between layers for stable training
Dropout — randomly deactivate neurons during training to prevent overfitting
Learning rate scheduling — decrease learning rate as training progresses
Early stopping — stop training when validation loss stops improving

Dive deeper with our Neural Networks & Deep Learning lesson, which includes interactive quizzes and visual explanations. Get full access to all 31 lessons covering everything from neurons to production AI systems.