Every time your phone recognizes your face in the dark, every time Spotify queues a song you didn’t know you needed, every time ChatGPT answers a question — a neural network is doing the work.
Neural networks are the engine underneath almost every meaningful AI application of the last decade. And yet most people — including most adults — cannot explain what one actually is.
This guide does exactly that. No jargon. No assumptions. Just a clear explanation that a curious 12-year-old and their parent can read together and actually understand.
Start With the Brain — But Don’t Take the Analogy Too Far
Neural networks were inspired by the structure of the human brain. Not copied — inspired. The distinction matters.
Your brain contains approximately 86 billion neurons — specialized cells that receive electrical signals, process them, and pass signals on to other neurons. When you learn something new, the connections between certain neurons strengthen. When you forget something, they weaken. Your brain’s ability to recognize your mother’s face, understand spoken language, and ride a bicycle all emerges from these shifting patterns of connection.
A neural network borrows this basic idea: it is a mathematical system made up of nodes (loosely analogous to neurons) connected to each other. Each connection has a weight — a number that determines how strongly one node influences another. When a neural network learns, it adjusts these weights.
That is the core of it. Everything else — the remarkable things neural networks can do — follows from that simple structure.
What a Neural Network Actually Looks Like
A neural network is organized into layers. Every network has at least three:
The Input Layer
This is where data enters the network. If you are building a neural network to recognize handwritten digits, the input layer receives the pixel values of an image — say, a 28×28 grid of pixels, each represented as a number between 0 (black) and 255 (white). That gives you 784 input nodes, one per pixel.
The Hidden Layers
Between the input and output, there are one or more hidden layers. This is where the network does its computational work — transforming the raw input into increasingly abstract representations. In a network trained to recognize faces, early hidden layers might detect edges and curves. Deeper layers combine those into eyes, noses, and cheekbones. The final hidden layers represent entire face structures.
The word “deep” in “deep learning” simply refers to networks with many hidden layers. A modern large language model like GPT-4 has hundreds of layers — which is why it can capture extraordinarily subtle patterns in language.
The Output Layer
The output layer produces the network’s answer. For a digit recognition network, there might be 10 output nodes — one for each digit from 0 to 9. The node with the highest value is the network’s prediction.
How a Neural Network Learns: The Training Process
A neural network does not start out knowing anything. Its weights are initialized randomly — which means its initial predictions are essentially random too. Learning is the process of adjusting those weights until the predictions become accurate.
Here is how that works, step by step:
Step 1: Forward Pass
Data enters the input layer and flows forward through the network. Each node takes the weighted sum of its inputs, applies a mathematical function (called an activation function), and passes the result to the next layer. At the end, the output layer produces a prediction.
Step 2: Calculate the Loss
The network’s prediction is compared to the correct answer. The difference between them is called the loss (or error). A large loss means the prediction was wrong. A small loss means it was close. The goal of training is to minimize the loss across all training examples.
Step 3: Backward Pass (Backpropagation)
This is the clever part. Using a technique called backpropagation, the network calculates how much each weight contributed to the error. Weights that led to large errors are adjusted more; weights that were already close to correct are adjusted less.
The mathematical tool that makes this work is called gradient descent — an algorithm that repeatedly nudges each weight in the direction that reduces the loss. Imagine you are blindfolded on a hilly landscape and trying to reach the lowest point. You feel the slope under your feet and take a small step downhill. Then you check again and take another step. Gradient descent does exactly this — in a mathematical space with potentially millions of dimensions.
Step 4: Repeat — Thousands of Times
This process — forward pass, calculate loss, backpropagation, adjust weights — repeats across the entire training dataset, many times over. Each complete pass through the data is called an epoch. After enough epochs, the weights settle into values that allow the network to make accurate predictions on examples it has never seen.
Three Real Examples You Already Use
Face ID — Convolutional Neural Networks
When your iPhone unlocks when you look at it, a Convolutional Neural Network (CNN) is doing the work. CNNs are specially designed for visual data. Instead of connecting every node to every other node, they use small filters that slide across the image, detecting local patterns — edges, textures, shapes — which are then combined into increasingly complex features.
Your iPhone projects 30,000 invisible infrared dots onto your face, creates a 3D depth map, and passes that to a CNN that has been trained on millions of faces. The network outputs a single answer: this is or is not the registered user. It does this in under a second, in complete darkness.
ChatGPT — Transformer Networks
Large language models like GPT-4 use a different architecture called a Transformer. The key innovation of Transformers is the attention mechanism — a way for the network to determine which parts of the input are most relevant when generating each word of the output.
When you ask ChatGPT “What is the capital of France?”, the attention mechanism figures out that “capital” and “France” are the most relevant words, and generates “Paris” accordingly. When you ask a more complex question, the same mechanism tracks relationships across entire paragraphs — which is why Transformers can handle long, nuanced text in a way that earlier networks could not.
Spotify Recommendations — Collaborative Filtering with Neural Networks
Spotify’s recommendation system combines several approaches, including neural networks trained on listening behavior. The network learns to represent both songs and users as points in a high-dimensional mathematical space — where songs that are often listened to by the same people end up close together. When you listen to a song, Spotify finds songs nearby in that space and recommends them.
What Neural Networks Cannot Do
Neural networks are remarkably powerful — but they have real limitations that are important to understand.
They do not understand. A neural network that can describe the content of any photograph does not “see” in the way you do. It has learned statistical patterns in pixel values. When those patterns are unusual — a photograph taken from an angle the network has never seen, or an image deliberately crafted to fool it — the network fails, sometimes spectacularly. These are called adversarial examples, and they reveal that what looks like understanding is actually very sophisticated pattern matching.
They require enormous amounts of data. A child can learn to recognize a cat from a handful of examples. A neural network typically requires thousands or millions. This data hunger is one of the central challenges of modern AI — and one reason why the companies that control large datasets have such a structural advantage in AI development.
They are difficult to interpret. In a network with millions of weights, it is not possible to read off a simple explanation of why a particular prediction was made. This “black box” problem is one of the most active areas of AI research, with significant implications for safety, fairness, and accountability.
They reflect whatever is in their training data. If the data contains biases — and it almost always does — the network learns and reproduces those biases. A face recognition system trained predominantly on certain demographic groups performs worse on others. A language model trained on internet text absorbs the full range of human expression, including its worst aspects.
The Mathematics Underneath
Neural networks are built on mathematics that students can learn — and that we teach at CyberMath Academy. The core tools are:
- Linear algebra: Weights and activations are organized as vectors and matrices. The forward pass through a neural network layer is a matrix multiplication followed by a nonlinear function. Understanding matrix operations is essential to understanding how neural networks actually compute.
- Calculus: Backpropagation is an application of the chain rule — the rule for differentiating composite functions. Every time a neural network adjusts its weights, it is computing a partial derivative with respect to each weight.
- Probability and statistics: Loss functions, activation functions, and the interpretation of neural network outputs all draw on probability theory. A softmax output layer, for instance, converts raw scores into a probability distribution.
- Optimization: Gradient descent and its modern variants (Adam, RMSprop, AdaGrad) are the algorithms that actually perform the learning. Understanding why they work — and when they fail — requires understanding the geometry of high-dimensional spaces.
None of this requires a university degree. Students aged 12 and above with solid foundational mathematics can begin to understand all of it — and at CyberMath Academy, they do.
What Students Build at CyberMath Academy
In our AI and Machine Learning track at Harvard Faculty Club, Boston, MA (July 20–31, 2026), students do not just learn about neural networks. They build one.
Starting from the mathematical foundations — linear algebra, probability, gradient descent — students work up to implementing a neural network from scratch. By the end of the two-week program, students have trained real models on real datasets. Some have built image classifiers. Others have worked on natural language processing tasks. A few have applied neural networks to problems in medicine, environmental science, or music.
No prior coding experience is required. What is required is mathematical curiosity and the willingness to think carefully about hard problems. Our instructors — including members of the Google Brain team and Harvard Medical School researchers — guide students through every step.
“I had no idea that the math I was learning could be used to fight cancer. Now I want to study computational biology.”
— CyberMath Academy student, Summer 2024
Want to Test Yourself?
Every week on our Instagram (@cybermathacademy), we post an AI or math quiz — questions exactly like the ones explored in this article. This week’s quiz: What inspired neural networks? Drop your answer in the comments.
Apply for Harvard · Boston — July 20–31, 2026
Follow our weekly AI quizzes: @cybermathacademy · [email protected] · cybermath.org