This document will explain what neural networks are and how they work, which will help you understand how AI and machine learning work. In the scenario below you'll play the part of the neural network.
First day of your new job as a "classifier" your boss walks in and drops a big spreadsheet of numbers on your desk.
"Is this a cat?" she asks.
Confused you ask "What?"
"Is this a cat?"
Even more confused, you respond "I don't know".
"Wrong!" she says, and slaps you across the face.
Before you've had a chance to be shocked she drops another large spreadsheet on your desk. "Is this a cat?"
"I don't understand" you respond.
"Wrong!" she says, and slaps you again.
Another spreadsheet. "Is this a cat?"
Not wanting another slap, you meekly respond "Yes?"
"Correct! Good job!" she says, and gives you a wonderful reward. Almost makes up for the slaps.
Another spreadsheet. Cat?
Slightly more confident and wanting another reward you respond "Yes"
"Wrong!". Another slap.
You are very confused. "I don't understand what's going on. You haven't told me the rules, you haven't told me how to figure out something is a cat, you haven't given me the logic to figure this out. You haven't trained me."
"Correct" says the boss. "This is training. Trust me, you're going to get very good at recognizing cats"
Another spreadsheet. This time you focus on the sheet. It's 256 columns and 256 rows, filled with numbers. Is it a cat? You don't know, so you guess.
Another and another. Right, wrong, wrong, right again, it keeps going. Slaps and rewards.
You're starting to notice some patterns - if the sheet is almost all zeros then it's not a cat. You look at the center of the sheet - that part should have larger numbers.
It's getting slightly better - you're getting more rewards than slaps, guessing correctly more often than not. It's all about the patterns of the numbers.
This is the strangest job you've ever had. The slaps are terrible, but the rewards are so great you don't want to quit. How do you get better at this? It's not really possible for one person. If you had more people they could focus on different aspects of the sheet, look for different patterns, and you could use their findings to make better guesses.
You hire 10 people and bring them to work the next day. When you get a spreadsheet you show it to those 10 people and ask them "Is this a cat?"
They are as confused as you were. You force them to guess. The first guy looks like an idiot, so you decide to go with the opposite of what he says. The third lady looks really thoughtful so you put a lot of weight on what she says. In your mind you assign a weight to each of their guesses to come up with your final answer.
Each time you get a reward or punishment you share it with the 10 people you've hired: you reward or slap them based on how much weight you put on their input and how much they contributed to you getting the answer right or wrong. You learn the patterns: if the first guy says a strong no and the third lady a strong yes then it's very likely a cat. You learn many more patterns like this.
The 10 people are learning to hone their opinions based on the rewards and punishments they get. They can focus on different aspects of the sheet, and their opinions together can tell you a lot about the sheet.
After what feels like an epoch and a lot of spreadsheets, eventually you get better. You're getting more rewards. This is working. Your boss says you're very perceptive, and starts calling you Perceptron.
What if you get more people involved? You could make it 50 people reporting to you, but that'd be hard to manage. How about we add another layer of people before your 10?
You hire 50 more people, have them give their guesses to the 10 people that report to you. You're now very removed from looking at the spreadsheets - instead you rely on the patterns found by the first layer of the people, who give their opinions to the second layer of the people, who then inform you. Everybody passes the slaps and rewards down the line based on how much each earlier person's guess contributed to their guess.
It takes even longer, but eventually the system starts working. You're much more accurate in identifying when it's a cat.
Interestingly you were never taught the rules or logic, and you didn't teach your people the logic or rules. You just propagated the rewards and punishments back through each layer: the more each person's opinion contributes to the answer, the more rewards or punishment you shared with them, and they in turn with the people in the layer behind them. Each person uses the same method to propogate their rewards and punishment back to the people in the layer below them.
This is how neural networks work: they see many examples and get rewarded or punished based on whether their guesses are correct. They use multiple layers of workers and eventually learn patterns. Importantly no one is teaching them what patterns they should be looking for or telling them the logic or the rules - the networks eventually figure out the patterns and logic based on very many rounds of example, reward, and punishment. This is called machine learning, because the machine is learning the rules by itself.
Neural networks work well when you have many examples of something (eg. pictures of cats), but it's hard to write the logic and rules to describe how to recognize that thing. Try it - write down some rules for how to recognize a cat (eg. "has 4 legs"), then look at pictures of cats and see where the rules fail (eg. a picture of a cat's head). In these cases you use machine learning so the machine learns the rules by itself.
Also note that computers see things as multi-dimensional tables of data. They don't look at a "picture" - they see 3 spreadsheets of numbers representing the RGB values of the picture.
Identifying cats is a lucrative business and you're pretty good at it. But not good enough. How can we get even better? More people, more layers!
Unfortunately it takes a long time for people to do their calculations, so you've been stuck with just a few layers of people for a while. Having a lot more people would make the process take too long.
One day you run into a group of people who call themselves Gaming People United (GPU). These people play a game that has taught them to be very good at looking at spreadsheets and calculating numbers - in fact they can do a lot of calculations in parallel, very quickly.
Excited, you hire a bunch of them and put them to work recognizing cats. Now instead of two layers of people you can have 10! Each layer can focus on higher and higher concepts - the first layer can look for small details (eg. do I see a pattern that looks like a small circle? Do I see a pattern that looks like a sharp edge?), the second layer can look for patterns from the results of the first layer (eg. are there two circles close to each other), the third layer can build on that (eg. are there two circle close to each other, and a triangle below them), and so forth. By the time the results of the layers get to you you have some fairly sophisticated concepts - we see a pattern that looks like a face, we also see patterns that could be legs, and a sharp thing that might be a beak. Now your guess is more informed than ever.
You train the layers sending slaps and rewards down through each layer, and after many epochs you get really good at recognizing cats.
One of the major break-throughs in machine learning was the advent of Deep Learning, which is basically what we describe above - GPUs (Graphics Processing Units) got popular because they enable fast 3D graphics for games. They also happen to be very good at quickly doing the types of calculations neural networks need. Machine learning people started using GPUs for training neural networks, and with this extra speed they could have many more layers - from 3 or 4 layers to 10 or 11, and then hundreds.
This addition of layers led to significant advances in neural network performance - very quickly neural networks became the best solution for voice recognition, for image recognition, image creation, and sophisticated language models. This is called Deep Learning because there are so many layers, not because it's profound.
You can stop reading here if you want, the following portion is only here because someone asked me what "convolutional neural networks" are.
How can we make the process even more efficient? You start to realize that looking at the entire sheet is too hard - the first layer of people literally have to stare at the whole thing and try to guess based on that. What if we give them a small portion of the sheet to look at, give a guess for that portion, then move on to the next portion of the sheet, and so on? The first layer of people are focused on lower level concepts anyway - looking for edges, things that look like circles, and so forth - and they can find those in the small portions of the sheet, without the need to look at the whole thing at once.
You tell the first layer of people this is what they should start doing. They respond that this sounds very convoluted and they'll only agree to do it if you call them "colonel".
This makes the process even faster and more accurate: the first layer become specialists in small features of the sheet, and they learn to be very efficient and fast. You try the same idea for the next several layers: you tell people to focus on small portions of the feedback they get from the layer below them, moving the portion they focus on so eventually the scan across all of the feedback.
This specialization and focus makes you even more accurate. Congratulations, you are officially recognized as the best cat classifier in the world.
Convolutional Neural Networks
Convolutions are this idea of applying a specific set of calculations, sometimes called a kernel, to each portion of the input, and scanning your window of attention across the entire image. The set of calculations, or kernel, is learned by each worker - you don't tell it what to calculate or how, it learns what's useful based on rewards and punishments. Convolutional Neural Networks (CNNs) were a significant step forward in the capability of neural networks.
If you're interested in this topic you might also enjoy the other posts in the Non-Technical Explainer series as well.