A couple of months ago I wrote this post that predicted:
Current techniques for Machine Learning are not going to produce human intelligence
That was made after week three of the Intro to Machine Learning course at Coursera I was taking. Well, I finished the course last week, and in the final video, Professor Ng told me that I should now “consider myself an expert in Machine Learning”. Hmm – I sure don’t feel like one. But anyway, now that I am (supposedly) an expert, I thought I might check my expert opinion and see if any of the extra things I have learnt make any difference to that prediction. Especially since one of the things I learnt was “there is a LOT more to learn about machine learning.”
The two reasons I gave for this prediction was that current ML techniques require too much maths and too many training examples.
Too Much Maths
My initial prediction said that current ML techniques were unlikely to replicate human level intelligence since our brains don’t indulge in the sort of mathematical number crunching found in ML. This also goes for neural networks (covered in weeks four and five), a technique that supposedly mimics the human brain in terms of the number of input/output nodes and the connections between them. From what I understand and have seen of neural nets so far, that is the only thing they have in common. Neural nets have even more calculations in them than the ‘basic’ regression models I learnt about first. The underlying mechanisms (chemical vs numerical) and the forward/backward nature of the algorithm etc. appear to be totally different from what is likely going in my brain right now. But as I have said elsewhere, I have not studied neuroscience.
Regardless, I stand by my observation that ML does way more maths than the living brain of any living creature.
But is that actually relevant? If we put aside the spiritual side of consciousness (and that is actually quite a big ‘if’), then if ML can simulate the complex chemical processes in an organic brain using mathematical models, then why could that not actually be good enough to at least simulate organic intelligence?
Too Many Training Examples
I also said that good ML models require way more training data than human beings do to make accurate predictions. Therefore, current ML techniques can’t replicate human intelligence.
But is it actually true that humans don’t get lots of training data? Sure, I don’t need to formally look at 5,000 photos labelled with whether they are of a pair of shoes or not in order to be able to tell whether the 5,001st photo is something I might put on my feet. But ever since the day I first opened my eyes in my mother’s womb (not that there was likely much of great interest there to look at) I started receiving “training data.” The same goes for all five senses – but let’s just look at the visual stuff for now.
Can we quantify how much visual training data we receive? It doesn’t appear as if you can accurately compare the images that human eyes receive in the same simple terms of mega-pixels/resolution or frame rate that you can use to describe digital video. But even what is likely a serious underestimation of how many images we see by the age of five (assuming a frame rate of 1 FPS and 10 megapixel images for the 12 wakeful hours each day) comes to about 80 million high resolution, full colour, stereoscopic training examples. And that is just the visual input for the first five years. As long as we have the use of our senses and brains, we never stop receiving or processing this “training data”.
(Note that an interesting counter-example to this would be that of creatures with life-spans measured in days rather than years. Insects can “know” stuff pretty much from birth/hatching. If “instinct” is the biologist’s word for this, could a computer scientist use “pre-loaded neural net?”)
The simple reason why I don’t need to see 5,000 examples of a “new” thing in order to recognise it again may be that somewhere within my gazillion other mental training examples, I have already seen something else similar enough to connect it to. But when a neural net starts learning, it really is starting from zero. Maybe only 5,000 examples is actually pretty good going! It would be interesting to see empirical data comparing the accuracy, but more interestingly, the rate of learning new things between two neural networks that differ only in the amount of training data they have received.
So, was my Prediction Wrong?
So where does that leave us? Since there are arguably good rebuttals to my two criteria for tossing out the ML techniques I know about as ultimately leading to human level AI, was my prediction wrong? I don’t think so. And that’s because I think I forgot a third reason (and probably others) for making that prediction. It was hinted at in my discussion of babies learning to recognise things.
Computers need to be programmed with specific learning algorithms, and currently can only really learn one thing well( e.g. playing Go, recognising shoes etc). Choosing, setting up, debugging, tuning and actually using these algorithms – at this stage of the game at least – are more of a black art than a science which absolutely require human experience and intuition. But my mother didn’t need to “tune” me to learn.
In an entertaining book rebutting Stephen Pinker’s “The Language Instinct”, Geoffrey Sampson says that human beings do not have a language instinct that allows us to learn language so quickly. Instead, we are far luckier: we have a learning instinct that we can (and do) apply to anything and everything, language included. Here is a short summary of the arguments – bad language and all, but there are others.
It is no surprise then to discover that a lot of people are working on discovering some kind of general purpose learning algorithm. Here’s an article about Professor Ng himself and the hunt for this elusive algorithm.
If we are able to program this learning instinct, we might be in more of a position to really simulate aspects of human intelligence.
Baby Image: Betty Wills, Wikimedia Commons, License CC-BY-SA 4.0
I remember a number of years ago when I was contemplating the sad reality facing supervised learning: The need for so much training data… and then I realized that reality provides an “unlimited” amount of very relevant, free training data, much as you describe here. Namely, if you create a learning system that tries to predict what will happen next, then you can take your sensations about what happens next and compare that to your prediction. If your prediction came true, then you got the “training example” right. If your prediction did not come true, you got the “training example” wrong.
Thus, I suspect that babies have a pretty amazing supply of training data.
Also, as you were positing, the speed at which something can learn is a function of how rich a foundation it has of smaller but relevant features to draw from. To give an example, toolkits like the Wolfram Language allow you to create an image classifier where you might feed it 20 examples of a dog and 20 examples of a cat, and it will create a classifier for you that will distinguish between them, even if it had never previously seen a picture of a cat or a dog before. The way it works is that you first train a neural network on many hundreds of thousands of images. You then chop off the last layer(s) of the neural network. What you get is a new neural network that has perhaps hundreds of “outputs”, and those outputs are high level features that it found useful in distinguishing common objects. To no big surprise, you can then train a new network on your cat/dog examples as follows: For each image, you run it through the top-chopped-off network and collect its outputs. Those outputs are then the inputs to some new network that you will train. That new network might actually be very simple, maybe only a single layer, and it will have a huge headstart in classifying dogs and cats as compared to training a neural network from scratch.
Presumably a person’s mind is quite like this as well. Just as in one of your first blog posts how I was suggesting that one of the keys to learning complex patterns is to first learn simple patterns, and then to use those simple patterns — by composing them — to learn more complex patterns, a human can probably use their existing database of patterns to quickly create a new “pattern” for some novel object they come across. The “representation” of that new concept is thus quite efficient to store in memory, because it’s “just” a composition of simpler patterns.
I suspect the way forward for machine learning, as many others do as well (I think), is for us to figure out how to make really good models that can predict the next “frame”. In the image domain, that’s the next image frame. In the audio domain, that’s the next portion of audio, etc. That form of unsupervised learning will have free, unlimited, high-quality training data at its exposure, which will breath new life into ML. (No more paying low wage workers cents per hour to laborious label training data, and still not having nearly enough of it)
I should amend what I just said: The predict-the-next-frame approach to learning isn’t saying it quite right. That assumes our end goal is to predict the input layer of the next frame that will be fed to the network. That’s fine, but what I’m saying in general is that for every node in the network, not just the bottom layer, we want to make a prediction. So if there’s a very high level node in the network that lights up if there’s a cat in the frame, then we want to make a prediction whether the next frame will also contain a cat in the frame. If there’s a neuron that lights up when there’s a loud “bang” sound detected, then we’ll want to predict that lighting up when a dish is about to smash on the ground, etc, etc, etc.
And again, we’ll start by learning how to predict low-level features, and once we get good enough at those, we’ll be able to use those to form higher level features, and as we form higher level features, we’ll try to predict those, and up and up we’ll go learning richer and richer patterns.
To some degree, I think that’s what humans call “wisdom”. When we look at someone who has some decades under their belt and we say they’re “wise”, I think we’re saying that their mind has been successful at building quite a tall tower of patterns, and the patterns at the top of that hierarchy are quite special and useful, and distinguish them from the less wise people around them. Some patterns make take many decades to develop in people — we just haven’t seen enough training examples for the sub-patterns they are composed of by the time we’re 20 to really form that higher level pattern. But eventually the lights go on — we have that “aha” moment when we’re 30, or 40, or 70, and “get it” in a way we couldn’t before.
As for whether there are too many “maths” in the current ML approaches, I have mixed feelings on that. Each node is just connected to its inputs and its outputs, and the maths that are computed when the network is in use are quite simple — just take the inputs, multiply them by weights, and we have our core output signal. Maybe nature really does something analogous to that. But the experts seem to agree with you there is evidence that backpropogation isn’t how the human brain learns.
For me, the biggest thing that current deep learning is missing (besides the whole consciousness side of things) is something analogous to discrete human thought. Ironically, discrete human thought feels a bit more like a semi-serial computer execution scheme where there’s a “train of thought” that is mutated over time. There may be lots of parallel processing going on, but the thought process seems quite linear, or perhaps tree like. It has a certain cohesiveness that feels very different than the way a neural network goes about its work.
I’ve long held a suspicion that an intelligent system will be quite a bit more capable if it’s able to synergize the massively parallel processing of neural nets, which seems to me very much like the human subconcious — super powerful at feeling patterns / intuiting things, with a more discrete form of computation somewhat like how computers execute a program — something that is:
– More discrete
– More introspectable
– More “linear”
This thing looks up memories, considers possibilities / branches, backtracks, tries to listen to the stream of nudges from the subconcious, etc, etc.
If we can master unsupervised machine learning that can build up larger and larger concepts from watching a stream of video + audio + touch, and we can synergize that with a more discrete module that can be built on top of that network that allows for something akin to a “continuous train of thought”, then I feel like, over the decades, we might arrive at something much more analogous to a human mind than today’s deep neural networks alone.
And after all that, there’s still the deep mystery of consciousness, and how much that aspect of the human mind affects the behavior of the human as compared to a machine processing the same inputs.
Thanks for verifying my hunch about being able to build upon existing training examples with only a little bit more information. It wouldn’t have crossed my mind to chop off the last layer of a neural net to do that, but I guess it makes sense. You have an interesting take on wisdom… It could be accurate. Something I realised after hitting “post” that I didn’t touch on – but you did – was the nature of trial and error learning in humans. I think we learn a lot more from our mistakes (‘mis-classifications’ in ML parlance) than the ML algorithms I have seen. Dropping new data into the ML systems I have seen will gradually change them – whereas a human can know pretty quickly when a situation is completely different and “everything I know is wrong”. Or equally when this is just a special case to be treated differently – or not. We may even judge special cases wrongly, but we are still aware of the phenomenon and can (try) to respond accordingly. This is quite a big deal for learning. In terms of the maths thing, yes, each calculation may be mathematically simple (summing a whole bunch of products), but you try doing it all in your head! Maybe our brain can reach the same sort of results chemically, but its certainly not maths. Thanks for your insightful comments as always.
Your point about reacting to mis-classifications — what comes to mind for me there is that it’s the degree to which our prediction was incorrect that grabs our attention and provides “energy” for reorganizing the system.
One interesting illustration I read about a couple of years ago: Some math genius posited that the brain is somewhat like a mountain covered in snow. When additional snow falls, it occasionally creates a little cascade of sliding snow, until the system finds a new equilibrium. Sometimes though, that cascade triggers a bigger cascade, and sometimes an all-out avalanche. Turning to the brain, imagine if the brain consisted of many teeter-totters of a sort, encoding their degree of belief about something. Now imagine enough evidence is accumulated that causes that teeter-totter to flip to its opposite side. That change of state might be enough to disturb the teeter-totters nearby it, and if they change state, you could have an all-out avalanche of state change. (another illustration: a see of mouse traps, where one mouse trap going off could set off a cascade) I’ve wondered if the feeling of “epiphany” might be somehow analogous to one of these cascades of teeter-totters flipping state and some tipping point in the mind has been reached.
Within that thought experiment, the notion of being able to react effectively to some radically failed prediction might be somewhat like the cascade of teeter-totters. The degree of failed prediction may act like a kind of “energy” causing a larger and more cascading series of state changes in the mind.
As for summing a bunch of products and “try doing it all in your head”, I actually wonder if that is one of the advantages of the brain being a physical system. My analogy here is the foundation of your house. For the foundation to “figure out” how much to push down on the ground, it has to calculate the sum of billions of things. I’m sitting here on the third floor on my chair, and my chair is pressing into the floor. So it has to take me into account. And the desk in front of me. And the fridge on the first floor. etc, etc, etc. (down to the level of atoms) It blows me away how many “calculations” reality has to perform every millisecond to know how much the foundation of my house needs to press down on the soil below.
Could the brain leverage something like that? Does electricity act somewhat like that? If you have five wires converging and wire A has 0.4 units of “electric pressure” and wire B and 0.1 and wire C has …, does nature automatically sum of the contributions of all of them? I suspect so. Maybe the brain is able to do a crazy number of “additions” per millisecond simply because it is a physical system and physics does all of those additions for free.
Whether something is “maths” or “physics” feels somewhat arbitrary to me.
The teeter totter analogy is interesting. Thanks. I agree with you about the physical nature of things. What I was trying to say is that clearly we don’t consciously do all that maths in our heads, but maybe something similar happens at the physical level involving the various biochemical and electrical properties of our brains obviating the need for a calculator and any ‘conscious’ calculation. If you put a certain concoctions of chemicals in a certain part of a certain brain, the laws of chemistry and electricity will do their ‘physical’ thing ‘for free’ – in just the same way that putting certain electrical charges through a certain piece of silicon results in another kind of physical system doing its thing. One physical system (e.g. wheels, knives) can replicate (even if only partially) the result of a totally different physical system (e.g. legs, teeth). We have been using technology for that since the stone age. The question is whether or not current ML algorithms are in fact closely replicating what happens in the brain at a generalised level. I think the answer is ‘no’ – but it’s not because they use maths and the brain uses chemicals (as per my post 2 months ago). I now say it’s because the algorithms are totally different.
Have you played with ChatGPT? At https://chat.openai.com/chat I asked it to “write a gospel song” which it did with 2 verses, chorus, bridge and outro. I then asked it to “write the music for the gospel song you just wrote” and got the response “As an AI language model, I don’t have the capability to compose music. However, I can provide you with some general guidance on how you could approach composing the music for the gospel song I wrote.
First, you could decide on the tempo and time signature of the song. A moderate tempo such as 80-100 bpm could work well for a gospel song. Common time (4/4) or cut time (2/2) could be suitable time signatures.
Next, you could think about the instrumentation you would like to use. A gospel choir, piano, guitar, drums, and organ are common instruments used in gospel music.
Once you have decided on the instrumentation, you could begin to experiment with chord progressions that suit the song’s message and mood. Gospel music typically uses major and minor chords, as well as chord progressions such as the 1-4-5 progression or the 1-6-2-5 progression.
From there, you could start to compose the melody and lyrics to fit the chord progression and tempo. You may want to experiment with different harmonies, rhythms, and melodies until you find the right combination that brings the gospel song to life.
Remember, gospel music is all about expressing faith, hope, and love through music. So, let the spirit move you and guide you as you compose the music for your gospel song.”