ChatGPT machine learning defined no hype

The truly massive impact of ChatGPT has resulted in a tidal wave of very confident confusion, commercially motivated hyperbole, and the evocation of every cautionary science fiction story from the past 80 years.

We have prominent journalists trying to have two hour conversations about life with it and acting totally shocked that it says something silly. We have unsurprising claims about how it will help everyone get rich quick. And we have people claiming it will ruin the economy or destroy the world.

What is it really?

ChatGPT is a large language model (LLM). You might read the definition and still not understand, so here is what it does in very basic terms.

  1. Someone ‘trains’ the model by inputing any large body of text. (Every Wikipedia article)
  2. You ask the model to generate some kind of response. (What is Jello?)
  3. The model processes your query and generates a prediction for each next word in its response. (Jello…is…a…dessert)
  4. This prediction is based on the patterns and relationships between words that the model has learned from the training data. (Dessert is always present when talking about Jello.)

In many ways LLMs are a kind of smart auto-complete. They are trained to know how we use language and to know by inference which words and topics relate to one another.

Potential Confusion

So imagine you train one language model on only Wikipedia articles and one model on tabloid articles. Neither model understands the concept of truth. Neither model is able to provide insight into the quality or honesty of the writing in either source. All they are doing is forming sentences and paragraphs that mimic the structure of their source material.

But now let’s consider the result and the impact of it’s use. Imagine for the moment that we don’t know which model we are using. If we asked a question of the model trained on Wikipedia we might come to believe that language models are very smart. In contrast, question asked of the tabloid model might lead us to dismiss the technology as useless or even dangerous.

In other words, the model is only a reflection of what it has been fed. So feed it a lot of the internet, including plenty of human nonsense, and its going to give you nonsense at times. It is no more or less than a reflection of what it has learned.

Going A Bit Deeper

My definition above was overly simple, of course. LLMs are trained using neural networks that in some very basic ways mimic how we reason. And software like ChatGPT includes added instructions that prevent certain kinds of output and also dictate its tone.

This layer of instruction and rough ability to make inferences has lead a lot of very smart people to anthropomorphize the heck out of ChatGPT. There have been endless think pieces written about what it has said and they invariably attribute to the software some measure of agency or consciousness that it simply does not have.

This is certainly natural. Humans have a very strong tendency to anthropomorphize, or see human qualities in non-human things. When a robot talks to us in a manner that very closely mimics reasoning we seem hard wired to identify with it. So perhaps the confusion is understandable.

But that misunderstanding, in many cases, reflects a lack of understanding or accepting the nature of the technology.

What Is It Good For?

So much of the hype, confusion, and frustration about ChatGPT and similar products centers around a misunderstanding of what it does well. Many recent articles have presented silly things it has said in the middle of an extended conversation on personal matters. ChatGPT is certainly not a therapist.

If you want to know what it’s good at you need to talk to people who have been using it. One of its most widely utilized capacities is coding. It is a fantastic code reference and can help make coding more efficient in a number of ways, from defining the function of existing code to offering better performing options.

ChatGPT is also really good at summarizing things. You can ask it to define quantum physics for a 10 year old or offer a chapter list for a book on a certain topic. But it’s limitations become clear when you ask in about topics that involve some kind of value judgement.

As the service clearly states, the output may reflect the biases of the information on which it was trained. It is not good at providing sound political or historical analysis. It can’t be used to ask for psychological advice or to judge the morality of world events.

Just The Beginning

New uses for language models are surfacing every day. Their release has lead to a boom in creative ideas from whimsical things like the ability to chat with current and past Presidents to very practical applications like a service that automates many of the roles needed to run a small business. But this is just the beginning.

An essential point is that ChatGPT is built on GPT3.5 which has about 150 million parameters, or pieces of data that have been used to train it. GPT4 is going to have more like 1 trillion parameters, meaning that it will be vastly more knowledgeable and accurate in it’s responses.

This means that a lot of the hand wringing and the hype is just very premature. We don’t yet have a clear picture of where this technology will lead us. Some of the concerns are very real. Jobs are being lost. But then jobs are also being created as well.

And so at this point its probably best to hold off on any grand judgements about this technology until it has had a chance to mature and we have a better sense of what it can be used for and what risks, if any, it represents.