Intro to ChatGPT
AI | Nathan Chappell

Intro to ChatGPT

Thursday, May 23, 2024 • 6 min read
An introduction to a powerful tool.

Introduction to ChatGPT

In recent years, a new tool has emerged that's changing the way NLP (Natural Language Processing) practitioners work. It's called ChatGPT, and it's making NLP more accessible, affordable, and versatile than ever before.

Natural language processing (NLP) has always been a challenging paradigm, and historically led to a proliferation of special purpose tools designed to accomplish specific tasks. While effective and pragmatic, you just can’t help but pointing out that a human doesn’t learn language that way. People don’t get really good a finding answers to a question from a passage of text, while at the same time are unable to understand or come up with an appropriate response to basic questions. Most people can’t identify parts of speech with 95% accuracy, while not knowing what any of the words mean.

The move to generative models and text generation

There were two recent discoveries in NLP at the source of the major breakthroughs we’ve seen recently. One was the discovery that a big model (i.e. a Large Language Model - LLM) that sort of just knew the language pretty well could start to outperform the specialized models at their specific tasks. The other was the realization that text generation was a generalization of all other NLP tasks!

Technical Aside - Generative vs Discriminative Models

This aside is somewhat technical.

What do we mean by knows the language? Pretty much Language Models will learn to predict the next word given the preceding words. However, they don’t necessarily learn to predict the entire language! The LLMs, however, are given a large corpus of text, and they learn to reproduce the dataset. This is a feature of a generative model. The idea is that by the time we’ve reached 100s of GBs of text, we have a representative sample of natural language and learning to reproduce this is essentially learning the entire language. Technically speaking, a generative model learns a joint probability distribution, while a discriminative model learns a conditional distribution. It’s important to note that generative model and a text generating model are not the same thing - one refers to the type of learning that is occuring, while the other refers to a task the model performs. In particular, discriminative models could be used to generate text, and generative models can be used for tasks other than generation.

A classic example of a discriminative model in NLP is a spam classifier. Given an email, such a model will tell me the probability that it is spam. However, this model does not know the probability of receiving spam. A generative model would be able to tell me both pieces of information.

What do we mean by knows the language? Essentially, Language Models learn to predict the next word given the preceding words. However, they don’t necessarily learn to predict the entire language! LLMs, however, are given a large corpus of text, and they learn to reproduce the dataset. This is a feature of a generative model. The idea is that by the time we’ve reached hundreds of gigabytes of text, we have a representative sample of natural language, and learning to reproduce this is essentially learning the entire language.

Technically speaking, a generative model learns a joint probability distribution, while a discriminative model learns a conditional distribution. It’s important to note that generative model and text generating model are not the same thing—one refers to the type of learning that is occurring, while the other refers to a task the model performs. In particular, discriminative models can be used to generate text, and generative models can be used for tasks other than text generation.

A classic example of a discriminative model in NLP is a spam classifier. Given an email, such a model will tell me the probability that it is spam. However, this model does not know the probability of receiving spam. A generative model would be able to tell me both pieces of information.

Text Generation as the most general NLP task

A rather simple but crucial idea that has accelerated the adoption of LLMs is the realization that any NLP task could be reformulated as “continuation” of some textual prompt.

To demostrate this, we will convert the Extractive Question Answering (QA) task into a text generation task.

Extractive Question Answering (QA):
input: a paragraph, and a question
output: the answer based on the paragraph

We can convert this into a text-generation task by crafting a prompt as follows:

Given the following information:
{information}

The best answer to the question {question} is:

And then the model will generate the most appropriate text that should follow. If we assume that the model really has learned to generate text just like humans do, then we should expect it to output an answer using the information in the input paragraph.

Now suppose such a model exists with a public API that is reasonably priced! Let’s consider some implications of such a widely available and very powerful new tool.

Making NLP Accessible to Programmers

Assuming that working with the model really isn’t that much more difficult than what I’ve indicated, such a tool will be accessible to anyone with a working knowledge of the English language. Honestly it’s not too far off.

In our experience, working with publicly available models for simple tasks is about this easy. In fact, we were able to use gpt-3.5-turbo to create a classifier in a day that outperformed the work of a few months of ML engineers.

But while initial success in demonstration with these new tools can be impressive, it's important to remember that building a fully functional program involves more than just generating accurate outputs. It requires a significant amount of technical work including data preprocessing, model fine-tuning, performance optimization, deployment, and ongoing maintenance. Additionally, understanding the strengths and limitations of the underlying model is crucial for building robust applications. So, while quick wins are possible, creating value-providing programs still requires some understanding of both the tool and the problem domain.

Lowering the Barriers for Businesses

In the past, engaging in AI tasks (formerly known as machine-learning tasks) was challenging due to the need for expertise, data collection, and the risk of uncertain outcomes. However, leveraging large language model (LLM) services provided by companies like OpenAI mitigates these challenges. These services offer robust capabilities without requiring extensive data gathering or specialized expertise, thereby reducing the barriers for businesses to explore and deploy AI-driven solutions. This democratization of AI fosters new opportunities for businesses to drive growth, efficiency, and competitiveness.

The Power of the Language Models

One important aspect of these language models is that they learn to model their output (i.e. a natural language) by doing complex pattern matching on whatever they are exposed to during training. This means that various statistical correlations from the sample data will still exist in the language model. In particular, if it so happens that it is exposed to primarily factual language during training, then it stands to reason, and has been observed, that it will tend to generate factual output (it’s worth noting that the depth of this “understanding” is one thing that tends to distinguish the more powerful models from the less).

This "real-world know-how" becomes apparent when the model is applied to natural language processing (NLP) tasks. Since it has absorbed a vast amount of textual data during training, it's equipped with a broad understanding of different subjects, contexts, and domains. As a result, when practitioners use these models for NLP tasks like text generation, summarization, or question answering, they benefit from the accumulated knowledge embedded within the model.

This aspect of language models is particularly beneficial for new practitioners in ways they might not initially realize. Many tasks that seem "obvious" in natural language become rather difficult to specify programmatically. However, by leveraging language models trained on vast amounts of text, practitioners can tap into this implicit understanding to tackle complex NLP tasks more effectively. These models can discern contextual cues, identify key information, and generate coherent summaries or responses, mimicking human-like comprehension. For new practitioners, this means they can focus less on intricate programming details and more on understanding their problem domain and crafting suitable prompts or queries.

Conclusion

In this article, I was hoping to explain what the “big deal” is on a conceptual level. In the next post, we will look into how we can put ChatGPT to work as a virtual assistant.

In conclusion, ChatGPT and other large language models (LLMs) have ushered in a new era of natural language processing (NLP), making it more accessible, affordable, and versatile than ever before. By leveraging the power of generative models, these tools simplify complex NLP tasks and democratize AI technology for programmers and businesses alike. This transformative capability empowers users to achieve impressive results quickly and efficiently, unlocking new opportunities for innovation and growth. As we continue to explore and refine these technologies, the potential for groundbreaking advancements in NLP and beyond is vast, offering a thrilling glimpse into the future of AI-driven possibilities.