Written By Cole Ingraham
Edited By Jess Feldman
Course Report strives to create the most trust-worthy content about coding bootcamps. Read more about Course Report’s Editorial Policy and How We Make Money.
Course Report strives to create the most trust-worthy content about coding bootcamps. Read more about Course Report’s Editorial Policy and How We Make Money.
Large Language Models (LLMs) are the cool new kid on the block. Ever since ChatGPT took the world by storm in November 2022, you cannot even walk down the street without overhearing mentions of it – some positive and some negative. There is no question that we are living in a changed world. However, in order to better understand the LLMs of today, it is important to understand where they came from.
In this guide, we’ll trace the evolution of LLMs to before their modern ancestors, looking at what they are, what they are not, and some of how they can be used. LLMs, especially the instruction-tuned variety, are impressive tools that many people thought were still decades away, but they are also not the end of the story. As long as you understand what they are, how to use them properly, and keep an eye on the output, they can be great tools for making some tasks that were once nearly impossible become almost trivial. (If you're ready to go further with LLMs, check out the Data Science Bootcamp with Machine Learning at NYC Data Science Academy!)Â
There is a long history of trying to understand language mechanically. Noam Chomsky's work on universal grammars (which led to tools such as regular expressions, pushdown automata, and others) allow us to detect the presence of structure in text beyond simple searching for exact matches to some query. This work forms the backbone of parsers, which, among other things, allows for programming languages like C, Python, and others that are more suited for people to work in to be translated into machine code for execution. In general, this research allows for understanding the structure of text.
Beyond purely structural methods, natural language processing (NLP) generally seeks to allow computers to detect and understand the semantics of language. For example:
These methods are focused on autonomous understanding of language.
Language models are a subset of NLP that aims to learn a probabilistic model of how language works. These were originally relatively simple statistical models based on n-grams, which assume that the probability of the next word is based purely on the preceding words. This is effectively an nth-order Markov chain — As long as you have at least "n" words, you can make an educated guess as to what the next word will be based on the statistics in a text corpus used for training the model.
Even with such a crude approach it is possible for simple language models to be useful. For example, the autocomplete on your phone has traditionally been powered by n-grams models. Also spell checking can use these models to flag low probability words.Â
For language models, there are basically two types: autoregressive and masked.Â
The appeal of autoregressive language models is that they can make more text given a short input. This however means that the imperfections in the choices of the next token (prediction errors) accumulate over time. The more tokens the model has produced, the more they impact future predictions, potentially causing the output to become incoherent, or worse, incorrect (referred to as hallucination).Â
Masked language models on the other hand are typically used for tasks such as making text searchable by meaning, called semantic search, due to their ability to encode information about long passages that have a notion of the similarity between their content.
The modern language models we have all become familiar with are all made with artificial neural networks, due to breakthroughs in deep learning that allow for training large models on large amounts of data, which translates to improved quality and accuracy. However, the larger the model gets, the more data required to train it. Most traditional approaches relied purely on supervised learning directly on the task in question. For example, if you want to perform machine translation from English to French, you would need a dataset consisting of pairs of the same sentence in both languages, and of sufficient quantity to allow the model to generalize well. How many examples are sufficient increases with the size of the model.
The scarcity of labeled data is not limited to natural language problems. Many domains have been working on ways to train larger models on smaller supervised datasets by various means. In computer vision, the primary method is image augmentation, where multiple variations of a single image are created using a set of transformations that are simple but preserve the overall integrity of the image, such as slight rotations, randomly cropping from various locations, subtly adjusting the color, and similar.
With language these same augmentations make no sense. However, a plentiful source of unlabeled data is any text you can find, for example the internet. What can you do with unlabeled text? Language modeling. Although the task of language modeling may not be what we ultimately are trying to solve, the expectation would be that a model with sufficient capacity that is trained on enough general language should learn enough about the underlying characteristics of language to allow it to more quickly adapt to more specific tasks later, and with less labeled data.
The go-to neural network for NLP had up until then been the recurrent neural network (RNN), which worked well but had one big problem: it was slow to train. While there is nothing theoretically wrong with this, it does take a significant amount of time and means that it is difficult to fully take advantage of accelerators such as GPUs.
In 2017, Transformer architecture was introduced in the "Attention Is All You Need" paper, enabling models to transform inputs into outputs by learning to "pay attention" to specific inputs. In 2018, Google released BERT and OpenAI introduced GPT, both leveraging Transformers for language understanding. GPT-1 was trained on fiction books. Scaling up data, compute, and model size was found to improve performance, leading to the release of GPT-2 in 2019. GPT-3, released in 2020, had 175 billion parameters and demonstrated few-shot learning, allowing it to perform tasks with minimal examples. This era also saw the emergence of "prompting" to interact with models effectively.
The name that everyone at this point has likely heard of, ChatGPT, came out in 2022. Suddenly there was a model that you could ask to perform various tasks using natural language. What is notable about this is that it is really still GPT-3 but with some modifications to make the now ubiquitous chat style interaction work. Shortly after in 2023, OpenAI released GPT-4, which is generally a more capable version of ChatGPT. Not much is known about the exact scale of GPT-4, as OpenAI has decided to be more protective of their position competitively than being open about their research.
If most people needed to name one thing that makes using language models like ChatGPT useful, they would probably talk about the ability to converse with them using natural language. At first this seems like something that a language model should be able to do fairly well, but it turns out that, even with sufficient scale, pure language modeling is not sufficient for that behavior to occur on its own. The language modeling task is great at uncovering the structure of language, but not all examples of language are conversational.
Now that we have AI that can respond to natural language questions and requests, what can we do with it? Most people have heard of if not used ChatGPT, Bing Chat, Google Bard, Claude, or any of the other chat assistant-style uses of LLMs out there. Whether you are interacting with them from a user interface or through an API, there are many tasks you can use these models for, and various ways to augment them.
Any instruction-tuned LLM should be good at anything that the original model was, which includes things like in-context learning. This means that tasks like summarizing a document or answering questions based on some information you provide work well. Asking about anything that existed in the training data should also be reasonable, although there are limitations and caveats.Â
What happens when you have questions about a very large document that the model was not trained on? Currently, all LLMs have a maximum context length, which defines the total number of tokens they can process, and is the sum of the input and output. There are a couple methods for dealing with this:
👩‍💻 NYC Data Science Academy students learn about and use LLMs during the bootcamp! NYC Data Science Academy students have utilized their knowledge of LLM and founded a company called intrinsica.ai. Find more exciting user cases from these students!
Both Open Source and Closed Models require capable hardware for the model size, and clustering may be necessary for extremely large models. This table provides a comprehensive comparison of Open Source Models and Closed Models, considering cost, performance, transparency, and control over updates.
When developing a LLM, there are a number of practical things to consider:
 |
Open Source Models |
Closed Models |
Cost and Accessibility |
Some open source models and research are accessible to the broader community, promoting transparency and collaboration. |
Closed models and research may be kept private to gain a competitive edge, limiting accessibility. |
Performance |
Open source models are catching up in performance but may not surpass some closed models like GPT-4. |
Closed models are currently the best-performing options available. |
Transparency and Data Privacy |
Open source models provide full access to model details, training data, and deployment options, enhancing transparency and addressing data privacy concerns. |
Closed models offer limited information about their training data and data usage, raising data privacy concerns. |
Control and Updates |
Users can deploy and maintain open source models themselves, ensuring control and the ability to address potential issues. |
Parent companies of closed models can make changes in offerings and terms without user input, potentially causing disruptions. Closed models routinely update the underlying model, potentially affecting existing applications without prior notice. |
LLMs have gotten a ton of attention and hype, but it is important to understand what they are not good at. One of the main reasons why LLM adoption has been so staggering has been their generality. The fact that you can interact with them in your own language, ask them questions across almost any domain, and request that the response be provided in a particular format makes them very easy and practical to use. If the only tool you have is an LLM, then every problem looks like prompt engineering. They can answer questions directly, be used to translate natural language into structured data, extract and synthesize information, all with the same intuitive interface.
But LLMs are not perfect — even Yann LeCun, the creator of convolutional neural networks, has criticized autoregressive language models! Here are 3 notable drawbacks of LLMs:
All advancements come with their share of fears and criticisms, and LLMs are no exception. These generally come in three flavors. Asking "will it kill us?", "what did you train on?" and "what do you do with my data?"
Earlier this year, the "Godfather of AI" Geoffrey Hinton left Google due to his concerns about the dangers of models such as LLMs as they become increasingly powerful. Much like LeCun, when the Godfather of AI says something, it is worth listening to, but I would caution against immediately jumping to Terminator, The Matrix, or other SciFi dramatizations. AI Safety has been a discipline for much longer than LLMs or even Deep Learning. You also have companies like OpenAI, Anthropic, and others putting up "guardrails" to make sure their LLMs behave "in an appropriate manner." Why a company that will not disclose any specific details about how their model works or was trained should be in charge of deciding what the "right" way to use it should be is a different discussion.
Where and how training data is gathered is another hot topic. Universal Music sued Anthropic over Claude being able to produce the lyrics to songs without rights (although you can find them using Google just as easily). Reddit dramatically increased their API prices to prevent LLMs makers from profiting off of their data for free. Twitter (or X now I guess) did the same. Pretty much everywhere you turn there is someone trying to recoup a piece of the pie that products like ChatGPT have created off the backs of their data (that was likely acquired illegally).
For users who are working with sensitive information, an important question is "what do you do with the prompts I send?" OpenAI says they only retain chats for use as your history, and any that you delete will be removed within 30 days after doing so. However, many people in fields such as medical, legal, and financial are hesitant to send anything too private to ChatGPT because we really do not know for sure that they do not train on it. They say they do not, but you never really know.
Dr. Cole Ingraham is a musician and composer turned software engineer and data scientist who currently teaches at NYC Data Science Academy as Lead AI Instructor.
7 Tips for Updating Your UX Design Resume for AI Roles!
These are 3 AI tools you want to know before your first tech interview!
A TripleTen career coach answers what to do in the first 90 days after bootcamp graduation!
Learn how to launch a career as a technical writer!
Find out the fundamentals of cloud engineering and how to launch a career in the Cloud!
Follow our tips to help you choose between these two, in-demand tech careers!
Hack Reactor's Zubair Desai shares how bootcampers should (and shouldn't!) use GenAI...
Lighthouse Labs walks us through cybersecurity jobs across 6 different industries!
Why You Should Learn CSS If You’re Not a Software Engineer
A Fullstack Academy instructors shares how AI is used in Data Analytics!
Sign up for our newsletter and receive our free guide to paying for a bootcamp.
Just tell us who you are and what you’re searching for, we’ll handle the rest.
Match Me