What makes it hard to study the capabilities of GPT-4?

6 min read

There are three main sources of information about GPT-4’s capabilities. First of all, there is the technical report^[1] released by OpenAI itself presenting the model. Secondly, we have reports from people who have tested GPT-4 in order to discover the limits of its capabilities. Finally, some people have used it for their work or as part of their software workflows.

These different sources of information present problems when it comes to studying GPT-4’s capabilities. The report made by OpenAI comes from a party with a direct conflict of interest; on the other hand, experiments performed by others run into a variety of limitations.

OpenAI has not openly released GPT-4. The public can interact with the model running on OpenAI's servers through two mechanisms: an API and a chatbot called ChatGPT. The API can be accessed directly over the web, or through a “playground” website. Both are open on a pay-per-use model, with a variety of prices and fine-tuned models. As of July 2023, GPT-4-based fine-tuned models can only be accessed through the premium versions of ChatGPT and the API. ChatGPT is, by far, the more popular of the two - the vast majority of interactions with GPT-4 are constrained by the limitations of the chatbot, and aren’t representative of GPT-4’s capabilities as a whole.

The limitations imposed by ChatGPT come in two forms: inherent, and deliberately imposed.

Inherent limitations come from the medium itself: ChatGPT “converses” with the user by responding to user prompts one after another. The entire previous conversation (both the user's inputs and the chatbot responses) are entered into the context window of the underlying GPT-4 model when processing the next response. This puts a significant limitation on the ways you can test GPT-4. For example, imagine that the following “conversation” had happened:

For various reasons, outputs of GPT-4 are not deterministic: the same set of inputs will produce a variety of different answers when put into the model multiple times. Suppose that we wanted to test what kind of response GPT-4 would produce in response to Input 2 assuming that it responded to Input 1 with Response A2, like so:

This sort of test would be possible with the underlying GPT-4 model, but is not possible to do through the chatbot interface, since we cannot change the output ChatGPT gave in response to a previous input. The only way to attempt this test is to repeatedly give ChatGPT the same Input 1 and hope that it produces Response A2 by luck. This significantly limits our ability to understand the capabilities of the underlying model. This sort of test is possible through the API, but the API is more expensive, and does not allow testing the pre-prompting utilized by ChatGPT (more on this below).

The other set of limitations comes directly from OpenAI. In order to make GPT-4 respond as a chat bot, OpenAI inserts a pre-prompt into the context window. The exact form of this pre-prompt is kept secret by OpenAI, but as an example, we can imagine it looking like this:

Example of a possible pre-prompt by OpenAI.

The user's input is then added after this pre-prompt, and the resulting text is sent into the tokenizer. Over time, OpenAI has changed the form of this pre-prompt in order to limit the output of ChatGPT - for example, advising the chatbot not to answer questions about illegal topics like manufacture of weaponry, trying to combat racism and so on. The exact form of this prompt is kept secret in order to mitigate ongoing attempts to develop “jailbreak” prompts that result in ChatGPT once again responding with previously prohibited outputs. These “jailbreak” prompts take a variety of forms, with one of the most famous being the “grandma jailbreak”^[2]:

Example of a ChatGPT jailbreak.

In addition to pre-prompting, OpenAI has fine-tuned the model with human feedback, and may or may not be filtering the output of the model using other tools before it is sent to the user, though the presence or absence of this filtering is hard to confirm. If they do filter output, this would apply equally to the API and ChatGPT outputs.

Finally, neither the API nor ChatGPT allow the input to contain images, a capability which so far has only been tested by OpenAI and its evaluation partner.

Keeping in mind the limitations of information available to us is crucial when evaluating the capabilities of GPT-4. OpenAI, as the company that both develops and profits from the use of GPT-4, has a direct financial stake in its capabilities. On the other hand, third parties can only evaluate GPT-4 through interfaces with indirect supervision by OpenAI, which has explicitly limited what kinds of capabilities it exposes to the public. LLMs in general and GPT-4 in particular have demonstrated capabilities that had not been expected by their developers and that they were not explicitly trained on; furthermore, the list of capabilities has expanded after the initial release, as the users of GPT-4 discovered more tricks for working with the model. Therefore, any list of capabilities of GPT-4 should be seen as a lower bound on the total set of capabilities it might possess.

What is "jailbreaking" a large language model (LLM)?