If you look at some of the headlines from media sites like CNN, ChatGPT and related technologies can seemingly do just about everything:
But anyone who has played around with this technology even a little understands that it’s not quite that simple.
And it’s not that simple largely because of 2 things: hallucinations and limited data sets.
What Are Hallucinations in GPTs?
ChatGPT talks about hallucinations in GPTs, saying they “refer to generated text that is not consistent with the input or the context of the task at hand … In some cases, hallucinations can be caused by the GPT’s ability to remember and reuse information from the training data, even if that information is not relevant to the current task.”
AIs like ChatGPT, Bing AI, and Google Bard can all give out confident answers about a person or topic that are completely wrong or “invented.” In the context of AI, these are often called hallucinations.
For instance, if you ask ChatGPT to talk about details of people who don’t qualify as celebrities or topics where its data set is lacking, it’s likely to generate a response that is invented, while not adding the caveat that it’s not confident about its answer.
It’ll do so without understanding whether what it is saying is completely accurate, fully invented, or outright wrong.
What Are the Implications of Limited Data Sets in GPTs?
Large language models can describe certain parts of the world, but you can quickly run into limitations:
ChatGPT answers the question “What year is it?” correctly. However, when asked who the Vice President of the US is in 2023, it responds with “I’m sorry, but as an artificial intelligence language model, I cannot predict the future, and my training only goes up until 2021. Therefore, I cannot provide information about who the Vice President of the United States may be in 2023 or any future year beyond my knowledge cutoff date.”
For ChatGPT, this means getting data sets in bulk, with no active crawls.
This is important context when you’re using it to help seed ideas for content generation. It is critical to fact-check information that comes from a large language model, especially if it’s about a field that you are not an expert on.