Limitations of Generative AI Tools for Creative Content
Generative AI models are upturning the world of content creation, with notable impacts on marketing, design, content generation, entertainment, software development, and interpersonal communications. They are able to produce blog posts, program code, poetry, artwork, videos, animation and more.
Using the Transformer Model, the software predicts the next word based on previous word sequences, or the next image based on words describing previous images. Companies need to understand how these tools work, how they can add value, and where human intervention is needed.
Generative AI refers to AI that creates content instead of merely analyzing existing data. One prominent example is ChatGPT, a chatbot trained by OpenAI. Other popular examples are Falcon-40B, Google’s Bard, Meta’s Llama-2, Midjourney and Adobe Firefly..
The generative model training process involves feeding large datasets of examples such as images, text, audio, and video. The model then analyzes patterns and relationships within the input data to understand the underlying rules governing the content. It generates new data by sampling from the learned probability distribution. And it continuously adjusts the parameters to maximize the chances of an accurate output.
These models, however, have largely been confined to big tech companies because training them requires massive amounts of data and computing power. GPT-3 was initially trained on 45 terabytes of data and employs 175 billion parameters or coefficients to make its predictions; a single training run for GPT-3 costs $12 million. Wu Dao 2.0, a Chinese model, contains 1.75 trillion parameters. Most companies don’t have the data center capabilities or cloud computing budgets to train their own models of this type.
But once a generative model is trained, it can be ‘fine-tuned’ for a particular content domain with much less data. This has led to specialized models like BERT — for biomedical content (BioBERT), legal content (Legal-BERT), and French text (CamemBERT) — for a variety of specific purposes.
First, let us take a look at the opportunities Gen AI offers:
To use Gen AI effectively, you still need human involvement at both the beginning and the end of the process.
To start with, a human must enter a prompt into a Gen AI tool in order to have it create content. As a rule of thumb, creative prompts yield creative outputs. ‘Prompt engineer’ is likely to become an established profession, at least until the next generation of even smarter AI emerges. The field has already led to an 82-page book of DALL-E 2 image prompts, and a prompt marketplace in which for a fee one can buy other users’ prompts. Most users of these systems will need to try several different prompts before achieving the outcome they desire
Then, once a model generates content, it will need to be evaluated and edited carefully by a human. Alternative prompt outputs may be combined into a single document. Image generation may require substantial manipulation.
Jason Allen, who won the Colorado ‘digitally manipulated photography’ contest with help from Midjourney, told a reporter that he spent more than 80 hours making more than 900 versions of the art, and fine-tuned his prompts over and over. He then improved the outcome with Adobe Photoshop, increased the image quality and sharpness with another AI tool, and printed three pieces on canvas. Phew - that required a lot of work!
An AI hallucination is where an LLM like OpenAI’s GPT4 or Google’s PaLM makes up false information or facts that aren’t based on real data or events. Even though they represent made-up facts, the LLM output presents them with confidence and authority.
Hallucinations are so common that OpenAI actually issues a warning to users within ChatGPT stating that ‘ChatGPT may produce inaccurate information about people, places, or facts.’ A significant improvement on GPT-3.5, GPT-4 still has a hallucination rate of 8.4%.
The challenge for users is to sort through what information is true and what isn’t.
While there are many examples of AI hallucinations emerging all the time, one of the most notable ones occurred as part of a promotional video released by Google in February 2023. Its AI chatbot, Bard, incorrectly claimed that the James Webb Space Telescope took the first image of a planet outside of the solar system.
Similarly, in the launch demo of Microsoft Bing AI in February 2023, Bing analyzed an earnings statement from Gap, providing an incorrect summary of facts and figures.
The most famous example of AI-generated misinformation is the instance of the ‘Balenciaga Pope’ in March 2023, where a striking photo of Pope Francis decked in a fashionable white puffer jacket did the rounds of the internet – it was generated by Midjourney with a simple text prompt. Much like the AI-generated images of Donald Trump getting arrested, or President Joe Biden smoking a cigar, many assumed the image was a real photograph.
These examples illustrate that users can’t afford to trust Gen AI tools to generate truthful responses all of the time.
The risks posed by AI hallucinations go well beyond spreading misinformation. According to Vulcan Cyber’s research team, ChatGPT can generate URLs, references, and code libraries that don’t exist or even recommend potentially malicious software packages to unsuspecting users.
So organizations and professionals experimenting with LLMs and Generative AI must do their due diligence when working with these solutions and double-check the output for accuracy.
Some of the key factors behind AI hallucinations are:
It’s important to write prompts in plain English with as much detail as possible. For it is ultimately our responsibility to implement sufficient programming and guardrails to mitigate the potential for hallucinations.
One of the main dangers of AI hallucination is too much reliance on the accuracy of the AI system’s output.
For example, an algorithm for facial recognition can work in a way that it more easily recognizes men than women because this type of data was more commonly used in training (for example, in the automotive industry, crash tests were long conducted using only dummies modeled on the male body and thus did not adequately take into account the special characteristics of women). Another example involves job applications, where algorithms might reject photos with dark skin color and/or foreign names, even though the professional suitability could be objectively assessed to be equal or better on the basis of the available data.
Below is the image from a research on the accuracy, mismatch and verbosity of GPT-4 and GPT-3.5
AI and LLMs are unlocking some exciting capabilities for enterprises, but it’s important to be mindful of the risks and limitations of these technologies to get the best output. Ultimately, AI solutions provide the most value when they’re used to augment human intelligence rather than attempting to replace it.