Generative AI, or artificial intelligence that generates new content like text, images, and music, has caused quite a stir lately. These algorithms are trained on huge datasets and can learn the patterns and characteristics of that data. This allows them to create new content that's similar in style or content to the training data.
So what's all the recent fuss about? Well, generative AI has the potential to transform a bunch of different industries and applications. It could be used to automatically generate content for social media, create personalized advertisements, or assist with the design process for products or buildings. Plus, generative AI might even be able to enhance creative projects by providing inspiration or taking care of some of the tedious work. And, let's be real, the idea of machines creating stuff that's just as good as (or better than) human-generated content is pretty mind-blowing and raises some interesting ethical and societal questions.
We experiment with many new generative models like OpenAI's ChatGPT as they become available, and recommend you do the same. Another popular product for generating images is MidJourney, which uses Discord as a user interface for submitting carefully crafted text prompts and receiving and refining the resulting images.
This post explores some of what we've observed about the user experience of recent examples, and how they might be utilized effectively within software products.
Having fun is a lot easier than getting a useful result
The process itself can also sometimes be frustrating and tiring for users as they try to get the desired results. Generative AI algorithms are not perfect and may produce low-quality or irrelevant content, which can be frustrating for users who are trying to accomplish a specific task. In addition, the process of writing prompts and refining them to get the desired results can be time-consuming and may require multiple iterations. Users may also experience fatigue when working with generative AI, as it can be mentally and emotionally demanding to constantly analyze and evaluate the generated content. The process of using generative AI can also be complex and may require a certain level of technical expertise, which can be intimidating or overwhelming for some users. Overall, the process of using generative AI can be challenging and may require a significant investment of time and effort to achieve the desired results.
If you look closely, quality of results vary a lot
Sometimes generative AI algorithms can produce images with strange details like a person typing on a keyboard with no corresponding screen, or the wrong number of fingers on a hand (look closely at the image at the top of this post, generated using MidJourney). This is because they don't always have a super deep understanding of the context or meaning of the generated content.
Generative models are trained on lots of data and can learn patterns and characteristics, but they don't have the same understanding of the world that humans do. So if you give them a prompt to create an image of a person, they might not know that most people have five fingers on each hand, and they might generate an image with six or four fingers instead. It's not that they're trying to be weird or anything, it's just that they don't have the same understanding of the world that we do.
The repetition problem affects more than just fingers. I generated most of the preceding paragraph (and the others before it) by asking ChatGPT about strangeness in generated images, but you'll notice many similarities in its answer to a similar question below.
This sort of redundancy is a common feature of the content ChatGPT generates, and becomes more frequent and especially obvious in longer content. Not all redundancy is bad, but too much can seem awkward, and can be a dead giveaway that content isn't human-generated.
Disillusionment happens before productivity
Market analysis firm Gartner says their hype cycle for emerging technologies, "gives you a view of how a technology or application will evolve over time, providing a sound source of insight to manage its deployment within the context of your specific business goals."
The recent frenzy of attention on ChatGPT likely indicates that expectations for generative AI are currently peaking. As excited early users realize the limits of the current user interfaces, hype around the technology will dwindle and descend into the "trough of disillusionment". Technologists will spend the period that follows iterating on and refining the models and accompanying user experiences. In modeling their 2022 Hype Cycle for Artificial Intelligence, Gartner estimates that generative AI is between 2-5 years from reaching mainstream productivity. The speed with which companies incorporate generative AI over the next year will determine who wins the race to market.
Focus on workflow and UX, and unlock AI's potential
Early adopters are enamored with a technology's newness and potential, but that magic wears off quickly, and fails to satisfy large enterprises and busy professionals. Dave Rogenmoser, founder and CEO of Jasper.AI, recently summed it up well:
What AI Twitter forgets is that 99.9999% of the world doesn't know what a GPT-4 is, never will, and doesn't care.— Dave Rogenmoser ⛳️🤖🦄 (@DaveRogenmoser) December 1, 2022
They want to get a promotion.
Or get home to their family early.
Or find someone to marry.
Use AI to help become the hero in their life and they will love you.
Reliable, consistent results are the hallmark of quality software, a reality seemingly incompatible with the constant surprises output by popular text-driven Generative AI products. The process doesn't have to be exhausting though. Many of the tiresome tricks for crafting text prompts can instead be codified into logic exposed in more traditional user interfaces. The models themselves will continue to improve also. The time has come to begin R&D in earnest, however, if you haven't already.
When evaluating existing AI products, note the difference between an app, a standalone pre-trained model, and PaaS models. Many apps don't expose their proprietary functionality in ways that can be incorporated into third-party products, so replicating their results may require you to start from scratch. Many existing pre-trained models may be available under open source licenses, but will require substantial supporting infrastructure to be developed. In some cases, APIs are offered by the developers of an app or pre-trained model, which can accelerate its integration. This is especially true of PaaS products like Microsoft's Azure Cognitive Services, which offers a range of infrastructure-included models tuned specifically for integration into third-party products.
You can always talk to us about any questions you may have. Let us help accelerate your plans to launch an AI enabled product or leverage AI within existing products.