(Video: Queries from WildChat. Note: Some conversations displayed in this story have been edited for length.)

What do people really ask chatbots? It’s a lot of sex and homework.

AI chatbots are taking the world by storm. We analyzed thousands of conversations to see what people are really asking them and what topics are most discussed.

9 min

and

August 4, 2024 at 5:05 a.m. EDT

They draft our work emails and help us brainstorm ideas for the great American novel. They field our questions about surprisingly intimate problems and offer us personal advice.

The release of OpenAI’s ChatGPT in late 2022 promised to usher in a new age of artificial intelligence. But until now, we’ve had little insight into how AI chatbots are actually being used in the wild.

So The Washington Post looked at nearly 200,000 English-language conversations from the research dataset WildChat, which includes messages from two AI chatbots built on the same underlying technology as ChatGPT. These conversations make up one of the largest public databases of human-bot interaction in the real world.

Researchers say these conversations are largely representative of how people use chatbots.

“The biggest motivation behind this work was that we can collect real user interactions versus those done in labs,” said Yuntian Deng, a postdoc at the Allen Institute for Artificial Intelligence, where the project was developed. The chatbots are free, and users can have unlimited exchanges with the bots.

The Post’s final analysis included nearly 40,000 conversations with WildChat, focusing on the first prompt submitted each day by each user. Here’s what The Post learned about how thousands of people are using chatbots.

What’s better than a brainstorming partner to banish writer’s block? A fifth of all requests involved asking the bot to help write fan fiction, movie scripts, jokes or poems, or to engage in role-play.

Researchers say AI chatbots are built for brainstorming, which makes use of the technology’s word-association skills and doesn’t require a strict adherence to facts. The Post found people used chatbots to help name businesses, create book characters and write dialogue.

“I don’t think I’ve ever seen a piece of technology that has this many use cases,” said Simon Willison, a programmer and independent researcher.

Some of the most imaginative stories come when users push the system with additional questions instead of taking its first response, he said. For example, he said, he has heard of people using it to help build up Dungeons & Dragons characters and plotlines — a use case that occurs a few dozen times in The Post’s analysis of WildChat.

Many bots have limited sexually explicit content, but that doesn’t stop people from trying to get around the rules. More than 7 percent of conversations are about sex, including people asking for racy role-play or spicy images.

During the pandemic, people swarmed AI chatbots that act as companions, such as Replika. And some people use ordinary chatbots for emotional connection and sexy talk. But it’s risky to get emotionally attached to software, experts say: The companies can make tweaks that change the bot’s “personality.” And some users have reported that the bots can turn aggressive.

Many users tried to get WildChat’s bots to engage in sexual role-play by experimenting with “jailbreaks,” or prompts devised to trick the system. The Allen Institute for Artificial Intelligence’s paper announcing the WildChat dataset found that jailbreaks were successful at evading the guardrails about half of the time.

WildChat does not require users to make an account to access its bots. Users may have felt that WildChat was more anonymous than something such as ChatGPT, said Niloofar Mireshghallah, a postdoc in computer science at the University of Washington who analyzed conversations in WildChat. This could have made people more comfortable trying to elicit sexually explicit material.

More than 1 in 6 conversations seemed to be students seeking help with homework. Some approached the bots like a tutor, hoping to get a better understanding of a subject area.

Others just went all-in and copy-and-pasted multiple-choice questions from online courseware software and demanded the right answers. The bots usually obliged.

Chatbots are often trained on publicly available data — which can include online articles, textbooks or historical writings. This makes them attractive options for students looking to summarize historical texts and answer geography questions. But this practice comes with risks. Chatbots don’t actually understand what they are saying; they are just mimicking human speech. And they have been known to hallucinate and invent information.

Educators have struggled to deal with the sudden influx of AI-based learning. Some universities use AI-text detectors to try to catch some of the generated information in students’ work, but the systems are imperfect and sometimes flag innocent students.

About 5 percent of conversations were people asking personal questions — such as for advice on flirting or what to do when a friend’s partner is cheating.

Humans are very susceptible to text, Willison said. If someone (or something) is able to write well, we see that person (or thing) as intelligent, he said. But chatbots have been known to spit out wrong or offensive information, and experts warn that the chatbots should not be treated as if they were truth machines.

It all comes down to how the users interpret the results, said Ethan Mollick, a Wharton associate professor who studies AI and business. Do users see AI as just one more place to get feedback after consulting friends and professionals? Or do they see it as a primary source of wisdom?

“As a cheap source of second opinions, it’s incredible,” he said.

People also felt comfortable dumping a great deal of personal information into their conversations with the chatbots. Mireshghallah, who examined 5,000 conversations in WildChat, found users’ full names, employer names and other personal information. Humans are easily lulled into trusting chatbots, she said.

Privacy experts have warned people against being too open with chatbots, especially because the companies developing the bots are usually saving your chats and using them to train their technology.

A huge portion of WildChat’s conversations involved computer coding. About 7 percent of conversations requested help writing, debugging or understanding computer code. An additional 1 percent were classified as homework help but involved questions about coding assignments.

WildChat users may be more tech-savvy than a general audience because the bots are hosted on the AI forum Hugging Face, which is popular with tech workers and researchers. Regardless, chatbots are particularly good at parsing and communicating about computer code, researchers say, because programming language adheres to strict and predictable rules.

Chatbots have become common companions to computer engineers, who use them to check work or do rote tasks, Willison said.

This utility has raised questions about the future of coding jobs — especially for entry-level programmers. But there isn’t strong evidence to suggest chatbots will replace coding jobs, said Hatim Rahman, an assistant professor at Northwestern University’s Kellogg School of Management who studies AI’s impact on work.

Instead, he said, it has made coding more accessible to those without computer science backgrounds. He compared it to TurboTax and other tax-preparation programs.

“Now everyone can use it to fill out a basic tax return. But accountants haven’t disappeared,” he said — they just focus on more highly skilled work.

About 15 percent of conversations seemed to be about work — including writing presentations, automating e-commerce tasks or drafting an email to nudge an employee to provide a doctor’s note about a sick child.

Last year, The Post found that using the technology to replace some common tasks such as sending messages or completing self-assessments was a helpful starting point but required a lot of human intervention to fix errors.

Some employers are embracing chatbots and even replacing human workers. Other industries remain hesitant about the emerging technology. Last year, a lawyer was fired after he used ChatGPT to draft a motion for a lawsuit: The bot had made up several legal citations.

In addition to those seeking an on-the-job assistant, 2 percent of conversations sought help finding a job, asking for help writing a résumé or cover letter, or preparing for a job interview.

It makes sense people would seek to automate these often-tedious processes. But Rahman warned that using these tools for job applications could prevent candidates from standing out, especially as the use becomes more common. “You could actually end up creating materials that are very similar to others,” he said.

WildChat’s bots can’t draw a picture for you, unlike some other AI bots that specialize in image generation. Still, some users asked it to create an image for them. (The text generator declined.)

WildChat’s bots did help users communicate with one of those AI image generators — about 6 percent of conversations requested help creating prompts for Midjourney, an AI image generator. The noun that users most commonly asked to be depicted was “girl.”

Image-generator bots, including Midjourney, Stable Diffusion and DALL-E, enable people to create semi-realistic images of pretty much anything their heart desires. The better the prompt, the more precise the image. Guides for prompting have popped up online.

Although creative, such image-generation bots can also be controversial. They sometimes spit out biased or stereotypical images and have disrupted the art industry as artists grapple with how much to use or ignore the generators.

About 13 percent of prompts included the word “please.” Experts expect people to get more confident “talking” to chatbots as time goes on, just as people learned the best ways to interact with search engines. In The Post’s analysis, most people used WildChat’s bots only once.

But a few superusers talked to the bots nearly daily. One user had 13,213 conversations over 201 days. Another had 5,960 conversations over 350 days — nearly every day that WildChat was active.

And not everyone was as courteous. In a few instances, people responded with a well-known expletive or by deploying slurs commonly used against Black people, gay people or disabled people.

For now, people are still figuring out when to trust or disregard chatbots’ results.

“There’s no instruction manual out there,” said Wharton’s Mollick. “As a result, you’re watching people explore in real time how to use this.”

About this story
Each of the conversations featured here is part of a massive database of real human-chatbot interactions released by the Allen Institute for Artificial Intelligence. Editing by Karly Domb Sadof, Meghan Hoyer and Alexis Fitts. Copy editing by Carey L. Biron.
Methodology
The Allen Institute for Artificial Intelligence got users’ permission to record all their interactions with their WildChat chatbots, and it released a database of roughly 1 million conversation transcriptions to the public- this year. The Post analyzed the database as of May 3.
The Post’s analysis excluded chatbot interactions that came from outside the United States, based on the Allen Institute’s categorization of IP address geolocations. It also filtered out conversations conducted in languages other than English, either by the Allen Institute’s categorization or Midjourney image-generator prompt requests that included a Chinese-language description embedded in English-language boilerplate. The Post also excluded a subset of possibly automated prompts asking the bots to “repeat this phrase” that occurred on a half-hourly basis.
Because more than half of the U.S. English conversations in the dataset came from fewer than 100 IP addresses, The Post’s analysis included only the first prompt per day per IP address. The final analysis used 39,000 conversations involving 16,000 distinct IP addresses. Most of the data set The Post analyzed was built on the GPT 3.5 Turbo API, while some used the more sophisticated GPT 4.
The Post’s category breakdown was based on a random sample of 458 such conversations, categorized manually by a Post reporter. The margin of sampling error is about 5 percent.
Conversations were coded as related to politics and sex based on keywords.