
Understanding human-AI communication through syntactic analysis of user prompts with MidJourney AI.
This project aims to discover how humans adapt their communication with AI through the lenses of Communication Accommodation Theory and Register Alignment Theory (both concepts found in human-human communication) by analyzing user prompts fed into MidJourney AI, a text-to-image model (TTI), during a 24-hour hackathon conducted at the iSchool.
Generative AI has become an increasingly popular topic particularly in the human-computer interactions (HCI) field within the last decade or so. However, much remains to be seen on whether human-AI communication can be distinguished from human-human or human-animal communication. This study aims to fill this gap in knowledge by conducting a longitudinal analysis of how humans adapt their communication with a text-to-image AI (TTI) over an extended period of time.
1# Reads "iDare01_Team1.csv", preprocesses punctuation, tokenizes words, uses POS tagging to assign
2# words as their grammatical function, implements a pandas dataframe, uses a function to define percent change
3# Created by Oanh Nguyen as part of the 2024 REU Program at Syracuse University.
4
5import nltk # imports nltk library
6import pandas as pd # imports pandas library as pd
7import matplotlib.pyplot as plt # importing matplotlib
8from nltk.tag import pos_tag # imports pos tag from nltk
9from nltk.tokenize import word_tokenize # imports word tokenizer from nltk
10from collections import Counter # imports counter function
11
12nltk.download('punkt') # downloads punkt package from ltk
13nltk.download('averaged_perceptron_tagger') # downloads perceptron tagger from nltk
14
15df = pd.read_csv("iDare_Team06.csv", usecols=['Content_Cleaned','Time','Date']) # reads csv file and uses content_cleaned, time, and date column
16
17df.drop([3,6,7], inplace = True) # drops user id, mentions, and link columns
18
19df['Content_Cleaned'] = df['Content_Cleaned'].astype(str) # converts content cleaned column to a string
20
21tok_and_tag = lambda x: pos_tag(word_tokenize(x)) # defines function that tokenizes comments and pos tags them
22
23df['Content_Cleaned'] = df['Content_Cleaned'].apply(str.lower) # makes content all lowercase
24df['tagged_sent'] = df['Content_Cleaned'].apply(tok_and_tag) # applies function tok_and_tag
25
26# print(df['tagged_sent'])
27
28df['pos_counts'] = df['tagged_sent'].apply(lambda x: Counter([tag for nltk.word, tag in x])) # counts number of given pos for given row
29pos_df = pd.DataFrame(df['pos_counts'].tolist()).fillna(0).astype(int) # fills number of counted pos
30pos_df.index = df.index # sets index of pos_df same as df
31
32# print(pos_df)
33
34time = df['Time'] # pulls from 'time' column of df
35date = df['Date'] # pulls from 'date' column of df
36
37pos_df.insert(0, 'Time', time) # adds time to leftmost column of pos_df
38pos_df.insert(1, 'Date', date) # adds date to second leftmost column of pos_df
39
40# print(pos_df)
41
42pos_df.to_csv('POS_Team6.csv', index=True) # exports result as a csv file -- make sure to change name!
43
Our results show that humans do adapt their communication in some way throughout extended interaction with text-to-image models. Humans may increase or decrease the amounts of POS in their prompts creating "spikes" and "drops" or even push the same amount of POS to create "plateaus" to achieve a specific result from MidJourney.
When we investigated these changes further, we discovered that these changes correlate with images generated by MidJourney. For example, Team 1 underwent a stylistic change between Prompt #223 to Prompt #478. They began with a semi-realistic style before ending in a more typical "comic" style. This correlates with the drastic changes in parts-of-speech as visualized in Team 1's graph.
While this project has proven humans do indeed adapt their communication with AI, we still don't know why humans accommodate their communication. As I work with Dr. Banks, I will be continuing to explore this project further by researching preexisting literature on human-AI communication in hopes of developing a concise thesis. Ideally, I would be able to interview the team members who participated in this hackathon to further understand the reasoning behind their prompt choices, but this will have to be explored at another time.
Recently, this research project won third place in the 2024 National Student Data Corps Data Science Symposium's Undergraduate Student Cohort. I'm fortunate to have been given this opportunity and I hope to continue furthering the field of human-computer interaction with additional research.