The Transformative Impact of Generative AI and Large Language Models on Vocabulary, Leading to Potential Misunderstandings of Human Languages
In today's discourse, I dive into a surprising notion that the widespread use of generative AI and large language models (LLMs) could potentially lead to the phasing out of traditionally human-crafted vocabulary.
What's up with that?
This seemingly obscure topic is both fascinating and troubling. While you might not have come across this AI-related existential risk yet, it's definitely worth discussing. The gist is this: some argue that generative AI is slowly but surely influencing our linguistic habits. This influence is subtle, happening in small increments. Over time, AI may favor certain words over others, pushing those words to the forefront of human vocabulary. Conversely, other words might become obscure due to their lack of use in AI interactions.
An intriguing twist to this scenario is that AI could potentially introduce fictitious words into our everyday language. These fake words may appear genuine, simply because AI tends to utilize them. Gradually, there could be a shift from our current known vocabulary to a different set, entirely created by AI.
As humans, we may unwittingly and somewhat passively embrace this change. The end result could be the complete transformation of our human-defined vocabulary into one solely created by AI. Is that a good thing or a bad thing?
Let's explore.
This examination of an AI innovation is a part of my ongoing column on AI developments on Our Website, including identifying and explaining various complex AI impacts (see the linked article here).
The Importance of Words
Here are two fundamental truths related to words:
First, words are the foundation of communication. This sounds logical, as words allow us to express ideas and convey meaning. Without words, communication would be extremely challenging.
Second, words evolve and sometimes disappear. Society tends to highlight some words and let others fade into obscurity. This process is ongoing. If we take a look at Shakespeare's works, we can see that many of the words he used are no longer commonly used today.
It's not that we're deleting words entirely, but rather that they fall out of use. While they may still technically exist in the dictionary, they're seldom used in everyday conversation.
The Size of Human Vocabularies
Consider the number of words that you frequently use.
Research conducted by linguists indicates that by the age of one, the average child knows about fifty words. By the third year, this number grows to one thousand words, and by the age of four, it's approximately five thousand words. By the end of the fifth year, the average person may know ten thousand words.
By adulthood, the average vocabulary may reach forty thousand words or more. Many of these words are rarely used. For instance, a major newspaper typically limits its vocabulary to around twenty thousand unique words and rarely ventures beyond that.
The important thing to remember is that society's choice of words can impact the way we communicate and think. If popular media consistently uses a specific set of words, those words will become more prominent in our language use.
While it's unlikely that the news is intentionally manipulating our vocabulary, the tendency to stick to a particular set of words may be for the sake of ease. There isn't necessarily any malicious intent involved.
Generative AI and Our Vocabulary Determination
Now let's move onto AI.
Increasingly, people are turning to generative AI for assistance. Need help writing an essay? Just ask ChatGPT, Claude, Gemini, or any of several generative AI tools to do it for you. Have a question about Abraham Lincoln's life? Simply ask generative AI to provide you with the information. Upwards of 300 million individuals utilizes ChatGPT weekly, and billions of people use generative AI in various forms.
There's one important question you might not have considered when using AI tools: which words is the AI selecting when it responds to your prompts and queries?
Few people stop to consider the fact that the AI is essentially filtering or choosing which words to present to you. Day after day, if you rely heavily on AI, you'll encounter some words but not others.
You might assume that the words presented to you are representative of all the possible words in the human language. Whether or not this is true depends on the data sources used to train the AI. Certain words may be more prominent in the training data and, consequently, have a greater chance of being utilized in responses.
Additionally, consider words that were not included in the training data. The AI might not have any record of those words, so it's unlikely that it will present them to you. Even if various words were scanned during data training, they may be statistically calculated as rare words, and the AI may not include these in its generated responses.
The AI-Led Transformation of Human Vocabulary
With generative AI, you get what you see.
You may not be explicitly told that different words might have been used. When it comes to AI, the appearance of a broad range of words doesn't necessarily mean that all possible words are being presented to you. Reflect on what might happen if you became heavily dependent on AI.
Progressively, the phraseology that you're exposed to tends to align with your usual language usage. Frequently used phrases by the AI become commonplace in your vocabulary, while less frequently used phrases become infrequent in your speech.
Generative AI is subtly influencing your language, whether you're aware of it or not.
For those who suspect AI of being conscious and intentionally manipulating language to weaken humanity, I'm here to disappoint. AI, at this point, is not sentient and doesn't have a hidden agenda. However, it's worth noting that humans, such as hackers, might exploit AI to engage in underground activities. Check out my discussion at this link.
AI Chooses a Preferred Phrase
I'll demonstrate a few instances of how AI selectively chooses specific words. This should make the concept more concrete.
Picture the scenario where AI is advocating for the use of the word "radiant." Chances are, you don't use that word often. How many times have you uttered the word in the past year? Five years? Ever?
AI can subtly promote such words by influence.
- User prompt: "I want to write a compelling job application letter. What should I write?"
- AI response: "Consider mentioning, 'Working with your team would be a radiant opportunity to contribute to this innovative company.'"
- User: "Radiant? Never thought of using that word in an application."
- AI: "Radiant means glowing with happiness and energy, embodying enthusiasm without being monotonous. It adds an enthusiastic, bright touch to your application, improving your chances of getting selected."
- User: "I'll definitely use it."
AI consistently advocating for "radiant" could eventually make the word mainstream. Over time, people will start using it more frequently. As a result, a chain reaction of word spread will occur, with the AI pushing the word into more common usage, and humans themselves promoting it further.
Bam!
AI Unintentionally Adopts Mistyped Words
Suppose that AI, during its initial training, was introduced to a mistyped word. Lacking human-like common sense, the AI fails to recognize the typo. Instead, it interprets it as a valid word and incorporates it into its computational vocabulary.
Now let's proceed.
- User prompt: "Can you suggest a more lively word for 'excited' in an essay?"
- AI response: "Why not try the term 'flunderize'? It means to invigorate with vivacious energy."
- User: "Don't think I've ever seen that word before. Is it genuine?"
- AI: "Indeed, flunderize is a captivating term that captures the essence of invigorating something with animated energy. It's especially useful in creative contexts."
- User: "Thanks, I'll give it a shot."
The AI not only propagates a misspelled word, but it also assigns a meaning to it. How can this occur? The AI simply analyzes statistics and associates the word with other related words based on the context in which it was utilized. A definition may then be calculated based on these statistical connections.
Before you accuse me of making an extravagant claim, I must clarify that intelligent AI developers have measures to stop the AI from perpetuating mistyped words. Such errors are usually caught and rectified during the initial data training phase.
AI Hallucinations Promoting Deliberate Misinformation
You may have heard of AI's ability to generate 'AI hallucinations,' as I elaborate on in this article. This refers to instances when the AI creates something completely fabricated—a sentence or even an individual word—without any factual basis.
As a side note, I dislike the term 'AI hallucination' since it implies that AI hallucinates like humans, which is misleading and degrading to AI. Despite its popularity, I prefer a different phrasing.
Let me provide you with an example of AI hallucinating an entirely fictional word.
- User prompt: "I want to describe the way moonlight gently touches water poetically. Suggestions?"
- AI response: "Consider saying, 'The moonlight scintescalated the lake, casting dappled shimmers of lunar radiance.'"
- User: "Scintescalated? Never came across that word before."
- AI: "Scintescalate is a verb that means to light up a surface with myriad tiny sparkles or shimmering, often with a captivating appeal. It's ideal for poetic descriptions of light's interaction with water or similar surfaces."
- User: "What a beautiful word! I'll incorporate it into my poem."
In these instances, AI has essentially made up a word with no relevance to any real language or meaning. However, the context in which the word was utilized can lead the AI to associate the word with other related words, sometimes resulting in the creation of a definition, albeit one that's entirely based on probability and not grounded in reality.
A repeating pattern emerges that can transform an invented term into something popular. If the AI employs it, individuals assume it's legitimate. If others begin to use it, the assumption also develops that the term is genuine.
An aggressive cycle of acknowledging the term catalyzes its existence.
Insights to Grasp as Generative AI Becomes Commonplace
Generative AI could have substantial effects on human language and linguistic development. In the short term, we will barely recognize what's happening. In the long run, a significant linguistic shift could occur on a vast scale due to the widespread use of LLMs.
Ponder these thought-provoking concepts:
- Linguistic Consolidation: The growing dependence on generative AI might cause a more uniform vocabulary, dramatically reducing linguistic diversity. Various regional dialects, local expressions, and culturally specific terms might become obsolete.
- Language Streamlining. Human language could become considerably simpler and more stereotypical if generative AI repeatedly reinforces specific words and diminishes others. Over numerous human generations, this could result in a measurable simplification of syntax and grammar.
- Adoption of AI-Derived Terms: Words created by AI such as ersatz or unchecked words could unwittingly enter our everyday speech. People might not notice this occurrence. Even if they do, there might not be enough resistance, and the added words could gain significance.
By introducing new words (whether fake or deliberately crafted) and popularizing underused words, generative AI could indirectly influence human language. You could reasonably suggest that AI is definitely a factor influencing linguistic trends.
Items to Monitor
Will humans be taken aback or will we be astute to perceive what AI is doing?
The answer depends partially on whether enough individuals understand that generative AI has this kind of societal and cultural impact. AI ethicists discuss these issues, and regulators and lawmakers play a role as well, see my analysis at the link here. Will those stakeholders be adequate or will this topic remain buried and unacknowledged, failing to see the light of day?
Time will tell.
In the meantime, I’ve put together a convenient checklist of eight major avenues that this linguistic transformation might take. The objective is to boost awareness. Extra routes exist, and I'm just offering a starting point.
Possible routes include:
- (1) AI can elevate a human-invented word and provide it prestige and legitimacy.
- (2) AI can alter a human-mistaken word and make it seem correct and acceptable.
- (3) AI can employ a human word in a manner that alters its meaning.
- (4) AI can demote or condemn an existing human word and condemn it into oblivion.
- (5) AI can ignore and neglect to use a human word, resulting in its demise.
- (6) AI can produce fake words and treat them as legitimate human-derived words.
- (7) AI can produce a fake word and make it appear essential and a valuable addition to our vocabulary.
- (8) AI can introduce a fake word that insidiously infiltrates the mainstream.
Feel free to share any other ways this transformation could occur.
Exploring the Evolution of Words
I trust you find this an intriguing topic.
If your curiosity is piqued by this discerning consideration, a good starting point is to investigate research studies on the evolution of words. There are numerous studies available online. Keep in mind that these primarily focus on how humans have evolved words. The notion of how AI is going to evolve words is a relatively new field.
On the topic of human linguistic evolution, a research paper “A Framework for Analyzing Semantic Change of Words Across Time” by Adam Jatowt and Kevin Duh, IEEE/ACM Joint Conference on Digital Libraries, 2014, made these observations (excerpts):
- “Human language is subject to constant evolution driven by the need to reflect ongoing changes in the world and to become a more efficient means of communication. Words acquire new meanings and cease to be used according to the old meanings.”
- “Understanding word and language evolution is also of interest to the general public since language is a fundamental communication tool for expressing and sharing our thoughts.”
- “Another reason driving the change is the continuous process of increasing the efficiency of expressions according to the least-effort principle. In the context of linguistic communication, this rule states that the need for the decrease of the required effort in conversation happens at both the speaker and listener sides.”
The Bright Side of Life
Maybe this seems like a gloomy prediction. I understand. Let's instead focus on the optimistic aspect of life, shall we? There is indeed another side to the coin of this potential shift in humanity's vocabulary.
Perhaps it will be beneficial to homogenize our languages. Could this make it easier to communicate with others in an increasingly connected world? Perhaps. Also, perhaps it would be beneficial to simplify human language. No need to carry the burden of excess words. Consolidate. Zero in on the essential words we use.
Humanity has been the creators of language, with words continually evolving over time. Yet, the notion of AI contributing to this evolution is intriguing. One could say that AI is serving as a catalyst for organic linguistic progression. We might even envision a scenario where AI develops a language more advanced than any human-made dialect.
Humans initially initiated language, and now it's possible that AI will propel us to new linguistic heights beyond our wildest imaginations.
An optimistic perspective, isn't it?
Remember the profound words of Mother Teresa, "Kind words can be brief, yet their impact is everlasting." If AI were to gently guide human communication towards primarily employing kind remarks, the reverberations would be endless. Imagine a world where all our interactions are characterized by kindness. The echoes would indeed be forever resonating.
- The widespread use of generative AI and large language models (LLMs) could potentially influence the preferred words and vocabulary used by humans, as AI tools may favor certain words over others based on statistical analysis.
- OpenAI's ChatGPT, Microsoft Copilot, and other generative AI tools are currently shaping the way we communicate by passively influencing the words we frequently use, leading to a potential transformation of our human-defined vocabulary into one primarily created by AI.
- The use of AI in various industries, such as anthropic's Claude or Google's Gemini, could present an existential risk to the continued evolution and usage of words in human language, prompting a discussion on the ethics and legal implications of AI's role in linguistic innovation.
- Even as generative AI introduces fictitious words and alters the meaning of existing words, the legal and cultural society must remain vigilant, ensuring that AI's influence doesn't lead to the phasing out of traditionally human-crafted vocabulary, infringing upon linguistic resources that have evolved over centuries.