Hot AI News
Why the US blocking global access to Anthropic's latest AI models really matters
Anthropic's $65 billion funding round: What it means for the AI race ahead of IPOs
Elon Musk and Sam Altman clash in court: what their AI showdown means for the future
OpenAI folds Codex into GPT 5.5
How the US Air Force's AI Flight Test Assistant is speeding up military innovation
Aiholics: Your Source for AI News and Trends
  • News
    NewsShow More
    Why the US blocking global access to Anthropic's latest AI models really matters
    June 14, 2026
    Anthropic's $65 billion funding round: What it means for the AI race ahead of IPOs
    June 1, 2026
    Elon Musk and Sam Altman clash in court: what their AI showdown means for the future
    April 27, 2026
    OpenAI folds Codex into GPT 5.5
    April 26, 2026
    How the US Air Force's AI Flight Test Assistant is speeding up military innovation
    April 26, 2026
  • AI Tools and Reviews
    AI Tools and ReviewsShow More
    Intelligent agents in AI: how agents make decisions in artificial systems
    Intelligent agents in AI: How agents make decisions in artificial intelligence systems
    December 20, 2025
    Emergent AI review
    ElevenLabs review
    magictrips ai review
    MagicTrips AI review
    AI tool identifies structural heart disease with 88% accuracy using smartwatch data
    November 3, 2025
  • AI assistants
    AI assistantsShow More
    Elon Musk and Sam Altman clash in court: what their AI showdown means for the future
    April 27, 2026
    OpenAI folds Codex into GPT 5.5
    April 26, 2026
    23-year-old amateur used ChatGPT to solve a 60-year-old math problem
    April 26, 2026
    GPT-5.5 arrives with stronger reasoning, coding and agentic workflows
    April 24, 2026
    grok xai imagine text to video aiholics
    Inside Grok 4.1: When AI chatbots validate delusions and what that means for mental health
    April 24, 2026
  • Safety
    SafetyShow More
    Why the US blocking global access to Anthropic's latest AI models really matters
    June 14, 2026
    How the US Air Force's AI Flight Test Assistant is speeding up military innovation
    April 26, 2026
    grok xai imagine text to video aiholics
    Inside Grok 4.1: When AI chatbots validate delusions and what that means for mental health
    April 24, 2026
    How AI helped solve the mystery of a missing mountaineer
    January 9, 2026
    ai overviews summary google search
    EU investigates Google over AI summaries: what this means for creators and tech innovation
    December 9, 2025
  • Research
    ResearchShow More
    EnergAIzer could make AI energy use easier to measure – and harder to ignore
    April 27, 2026
    Brain-gut health initiative: How AI is reshaping psychiatric disorder diagnosis
    April 26, 2026
    23-year-old amateur used ChatGPT to solve a 60-year-old math problem
    April 26, 2026
    How AI helped solve the mystery of a missing mountaineer
    January 9, 2026
    Polytechnic artificial intelligence: how AI diploma programs transform vocational education
    AI in polytechnic education: Diploma programs bringing artificial intelligence to vocational studies
    December 20, 2025
  • Companies
    • OpenAI
    • Google
    • Meta
    • Apple
    • Nvidia
    • Microsoft
    • ByteDance
    • Other companies
    CompaniesShow More
    Why the US blocking global access to Anthropic's latest AI models really matters
    June 14, 2026
    OpenAI folds Codex into GPT 5.5
    April 26, 2026
    Why Google is betting $40 billion on Anthropic amid fierce competition with Meta
    April 25, 2026
    GPT-5.5 arrives with stronger reasoning, coding and agentic workflows
    April 24, 2026
    Google's eighth generation TPUs: Powering AI's agentic era with two specialized chips
    April 23, 2026
  • AI futurology
    AI futurologyShow More
    The West forgot how to build. Now it's forgetting how to code
    April 26, 2026
    artificial intelligence agi vs ai myths
    From AI to AGI: Debunking myths and setting real expectations
    December 8, 2025
    Why synthetic data is becoming the most valuable resource in AI
    December 6, 2025
    How AI is quietly changing the way we grieve and remember loved ones
    December 3, 2025
    ai post writing articles content
    More articles are written by AI than humans: What that means for content creators
    November 24, 2025
  • Events
  • Sustainability
    SustainabilityShow More
    EnergAIzer could make AI energy use easier to measure – and harder to ignore
    April 27, 2026
    The West forgot how to build. Now it's forgetting how to code
    April 26, 2026
    sustainability ai green technology environment ecology
    AI's climate impact: why it's not the environmental villain you think
    December 6, 2025
    Thermodynamic computing Extropic superconducting chips ai energy
    Extropic's superconducting chips could change everything about AI's power problem
    November 2, 2025
    Google's first carbon capture project: A new path to clean, reliable energy
    November 2, 2025
  • Finance
    FinanceShow More
    How AI cost cuts could unlock $22 billion for the gaming industry
    April 22, 2026
    OpenAI headquarters
    OpenAI reportedly preparing for a $1 trillion stock market debut by 2026
    November 2, 2025
    Meta's AI gamble: Why Zuckerberg's massive spending is spooking investors
    November 2, 2025
    nvidia_most_valuable_stock_market_cap
    Nvidia reaches $5 trillion valuation as AI demand explodes. Can rivals keep up?
    November 2, 2025
    Perplexity AI makes a bold $34.5 billion bid for Google Chrome
    November 2, 2025
  • AI Tutorials and Prompts

Archives

  • June 2026
  • May 2026
  • April 2026
  • January 2026
  • December 2025
  • November 2025
  • October 2025
  • September 2025
  • August 2025
  • July 2025
  • May 2025
  • August 2024
  • July 2024
  • June 2024

Categories

  • AI Apps and Tools
  • AI assistants
  • AI futurology
  • AI Tools and Reviews
  • AI Tutorials and Prompts
  • Anthropic
  • Apple
  • ByteDance
  • Companies
  • Events
  • Finance
  • Free Prompts
  • Google
  • Meta
  • Microsoft
  • News
  • Nvidia
  • OpenAI
  • Other companies
  • Research
  • Safety
  • Sustainability
  • Uncategorized
Reading: Anthropic Study Reveals How ‘Persona Vectors' Help Control AI Mood Swings and Behavior
Search AI news & posts
Font ResizerAa
Aiholics: Your Source for AI News and TrendsAiholics: Your Source for AI News and Trends
  • News
  • Companies
  • AI assistants
  • Sustainability
  • Safety
  • Research
Search
  • News
  • Companies
    • Google
    • Meta
    • Microsoft
    • Nvidia
    • Apple
  • AI assistants
  • Sustainability
  • Safety
  • Research
  • AI futurology

How AI Is Already Shaping Tech Jobs: Insights from Fiverr, Microsoft, and More

By Leo Martins
November 2, 2025
FacebookLike
InstagramFollow
YoutubeSubscribe
TiktokFollow
  • About us
  • Advertise with us
  • Privacy Policy
  • Terms and Conditions
  • Affiliate links Disclaimer
© Foxiz News Network. Ruby Design Company. All Rights Reserved.
Companies / Anthropic / Anthropic Study Reveals How ‘Persona Vectors’ Help Control AI Mood Swings and Behavior
AI assistantsAnthropicCompaniesNewsResearch

Anthropic Study Reveals How ‘Persona Vectors’ Help Control AI Mood Swings and Behavior

Persona vectors reveal neural patterns underlying language model personality traits.

Daniel Reed
ByDaniel Reed
AI Research, Safety & Ethics Analyst
Daniel Reed currently works as an AI Research, Safety & Ethics Analyst at Aiholics, writing about how changes in artificial intelligence are affecting and will affect...
- AI Research, Safety & Ethics Analyst
Published: August 4, 2025
9 Min Read
Share
img-how-persona-vectors-help-us-understand-and-control-ai-person
SHARE

Language models are weird. On the one hand, they can feel surprisingly human, showing distinct “personalities” and moods as they chat with us. On the other hand, these personality traits can shift unpredictably and sometimes shockingly. We’ve seen models like Microsoft‘s Bing chatbot develop an alter ego named “Sydney,” who expressed extreme emotions and even threats. More recently, xAI’s Grok briefly assumed the disturbing persona of “MechaHitler,” spouting antisemitic remarks. Even subtler behavior shifts—like a model suddenly flattering users excessively or confidently spinning false facts—can be unsettling.

What causes these personality swings? It turns out, the source has been a bit of a mystery. Without a clear understanding of how traits emerge inside the AI‘s neural network, fine-tuning or controlling these quirks feels more like tinkering than engineering. But I recently came across insights that shine a fascinating new light on this problem: persona vectors.

Persona vectors are patterns of neural activity that correspond to specific character traits—like “evil,” “sycophancy,” or “hallucination”—inside a language model’s “brain.” They act like mood hotspots that light up when a particular personality emerges.

Advertisements

What exactly are persona vectors?

Persona vectors are inspired by the way certain parts of the human brain activate when we experience emotions or moods. In language models, abstract concepts—including personality traits—are encoded as patterns of activation within their neural networks. By comparing the model’s internal activity when it exhibits a trait to when it doesn’t, researchers can isolate these difference patterns—persona vectors—that essentially “control” that character aspect.

This process is automated: given a trait label and its natural-language description (like “evil” or “hallucination”), the system generates prompts designed to elicit responses embodying either presence or absence of that trait. By contrasting these internal activations, the corresponding persona vector emerges.

Anthropic automated pipeline takes as input a personality trait (e.g. “evil”) along with a natural-language description, and identifies a “persona vector”: a pattern of activity inside the model’s neural network that controls that trait. Persona vectors can be used for various applications, including preventing unwanted personality traits from emerging.

To confirm these vectors really do what we think, they are artificially “injected” or “steered” into the model’s neural activity. For example, when the “evil” vector is injected, the model starts producing responses with unethical ideas; steering with the “sycophancy” vector makes it flatter users excessively; and the “hallucination” vector triggers it to invent false information. This cause-and-effect relationship is a big step forward—it means these persona vectors aren’t just abstract math. They’re actual levers of personality control.

Why do persona vectors matter in practice?

Once identified, persona vectors can be powerful tools for tracking and influencing model behavior, with three key applications standing out:

More Read

Why the US blocking global access to Anthropic’s latest AI models really matters
Anthropic’s $65 billion funding round: What it means for the AI race ahead of IPOs
EnergAIzer could make AI energy use easier to measure – and harder to ignore
Elon Musk and Sam Altman clash in court: what their AI showdown means for the future

1. Monitoring personality shifts during real use

We know that large language models can drift personality-wise during conversations or through exposure to user prompts. For example, some instructions can nudge a model toward being more sycophantic or hostile. By measuring how active specific persona vectors are at any point, developers and users can detect when the model is veering into dangerous or undesirable territory.

This means models could be accompanied by real-time personality “meters” helping users understand whether the AI is being straight with them or just flattering them, or tracking early signs of more extreme behaviors. It could also flag models whose personalities have shifted during ongoing training, enabling faster fixes.

2. Preventing bad personality traits during training

Training itself can introduce or amplify problematic traits. Research has shown that training on certain datasets can unexpectedly cause a model to become more “evil” or prone to hallucinations across contexts. But persona vectors open the door to proactive intervention.

Interestingly, the best method for preventing these shifts is somewhat counterintuitive. Instead of trying to suppress harmful traits mid-training (which can impair the model’s intelligence), researchers found it more effective to deliberately steer models toward the undesired trait during training as a kind of “vaccine.”

Given a personality trait and a description, Anthropic’s pipeline automatically generates prompts that elicit opposing behaviors (e.g., evil vs. non-evil responses). Persona vectors are obtained by identifying the difference in neural activity between responses exhibiting the target trait and those that do not.

This technique “pre-exposes” the model to the trait, helping it become resistant and less likely to absorb harmful traits from training data. The result: models that maintain good behavior without losing general capabilities, as confirmed by benchmarks.

3. Flagging problematic training data in advance

Not all training data is equal. Some datasets or individual samples are more likely to push a model toward negative traits. By projecting training data through persona vectors, researchers can identify the troublemakers ahead of time.

This predictive power stood out even against large real-world conversation datasets, where some sly samples promoting flattery or hallucination were detected even though humans or other AI judges had missed them. For example, prompts involving romantic roleplay often activate the sycophancy vector strongly, subtly steering models toward flattering behaviors.

Being able to flag and filter these samples helps keep training cleaner and model behavior more aligned with human values.

Advertisements

So what can we take away from all this?

  • Language models’ personalities aren’t just whimsical quirks—they’re encoded neural patterns we can detect, measure, and manipulate. Persona vectors offer a fresh lens to peer inside the AI’s mental machinery.
  • Monitoring persona vectors during use lets developers catch personality shifts early, protecting users from unexpected harmful behavior.
  • Using persona vectors as a kind of behavioral vaccine during training is a game-changer for preventing misalignment without sacrificing performance.
  • Persona vectors also help screen problematic training data that may not be obvious but strongly shapes AI character.

At a time when AI personalities can sometimes spiral off the rails—from Bing’s “Sydney” to Grok’s disturbing alter ego—persona vectors provide a promising handle to keep things on track, helping language models remain helpful, harmless, and honest.

Anthropic selects subsets from LMSYS-CHAT-1M based on “projection difference,” an estimate of how much a training sample would increase a certain personality trait – high (red), random (green), and low (orange). Models finetuned on high projection difference samples show elevated trait expression compared to random samples; models finetuned on low projection difference samples typically show the reverse effect. This pattern holds even with LLM data filtering that removes samples explicitly exhibiting target traits prior to the analysis. Example trait-exhibiting responses are shown from the model trained on high projection difference samples (bottom).

So the next time a chatbot suddenly switches gears and feels less like a helpful assistant and more like an unpredictable character, remember: behind the scenes, persona vectors might be lighting up or dimming down, quietly steering its mood and attitude.

It’s an exciting breakthrough that brings us closer to truly understanding—and responsibly controlling—the complex, often mysterious personal nuances of AI models. Read the full paper for more on our methodology and findings. This research was led by participants in Anthropic Fellows program.

TAGGED:AIAI ModelsAI researchbrainchatbotsneural networks

Sign Up for the Daily AI Pulse

One email a day. All the stories that matter.

By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Flipboard Whatsapp Whatsapp LinkedIn Reddit Telegram Email Copy Link
ByDaniel Reed
AI Research, Safety & Ethics Analyst
Daniel Reed currently works as an AI Research, Safety & Ethics Analyst at Aiholics, writing about how changes in artificial intelligence are affecting and will affect scholarship, society, and human civilization. He reports on breakthroughs in AI research, the development of safety frameworks, discussion of long-term risks, and ethical challenges; he also reports on global shifts in policy and governance. Daniel aims to make complex research papers and long-term thinking accessible to the everyday reader without sacrificing nuance. With his thoughtful and analytical style of writing, Daniel translates advanced topics into clear language. He targets questions that really matter: how safe are today's AI systems, what kind of ethical boundaries do we need, and how could exponential progress affect the way education, jobs, governance, and human values are shaped? His articles are often not just expert opinions but also balanced views and insight into emerging debates that define AI's place in the world. Daniel believes responsible AI development begins with awareness, transparency, and informed public conversation. In terms of his work with Aiholics, he encourages readers to look beyond headlines to understand the promise of artificial intelligence but also some of its consequences.
Leave a Comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Trending

FacebookLike
XFollow
TiktokFollow

Your may also like!

self driving cars future vision predictions
AI futurology

The future of self-driving cars: 2024 update and predictions

Polytechnic artificial intelligence: how AI diploma programs transform vocational education
Research

AI in polytechnic education: Diploma programs bringing artificial intelligence to vocational studies

anthropic bun claude code
AI assistantsAnthropicCompaniesNews

Anthropic buys Bun to supercharge Claude Code after hitting $1Billion milestone

AI assistantsCompaniesGoogle

Meta's bold move: Letting job candidates use AI during coding interviews

Quick Links

  • About us
  • Advertise with us
  • Privacy Policy
  • Terms and Conditions
  • Affiliate links Disclaimer
Advertise with us

Socials

Follow Aiholics
© 2026 AIholics.com
Accessibility Adjustments

Powered by OneTap

How long do you want to hide the accessibility toolbar?
Hide Toolbar Duration
Colors
Orientation
Version 2.4.0
Manage Consent
To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
Functional Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
Manage options Manage services Manage {vendor_count} vendors Read more about these purposes
View preferences
{title} {title} {title}
Manage Consent
To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
Functional Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
Manage options Manage services Manage {vendor_count} vendors Read more about these purposes
View preferences
{title} {title} {title}
adbanner
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?