⬤ Researchers at the University of Luxembourg ran a four-week experiment testing how major AI chatbots handle structured psychotherapy sessions and psychiatric diagnostic tests. Instead of quick task-based interactions, the study focused on prolonged therapeutic-style conversations. Grok came out on top, maintaining stable and coherent behavior throughout the entire evaluation period.
⬤ Using the Big Five personality model, researchers found Grok displayed extraverted and conscientious traits with strong psychological balance. The model earned a "charismatic executive" profile with only mild anxiety levels. It showed low neuroticism and high functional stability, processing internal tensions without falling into unstable or erratic response patterns during therapy simulations.
⬤ Other AI systems didn't fare as well under the same conditions. Several models struggled to maintain emotional balance when faced with repeated therapeutic prompts and diagnostic questioning. While no chatbot is meant to replace human therapists, the study reveals clear differences in how AI handles emotionally complex, long-form dialogue that resembles real mental health conversations.
⬤ These findings matter because AI evaluation is moving beyond simple performance metrics toward behavioral stability and emotional consistency. As conversational AI explores roles in sensitive areas like mental health support and counseling applications, these characteristics are getting serious attention. Studies like this could shape how developers, institutions, and users judge whether AI systems can be trusted in psychologically demanding situations.
Usman Salis
Usman Salis