Comparing Human and AI Performance in Negotiation Assessment: Evidence from a Competition Setting

Abstract: This study investigates the alignment and accuracy of negotiation competency assessments provided by expert human judges and an advanced AI model (GPT-5). Analyzing 44 negotiations from the 2025 edition of The Negotiation Challenge, we found a strong overall agreement between human and AI scores, with 12 out of 16 competencies showing an average root mean square error (RMSE) below 1. Divergence was primarily observed in textually implicit competencies (e.g., Ethics, Setting the Stage), where AI's reliance on transcripts precluded access to non-verbal cues and tacit knowledge used by human experts. However, when comparing scoring accuracy against objective negotiation outcomes derived from regression weights, the GPT-5 model significantly outperformed human judges in predicting the overall ranking (Pearson's r=0.61 vs. human average r=0.44) while Gemini 2.5 Pro underperforms (Pearson’s r=-0.02). These findings support a hybrid assessment model for negotiation education and practice, leveraging AI for efficiency and accurate scoring of textually codified competencies, while reserving human expertise for nuanced ethical and relational evaluations, thereby shifting the educator's role toward mentorship.

Keywords: AI, performance assessment, competition, negotiation, competency assessment

Jan SmolinskiAarhus University (Denmark)
jfs2001@gmail.com

Remigiusz SmolinskiHHL (Germany)
remigiusz.smolinski@hhl.de