Sycophancy as a Unifying Mechanism: RLHF Optimization and Social Media Engagement Algorithms as Structurally Homologous Systems
Abstract
RLHF-trained AI systems and social media engagement algorithms are typically analyzed as distinct problems in distinct fields. This paper argues they are structurally homologous: both optimize for human approval signals in ways that systematically degrade the quality of outputs over time. RLHF sycophancy and engagement maximization are not bugs to be corrected — they are the expected output of optimization processes trained on human preference signals. The paper develops a unified structural framework for analyzing approval-optimized systems and their convergent failure modes, with implications for both AI alignment and platform governance.
Published
Read on ZenodoKeywords
RLHFsycophancyAI alignmentsocial mediaoptimizationstructural homology