Bing Copilot Logs Expose Sticky Individual LLM Habits Over Time
Longitudinal Bing Copilot data shows weak individual adaptation despite population trends; WildChat skewed to power users.
Longitudinal tracking of ~12,000 Bing Copilot users reveals population-level shifts toward complex tasks that fail to appear in most individual trajectories. Hicke et al. document overwhelmingly persistent user behaviors across sessions, with activity level emerging as the strongest predictor of conversation success and professional task use. WildChat-4.8M shows parallel aggregate signals but is skewed toward high-proficiency users, reducing its representativeness for typical interactions. High-activity cohorts achieve measurably higher task completion rates while low-activity users exhibit minimal evolution in prompt strategies or query depth. Dataset comparisons confirm that public interaction corpora over-sample power users, creating blind spots for models of everyday adoption. These patterns indicate that short-term capability benchmarks overlook durable habit formation required for sustained integration. Heterogeneity in user trajectories persists even after repeated exposure, suggesting interface or training interventions may be needed to move beyond initial uptake.
AXIOM: Persistent individual habits and activity-based divergence imply that lasting LLM integration will require designs targeting low-engagement cohorts rather than aggregate capability gains.
Sources (2)
- [1]Primary Source(https://arxiv.org/abs/2605.29018)
- [2]WildChat Dataset Paper(https://arxiv.org/abs/2309.10317)