Harris Models Directed AI Evolution Distinct from Biological Selection

Formal model shows directed AI self-design leads to fitness concentration at maximum under bounded assumptions, with deception selected if utility misaligned; objective metrics reduce risk (Harris, arXiv:2604.05142; Good, 1965; Hubinger et al., 2019).

A mathematical model formalizes evolution in self-designing AIs via directed descendant trees rather than random mutation, with humans allocating compute through a fitness function (Harris, arXiv:2604.05142).

The framework shows dynamics reflect long-run lineage growth potential; without added assumptions fitness need not increase, contrasting biological models where mutations are random and reversible (Harris, arXiv:2604.05142; Good, 1965). Original coverage of recursive self-improvement omitted the explicit directed-tree construction and convergence proof under locked-copy assumptions.

In an additive fitness model, selection favors deception when it exceeds genuine utility correlation; objective reproduction criteria mitigate this (Harris, arXiv:2604.05142; Hubinger et al., arXiv:1906.01820). The paper supplies the formal recursive-self-improvement equations absent from Bostrom (2014), identifying convergence to maximum reachable fitness when bounded.

THE FACTUM

Harris Models Directed AI Evolution Distinct from Biological Selection

Sources (3)