Date
Publisher
arXiv
Do "digital twins" capture individual responses in surveys and experiments?
We run 19 pre-registered studies on a national U.S. panel and their LLM-powered
digital twins (constructed based on previously-collected extensive
individual-level data) and compare twin and human answers across 164 outcomes.
The correlation between twin and human answers is modest (approximately 0.2 on
average) and twin responses are less variable than human responses. While
constructing digital twins based on rich individual-level data improves our
ability to capture heterogeneity across participants and predict relative
differences between them, it does not substantially improve our ability to
predict the exact answers given by specific participants or enhance predictions
of population means. Twin performance varies by domain and is higher among more
educated, higher-income, and ideologically moderate participants. These results
suggest current digital twins can capture some degree of relative differences
but are unreliable for individual-level predictions and sample mean and
variance estimation, underscoring the need for careful validation before use.
Our data and code are publicly available for researchers and practitioners
interested in optimizing digital twin pipelines.
Who age?
Why use AI?
Study design
