Reward models inherit value biases from pretraining

Christian B., Thompson J., Yang EM., Adam V., Kirk H., SUMMERFIELD C., DUMBALSKA T.

Type

Conference paper

Publication Date

2026-04-23T00:00:00+00:00

Permalink More information Close