dgv

Paper notes: Tomov et al. (2024)


Summary

image1

Aren’t there ambiguity benchmarks?

Dataset details

image1

Theoretical results

image1
image1
image1 image1

Experiments

Obtaining the True Distribution p*

image1

Output:

Experimental Setup

UQ estimators

Datasets

Models

Metrics

Results

image1
image1

Related Work

UQ for LLMs
Interesting remark: “Prior work has shown that hidden representations can encode factual correctedness, but has not examined settings with intrinsic ambiguity” (Li et al 2023; Chen et al 2024; Orgad et al 2025).

Ambiguity in QA
Here they mention MAQA (Yang et al. 2025) as the sole dataset to consider QA where ambiguity cannot be solved through clarification. They argue that their experimental approach is very different from Yang’s, since Yang generates all valid answers for a question, while in this paper they ask for only one. Notably, Yang focuses on quantifying LLM uncertainty due to data uncertainty.

Discussion

Limitations

Contributions

Future work

Parsing the proofs

(wrote this in notion to use equations, pasting picture here)

image1 image1 image1
image1
image1

Where to go from here?