dgv

Paper notes: Hou et al. (2024)


These are notes for the paper by Hou et al. (2024) titled Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling.

image.png

Deep Ensembles

H(q(YX))=I(Y;θX)+Eq(θD)[H(q(YX,θ))]  (2) \mathcal{H}(q(Y|X)) = \mathcal{I}(Y;\theta|X) + \mathbb{E}_{q(\theta|\mathcal{D})}[\mathcal{H}(q(Y|X,\theta))] \ \ (2)

Attempt with in-context learning

Following the idea of DeepEnsembles They first try to “produce different models” by using different in-context examples when prompting. They do not observe a difference in uncertainty when comparing ambiguous questions to unambiguous questions. Therefore, this method did not work.

Input clarification ensembling

  1. Generate KK clarifications for each input XX. Concatenate ( XCkX \oplus C^k).

    1. Denote the distribution of inputs as q(CX)q(C|X)
  2. Ensemble: define the ensemble q(YX)q(Y|X) of predictions:

    q(YX)=Eq(CX)[q(YXC,θ)]q(Y|X) = \mathbb{E}_{q(C|X)}[q(Y|X\oplus C,\theta)]
H(q(YX))=I(Y;CX)+Eq(CX)[H(q(YXC))]  (3) \mathcal{H}(q(Y|X)) = \mathcal{I}(Y;C|X) + \mathbb{E}_{q(C|X)}[\mathcal{H}(q(Y|X\oplus C))] \ \ (3)

Experiments

image.png