app.backend.claim_confidence

Per-claim confidence via NLI self-entailment.

For every decomposed claim we compute P(response entails claim) using the ModernBERT-large-nli scorer already loaded for Kernel Language Entropy. The response is used as the premise and each claim as the hypothesis, so a high score means the model’s own output supports the claim.

The score is P(entailment) + 0.5 * P(neutral), mirroring the KLE similarity weighting. Because the three NLI probabilities sum to 1, this one-direction score is bounded in [0, 1] (unlike the bidirectional KLE W matrix in [0, 2]). We only use the response -> claim direction: the reverse direction is uninformative here because a short claim generally cannot entail the full response, so every score would collapse toward neutral/contradiction.

This is the degenerate single-sample case of SelfCheckGPT-NLI (Manakul et al., EMNLP 2023): rather than sampling K responses and averaging entailment, we treat the produced response as the sole reference context.

Functions

compute_claim_confidences(response, claims, ...)

Score every claim against response with NLI.

score_to_metrics(score)

Map a [0, 1] confidence score into the API's {confidence, level, guidance} dict.

app.backend.claim_confidence.compute_claim_confidences(response: str, claims: list[str], bert_model: Any, bert_tokenizer: Any) list[dict][source]

Score every claim against response with NLI.

Returns one {confidence, level, guidance} dict per claim, in the same order as claims. Raises if the NLI forward pass fails; the caller is expected to treat the claim ledger as unavailable on exception.

app.backend.claim_confidence.score_to_metrics(score: float) dict[source]

Map a [0, 1] confidence score into the API’s {confidence, level, guidance} dict.