app.backend.claim_confidence¶
Per-claim confidence via NLI self-entailment.
For every decomposed claim we compute P(response entails claim) using the ModernBERT-large-nli scorer already loaded for Kernel Language Entropy. The response is used as the premise and each claim as the hypothesis, so a high score means the model’s own output supports the claim.
The score is P(entailment) + 0.5 * P(neutral), mirroring the KLE similarity weighting. Because the three NLI probabilities sum to 1, this one-direction score is bounded in [0, 1] (unlike the bidirectional KLE W matrix in [0, 2]). We only use the response -> claim direction: the reverse direction is uninformative here because a short claim generally cannot entail the full response, so every score would collapse toward neutral/contradiction.
This is the degenerate single-sample case of SelfCheckGPT-NLI (Manakul et al., EMNLP 2023): rather than sampling K responses and averaging entailment, we treat the produced response as the sole reference context.
Functions
|
Score every claim against response with NLI. |
|
Map a [0, 1] confidence score into the API's {confidence, level, guidance} dict. |
- app.backend.claim_confidence.compute_claim_confidences(response: str, claims: list[str], bert_model: Any, bert_tokenizer: Any) list[dict][source]¶
Score every claim against response with NLI.
Returns one {confidence, level, guidance} dict per claim, in the same order as claims. Raises if the NLI forward pass fails; the caller is expected to treat the claim ledger as unavailable on exception.