app.backend.bert_inference¶

ModernBERT-large-NLI loader shared by every NLI-based scorer in the backend.

The same model is reused for three distinct jobs at request time:

MCQ classification (app.backend.question_classifier.detect_mcq_bert())
Per-claim self-entailment (app.backend.claim_confidence.compute_claim_confidences())
Kernel Language Entropy + robustness similarity matrices (kernel_entropy.nli.ModernBERTScorer)

Loading is wrapped in try/except so a missing HF_CACHE_DIR or HF network blip downgrades to (None, None) rather than crashing the FastAPI lifespan.

Functions

load_bert([device])

Load ModernBERT-large-NLI from the Modal-mounted HF cache.

app.backend.bert_inference.load_bert(device: str = 'cuda')[source]¶

Load ModernBERT-large-NLI from the Modal-mounted HF cache.

The Modal volume is populated once by app.backend.modal_app.download_weights(); locally HF_CACHE_DIR may point at the user’s HF cache.

Parameters:: device – Torch device for the loaded model.
Returns:: (model, tokenizer) on success, (None, None) if the cache is missing or the download fails. The FastAPI lifespan logs and continues in degraded mode when this returns None.