app.backend.bert_inference¶
ModernBERT-large-NLI loader shared by every NLI-based scorer in the backend.
The same model is reused for three distinct jobs at request time:
MCQ classification (
app.backend.question_classifier.detect_mcq_bert())Per-claim self-entailment (
app.backend.claim_confidence.compute_claim_confidences())Kernel Language Entropy + robustness similarity matrices (
kernel_entropy.nli.ModernBERTScorer)
Loading is wrapped in try/except so a missing HF_CACHE_DIR or HF
network blip downgrades to (None, None) rather than crashing the
FastAPI lifespan.
Functions
|
Load ModernBERT-large-NLI from the Modal-mounted HF cache. |
- app.backend.bert_inference.load_bert(device: str = 'cuda')[source]¶
Load ModernBERT-large-NLI from the Modal-mounted HF cache.
The Modal volume is populated once by
app.backend.modal_app.download_weights(); locallyHF_CACHE_DIRmay point at the user’s HF cache.- Parameters:
device – Torch device for the loaded model.
- Returns:
(model, tokenizer)on success,(None, None)if the cache is missing or the download fails. The FastAPI lifespan logs and continues in degraded mode when this returnsNone.