app.backend.bert_inference

ModernBERT-large-NLI loader shared by every NLI-based scorer in the backend.

The same model is reused for three distinct jobs at request time:

Loading is wrapped in try/except so a missing HF_CACHE_DIR or HF network blip downgrades to (None, None) rather than crashing the FastAPI lifespan.

Functions

load_bert([device])

Load ModernBERT-large-NLI from the Modal-mounted HF cache.

app.backend.bert_inference.load_bert(device: str = 'cuda')[source]

Load ModernBERT-large-NLI from the Modal-mounted HF cache.

The Modal volume is populated once by app.backend.modal_app.download_weights(); locally HF_CACHE_DIR may point at the user’s HF cache.

Parameters:

device – Torch device for the loaded model.

Returns:

(model, tokenizer) on success, (None, None) if the cache is missing or the download fails. The FastAPI lifespan logs and continues in degraded mode when this returns None.