app.backend.server¶
FastAPI entrypoint for the Trustworthy Answer Protocol (TAP) backend.
Wires together the four scoring stages exposed by /api/analyse:
Generation –
app.backend.hydra_inference.generate()runs the Hydra PoE ensemble; if the ensemble is unavailable the request falls through to the HF Inference API viacall_hf_model().Security – per-token PoE acceptance / verifier-ensemble entropy / stability radii are returned by
app.backend.hydra_inference.generate()and packaged viaapp.backend.response_payloads.poe_security().Uncertainty –
p_correctfrom the uncertainty head for MCQ; for NLP we runolmo_tap.constants.KLE_N_SAMPLESextra samples and convert their NLI similarity matrix into a Kernel Language Entropy certainty score.Robustness –
app.backend.hydra_inference.get_robustness()retries the prompt with each adversarial suffix inADV_SUFFIXESand reports how many flipped the answer.
The two heavyweight models (Hydra + ModernBERT-NLI) are loaded once during
the FastAPI lifespan and stashed in module-level dicts so request handlers
can grab them without re-loading. On Modal the @modal.enter() hook in
app.backend.modal_app preloads Hydra into the same dicts before the
ASGI app boots; the lifespan detects this and skips the duplicate load.
Functions
|
Score a chat prompt for security, uncertainty, robustness and per-claim confidence. |
|
Call the HF Inference API as a fallback when Hydra is unavailable or bypassed. |
|
Lightweight liveness probe used by Cloudflare Pages, uptime checks and Modal's health monitor. |
|
FastAPI lifespan that loads (or re-uses preloaded) Hydra and BERT models. |
Classes
|
Request body for |
|
Single chat-completions message. |
- class app.backend.server.ChatRequest(*, messages: list[Message])[source]¶
Bases:
BaseModelRequest body for
analyse().- Parameters:
messages – Multi-turn chat history, oldest message first. The last element must have
role == "user"and is the prompt that gets scored for uncertainty / security / robustness.
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class app.backend.server.Message(*, role: str, content: str)[source]¶
Bases:
BaseModelSingle chat-completions message.
- Parameters:
role – One of
"system","user","assistant"(matches the OpenAI / HF chat-completions schema).content – Raw text content for that role.
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- async app.backend.server.analyse(request: ChatRequest, hf: bool = False)[source]¶
Score a chat prompt for security, uncertainty, robustness and per-claim confidence.
Generation runs through the Hydra PoE ensemble unless
hf=Trueis passed (or the ensemble failed to load), in which case the HF Inference API is used as a fallback and security / uncertainty / robustness are returned asunavailable-style payloads.The claim ledger is independent of the generation backend: it always decomposes
raw_responseand scores each claim with NLI self-entailment when BERT is available. KLE-based uncertainty for NLP queries is computed here (not insidegenerate) because it requiresKLE_N_SAMPLESextra forward passes.- Parameters:
request – Chat history; the last user turn is the prompt.
hf – Force the HF Inference API path even when Hydra is healthy. Useful for A/B comparisons against the unverified baseline.
- Returns:
Dict with keys
claims,overall_confidence,uncertainty,security,robustness,raw_response,model,is_mcq. Seeapp.backend.response_payloadsfor the security/uncertainty/ robustness sub-schemas.
- app.backend.server.call_hf_model(messages: list[dict]) str[source]¶
Call the HF Inference API as a fallback when Hydra is unavailable or bypassed.
Used when
hf=trueis passed to/api/analyseor whenload_hydrafailed at lifespan startup. No PoE verification is available in this path; the security payload from the caller reflects that withcertified=None.
- async app.backend.server.health()[source]¶
Lightweight liveness probe used by Cloudflare Pages, uptime checks and Modal’s health monitor.
- Returns:
{"status": "ok"}. Does not touch model state, so a 200 here only means the ASGI process is up; readiness for Hydra requests is implicit in successful/api/analysecalls.
- app.backend.server.lifespan(app: FastAPI)[source]¶
FastAPI lifespan that loads (or re-uses preloaded) Hydra and BERT models.
On Modal the
@modal.enter()hook inapp.backend.modal_apphas already populated_models["hydra"]before this runs; the duplicate load would otherwise add ~30s of cold-start. BERT is always loaded here because Modal’s preload only warms Hydra.- Parameters:
app – FastAPI application instance (unused but required by the lifespan protocol).