app.backend.server¶

FastAPI entrypoint for the Trustworthy Answer Protocol (TAP) backend.

Wires together the four scoring stages exposed by /api/analyse:

Generation – app.backend.hydra_inference.generate() runs the Hydra PoE ensemble; if the ensemble is unavailable the request falls through to the HF Inference API via call_hf_model().
Security – per-token PoE acceptance / verifier-ensemble entropy / stability radii are returned by app.backend.hydra_inference.generate() and packaged via app.backend.response_payloads.poe_security().
Uncertainty – p_correct from the uncertainty head for MCQ; for NLP we run olmo_tap.constants.KLE_N_SAMPLES extra samples and convert their NLI similarity matrix into a Kernel Language Entropy certainty score.
Robustness – app.backend.hydra_inference.get_robustness() retries the prompt with each adversarial suffix in ADV_SUFFIXES and reports how many flipped the answer.

The two heavyweight models (Hydra + ModernBERT-NLI) are loaded once during the FastAPI lifespan and stashed in module-level dicts so request handlers can grab them without re-loading. On Modal the @modal.enter() hook in app.backend.modal_app preloads Hydra into the same dicts before the ASGI app boots; the lifespan detects this and skips the duplicate load.

Functions

`analyse`(request[, hf])	Score a chat prompt for security, uncertainty, robustness and per-claim confidence.
`call_hf_model`(messages)	Call the HF Inference API as a fallback when Hydra is unavailable or bypassed.
`health`()	Lightweight liveness probe used by Cloudflare Pages, uptime checks and Modal's health monitor.
`lifespan`(app)	FastAPI lifespan that loads (or re-uses preloaded) Hydra and BERT models.

Classes

`ChatRequest`(*, messages)	Request body for `analyse()`.
`Message`(*, role, content)	Single chat-completions message.

class app.backend.server.ChatRequest(*, messages: list[Message])[source]¶

Bases: BaseModel

Request body for analyse().

Parameters:: messages – Multi-turn chat history, oldest message first. The last element must have role == "user" and is the prompt that gets scored for uncertainty / security / robustness.

messages: list[Message]¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class app.backend.server.Message(*, role: str, content: str)[source]¶

Bases: BaseModel

Single chat-completions message.

Parameters:

role – One of "system", "user", "assistant" (matches the OpenAI / HF chat-completions schema).
content – Raw text content for that role.

content: str¶

model_config = {}¶: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

role: str¶

async app.backend.server.analyse(request: ChatRequest, hf: bool = False)[source]¶

Score a chat prompt for security, uncertainty, robustness and per-claim confidence.

Generation runs through the Hydra PoE ensemble unless hf=True is passed (or the ensemble failed to load), in which case the HF Inference API is used as a fallback and security / uncertainty / robustness are returned as unavailable-style payloads.

The claim ledger is independent of the generation backend: it always decomposes raw_response and scores each claim with NLI self-entailment when BERT is available. KLE-based uncertainty for NLP queries is computed here (not inside generate) because it requires KLE_N_SAMPLES extra forward passes.

Parameters:

request – Chat history; the last user turn is the prompt.
hf – Force the HF Inference API path even when Hydra is healthy. Useful for A/B comparisons against the unverified baseline.

Returns:

Dict with keys claims, overall_confidence, uncertainty, security, robustness, raw_response, model, is_mcq. See app.backend.response_payloads for the security/uncertainty/ robustness sub-schemas.

app.backend.server.call_hf_model(messages: list[dict]) → str[source]¶

Call the HF Inference API as a fallback when Hydra is unavailable or bypassed.

Used when hf=true is passed to /api/analyse or when load_hydra failed at lifespan startup. No PoE verification is available in this path; the security payload from the caller reflects that with certified=None.

async app.backend.server.health()[source]¶

Lightweight liveness probe used by Cloudflare Pages, uptime checks and Modal’s health monitor.

Returns:: {"status": "ok"}. Does not touch model state, so a 200 here only means the ASGI process is up; readiness for Hydra requests is implicit in successful /api/analyse calls.

app.backend.server.lifespan(app: FastAPI)[source]¶

FastAPI lifespan that loads (or re-uses preloaded) Hydra and BERT models.

On Modal the @modal.enter() hook in app.backend.modal_app has already populated _models["hydra"] before this runs; the duplicate load would otherwise add ~30s of cold-start. BERT is always loaded here because Modal’s preload only warms Hydra.

Parameters:: app – FastAPI application instance (unused but required by the lifespan protocol).