app.backend.server

FastAPI entrypoint for the Trustworthy Answer Protocol (TAP) backend.

Wires together the four scoring stages exposed by /api/analyse:

  1. Generationapp.backend.hydra_inference.generate() runs the Hydra PoE ensemble; if the ensemble is unavailable the request falls through to the HF Inference API via call_hf_model().

  2. Security – per-token PoE acceptance / verifier-ensemble entropy / stability radii are returned by app.backend.hydra_inference.generate() and packaged via app.backend.response_payloads.poe_security().

  3. Uncertaintyp_correct from the uncertainty head for MCQ; for NLP we run olmo_tap.constants.KLE_N_SAMPLES extra samples and convert their NLI similarity matrix into a Kernel Language Entropy certainty score.

  4. Robustnessapp.backend.hydra_inference.get_robustness() retries the prompt with each adversarial suffix in ADV_SUFFIXES and reports how many flipped the answer.

The two heavyweight models (Hydra + ModernBERT-NLI) are loaded once during the FastAPI lifespan and stashed in module-level dicts so request handlers can grab them without re-loading. On Modal the @modal.enter() hook in app.backend.modal_app preloads Hydra into the same dicts before the ASGI app boots; the lifespan detects this and skips the duplicate load.

Functions

analyse(request[, hf])

Score a chat prompt for security, uncertainty, robustness and per-claim confidence.

call_hf_model(messages)

Call the HF Inference API as a fallback when Hydra is unavailable or bypassed.

health()

Lightweight liveness probe used by Cloudflare Pages, uptime checks and Modal's health monitor.

lifespan(app)

FastAPI lifespan that loads (or re-uses preloaded) Hydra and BERT models.

Classes

ChatRequest(*, messages)

Request body for analyse().

Message(*, role, content)

Single chat-completions message.

class app.backend.server.ChatRequest(*, messages: list[Message])[source]

Bases: BaseModel

Request body for analyse().

Parameters:

messages – Multi-turn chat history, oldest message first. The last element must have role == "user" and is the prompt that gets scored for uncertainty / security / robustness.

messages: list[Message]
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class app.backend.server.Message(*, role: str, content: str)[source]

Bases: BaseModel

Single chat-completions message.

Parameters:
  • role – One of "system", "user", "assistant" (matches the OpenAI / HF chat-completions schema).

  • content – Raw text content for that role.

content: str
model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

role: str
async app.backend.server.analyse(request: ChatRequest, hf: bool = False)[source]

Score a chat prompt for security, uncertainty, robustness and per-claim confidence.

Generation runs through the Hydra PoE ensemble unless hf=True is passed (or the ensemble failed to load), in which case the HF Inference API is used as a fallback and security / uncertainty / robustness are returned as unavailable-style payloads.

The claim ledger is independent of the generation backend: it always decomposes raw_response and scores each claim with NLI self-entailment when BERT is available. KLE-based uncertainty for NLP queries is computed here (not inside generate) because it requires KLE_N_SAMPLES extra forward passes.

Parameters:
  • request – Chat history; the last user turn is the prompt.

  • hf – Force the HF Inference API path even when Hydra is healthy. Useful for A/B comparisons against the unverified baseline.

Returns:

Dict with keys claims, overall_confidence, uncertainty, security, robustness, raw_response, model, is_mcq. See app.backend.response_payloads for the security/uncertainty/ robustness sub-schemas.

app.backend.server.call_hf_model(messages: list[dict]) str[source]

Call the HF Inference API as a fallback when Hydra is unavailable or bypassed.

Used when hf=true is passed to /api/analyse or when load_hydra failed at lifespan startup. No PoE verification is available in this path; the security payload from the caller reflects that with certified=None.

async app.backend.server.health()[source]

Lightweight liveness probe used by Cloudflare Pages, uptime checks and Modal’s health monitor.

Returns:

{"status": "ok"}. Does not touch model state, so a 200 here only means the ASGI process is up; readiness for Hydra requests is implicit in successful /api/analyse calls.

app.backend.server.lifespan(app: FastAPI)[source]

FastAPI lifespan that loads (or re-uses preloaded) Hydra and BERT models.

On Modal the @modal.enter() hook in app.backend.modal_app has already populated _models["hydra"] before this runs; the duplicate load would otherwise add ~30s of cold-start. BERT is always loaded here because Modal’s preload only warms Hydra.

Parameters:

app – FastAPI application instance (unused but required by the lifespan protocol).