Scoring¶
Every signal carries three numeric fields and two categorical fields. The dashboard surfaces each of them; here's what they mean — and exactly how each number is computed.
How a single agent scores¶
Five specialist agents — SignalScope, SentimentPulse, MacroMind, SectorScan, PolicyRadar — look at every news item independently. Each agent runs the same three-step pipeline:
1. Gather domain-specific evidence. The agent's prompt requires it to call the tools relevant to its specialty before scoring:
| Agent | Tools it must call before scoring |
|---|---|
| SignalScope | geopolitical / sanctions / macro-event lookups, RSS, web search |
| SentimentPulse | Reddit sentiment, Stocktwits multi/short ratio, YouTube sentiment |
| MacroMind | central-bank rates, CPI, FX, oil / gold |
| SectorScan | same-sector and supply-chain peer lookups |
| PolicyRadar | regulatory filings, central-bank speeches, SEC / ASIC enforcement |
2. Score against an explicit calibration table. The LLM is anchored by this table, written verbatim into every agent's prompt:
| Range | Meaning |
|---|---|
| 0.80 – 1.00 | Direct major impact (ban names your stock, bankruptcy, huge earnings surprise) |
| 0.65 – 0.79 | Significant indirect impact (sector-wide move, key customer/supplier affected) |
| 0.40 – 0.64 | Moderate relevance (industry trend, macro data in line) |
| 0.00 – 0.39 | Weak signal (general commentary, distant connection) |
The prompt also says, verbatim: "DO NOT suppress scores out of caution — if a stock is directly named with clear impact, score it 0.75+." This is a deliberate counter-bias against LLMs' tendency to under-rate.
3. Self-critique (Reflexion pattern). Each agent runs a REVISE node that re-reads its own draft and may adjust the scores in a second pass. The final scores are the post-reflexion ones; the original draft is kept as draft_impact_score for audit.
Important caveats about agent scores
- These are LLM judgments, not formulas. The same news may produce slightly different scores on different runs (typical drift ≈ ±0.03).
confidence_scoremeasures how sure the agent is of its own assessment — not how big the news is. They are independent dimensions.- When an agent's tool calls fail (e.g. external API down) it still scores, but its
confidence_scoreshould drop accordingly.
How the signal-level scores are aggregated¶
After all 5 agents have run, three different aggregation rules combine their outputs. The rule is different for each number you see in the UI — this matters and is the most-asked question:
| Field in the UI | Aggregation rule | Why this rule |
|---|---|---|
Signal impact_score |
max(agent.impact_score) across agents that succeeded |
The most-impacted view wins. If even one specialist sees a major event, the signal is treated as major — better one false alert than missing a real one. |
Signal confidence_score |
mean(agent.confidence_score) across agents that succeeded |
Consensus on certainty — if four agents are uncertain, the signal carries that uncertainty even if the fifth is sure. |
Per-ticker impact_score in the tickers_linked list |
Weighted average across the agents that mentioned this ticker, weighted by each agent's confidence_score |
For a single stock you want the consensus impact, not the most extreme view; the weighting lets confident agents have more say. |
Source of truth in the code: agents/synthesis_engine.py:308-311 for the signal-level rules and agents/synthesis_engine.py:218-251 for the per-ticker rule.
Degraded mode. If fewer than 3 of the 5 agents return successfully, the urgency is automatically downgraded one tier (FLASH → ALERT, ALERT → NOTE, NOTE → FYI). This protects against a single confidently-scoring agent overwhelming a degraded run.
Urgency¶
A bucket label, not a number. It controls where and how loudly the signal is delivered. The four tiers map to shared/utils.apply_urgency_thresholds in the code and render as distinctly coloured badges on every signal card (red / orange / gold / grey).
| Urgency | Threshold | Badge | Delivery |
|---|---|---|---|
| FLASH | impact ≥ 0.80 and event is "flash-tier" (sanctions, bankruptcy, rate surprise, export ban, geopolitical conflict, ...) | red | Telegram + browser push, requireInteraction sticky banner |
| ALERT | impact ≥ 0.65 | orange | Telegram + browser push |
| NOTE | impact ≥ 0.40 | gold | Dashboard only (no push) |
| FYI | impact < 0.40 | grey | Dashboard only, at the bottom of the feed |
The flash event-type list is explicit (export_ban, sanctions, tariffs, rate_surprise, bankruptcy, trading_halt, geopolitical_conflict, fraud, earnings_shock, regulatory_action). A moderate-impact item can still land as FLASH if PolicyRadar or MacroMind tag it with one of these event types.
Why FLASH is rare
Most earnings / analyst / supply-chain news tops out at ALERT tier even when clearly noteworthy. FLASH is reserved for the "wake up now" events; if your feed shows mostly FLASH, the threshold logic is miscalibrated — open an issue.
Impact score¶
A number in [0.0, 1.0]. Aggregation is max (see the table above). Interpretation for end users:
- 0.80 + — likely to move the stock notably in the next session. Rare, usually policy or M&A.
- 0.65 – 0.80 — clearly relevant; worth reading. Most ALERT-tier signals live here.
- 0.40 – 0.65 — incremental news, interesting context, unlikely to move price alone.
- Below 0.40 — filler. Shown because it mentions your watchlist, but low priority.
The UI shades the impact bar: red for 0.70+, amber for 0.45–0.70, blue for below 0.45.
Confidence score¶
Also [0.0, 1.0]. Aggregation is mean of successful agents' self-reported confidence. Confidence is not about the news being important — it's about how sure each agent is of its own assessment. High-impact low-confidence is legitimate (e.g. rumour-sourced M&A); low-impact high-confidence is also legitimate (routine earnings beat already priced in).
Treat the two scores independently:
- High impact + high confidence → genuine signal, act on it responsibly.
- High impact + low confidence → worth noticing, treat as hypothesis.
- Low impact + high confidence → noise, well-identified noise.
- Low impact + low confidence → ignore.
Per-ticker relevance¶
Every signal carries a tickers_linked list — one entry per ticker touched by the news, with its own impact score and a link_type:
direct— the news explicitly mentions this ticker or its company by name.sector— the news is about an adjacent company or market event; this ticker is propagated by the sector / supply-chain agent.chain— supply-chain link (key customer, supplier, partner) inferred by SectorScan.macro— broad asset-class moves (all REITs, all banks); reserved for genuinely sector-wide events.
Per-ticker impact uses the weighted average rule, not max, because for a single stock you want the consensus impact, not the most extreme view. The dashboard sorts these descending so the most relevant ticker for each signal is the first bar.
Signal type¶
The agents tag each signal with a short string describing what kind of news it is — earnings, guidance_change, m_and_a, policy_change, supply_chain, macro_event, routine_news, sentiment_surge. This is displayed next to the urgency badge as a small muted pill. Useful for filtering and future rule matching.
See also¶
- Agents — who produces these scores
- Claw chat (RAG) — how the same scores are used for retrieval ranking