Skip to content

[Weave] signals content-DOCS-1994#2367

Open
anastasiaguspan wants to merge 8 commits intomainfrom
signals-docs-1994
Open

[Weave] signals content-DOCS-1994#2367
anastasiaguspan wants to merge 8 commits intomainfrom
signals-docs-1994

Conversation

@anastasiaguspan
Copy link
Copy Markdown
Contributor

Description

Adding new Signals content (naming still TBD).
Signals is replacing the goto starter user journey, making existing 'monitors' the more advanced, custom path - as such it is moved to a subsequent page 'custom monitors'.

Testing

  • [ ]x Local build succeeds without errors (mint dev)
  • [ x] Local link check succeeds without errors (mint broken-links)
  • PR tests succeed

Related issues

@anastasiaguspan anastasiaguspan requested a review from a team as a code owner March 26, 2026 13:53
@github-actions
Copy link
Copy Markdown
Contributor

Images automagically compressed by Calibre's image-actions

Compression reduced images by 76.5%, saving 357.9 KB.

Filename Before After Improvement Visual comparison
weave/guides/evaluation/img/weave_signals_trace_reasoning.png 467.9 KB 109.9 KB 76.5% View diff

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 26, 2026

📚 Mintlify Preview Links

🔗 View Full Preview

✨ Added (1 total)

📄 Pages (1)

File Preview
weave/guides/evaluation/custom-monitors.mdx Custom Monitors

📝 Changed (2 total)

📄 Pages (1)

File Preview
weave/guides/evaluation/monitors.mdx Monitors
⚙️ Other (1)
File
docs.json

🤖 Generated automatically when Mintlify deployment succeeds
📍 Deployment: eb51648 at 2026-04-03 15:45:47 UTC

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 26, 2026

🔗 Link Checker Results

All links are valid!

No broken links were detected.

Checked against: https://wb-21fd5541-signals-docs-1994.mintlify.app

@mintlify
Copy link
Copy Markdown
Contributor

mintlify bot commented Mar 26, 2026

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
wandb 🟢 Ready View Preview Mar 26, 2026, 1:59 PM

Copy link
Copy Markdown
Contributor

@NiWaRe NiWaRe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the phrasing as "Monitor using built-in signals" - we're still debating on the exact name but this might be a good middle ground.

Three points that we could add:

  • The most natural way of "using" signals is through the dashboard under "Project" tab (still behind feature flag but can add in screenshots) and the added tags in the "Signals" column in the trace table (if you hover over a tag it shows you the reason and confidence score, for example here)
  • We might want to include the specific output pattern that will make Weave render the scorer output as a tag in case people want to create their own custom signals?
  • Could we add a disclaimer with potentially a form or email address to tell people that this is currently in private preview and be tested by reaching out to us?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

Images automagically compressed by Calibre's image-actions

Compression reduced images by 75.3%, saving 149.5 KB.

Filename Before After Improvement Visual comparison
weave/guides/evaluation/img/weave_signals_trace_hover.png 119.7 KB 34.6 KB 71.1% View diff
weave/guides/evaluation/img/weave_signals_project_dash.png 78.7 KB 14.4 KB 81.8% View diff

1 image did not require optimisation.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 3, 2026

Images automagically compressed by Calibre's image-actions

Compression reduced images by 75.2%, saving 135.4 KB.

Filename Before After Improvement Visual comparison
weave/guides/evaluation/img/weave_signals_trace_reasoning.png 180.0 KB 44.6 KB 75.2% View diff

2 images did not require optimisation.

Copy link
Copy Markdown
Contributor

@dbrian57 dbrian57 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, so I think this doc needs some rework. The opening section is very markety and contains three consecutive and sizable blocks of info before anything actionable is mentioned.

I would opt to:

  1. Rework the intro section to more plainly and concisely state what signals are and their benefits. I would also make sure to explicitly mention that they're a type of monitor.
  2. Give the same treatment to the intro of the "Available signals" section
  3. Move some wording into the intro section (I noted these in my annotations)
  4. Clarify the "How signals work" section. This might just need an intro adjustment, but I'm not sure. I just found it difficult to follow.

I'm happy to discuss any of this, obviously.


### When to use signals and custom monitors

Use [signals](/weave/guides/evaluation/monitors) to get started with production monitoring quickly, then add custom monitors for evaluation criteria specific to your application.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to just briefly explain what signals and custom monitors here are first before we move into when you should use each.

Signals provide a high-level monitoring solution designed to bridge this gap by offering automated, behavioral scoring for agents in production.

Signals utilize a robust backend infrastructure to provide real-time performance insights:
- Automated scoring: Every incoming production trace is automatically processed and scored based on predefined metrics.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these opening terms be bolded?


Signals provide a high-level monitoring solution designed to bridge this gap by offering automated, behavioral scoring for agents in production.

Signals utilize a robust backend infrastructure to provide real-time performance insights:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm skeptical about mentioning a backend infrastructure if customers don't have to manage/configure a backend infrastructure.

Also, I'm so sorry, but my old manager is yelling at me in my head about never using the word "utilize" when you could write "use" 😅

Signals utilize a robust backend infrastructure to provide real-time performance insights:
- Automated scoring: Every incoming production trace is automatically processed and scored based on predefined metrics.
- Infrastructure: Processing is powered by Coreweave compute and Coreweave GPUs to ensure scalability across millions of traces.
- Custom metrics: Developers can create specific metrics, such as response length or faithfulness to source material, to help understand exactly how an agent is behaving.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might need to rewrite the opening line a little bit, because it makes it sound like this is going to be a list of performance metrics that it can return, but these are really features of the feature.

By using signals within production, you can:
- Gain behavioral insight: Move beyond simple system metrics to understand if your agent is hallucinating, failing to follow conversation patterns, or losing grounding in its evidence.
- Automate alerts: Set up automated triggers that notify your team through tools like Slack when an agent's performance drops below a certain threshold.
- Accelerate the research loop: Use the scores and failure analyses generated by signals to identify specific weaknesses, which can then be used to kick off the research loop for offline model improvement, data annotation, or reinforcement learning.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Accelerate the research loop: Use the scores and failure analyses generated by signals to identify specific weaknesses, which can then be used to kick off the research loop for offline model improvement, data annotation, or reinforcement learning.
- Accelerate the research loop: Use the scores and failure analyses generated by signals to identify specific weaknesses, which you can use to research model improvement, data annotation, or reinforcement learning.

"kick off" is kind of an idiom, so I try to avoid it.

7. After running the script using several different statements, open the W&B UI and navigate to the **Traces** tab. Select any **LLMAsAJudgeScorer.score** trace to see the results.

![Monitor trace](/weave/guides/evaluation/img/monitors-4.png)
In modern agent development, standard system metrics like latency, token count, and cost are insufficient for understanding complex agent behavior. While inspecting individual traces provides deep insight, it is impossible to scale manual reviews across the millions of traces generated in a live environment.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should plainly state what signals are at the start. Otherwise, this kinda reads like a company blog post. We could amend this quickly by moving the second paragraph to the top and fixing it up a little at the end. We should also make the connection that signals are a type of monitor.

Also, I feel like this paragraph is very opinionated. I think we should consider nixing it entirely and in lieu of something that plainly states what Signals do and how they benefit the customer.


## Available signals

W&B Weave offers monitors with built-in signals: preset scorers that evaluate production traces for common quality issues and errors out of the box, with no custom setup. Each built-in signal uses a benchmarked LLM prompt to classify traces and saves the results as comma-delimited tags representing the detected issues.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this is more introductory stuff that should be at the top of the page.


W&B Weave offers monitors with built-in signals: preset scorers that evaluate production traces for common quality issues and errors out of the box, with no custom setup. Each built-in signal uses a benchmarked LLM prompt to classify traces and saves the results as comma-delimited tags representing the detected issues.

To start classifying traces immediately, enable signals from the Monitors page. Signals don't require prompt engineering or scorer configuration.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflows below guide people through this, so I don't think we need this line.


To start classifying traces immediately, enable signals from the Monitors page. Signals don't require prompt engineering or scorer configuration.

Signals use a [W&B Inference](/inference/) model to score traces, so no external API keys are required.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should either be in the intro area or the How signals work section. I'd opt for the introductory section and also add whether or not this costs customers additional money to use.


Each signal uses an LLM-as-a-judge approach to classify traces:

1. **Trace selection**: Quality signals evaluate successful root-level traces. Error signals evaluate failed traces. Child spans and intermediate calls are not scored.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like I'm missing context here. Are each of these a step that Weave goes through to assess and apply a signal to a trace?

Also, the "Trace selection" bullet introduces new information: quality signals only evaluate root-level traces. I feel like this should be mentioned earlier in "Available signals" so users don't wonder why child spans aren't scored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants