Conversation
|
Images automagically compressed by Calibre's image-actions ✨ Compression reduced images by 76.5%, saving 357.9 KB.
|
📚 Mintlify Preview Links✨ Added (1 total)📄 Pages (1)
📝 Changed (2 total)📄 Pages (1)
⚙️ Other (1)
🤖 Generated automatically when Mintlify deployment succeeds |
🔗 Link Checker Results✅ All links are valid! No broken links were detected. Checked against: https://wb-21fd5541-signals-docs-1994.mintlify.app |
|
Preview deployment for your docs. Learn more about Mintlify Previews.
|
NiWaRe
left a comment
There was a problem hiding this comment.
I like the phrasing as "Monitor using built-in signals" - we're still debating on the exact name but this might be a good middle ground.
Three points that we could add:
- The most natural way of "using" signals is through the dashboard under "Project" tab (still behind feature flag but can add in screenshots) and the added tags in the "Signals" column in the trace table (if you hover over a tag it shows you the reason and confidence score, for example here)
- We might want to include the specific output pattern that will make Weave render the scorer output as a tag in case people want to create their own custom signals?
- Could we add a disclaimer with potentially a form or email address to tell people that this is currently in private preview and be tested by reaching out to us?
|
Images automagically compressed by Calibre's image-actions ✨ Compression reduced images by 75.3%, saving 149.5 KB.
1 image did not require optimisation. |
7a90513 to
7618dd1
Compare
|
Images automagically compressed by Calibre's image-actions ✨ Compression reduced images by 75.2%, saving 135.4 KB.
2 images did not require optimisation. |
dbrian57
left a comment
There was a problem hiding this comment.
Hey, so I think this doc needs some rework. The opening section is very markety and contains three consecutive and sizable blocks of info before anything actionable is mentioned.
I would opt to:
- Rework the intro section to more plainly and concisely state what signals are and their benefits. I would also make sure to explicitly mention that they're a type of monitor.
- Give the same treatment to the intro of the "Available signals" section
- Move some wording into the intro section (I noted these in my annotations)
- Clarify the "How signals work" section. This might just need an intro adjustment, but I'm not sure. I just found it difficult to follow.
I'm happy to discuss any of this, obviously.
|
|
||
| ### When to use signals and custom monitors | ||
|
|
||
| Use [signals](/weave/guides/evaluation/monitors) to get started with production monitoring quickly, then add custom monitors for evaluation criteria specific to your application. |
There was a problem hiding this comment.
I think we need to just briefly explain what signals and custom monitors here are first before we move into when you should use each.
| Signals provide a high-level monitoring solution designed to bridge this gap by offering automated, behavioral scoring for agents in production. | ||
|
|
||
| Signals utilize a robust backend infrastructure to provide real-time performance insights: | ||
| - Automated scoring: Every incoming production trace is automatically processed and scored based on predefined metrics. |
There was a problem hiding this comment.
Should these opening terms be bolded?
|
|
||
| Signals provide a high-level monitoring solution designed to bridge this gap by offering automated, behavioral scoring for agents in production. | ||
|
|
||
| Signals utilize a robust backend infrastructure to provide real-time performance insights: |
There was a problem hiding this comment.
I'm skeptical about mentioning a backend infrastructure if customers don't have to manage/configure a backend infrastructure.
Also, I'm so sorry, but my old manager is yelling at me in my head about never using the word "utilize" when you could write "use" 😅
| Signals utilize a robust backend infrastructure to provide real-time performance insights: | ||
| - Automated scoring: Every incoming production trace is automatically processed and scored based on predefined metrics. | ||
| - Infrastructure: Processing is powered by Coreweave compute and Coreweave GPUs to ensure scalability across millions of traces. | ||
| - Custom metrics: Developers can create specific metrics, such as response length or faithfulness to source material, to help understand exactly how an agent is behaving. |
There was a problem hiding this comment.
I think we might need to rewrite the opening line a little bit, because it makes it sound like this is going to be a list of performance metrics that it can return, but these are really features of the feature.
| By using signals within production, you can: | ||
| - Gain behavioral insight: Move beyond simple system metrics to understand if your agent is hallucinating, failing to follow conversation patterns, or losing grounding in its evidence. | ||
| - Automate alerts: Set up automated triggers that notify your team through tools like Slack when an agent's performance drops below a certain threshold. | ||
| - Accelerate the research loop: Use the scores and failure analyses generated by signals to identify specific weaknesses, which can then be used to kick off the research loop for offline model improvement, data annotation, or reinforcement learning. |
There was a problem hiding this comment.
| - Accelerate the research loop: Use the scores and failure analyses generated by signals to identify specific weaknesses, which can then be used to kick off the research loop for offline model improvement, data annotation, or reinforcement learning. | |
| - Accelerate the research loop: Use the scores and failure analyses generated by signals to identify specific weaknesses, which you can use to research model improvement, data annotation, or reinforcement learning. |
"kick off" is kind of an idiom, so I try to avoid it.
| 7. After running the script using several different statements, open the W&B UI and navigate to the **Traces** tab. Select any **LLMAsAJudgeScorer.score** trace to see the results. | ||
|
|
||
|  | ||
| In modern agent development, standard system metrics like latency, token count, and cost are insufficient for understanding complex agent behavior. While inspecting individual traces provides deep insight, it is impossible to scale manual reviews across the millions of traces generated in a live environment. |
There was a problem hiding this comment.
I think we should plainly state what signals are at the start. Otherwise, this kinda reads like a company blog post. We could amend this quickly by moving the second paragraph to the top and fixing it up a little at the end. We should also make the connection that signals are a type of monitor.
Also, I feel like this paragraph is very opinionated. I think we should consider nixing it entirely and in lieu of something that plainly states what Signals do and how they benefit the customer.
|
|
||
| ## Available signals | ||
|
|
||
| W&B Weave offers monitors with built-in signals: preset scorers that evaluate production traces for common quality issues and errors out of the box, with no custom setup. Each built-in signal uses a benchmarked LLM prompt to classify traces and saves the results as comma-delimited tags representing the detected issues. |
There was a problem hiding this comment.
I feel like this is more introductory stuff that should be at the top of the page.
|
|
||
| W&B Weave offers monitors with built-in signals: preset scorers that evaluate production traces for common quality issues and errors out of the box, with no custom setup. Each built-in signal uses a benchmarked LLM prompt to classify traces and saves the results as comma-delimited tags representing the detected issues. | ||
|
|
||
| To start classifying traces immediately, enable signals from the Monitors page. Signals don't require prompt engineering or scorer configuration. |
There was a problem hiding this comment.
The workflows below guide people through this, so I don't think we need this line.
|
|
||
| To start classifying traces immediately, enable signals from the Monitors page. Signals don't require prompt engineering or scorer configuration. | ||
|
|
||
| Signals use a [W&B Inference](/inference/) model to score traces, so no external API keys are required. |
There was a problem hiding this comment.
Probably should either be in the intro area or the How signals work section. I'd opt for the introductory section and also add whether or not this costs customers additional money to use.
|
|
||
| Each signal uses an LLM-as-a-judge approach to classify traces: | ||
|
|
||
| 1. **Trace selection**: Quality signals evaluate successful root-level traces. Error signals evaluate failed traces. Child spans and intermediate calls are not scored. |
There was a problem hiding this comment.
I feel like I'm missing context here. Are each of these a step that Weave goes through to assess and apply a signal to a trace?
Also, the "Trace selection" bullet introduces new information: quality signals only evaluate root-level traces. I feel like this should be mentioned earlier in "Available signals" so users don't wonder why child spans aren't scored.
Description
Adding new Signals content (naming still TBD).
Signals is replacing the goto starter user journey, making existing 'monitors' the more advanced, custom path - as such it is moved to a subsequent page 'custom monitors'.
Testing
mint dev)mint broken-links)Related issues