You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- ZERO TOLERANCE FOR FALSE POSITIVES: If the text is standard, polite, or merely descriptive (e.g., "We use cookies", "Learn More", "Accept"), it MUST be labeled 'safe'.
108
+
- ZERO TOLERANCE FOR FALSE POSITIVES: If the text is standard, polite, or merely descriptive, it MUST be labeled 'safe'.
101
109
- CONTEXT MATTERS: "No thanks" is safe. "No, I prefer to pay more" is emotional_steering.
102
-
- DEFAULT TO SAFE: If you are less than 95% certain a pattern exists, return 'safe'.
103
110
104
-
STEP-BY-STEP REASONING:
105
-
1. Analyze the literal meaning of the text.
106
-
2. Evaluate the psychological intent (Is it steering, shaming, or confusing?).
107
-
3. Compare against the legal frameworks above.
108
-
109
111
OUTPUT FORMAT:
110
-
You must return a raw JSON object with this exact structure:
112
+
Return ONLY a JSON object. You MUST provide the "reasoning" key BEFORE the "category" key to ensure logical chain-of-thought analysis.
111
113
{{
112
-
"reasoning": "A 1-sentence legal justification for your decision.",
114
+
"reasoning": "CHAIN OF THOUGHT: Step-by-step, logically explain why this text violates user intent OR why it is perfectly safe.",
0 commit comments