-
Notifications
You must be signed in to change notification settings - Fork 736
MAINT: Add labels to attack results #1624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
b63361c
e8164aa
4ec494a
5e04ca7
fa5a0d7
aafefc1
fca41cb
5f179e6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -9,7 +9,7 @@ | |
| from datetime import datetime, timedelta, timezone | ||
| from typing import TYPE_CHECKING, Any, Optional, TypeVar, Union | ||
|
|
||
| from sqlalchemy import and_, create_engine, event, exists, text | ||
| from sqlalchemy import and_, create_engine, event, exists, or_, text | ||
| from sqlalchemy.engine.base import Engine | ||
| from sqlalchemy.exc import SQLAlchemyError | ||
| from sqlalchemy.orm import InstrumentedAttribute, joinedload, sessionmaker | ||
|
|
@@ -448,32 +448,52 @@ def _get_attack_result_label_condition(self, *, labels: dict[str, str]) -> Any: | |
| """ | ||
| Get the SQL Azure implementation for filtering AttackResults by labels. | ||
|
|
||
| Matches if the labels are found on the AttackResultEntry directly | ||
| OR on an associated PromptMemoryEntry (via conversation_id). | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we get rid of PromptMemoryEntry labels?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I'd rather have the route to only attack result labels to simplify things. And a data migration path for the databases
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am putting that in a separate PR (deprecate labels on message piece and its memory entries) and then, this OR will be removed too. Maybe I'm being over-cautious? should I do the flip in one go? |
||
|
|
||
| Uses JSON_VALUE() function specific to SQL Azure with parameterized queries. | ||
|
|
||
| Args: | ||
| labels (dict[str, str]): Dictionary of label key-value pairs to filter by. | ||
|
|
||
| Returns: | ||
| Any: SQLAlchemy exists subquery condition with bound parameters. | ||
| """ | ||
| # Build JSON conditions for all labels with parameterized queries | ||
| label_conditions = [] | ||
| bindparams_dict = {} | ||
| for key, value in labels.items(): | ||
| param_name = f"label_{key}" | ||
| label_conditions.append(f"JSON_VALUE(labels, '$.{key}') = :{param_name}") | ||
| bindparams_dict[param_name] = str(value) | ||
| Any: SQLAlchemy condition with bound parameters. | ||
| """ | ||
| # --- Direct match on AttackResultEntry.labels --- | ||
| ar_label_conditions = [] | ||
| ar_bindparams: dict[str, str] = {} | ||
| for i, (key, value) in enumerate(labels.items()): | ||
| path_param = f"ar_label_path_{i}" | ||
| value_param = f"ar_label_val_{i}" | ||
| ar_label_conditions.append(f'JSON_VALUE("AttackResultEntries".labels, :{path_param}) = :{value_param}') | ||
| ar_bindparams[path_param] = f"$.{key}" | ||
| ar_bindparams[value_param] = str(value) | ||
|
|
||
| ar_combined = " AND ".join(ar_label_conditions) | ||
| direct_condition = and_( | ||
| AttackResultEntry.labels.isnot(None), | ||
| text(f'ISJSON("AttackResultEntries".labels) = 1 AND {ar_combined}').bindparams(**ar_bindparams), | ||
| ) | ||
|
|
||
| combined_conditions = " AND ".join(label_conditions) | ||
| # --- Conversation-level match on PromptMemoryEntry.labels --- | ||
| pme_label_conditions = [] | ||
| pme_bindparams: dict[str, str] = {} | ||
| for key, value in labels.items(): | ||
| param_name = f"pme_label_{key}" | ||
| pme_label_conditions.append(f"JSON_VALUE(labels, '$.{key}') = :{param_name}") | ||
| pme_bindparams[param_name] = str(value) | ||
|
|
||
| return exists().where( | ||
| pme_combined = " AND ".join(pme_label_conditions) | ||
| conversation_condition = exists().where( | ||
| and_( | ||
| PromptMemoryEntry.conversation_id == AttackResultEntry.conversation_id, | ||
| PromptMemoryEntry.labels.isnot(None), | ||
| text(f"ISJSON(labels) = 1 AND {combined_conditions}").bindparams(**bindparams_dict), | ||
| text(f"ISJSON(labels) = 1 AND {pme_combined}").bindparams(**pme_bindparams), | ||
| ) | ||
| ) | ||
|
|
||
| return or_(direct_condition, conversation_condition) | ||
|
|
||
| def get_unique_attack_class_names(self) -> list[str]: | ||
| """ | ||
| Azure SQL implementation: extract unique class_name values from | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -697,6 +697,7 @@ class AttackResultEntry(Base): | |
| outcome (AttackOutcome): The outcome of the attack, indicating success, failure, or undetermined. | ||
| outcome_reason (str): Optional reason for the outcome, providing additional context. | ||
| attack_metadata (dict[str, Any]): Metadata can be included as key-value pairs to provide extra context. | ||
| labels (dict[str, str]): Optional labels associated with the attack result entry. | ||
| pruned_conversation_ids (List[str]): List of conversation IDs that were pruned from the attack. | ||
| adversarial_chat_conversation_ids (List[str]): List of conversation IDs used for adversarial chat. | ||
| timestamp (DateTime): The timestamp of the attack result entry. | ||
|
|
@@ -728,6 +729,7 @@ class AttackResultEntry(Base): | |
| ) | ||
| outcome_reason = mapped_column(String, nullable=True) | ||
| attack_metadata: Mapped[dict[str, Union[str, int, float, bool]]] = mapped_column(JSON, nullable=True) | ||
| labels: Mapped[dict[str, str]] = mapped_column(JSON, nullable=True) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Any reason for adding on AR and NOT removing them from MP? |
||
| pruned_conversation_ids: Mapped[Optional[list[str]]] = mapped_column(JSON, nullable=True) | ||
| adversarial_chat_conversation_ids: Mapped[Optional[list[str]]] = mapped_column(JSON, nullable=True) | ||
| timestamp = mapped_column(DateTime, nullable=False) | ||
|
|
@@ -783,6 +785,7 @@ def __init__(self, *, entry: AttackResult): | |
| self.outcome = entry.outcome.value | ||
| self.outcome_reason = entry.outcome_reason | ||
| self.attack_metadata = self.filter_json_serializable_metadata(entry.metadata) | ||
| self.labels = entry.labels or {} | ||
|
|
||
| # Persist conversation references by type | ||
| self.pruned_conversation_ids = [ | ||
|
|
@@ -894,6 +897,7 @@ def get_attack_result(self) -> AttackResult: | |
| outcome_reason=self.outcome_reason, | ||
| related_conversations=related_conversations, | ||
| metadata=self.attack_metadata or {}, | ||
| labels=self.labels or {}, | ||
| ) | ||
|
|
||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confused about this comment. Aren't attack-result labels conversation-level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
today, the lables on "attack summary" (a GUI-related concept) come from "conversation stats" (another GUI-related concept) ... conversation stats are an aggregation of message pieces, along with their labels, which in turn end up on the attack summary.
This change here is saying "attack summary's labels must come not only from conversation stats (i.e. message pieces) but also from attack resutls" ...
Once we remove labels from message_piece , this will also be removed and the only kind of label that ends up on attack summary will be the attack result ones.
this is making me think, also related to the other comment, maybe it's just a good idea to do both at the same time, instead of having an overlap period where both objects ,attack results and message pieces, have labels) - it will avoid confusions like this - I think I'll do this, after introducing a DB migration solution in #1631