Skip to content

Fix unhashable type: 'list' when running built-in evaluators on multi-turn conversations#45853

Draft
Copilot wants to merge 2 commits intomainfrom
copilot/fix-evaluation-exception
Draft

Fix unhashable type: 'list' when running built-in evaluators on multi-turn conversations#45853
Copilot wants to merge 2 commits intomainfrom
copilot/fix-evaluation-exception

Conversation

Copy link
Contributor

Copilot AI commented Mar 23, 2026

Running built-in evaluators (e.g., CoherenceEvaluator) against multi-turn conversation input via a target function raises EvaluationException: (InternalError) unhashable type: 'list'.

Root cause

For multi-turn conversations (2+ user-assistant pairs), _aggregate_results() stores per-turn results under an evaluation_per_turn key with list values — e.g., {"coherence_result": ["pass", "fail"]}. After _flatten_evaluation_per_turn_columns() expands this into individual DataFrame columns, columns like outputs.coherence.evaluation_per_turn.coherence_result end up with Python list values (one list per row).

_aggregation_binary_output() selects all columns ending with _result, which incorrectly includes these per-turn columns. Calling pd.Series.value_counts() on a Series of Python lists raises TypeError: unhashable type: 'list', which surfaces as EvaluationException: (InternalError) unhashable type: 'list'.

Changes

  • _evaluate.py — Add "evaluation_per_turn" not in col guard to the result_columns filter in _aggregation_binary_output(). Per-turn columns hold aggregated list values that are not suitable for scalar pass/fail counting; only top-level _result columns should be processed here.

  • test_evaluate.py — Add test_aggregation_binary_output_skips_evaluation_per_turn_columns covering a DataFrame with both a scalar coherence_result column and a list-valued evaluation_per_turn.coherence_result column, asserting the latter is excluded and no exception is raised.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • pypi.org
    • Triggering command: /home/REDACTED/work/azure-sdk-for-python/azure-sdk-for-python/.venv/bin/python /home/REDACTED/work/azure-sdk-for-python/azure-sdk-for-python/.venv/bin/python /home/REDACTED/work/azure-sdk-for-python/azure-sdk-for-python/.venv/lib/python3.9/site-packages/pip/__pip-REDACTED__.py install --ignore-installed --no-user --prefix /tmp/pip-build-env-8i36t6_a/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i REDACTED -- setuptools>=40.8.0 (dns block)
    • Triggering command: /home/REDACTED/work/azure-sdk-for-python/azure-sdk-for-python/.venv/bin/python /home/REDACTED/work/azure-sdk-for-python/azure-sdk-for-python/.venv/bin/python /home/REDACTED/work/azure-sdk-for-python/azure-sdk-for-python/.venv/lib/python3.9/site-packages/pip/__pip-REDACTED__.py install --ignore-installed --no-user --prefix /tmp/pip-build-env-gdcyzki8/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i REDACTED -- setuptools>=40.8.0 (dns block)
    • Triggering command: /home/REDACTED/work/azure-sdk-for-python/azure-sdk-for-python/.venv/bin/pip pip install azure-ai-evaluation==1.15.0 -q bute�� butes/__init__.py ness/_service_groundedness.py ness/__init__.py _init__.py task_adherence.p--norc al/__init__.py al/_document_retrieval.py ound�� nit__.py efficiency/_task_navigation_efficiency.py efficiency/__init__.py (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

This section details on the original issue you should resolve

<issue_title>[BUG] azure-ai-evaluation: Cannot run built-in evaluators against multi-turn conversation input</issue_title>
<issue_description>- Package Name: azure-ai-evaluation

  • Package Version: 1.15.0
  • Operating System: Windows

Describe the bug
Error raised when running the following script.
Error: azure.ai.evaluation._exceptions.EvaluationException: (InternalError) unhashable type: 'list'

To Reproduce
data.csv:

conversation
"{""messages"": [{""role"": ""user"", ""content"": ""Hi""}, {""role"": ""assistant"", ""content"": ""Hello""}]}"
"{""messages"": [{""role"": ""user"", ""content"": ""Hi""}, {""role"": ""assistant"", ""content"": ""Hello""}, {""role"": ""user"", ""content"": ""How are you?""}, {""role"": ""assistant"", ""content"": ""I am fine""}]}"

main.py:

import json
import os
from azure.ai.evaluation import (
    evaluate,
    CoherenceEvaluator,
    AzureOpenAIModelConfiguration,
)
from azure.identity import DefaultAzureCredential

MODEL_CONFIG = AzureOpenAIModelConfiguration(
    azure_endpoint="xxx",
    azure_deployment="xxx",
    api_version="2024-12-01-preview",
)


def target_function(conversation: str):
    return {"conversation": json.loads(conversation)}


if __name__ == "__main__":
    data_path = os.path.join(os.path.dirname(__file__), "data.csv")

    result = evaluate(
        data=data_path,
        target=target_function,
        evaluators={
            "coherence": CoherenceEvaluator(
                model_config=MODEL_CONFIG,
                credential=DefaultAzureCredential(),
                # is_reasoning_model=True,
            )
        },
        evaluator_config={
            "default": {
                "column_mapping": {
                    "conversation": "${target.conversation}",
                }
            }
        },
    )
    print(result)
```</issue_description>

## Comments on the Issue (you are @copilot in this section)

<comments>
<comment_new><author>@w-javed</author><body>
@Wixee Were you able to verify same with jsonl format. 
For example:
https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/evaluation/azure-ai-evaluation/tests/e2etests/data/evaluate_test_data_conversation.jsonl</body></comment_new>
<comment_new><author>@w-javed</author><body>
If you want to use CSV format. 
Here is an example. 
https://github.com/Azure-Samples/azureai-samples/blob/main/scenarios/evaluate/evaluate_with_various_inputs/qr_data.csv?plain=1

Please remove unnecessary double quotes. </body></comment_new>
</comments>

📍 Connect Copilot coding agent with Jira, Azure Boards or Linear to delegate work to Copilot in one click without leaving your project management tool.

…versations

When a multi-turn conversation (2+ user-assistant pairs) is evaluated,
_aggregate_results() produces an evaluation_per_turn dict with list values
like {"coherence_result": ["pass", "pass"]}. After
_flatten_evaluation_per_turn_columns() processes this, it creates DataFrame
columns like "outputs.coherence.evaluation_per_turn.coherence_result" with
list values (one list per row).

_aggregation_binary_output() was then picking up these per-turn columns
(they end with "_result") and calling value_counts() on them, which fails
with TypeError: unhashable type: 'list' because the values are Python lists.

Fix: exclude columns containing "evaluation_per_turn" from binary pass/fail
aggregation in _aggregation_binary_output().

Co-authored-by: mikhail <3210918+mikhail@users.noreply.github.com>
Agent-Logs-Url: https://github.com/Azure/azure-sdk-for-python/sessions/964040dc-0cc3-4d2f-8f9b-47c84c3efed5
Copilot AI changed the title [WIP] Fix EvaluationException in azure-ai-evaluation package Fix unhashable type: 'list' when running built-in evaluators on multi-turn conversations Mar 23, 2026
Copilot AI requested a review from mikhail March 23, 2026 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] azure-ai-evaluation: Cannot run built-in evaluators against multi-turn conversation input

2 participants