Fix `unhashable type: 'list'` when running built-in evaluators on multi-turn conversations by Copilot · Pull Request #45853 · Azure/azure-sdk-for-python

Copilot · 2026-03-23T18:49:53Z

Running built-in evaluators (e.g., CoherenceEvaluator) against multi-turn conversation input via a target function raises EvaluationException: (InternalError) unhashable type: 'list'.

Root cause

For multi-turn conversations (2+ user-assistant pairs), _aggregate_results() stores per-turn results under an evaluation_per_turn key with list values — e.g., {"coherence_result": ["pass", "fail"]}. After _flatten_evaluation_per_turn_columns() expands this into individual DataFrame columns, columns like outputs.coherence.evaluation_per_turn.coherence_result end up with Python list values (one list per row).

_aggregation_binary_output() selects all columns ending with _result, which incorrectly includes these per-turn columns. Calling pd.Series.value_counts() on a Series of Python lists raises TypeError: unhashable type: 'list', which surfaces as EvaluationException: (InternalError) unhashable type: 'list'.

Changes

_evaluate.py — Add "evaluation_per_turn" not in col guard to the result_columns filter in _aggregation_binary_output(). Per-turn columns hold aggregated list values that are not suitable for scalar pass/fail counting; only top-level _result columns should be processed here.
test_evaluate.py — Add test_aggregation_binary_output_skips_evaluation_per_turn_columns covering a DataFrame with both a scalar coherence_result column and a list-valued evaluation_per_turn.coherence_result column, asserting the latter is excluded and no exception is raised.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

pypi.org
- Triggering command: /home/REDACTED/work/azure-sdk-for-python/azure-sdk-for-python/.venv/bin/python /home/REDACTED/work/azure-sdk-for-python/azure-sdk-for-python/.venv/bin/python /home/REDACTED/work/azure-sdk-for-python/azure-sdk-for-python/.venv/lib/python3.9/site-packages/pip/__pip-REDACTED__.py install --ignore-installed --no-user --prefix /tmp/pip-build-env-8i36t6_a/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i REDACTED -- setuptools>=40.8.0 (dns block)
- Triggering command: /home/REDACTED/work/azure-sdk-for-python/azure-sdk-for-python/.venv/bin/python /home/REDACTED/work/azure-sdk-for-python/azure-sdk-for-python/.venv/bin/python /home/REDACTED/work/azure-sdk-for-python/azure-sdk-for-python/.venv/lib/python3.9/site-packages/pip/__pip-REDACTED__.py install --ignore-installed --no-user --prefix /tmp/pip-build-env-gdcyzki8/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i REDACTED -- setuptools>=40.8.0 (dns block)
- Triggering command: /home/REDACTED/work/azure-sdk-for-python/azure-sdk-for-python/.venv/bin/pip pip install azure-ai-evaluation==1.15.0 -q bute�� butes/__init__.py ness/_service_groundedness.py ness/__init__.py _init__.py task_adherence.p--norc al/__init__.py al/_document_retrieval.py ound�� nit__.py efficiency/_task_navigation_efficiency.py efficiency/__init__.py (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Configure Actions setup steps to set up my environment, which run before the firewall is enabled
Add the appropriate URLs or hosts to the custom allowlist in this repository's Copilot coding agent settings (admins only)

Original prompt

This section details on the original issue you should resolve

<issue_title>[BUG] azure-ai-evaluation: Cannot run built-in evaluators against multi-turn conversation input</issue_title>
<issue_description>- Package Name: azure-ai-evaluation

Package Version: 1.15.0
Operating System: Windows

Describe the bug
Error raised when running the following script.
Error: azure.ai.evaluation._exceptions.EvaluationException: (InternalError) unhashable type: 'list'

To Reproduce
data.csv:

conversation
"{""messages"": [{""role"": ""user"", ""content"": ""Hi""}, {""role"": ""assistant"", ""content"": ""Hello""}]}"
"{""messages"": [{""role"": ""user"", ""content"": ""Hi""}, {""role"": ""assistant"", ""content"": ""Hello""}, {""role"": ""user"", ""content"": ""How are you?""}, {""role"": ""assistant"", ""content"": ""I am fine""}]}"

main.py:

import json
import os
from azure.ai.evaluation import (
    evaluate,
    CoherenceEvaluator,
    AzureOpenAIModelConfiguration,
)
from azure.identity import DefaultAzureCredential

MODEL_CONFIG = AzureOpenAIModelConfiguration(
    azure_endpoint="xxx",
    azure_deployment="xxx",
    api_version="2024-12-01-preview",
)


def target_function(conversation: str):
    return {"conversation": json.loads(conversation)}


if __name__ == "__main__":
    data_path = os.path.join(os.path.dirname(__file__), "data.csv")

    result = evaluate(
        data=data_path,
        target=target_function,
        evaluators={
            "coherence": CoherenceEvaluator(
                model_config=MODEL_CONFIG,
                credential=DefaultAzureCredential(),
                # is_reasoning_model=True,
            )
        },
        evaluator_config={
            "default": {
                "column_mapping": {
                    "conversation": "${target.conversation}",
                }
            }
        },
    )
    print(result)
```</issue_description>

## Comments on the Issue (you are @copilot in this section)

<comments>
<comment_new><author>@w-javed</author><body>
@Wixee Were you able to verify same with jsonl format. 
For example:
https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/evaluation/azure-ai-evaluation/tests/e2etests/data/evaluate_test_data_conversation.jsonl</body></comment_new>
<comment_new><author>@w-javed</author><body>
If you want to use CSV format. 
Here is an example. 
https://github.com/Azure-Samples/azureai-samples/blob/main/scenarios/evaluate/evaluate_with_various_inputs/qr_data.csv?plain=1

Please remove unnecessary double quotes. </body></comment_new>
</comments>

Fixes [BUG] azure-ai-evaluation: Cannot run built-in evaluators against multi-turn conversation input #45077

📍 Connect Copilot coding agent with Jira, Azure Boards or Linear to delegate work to Copilot in one click without leaving your project management tool.

…versations When a multi-turn conversation (2+ user-assistant pairs) is evaluated, _aggregate_results() produces an evaluation_per_turn dict with list values like {"coherence_result": ["pass", "pass"]}. After _flatten_evaluation_per_turn_columns() processes this, it creates DataFrame columns like "outputs.coherence.evaluation_per_turn.coherence_result" with list values (one list per row). _aggregation_binary_output() was then picking up these per-turn columns (they end with "_result") and calling value_counts() on them, which fails with TypeError: unhashable type: 'list' because the values are Python lists. Fix: exclude columns containing "evaluation_per_turn" from binary pass/fail aggregation in _aggregation_binary_output(). Co-authored-by: mikhail <3210918+mikhail@users.noreply.github.com> Agent-Logs-Url: https://github.com/Azure/azure-sdk-for-python/sessions/964040dc-0cc3-4d2f-8f9b-47c84c3efed5

Initial plan

88e9809

Copilot AI assigned Copilot and mikhail Mar 23, 2026

Copilot started work on behalf of mikhail March 23, 2026 18:49 View session

Copilot AI changed the title ~~[WIP] Fix EvaluationException in azure-ai-evaluation package~~ Fix unhashable type: 'list' when running built-in evaluators on multi-turn conversations Mar 23, 2026

Copilot AI requested a review from mikhail March 23, 2026 19:29

Copilot finished work on behalf of mikhail March 23, 2026 19:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `unhashable type: 'list'` when running built-in evaluators on multi-turn conversations#45853

Fix `unhashable type: 'list'` when running built-in evaluators on multi-turn conversations#45853
Copilot wants to merge 2 commits intomainfrom
copilot/fix-evaluation-exception

Copilot AI commented Mar 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Root cause

Changes

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Mar 23, 2026 •

edited

Loading