Fix unhashable type: 'list' when running built-in evaluators on multi-turn conversations#45853
Draft
Fix unhashable type: 'list' when running built-in evaluators on multi-turn conversations#45853
unhashable type: 'list' when running built-in evaluators on multi-turn conversations#45853Conversation
…versations
When a multi-turn conversation (2+ user-assistant pairs) is evaluated,
_aggregate_results() produces an evaluation_per_turn dict with list values
like {"coherence_result": ["pass", "pass"]}. After
_flatten_evaluation_per_turn_columns() processes this, it creates DataFrame
columns like "outputs.coherence.evaluation_per_turn.coherence_result" with
list values (one list per row).
_aggregation_binary_output() was then picking up these per-turn columns
(they end with "_result") and calling value_counts() on them, which fails
with TypeError: unhashable type: 'list' because the values are Python lists.
Fix: exclude columns containing "evaluation_per_turn" from binary pass/fail
aggregation in _aggregation_binary_output().
Co-authored-by: mikhail <3210918+mikhail@users.noreply.github.com>
Agent-Logs-Url: https://github.com/Azure/azure-sdk-for-python/sessions/964040dc-0cc3-4d2f-8f9b-47c84c3efed5
Copilot
AI
changed the title
[WIP] Fix EvaluationException in azure-ai-evaluation package
Fix Mar 23, 2026
unhashable type: 'list' when running built-in evaluators on multi-turn conversations
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Running built-in evaluators (e.g.,
CoherenceEvaluator) against multi-turn conversation input via a target function raisesEvaluationException: (InternalError) unhashable type: 'list'.Root cause
For multi-turn conversations (2+ user-assistant pairs),
_aggregate_results()stores per-turn results under anevaluation_per_turnkey with list values — e.g.,{"coherence_result": ["pass", "fail"]}. After_flatten_evaluation_per_turn_columns()expands this into individual DataFrame columns, columns likeoutputs.coherence.evaluation_per_turn.coherence_resultend up with Python list values (one list per row)._aggregation_binary_output()selects all columns ending with_result, which incorrectly includes these per-turn columns. Callingpd.Series.value_counts()on a Series of Python lists raisesTypeError: unhashable type: 'list', which surfaces asEvaluationException: (InternalError) unhashable type: 'list'.Changes
_evaluate.py— Add"evaluation_per_turn" not in colguard to theresult_columnsfilter in_aggregation_binary_output(). Per-turn columns hold aggregated list values that are not suitable for scalar pass/fail counting; only top-level_resultcolumns should be processed here.test_evaluate.py— Addtest_aggregation_binary_output_skips_evaluation_per_turn_columnscovering a DataFrame with both a scalarcoherence_resultcolumn and a list-valuedevaluation_per_turn.coherence_resultcolumn, asserting the latter is excluded and no exception is raised.Warning
Firewall rules blocked me from connecting to one or more addresses (expand for details)
I tried to connect to the following addresses, but was blocked by firewall rules:
pypi.org/home/REDACTED/work/azure-sdk-for-python/azure-sdk-for-python/.venv/bin/python /home/REDACTED/work/azure-sdk-for-python/azure-sdk-for-python/.venv/bin/python /home/REDACTED/work/azure-sdk-for-python/azure-sdk-for-python/.venv/lib/python3.9/site-packages/pip/__pip-REDACTED__.py install --ignore-installed --no-user --prefix /tmp/pip-build-env-8i36t6_a/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i REDACTED -- setuptools>=40.8.0(dns block)/home/REDACTED/work/azure-sdk-for-python/azure-sdk-for-python/.venv/bin/python /home/REDACTED/work/azure-sdk-for-python/azure-sdk-for-python/.venv/bin/python /home/REDACTED/work/azure-sdk-for-python/azure-sdk-for-python/.venv/lib/python3.9/site-packages/pip/__pip-REDACTED__.py install --ignore-installed --no-user --prefix /tmp/pip-build-env-gdcyzki8/overlay --no-warn-script-location --no-binary :none: --only-binary :none: -i REDACTED -- setuptools>=40.8.0(dns block)/home/REDACTED/work/azure-sdk-for-python/azure-sdk-for-python/.venv/bin/pip pip install azure-ai-evaluation==1.15.0 -q bute�� butes/__init__.py ness/_service_groundedness.py ness/__init__.py _init__.py task_adherence.p--norc al/__init__.py al/_document_retrieval.py ound�� nit__.py efficiency/_task_navigation_efficiency.py efficiency/__init__.py(dns block)If you need me to access, download, or install something from one of these locations, you can either:
Original prompt
📍 Connect Copilot coding agent with Jira, Azure Boards or Linear to delegate work to Copilot in one click without leaving your project management tool.