"Newspaper Eat" Means "Not Tasty": A Taxonomy and Benchmark for Coded Language in Real-World Chinese Online Reviews
This repository provides datasets and code for the ACL 2026 long paper on Coded Language.
"Newspaper Eat" Means "Not Tasty": A Taxonomy and Benchmark for Coded Language in Real-World Chinese Online Reviews
Ruyuan Wan, Changye Li, Ting-Hao 'Kenneth' Huang
ACL 2026
Example of creative coded language in a Google Maps restaurant review.
Top: the original user review in Chinese, as posted on Google Maps.
Left: the machine-generated English translation produced by Google Translate.
Right: the inferred underlying meaning obtained by decoding phonetic substitutions.
We introduce CodedLang, a dataset of 7,744 Chinese Google Maps reviews, including 900 reviews with span-level annotations of coded language.
🤗 Hugging Face Dataset: https://huggingface.co/datasets/RuyuanWan/CodedLang
We define 7 categories of coded language:
- Ambiguous Homophones
- Non-Lexical Homophones
- Phonetic Substitution
- Emoji Substitution
- Orthographic Substitution
- Cross-Lingual Phonetic Encoding
- Cipher
Our dataset is constructed based on two large-scale Google Maps review datasets:
- Google Local Reviews (Li et al., 2022, Yan et al., 2023): 666 million reviews
- Google Restaurant Reviews (Yan et al., 2023): 1.77 million reviews
Each sample includes:
id: Index of the revieworiginal_review: Original Chinese review texttranslated_review: English translation of the original review- Note: 309 reviews are missing translations from the raw Google Reviews dataset and are kept as-is
rating: Review rating (1–5 stars)char_mask_review: Character-level masked version of the reviewspan_mask_review: Span-level masked version of the reviewdecode_review: Decoded version of the review, where coded language is replaced with the intended meaning.coded_lang_class: Taxonomy label(s) for coded language classescode_span: Text span of coded expressionscoded_language: Binary label1: contains coded language0: non-coded review
We provide a coded language dictionary derived from human annotations:
Each entry includes:
code_span: Coded expressiondecode: Decoded formcoded_lang_class: Corresponding coded language category
For phonetic-based coded language, we additionally provide:
mid_homophone: Intermediate homophone form (if applicable)code_pinyin: Pinyin of the coded expressiondecode_pinyin: Pinyin of the decoded expressioncode_ipa: IPA representation of the coded expressiondecode_ipa: IPA representation of the decoded expression
We evaluate language models on taxonomy-aware coded language classification.
Multi-label coded language classification performance across categories by DeepSeek-V3.2. The highest (lowest) scores in each column are highlighted in bold (underlined).
Results show substantial performance variation across coded language categories, cross-lingual phonetic encoding remaining particularly challenging for current language models.
For full benchmark results and analyses, please refer to the paper.
If you find this work useful for your research, please cite our paper:
@inproceedings{wan2026newspaper,
title={"Newspaper Eat" Means "Not Tasty": A Taxonomy and Benchmark for Coded Languages in Real-World Chinese Online Reviews},
author={Wan, Ruyuan and Li, Changye and Huang, Ting-Hao'Kenneth'},
booktitle={The 64th Annual Meeting of the Association for Computational Linguistics},
year={2026}
}

