[Feature] PPL Highlight Command by RyanL1997 · Pull Request #5234 · opensearch-project/sql

RyanL1997 · 2026-03-14T17:01:23Z

Description

[Describe what this change achieves]

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

New functionality includes testing.
New functionality has been documented.
New functionality has javadoc added.
New functionality has a user manual doc added.
New PPL command checklist all confirmed.
API changes companion pull request created.
Commits are signed per the DCO using --signoff or -s.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

github-actions · 2026-03-14T17:02:22Z

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 5891541.

Path	Line	Severity	Description
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java	516	low	User-supplied highlight terms are interpolated directly into a Lucene query_string expression with only double-quote wrapping and no further sanitization. A crafted term containing backslash escapes or unbalanced quotes could malform or expand the generated query. Impact is limited to highlighting behavior (not document access control), so this is likely an oversight rather than malicious intent.

The table above displays the top 10 most important findings.

Total: 1 | Critical: 0 | High: 0 | Medium: 0 | Low: 1

Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.

⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

github-actions · 2026-03-14T17:03:27Z

PR Reviewer Guide 🔍

(Review updated until commit `b2d6afe`)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes Sub-PR theme: PPL Highlight AST, Parser, and Grammar Relevant files: core/src/main/java/org/opensearch/sql/ast/tree/Highlight.java core/src/main/java/org/opensearch/sql/ast/AbstractNodeVisitor.java core/src/main/java/org/opensearch/sql/ast/dsl/AstDSL.java core/src/main/java/org/opensearch/sql/analysis/Analyzer.java ppl/src/main/antlr/OpenSearchPPLLexer.g4 ppl/src/main/antlr/OpenSearchPPLParser.g4 ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java ppl/src/main/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizer.java ppl/src/test/java/org/opensearch/sql/ppl/parser/AstBuilderTest.java ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java Sub-PR theme: Calcite Highlight Logical Plan and Index Scan Pushdown Relevant files: core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java core/src/main/java/org/opensearch/sql/calcite/plan/rel/LogicalHighlight.java core/src/main/java/org/opensearch/sql/expression/HighlightExpression.java opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/HighlightIndexScanRule.java opensearch/src/main/java/org/opensearch/sql/opensearch/planner/rules/OpenSearchIndexRules.java opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/OpenSearchIndexEnumerator.java opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/context/PushDownType.java ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLHighlightTest.java opensearch/src/test/java/org/opensearch/sql/opensearch/storage/scan/CalciteIndexScanCostTest.java Sub-PR theme: Highlight Response Formatting, Integration Tests, and Documentation Relevant files: protocol/src/main/java/org/opensearch/sql/protocol/response/QueryResult.java protocol/src/main/java/org/opensearch/sql/protocol/response/format/SimpleJsonResponseFormatter.java protocol/src/test/java/org/opensearch/sql/protocol/response/QueryResultTest.java protocol/src/test/java/org/opensearch/sql/protocol/response/format/SimpleJsonResponseFormatterTest.java integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java integ-test/src/test/java/org/opensearch/sql/security/CalciteCrossClusterSearchIT.java docs/user/ppl/cmd/highlight.md docs/category.json
⚡ Recommended focus areas for review Term Injection Risk In `applyHighlightPushDown`, highlight terms are embedded directly into a `query_string` query using string concatenation with only quote-wrapping. If a term contains special characters (e.g., backslash, double-quote), this could produce malformed or unintended queries. Consider escaping terms before embedding them. protected static void applyHighlightPushDown( org.opensearch.sql.opensearch.request.OpenSearchRequestBuilder requestBuilder, java.util.List<String> highlightArgs) { if (highlightArgs == null \|\| highlightArgs.isEmpty()) { return; } org.opensearch.search.fetch.subphase.highlight.HighlightBuilder hb = new org.opensearch.search.fetch.subphase.highlight.HighlightBuilder(); if (highlightArgs.size() == 1 && "".equals(highlightArgs.get(0))) { // Wildcard: highlight search query matches in all fields hb.field(""); } else { // Highlight specific terms across all fields String queryStr = highlightArgs.stream() .map(term -> "\"" + term + "\"") .collect(java.util.stream.Collectors.joining(" OR ")); org.opensearch.search.fetch.subphase.highlight.HighlightBuilder.Field field = new org.opensearch.search.fetch.subphase.highlight.HighlightBuilder.Field("") .highlightQuery( org.opensearch.index.query.QueryBuilders.queryStringQuery(queryStr) .defaultField("")); hb.field(field); } hb.fragmentSize(Integer.MAX_VALUE); requestBuilder.getSourceBuilder().highlighter(hb); } Unchecked Cast The `highlights()` method contains an unchecked cast `(Map<String, Object>) hlMap` which is unnecessary since `hlMap` is already declared as `Map<String, Object>`. This suppresses compiler warnings without benefit and should be removed. Map<String, Object> hlMap = new LinkedHashMap<>(); for (Map.Entry<String, ExprValue> entry : hl.tupleValue().entrySet()) { hlMap.put( entry.getKey(), entry.getValue().collectionValue().stream() .map(ExprValue::stringValue) .collect(Collectors.toList())); } return (Map<String, Object>) hlMap; }) .collect(Collectors.toList()); Magic String The string `"_highlight"` is hardcoded in `resolveForCalcite` instead of using the constant `HighlightExpression.HIGHLIGHT_FIELD`. This is inconsistent with the rest of the codebase and could cause subtle bugs if the constant value changes. if ("_highlight".equals(rawPath)) { ExprValue hl = ExprValueUtils.getTupleValue(value).get("_highlight"); return (hl != null && !hl.isMissing()) ? hl : null; } Iterator Behavior Change The `iterator()` method was refactored from using `Map::values` (which preserves insertion order) to filtering entries by key. If the tuple map does not guarantee insertion order, the column ordering in data rows could differ from the schema ordering, causing misaligned results. Verify that `ExprTupleValue` always preserves insertion order. public Iterator<Object[]> iterator() { return exprValues.stream() .map(ExprValueUtils::getTupleValue) .map( tuple -> tuple.entrySet().stream() .filter(e -> !HIGHLIGHT_FIELD.equals(e.getKey())) .map(e -> e.getValue().value()) .toArray(Object[]::new)) .iterator(); Duplicate Highlight Column In `pushDownHighlight`, the `_highlight` field is unconditionally added to the schema without checking if it already exists (unlike `LogicalHighlight.create` which checks). If `pushDownHighlight` is called multiple times or the field already exists, it could result in duplicate columns. public RelNode pushDownHighlight(List<String> highlightArgs) { RelDataTypeFactory.Builder schemaBuilder = getCluster().getTypeFactory().builder(); schemaBuilder.addAll(getRowType().getFieldList()); schemaBuilder.add( HighlightExpression.HIGHLIGHT_FIELD, getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY)); CalciteLogicalIndexScan newScan = copyWithNewSchema(schemaBuilder.build()); newScan .getPushDownContext() .add( PushDownType.HIGHLIGHT, highlightArgs, (OSRequestBuilderAction) requestBuilder -> applyHighlightPushDown(requestBuilder, highlightArgs)); return newScan; }

github-actions · 2026-03-14T17:04:15Z

PR Code Suggestions ✨

Latest suggestions up to b2d6afe

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Guard against duplicate highlight column addition The `pushDownHighlight` method unconditionally adds the `_highlight` field to the schema even if it already exists (e.g., if highlight is pushed down twice). This can cause duplicate column errors. Add a guard similar to the one in `LogicalHighlight.create` to check before adding the field. opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java [88-95] public RelNode pushDownHighlight(List<String> highlightArgs) { RelDataTypeFactory.Builder schemaBuilder = getCluster().getTypeFactory().builder(); schemaBuilder.addAll(getRowType().getFieldList()); - schemaBuilder.add( - HighlightExpression.HIGHLIGHT_FIELD, - getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY)); + if (!getRowType().getFieldNames().contains(HighlightExpression.HIGHLIGHT_FIELD)) { + schemaBuilder.add( + HighlightExpression.HIGHLIGHT_FIELD, + getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY)); + } CalciteLogicalIndexScan newScan = copyWithNewSchema(schemaBuilder.build()); Suggestion importance[1-10]: 6 __ Why: The `pushDownHighlight` method unconditionally adds `_highlight` to the schema, while `LogicalHighlight.create` already guards against duplicates. This inconsistency could cause issues if highlight is pushed down multiple times, though in practice the optimizer rule likely prevents this.	Low
	Align iterator output order with schema columns The new `iterator()` implementation filters `_highlight` from the tuple map directly, but the original implementation iterated over `Map::values` which preserves insertion order aligned with the schema columns. If the tuple map contains extra fields not in the schema, or fields in a different order, the resulting arrays will be misaligned with the schema. The iteration should be driven by the schema columns, not the tuple map keys. protocol/src/main/java/org/opensearch/sql/protocol/response/QueryResult.java [89-97] +List<String> columnNames = schema.getColumns().stream() + .map(col -> col.getAlias() != null ? col.getAlias() : col.getName()) + .filter(name -> !HIGHLIGHT_FIELD.equals(name)) + .collect(Collectors.toList()); return exprValues.stream() .map(ExprValueUtils::getTupleValue) .map( tuple -> - tuple.entrySet().stream() - .filter(e -> !HIGHLIGHT_FIELD.equals(e.getKey())) - .map(e -> e.getValue().value()) + columnNames.stream() + .map(name -> { + ExprValue v = tuple.get(name); + return v != null ? v.value() : null; + }) .toArray(Object[]::new)) .iterator(); Suggestion importance[1-10]: 6 __ Why: The new iterator filters `_highlight` from the tuple map, but if the tuple contains fields in a different order than the schema or extra fields, the output arrays could be misaligned. Driving iteration from schema columns would be more robust, though the existing code relies on `LinkedHashMap` ordering which is typically consistent.	Low
	Escape special characters in highlight query terms The terms are concatenated into a query string without escaping special characters. If a term contains characters special to the query string syntax (e.g., `+`, `-`, `&&`, `\|\|`, `!`, `(`, `)`, `{`, `}`, `[`, `]`, `^`, `~`, `*`, `?`, `:`, `</code>, /), the query will fail or produce unexpected results. The terms should be escaped before being embedded in the query string.` opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java [512-515] String queryStr = highlightArgs.stream() - .map(term -> "\"" + term + "\"") + .map(term -> "\"" + term.replace("\\", "\\\\").replace("\"", "\\\"") + "\"") .collect(java.util.stream.Collectors.joining(" OR ")); Suggestion importance[1-10]: 5 __ Why: Terms embedded in the query string without escaping special characters could cause query failures or unexpected behavior. However, the terms are already wrapped in quotes which handles most cases, and the improved code only escapes backslashes and quotes rather than all special characters.	Low
General	Guard against null values in highlight map extraction The code calls `entry.getValue().collectionValue()` without checking if the value is actually a collection type. If a highlight field value is not a collection (e.g., it's a string or null), this will throw a runtime exception. Add a null/type check before calling `collectionValue()`. protocol/src/main/java/org/opensearch/sql/protocol/response/QueryResult.java [115-123] Map<String, Object> hlMap = new LinkedHashMap<>(); for (Map.Entry<String, ExprValue> entry : hl.tupleValue().entrySet()) { + ExprValue fieldVal = entry.getValue(); + if (fieldVal == null \|\| fieldVal.isMissing() \|\| fieldVal.isNull()) { + continue; + } hlMap.put( entry.getKey(), - entry.getValue().collectionValue().stream() + fieldVal.collectionValue().stream() .map(ExprValue::stringValue) .collect(Collectors.toList())); } -return (Map<String, Object>) hlMap; +return hlMap; Suggestion importance[1-10]: 4 __ Why: Calling `collectionValue()` without checking if the value is a collection type could cause a runtime exception for unexpected highlight field value types. This is a defensive coding improvement but may not occur in practice given the controlled data flow.	Low

Previous suggestions

Suggestions up to commit 51356cd

Category	Suggestion	Impact
Possible issue	Preserve schema column order in row iteration The new `iterator()` implementation iterates over `tuple.entrySet()` which may not preserve the column order defined in the schema. The original code used `Map::values` which relied on insertion order, but the schema column order should be the authoritative source. If the tuple map has a different key order than the schema, the data rows will be misaligned with the schema columns. protocol/src/main/java/org/opensearch/sql/protocol/response/QueryResult.java [89-97] return exprValues.stream() .map(ExprValueUtils::getTupleValue) .map( tuple -> - tuple.entrySet().stream() - .filter(e -> !HIGHLIGHT_FIELD.equals(e.getKey())) - .map(e -> e.getValue().value()) + schema.getColumns().stream() + .filter(col -> !HIGHLIGHT_FIELD.equals(getColumnName(col))) + .map(col -> { + ExprValue v = tuple.get(getColumnName(col)); + return v != null ? v.value() : null; + }) .toArray(Object[]::new)) .iterator(); Suggestion importance[1-10]: 7 __ Why: The new `iterator()` iterates over `tuple.entrySet()` which may not match the schema column order, potentially causing data/schema misalignment in responses. Using the schema as the authoritative ordering source is more correct and consistent with how `columnNameTypes()` works.	Medium
	Guard against duplicate highlight column addition The `pushDownHighlight` method unconditionally adds the `_highlight` field to the schema even if it already exists (e.g., if highlight is pushed down twice). This can cause duplicate column errors. Add a guard similar to the one in `LogicalHighlight.create` to check before adding the field. opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java [88-94] public RelNode pushDownHighlight(List<String> highlightArgs) { RelDataTypeFactory.Builder schemaBuilder = getCluster().getTypeFactory().builder(); schemaBuilder.addAll(getRowType().getFieldList()); - schemaBuilder.add( - HighlightExpression.HIGHLIGHT_FIELD, - getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY)); + if (!getRowType().getFieldNames().contains(HighlightExpression.HIGHLIGHT_FIELD)) { + schemaBuilder.add( + HighlightExpression.HIGHLIGHT_FIELD, + getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY)); + } CalciteLogicalIndexScan newScan = copyWithNewSchema(schemaBuilder.build()); Suggestion importance[1-10]: 6 __ Why: The `pushDownHighlight` method unconditionally adds `_highlight` to the schema, unlike `LogicalHighlight.create` which checks first. While double-highlight pushdown may be unlikely in practice, the inconsistency could cause issues and the fix is straightforward.	Low
	Escape special characters in highlight query terms The terms are concatenated into a query string without escaping special characters. If a term contains characters special to the query string syntax (e.g., `"`, `</code>, +, -, (, ), etc.), the resulting query string will be malformed or produce unexpected results. The terms should be escaped before being wrapped in quotes.` opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java [512-515] String queryStr = highlightArgs.stream() - .map(term -> "\"" + term + "\"") + .map(term -> "\"" + term.replace("\\", "\\\\").replace("\"", "\\\"") + "\"") .collect(java.util.stream.Collectors.joining(" OR ")); Suggestion importance[1-10]: 5 __ Why: Terms containing special query string characters (e.g., ", \) could produce malformed queries. However, since terms come from user-provided string literals in PPL, the risk is moderate and the escaping logic in the improved code is minimal (only handles \ and ").	Low
General	Guard against null highlight field values The code calls `entry.getValue().collectionValue()` without checking whether the value is actually a collection type. If a highlight field value is not a collection (e.g., it's a string or null), this will throw a runtime exception. Add a null/type check before calling `collectionValue()`. protocol/src/main/java/org/opensearch/sql/protocol/response/QueryResult.java [115-123] Map<String, Object> hlMap = new LinkedHashMap<>(); for (Map.Entry<String, ExprValue> entry : hl.tupleValue().entrySet()) { + ExprValue fieldVal = entry.getValue(); + if (fieldVal == null \|\| fieldVal.isMissing() \|\| fieldVal.isNull()) { + continue; + } hlMap.put( entry.getKey(), - entry.getValue().collectionValue().stream() + fieldVal.collectionValue().stream() .map(ExprValue::stringValue) .collect(Collectors.toList())); } -return (Map<String, Object>) hlMap; +return hlMap; Suggestion importance[1-10]: 5 __ Why: Calling `collectionValue()` on a non-collection `ExprValue` could throw a runtime exception if highlight field values are not always collections. Adding null/type checks improves robustness, though in practice highlight fragments should always be collections.	Low

Suggestions up to commit a7c44d8

Category	Suggestion	Impact
General	Prevent duplicate highlight column in projections The `_highlight` field is unconditionally appended to every projection whenever it exists in the schema, even when the user has explicitly excluded it or when it was already included in `expandedFields` via an explicit `fields` command. This can result in duplicate `_highlight` columns in the output. Add a check to only append it if it is not already present in `expandedFields`. core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java [426-430] // Include _highlight in projections when the highlight column is present in the schema int hlIndex = currentFields.indexOf(HighlightExpression.HIGHLIGHT_FIELD); -if (hlIndex >= 0) { +boolean alreadyIncluded = expandedFields.stream() + .anyMatch(rex -> rex instanceof org.apache.calcite.rex.RexInputRef + && ((org.apache.calcite.rex.RexInputRef) rex).getIndex() == hlIndex); +if (hlIndex >= 0 && !alreadyIncluded) { expandedFields.add(context.relBuilder.field(hlIndex)); } Suggestion importance[1-10]: 7 __ Why: This is a real concern: if `_highlight` is already included in `expandedFields` (e.g., via an explicit `fields` command referencing it), the current code would add it again, causing a duplicate column. The fix correctly checks for existing inclusion before appending, preventing potential downstream errors.	Medium
General	Handle non-collection highlight field values safely `entry.getValue().collectionValue()` will throw an exception if the highlight value for a field is not a collection type (e.g., if it is a plain string or some other type). This can cause the entire response serialization to fail. Add a null/type check or use a try-catch to handle unexpected value types gracefully. protocol/src/main/java/org/opensearch/sql/protocol/response/QueryResult.java [115-123] Map<String, Object> hlMap = new LinkedHashMap<>(); for (Map.Entry<String, ExprValue> entry : hl.tupleValue().entrySet()) { - hlMap.put( - entry.getKey(), - entry.getValue().collectionValue().stream() - .map(ExprValue::stringValue) - .collect(Collectors.toList())); + ExprValue fieldVal = entry.getValue(); + if (fieldVal == null \|\| fieldVal.isMissing() \|\| fieldVal.isNull()) { + continue; + } + try { + hlMap.put( + entry.getKey(), + fieldVal.collectionValue().stream() + .map(ExprValue::stringValue) + .collect(Collectors.toList())); + } catch (Exception e) { + hlMap.put(entry.getKey(), List.of(fieldVal.stringValue())); + } } return (Map<String, Object>) hlMap; Suggestion importance[1-10]: 4 __ Why: The suggestion adds defensive error handling for unexpected highlight value types. While the OpenSearch highlight response is expected to always return collections, a try-catch for robustness is reasonable. However, the `improved_code` uses a broad `catch (Exception e)` which is generally discouraged, and the scenario is unlikely given the controlled data flow.	Low
Possible issue	Guard against duplicate highlight column addition If `pushDownHighlight` is called more than once (e.g., due to rule re-application), the `_highlight` column will be added to the schema multiple times, causing duplicate column errors. Add a guard to check whether `_highlight` is already present in the row type before adding it, consistent with the same check already done in `LogicalHighlight.create`. opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java [88-95] public RelNode pushDownHighlight(List<String> highlightArgs) { RelDataTypeFactory.Builder schemaBuilder = getCluster().getTypeFactory().builder(); schemaBuilder.addAll(getRowType().getFieldList()); - schemaBuilder.add( - HighlightExpression.HIGHLIGHT_FIELD, - getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY)); + if (!getRowType().getFieldNames().contains(HighlightExpression.HIGHLIGHT_FIELD)) { + schemaBuilder.add( + HighlightExpression.HIGHLIGHT_FIELD, + getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY)); + } CalciteLogicalIndexScan newScan = copyWithNewSchema(schemaBuilder.build()); Suggestion importance[1-10]: 6 __ Why: This is a valid defensive check. `LogicalHighlight.create` already has this guard, and adding it to `pushDownHighlight` ensures consistency and prevents potential duplicate column errors if the rule fires multiple times. However, the optimizer typically prevents re-application, so the practical impact may be low.	Low
Possible issue	Escape special characters in highlight query terms The terms in `highlightArgs` are concatenated directly into a query string without escaping special characters. If a term contains characters special to the Lucene query string syntax (e.g., `+`, `-`, `&&`, `\|\|`, `!`, `(`, `)`, `{`, `}`, `[`, `]`, `^`, `~`, ``, `?`, `:`, `</code>, /), the resulting query will be malformed or produce unexpected results. The terms should be escaped before being embedded in the query string.` opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java [507-515] -if (highlightArgs.size() == 1 && "".equals(highlightArgs.get(0))) { - // Wildcard: highlight search query matches in all fields - hb.field("*"); } else { // Highlight specific terms across all fields String queryStr = highlightArgs.stream() - .map(term -> "\"" + term + "\"") + .map(term -> "\"" + org.apache.lucene.queryparser.classic.QueryParser.escape(term) + "\"") .collect(java.util.stream.Collectors.joining(" OR ")); Suggestion importance[1-10]: 5 __ Why: The suggestion is valid in principle—unescaped Lucene special characters in highlightArgs could cause malformed queries. However, the terms are already wrapped in quotes ("term"), which in Lucene query string syntax treats the content as a phrase and mitigates most special character issues. The risk is real but limited.	Low

Suggestions up to commit ab37b5b

Category	Suggestion	Impact
Possible issue	Preserve schema column order in data rows The previous `iterator()` implementation preserved the column ordering defined by the schema, while the new implementation iterates over the tuple's entry set, whose order may differ from the schema column order. This can cause values to be misaligned with their column headers in the response. The row values should be produced in schema column order, not tuple insertion order. protocol/src/main/java/org/opensearch/sql/protocol/response/QueryResult.java [89-97] return exprValues.stream() .map(ExprValueUtils::getTupleValue) .map( - tuple -> - tuple.entrySet().stream() - .filter(e -> !HIGHLIGHT_FIELD.equals(e.getKey())) - .map(e -> e.getValue().value()) - .toArray(Object[]::new)) + tuple -> { + List<String> columnNames = + schema.getColumns().stream() + .map(col -> col.getAlias() != null ? col.getAlias() : col.getName()) + .filter(name -> !HIGHLIGHT_FIELD.equals(name)) + .collect(Collectors.toList()); + return columnNames.stream() + .map(name -> tuple.getOrDefault(name, org.opensearch.sql.data.model.ExprMissingValue.of()).value()) + .toArray(Object[]::new); + }) .iterator(); Suggestion importance[1-10]: 7 __ Why: The new `iterator()` implementation iterates over the tuple's entry set order rather than schema column order, which could cause value-to-column misalignment in responses. This is a real correctness concern, though `LinkedHashMap` insertion order may coincidentally match in many cases.	Medium
Possible issue	Prevent duplicate highlight column in schema If `_highlight` is already present in the row type (e.g., when `pushDownHighlight` is called more than once), the schema builder will add a duplicate `_highlight` column, which can cause downstream field-resolution errors. Add a guard to only append the column when it is not already present, consistent with the same check in `LogicalHighlight.create`. opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java [88-97] public RelNode pushDownHighlight(List<String> highlightArgs) { RelDataTypeFactory.Builder schemaBuilder = getCluster().getTypeFactory().builder(); schemaBuilder.addAll(getRowType().getFieldList()); - schemaBuilder.add( - HighlightExpression.HIGHLIGHT_FIELD, - getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY)); + if (!getRowType().getFieldNames().contains(HighlightExpression.HIGHLIGHT_FIELD)) { + schemaBuilder.add( + HighlightExpression.HIGHLIGHT_FIELD, + getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY)); + } CalciteLogicalIndexScan newScan = copyWithNewSchema(schemaBuilder.build()); newScan.getPushDownContext().setHighlightArgs(highlightArgs); return newScan; } Suggestion importance[1-10]: 6 __ Why: The `pushDownHighlight` method unconditionally adds `_highlight` to the schema, unlike `LogicalHighlight.create` which guards against duplicates. While double-invocation may be unlikely in practice, the inconsistency is a real maintenance risk and could cause field-resolution errors.	Low
Security	Escape special characters in highlight query terms The terms are interpolated directly into a query string without escaping special characters (e.g., `+`, `-`, `&&`, `\|\|`, `!`, `(`, `)`, `{`, `}`, `[`, `]`, `^`, `~`, ``, `?`, `:`, `</code>, /). A term containing any of these characters will produce a malformed or unintended query, potentially causing an OpenSearch exception. Use QueryParser.escape or equivalent to sanitize each term before embedding it in the query string.` opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java [511-522] } else { // Highlight specific terms across all fields String queryStr = highlightArgs.stream() - .map(term -> "\"" + term + "\"") + .map(term -> "\"" + org.apache.lucene.queryparser.classic.QueryParser.escape(term) + "\"") .collect(java.util.stream.Collectors.joining(" OR ")); org.opensearch.search.fetch.subphase.highlight.HighlightBuilder.Field field = new org.opensearch.search.fetch.subphase.highlight.HighlightBuilder.Field("") .highlightQuery( org.opensearch.index.query.QueryBuilders.queryStringQuery(queryStr) .defaultField("*")); hb.field(field); } Suggestion importance[1-10]: 6 __ Why: Terms are interpolated directly into a Lucene query string without escaping special characters, which could cause malformed queries or OpenSearch exceptions. However, since terms are already wrapped in quotes ("term"), many special characters are already handled by phrase quoting, reducing the severity somewhat.	Low
General	Use constant instead of hardcoded string literal The highlight field is resolved by hardcoding the string `"_highlight"` instead of using the constant `HighlightExpression.HIGHLIGHT_FIELD`. This creates a maintenance risk where renaming the constant would silently break highlight resolution. Use the constant for consistency and safety. opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/OpenSearchIndexEnumerator.java [95-98] private Object resolveForCalcite(ExprValue value, String rawPath) { - if ("_highlight".equals(rawPath)) { - ExprValue hl = ExprValueUtils.getTupleValue(value).get("_highlight"); + if (HighlightExpression.HIGHLIGHT_FIELD.equals(rawPath)) { + ExprValue hl = ExprValueUtils.getTupleValue(value).get(HighlightExpression.HIGHLIGHT_FIELD); return (hl != null && !hl.isMissing()) ? hl : null; } return ExprValueUtils.resolveRefPaths(value, List.of(rawPath.split("\\."))).valueForCalcite(); } Suggestion importance[1-10]: 5 __ Why: Using the hardcoded string `"_highlight"` instead of `HighlightExpression.HIGHLIGHT_FIELD` is a maintenance risk. This is a straightforward improvement for consistency and refactoring safety.	Low

Suggestions up to commit f0dacc0

Category	Suggestion	Impact
Possible issue	Prevent duplicate highlight column in schema If `_highlight` is already present in the row type (e.g., when `pushDownHighlight` is called more than once), the method unconditionally adds a duplicate `_highlight` column, which will cause a schema conflict. Add a guard to only add the column when it is not already present, mirroring the check done in `LogicalHighlight.create`. opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java [88-97] public RelNode pushDownHighlight(List<String> highlightArgs) { RelDataTypeFactory.Builder schemaBuilder = getCluster().getTypeFactory().builder(); schemaBuilder.addAll(getRowType().getFieldList()); - schemaBuilder.add( - HighlightExpression.HIGHLIGHT_FIELD, - getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY)); + if (!getRowType().getFieldNames().contains(HighlightExpression.HIGHLIGHT_FIELD)) { + schemaBuilder.add( + HighlightExpression.HIGHLIGHT_FIELD, + getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY)); + } CalciteLogicalIndexScan newScan = copyWithNewSchema(schemaBuilder.build()); newScan.getPushDownContext().setHighlightArgs(highlightArgs); return newScan; } Suggestion importance[1-10]: 7 __ Why: The `pushDownHighlight` method unconditionally adds `_highlight` to the schema without checking if it already exists, which could cause a schema conflict if called multiple times. The fix mirrors the guard already present in `LogicalHighlight.create`.	Medium
Possible issue	Preserve schema column order in row iterator The new `iterator()` implementation filters out `_highlight` but also changes the column ordering guarantee: it now iterates over the map's entry set order rather than the schema column order. This can cause column values to be misaligned with the schema columns returned by `columnNameTypes()`. The original code used `Map::values` which relied on insertion order; the new code should explicitly follow the schema column order to be safe. protocol/src/main/java/org/opensearch/sql/protocol/response/QueryResult.java [89-97] return exprValues.stream() .map(ExprValueUtils::getTupleValue) .map( tuple -> - tuple.entrySet().stream() - .filter(e -> !HIGHLIGHT_FIELD.equals(e.getKey())) - .map(e -> e.getValue().value()) + schema.getColumns().stream() + .filter(col -> !HIGHLIGHT_FIELD.equals(getColumnName(col))) + .map(col -> tuple.getOrDefault(getColumnName(col), + org.opensearch.sql.data.model.ExprMissingValue.of()).value()) .toArray(Object[]::new)) .iterator(); Suggestion importance[1-10]: 7 __ Why: The new `iterator()` implementation iterates over the map's entry set order rather than the schema column order, which could cause column values to be misaligned with the schema. Iterating over schema columns explicitly ensures correct alignment between `datarows` and `schema`.	Medium
Security	Escape special characters in highlight query terms The terms are interpolated directly into a query string using `""" + term + """` without any escaping. A term containing a double-quote character (e.g., `he said` `"hello"`) will break the query string syntax and may cause an OpenSearch request failure or unexpected behavior. The terms should be escaped before being embedded in the query string. opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java [511-522] } else { // Highlight specific terms across all fields String queryStr = highlightArgs.stream() - .map(term -> "\"" + term + "\"") + .map(term -> "\"" + term.replace("\\", "\\\\").replace("\"", "\\\"") + "\"") .collect(java.util.stream.Collectors.joining(" OR ")); org.opensearch.search.fetch.subphase.highlight.HighlightBuilder.Field field = new org.opensearch.search.fetch.subphase.highlight.HighlightBuilder.Field("") .highlightQuery( org.opensearch.index.query.QueryBuilders.queryStringQuery(queryStr) .defaultField("")); hb.field(field); } Suggestion importance[1-10]: 6 __ Why: Terms containing double-quote characters are interpolated directly into the query string without escaping, which could break the query string syntax. However, since the grammar restricts `highlightArg` to string literals parsed by the PPL parser, the risk of unescaped quotes reaching this code is limited in practice.	Low

Suggestions up to commit 5891541

Category	Suggestion	Impact
Possible issue	Prevent duplicate highlight column in schema The `_highlight` column is unconditionally added to the schema even if it already exists (e.g., if `pushDownHighlight` is called twice). This can cause duplicate column errors. Add a guard to only add the column when it is not already present, similar to how `LogicalHighlight.create` handles it. opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/CalciteLogicalIndexScan.java [90-99] public RelNode pushDownHighlight(List<String> highlightArgs) { RelDataTypeFactory.Builder schemaBuilder = getCluster().getTypeFactory().builder(); schemaBuilder.addAll(getRowType().getFieldList()); - schemaBuilder.add( - HighlightExpression.HIGHLIGHT_FIELD, - getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY)); + if (!getRowType().getFieldNames().contains(HighlightExpression.HIGHLIGHT_FIELD)) { + schemaBuilder.add( + HighlightExpression.HIGHLIGHT_FIELD, + getCluster().getTypeFactory().createSqlType(SqlTypeName.ANY)); + } CalciteLogicalIndexScan newScan = copyWithNewSchema(schemaBuilder.build()); newScan.getPushDownContext().setHighlightArgs(highlightArgs); return newScan; } Suggestion importance[1-10]: 6 __ Why: The `pushDownHighlight` method unconditionally adds `_highlight` to the schema without checking if it already exists, which could cause duplicate column errors if called multiple times. The fix mirrors the guard already present in `LogicalHighlight.create`.	Low
Security	Escape highlight terms to prevent query injection The term values are interpolated directly into a query string using `""" + term +` `"""` without any escaping. If a term contains special query string characters (e.g., `"`, `</code>, :, ), this could produce a malformed or injected query. Terms should be escaped before being embedded in the query string.` opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java [511-522] } else { // Highlight specific terms across all fields String queryStr = highlightArgs.stream() - .map(term -> "\"" + term + "\"") + .map(term -> "\"" + org.apache.lucene.queryparser.classic.QueryParser.escape(term) + "\"") .collect(java.util.stream.Collectors.joining(" OR ")); org.opensearch.search.fetch.subphase.highlight.HighlightBuilder.Field field = new org.opensearch.search.fetch.subphase.highlight.HighlightBuilder.Field("") .highlightQuery( org.opensearch.index.query.QueryBuilders.queryStringQuery(queryStr) .defaultField("*")); hb.field(field); } Suggestion importance[1-10]: 6 __ Why: User-supplied terms are interpolated directly into a Lucene query string without escaping, which could produce malformed queries or allow injection of special characters. Using QueryParser.escape would mitigate this risk.	Low
General	Guard against null highlight field values Calling `entry.getValue().collectionValue()` will throw a `NullPointerException` or `ExpressionEvaluationException` if the highlight field value for a particular key is not a collection type. Add a null/type check before calling `collectionValue()` to handle unexpected value shapes gracefully. protocol/src/main/java/org/opensearch/sql/protocol/response/QueryResult.java [110-124] map( tuple -> { ExprValue hl = tuple.get(HIGHLIGHT_FIELD); if (hl == null \|\| hl.isMissing() \|\| hl.isNull()) { return null; } Map<String, Object> hlMap = new LinkedHashMap<>(); for (Map.Entry<String, ExprValue> entry : hl.tupleValue().entrySet()) { + ExprValue val = entry.getValue(); + if (val == null \|\| val.isMissing() \|\| val.isNull()) continue; hlMap.put( entry.getKey(), - entry.getValue().collectionValue().stream() + val.collectionValue().stream() .map(ExprValue::stringValue) .collect(Collectors.toList())); } return (Map<String, Object>) hlMap; }) Suggestion importance[1-10]: 4 __ Why: Calling `collectionValue()` on a highlight entry without checking if the value is null or missing could throw an exception for unexpected value shapes. Adding a null/missing check improves robustness.	Low
General	Verify child plan attachment for highlight command The `visitHighlightCommand` returns a `new Highlight(highlightArgs)` without attaching it to the child plan. All other command visitors (e.g., `visitTrendlineCommand`) return the node and rely on the caller to call `.attach(child)`, but `AstDSL.highlight` calls `.attach(input)`. Since `visitHighlightCommand` is called by the AST builder framework which chains nodes, the returned `Highlight` node will have a null child, causing a `NullPointerException` when the plan is traversed. The child should be attached here or the framework must handle it — verify the attachment mechanism is consistent with other commands. ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java [1290-1305] public UnresolvedPlan visitHighlightCommand(OpenSearchPPLParser.HighlightCommandContext ctx) { List<String> highlightArgs = ctx.highlightArg().stream() .map( arg -> { if (arg.STAR() != null) { return "*"; } else if (arg.stringLiteral() != null) { return StringUtils.unquoteText(arg.stringLiteral().getText()); } else { return arg.fieldExpression().getText(); } }) .collect(Collectors.toList()); + // Return without attach; the framework (visitPipeCommands) attaches the child. + // Ensure this is consistent with how other commands are handled. return new Highlight(highlightArgs); } Suggestion importance[1-10]: 2 __ Why: The suggestion asks to verify that the framework attaches the child, but the `improved_code` is essentially identical to the `existing_code` with only a comment added. Other command visitors in the same file follow the same pattern of returning without attaching, so this is consistent and not a real bug.	Low

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

github-actions · 2026-03-14T17:29:29Z

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit f0dacc0.

Path	Line	Severity	Description
opensearch/src/main/java/org/opensearch/sql/opensearch/storage/scan/AbstractCalciteIndexScan.java	519	low	User-supplied highlight terms are wrapped in double quotes but not escaped before being passed to queryStringQuery(). A term containing a literal '"' character could partially break out of the phrase context and alter the Lucene query structure. Because PPL users are already authenticated and can issue arbitrary search queries, the practical exploitability is limited, but proper escaping of the term string (e.g., via QueryParser.escape) would be safer.

The table above displays the top 10 most important findings.

Total: 1 | Critical: 0 | High: 0 | Medium: 0 | Low: 1

Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.

⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

github-actions · 2026-03-14T17:30:33Z

Persistent review updated to latest commit f0dacc0

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

github-actions · 2026-03-15T19:40:17Z

Persistent review updated to latest commit ab37b5b

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

github-actions · 2026-03-15T20:51:54Z

Persistent review updated to latest commit a7c44d8

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

github-actions · 2026-03-15T21:16:05Z

Persistent review updated to latest commit 51356cd

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

github-actions · 2026-03-15T21:21:58Z

Persistent review updated to latest commit b2d6afe

RyanL1997 added 5 commits March 13, 2026 15:08

initial commit for highlight cmd

38ad5e8

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

fix the response format

2e901a3

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

pushdown fix (similar to v2)

087c5f0

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

fix - filter out highlight col in result schema

15e32cb

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

add docs for highlight

5891541

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

RyanL1997 added PPL Piped processing language v3.6.0 Issues targeting release v3.6.0 feature labels Mar 14, 2026

dead code cleanup

f0dacc0

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

enable no pushdown

ab37b5b

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

fix pushdown and add IT and explain IT

a7c44d8

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

add more tests for coverage

51356cd

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

add ccr tests

b2d6afe

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] PPL Highlight Command#5234

[Feature] PPL Highlight Command#5234
RyanL1997 wants to merge 10 commits intoopensearch-project:mainfrom
RyanL1997:ppl-highlight-cmd

RyanL1997 commented Mar 14, 2026

Uh oh!

github-actions bot commented Mar 14, 2026

Uh oh!

github-actions bot commented Mar 14, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 14, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 14, 2026

Uh oh!

github-actions bot commented Mar 14, 2026

Uh oh!

github-actions bot commented Mar 15, 2026

Uh oh!

github-actions bot commented Mar 15, 2026

Uh oh!

github-actions bot commented Mar 15, 2026

Uh oh!

github-actions bot commented Mar 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RyanL1997 commented Mar 14, 2026

Description

Related Issues

Check List

Uh oh!

github-actions bot commented Mar 14, 2026

PR Code Analyzer ❗

Uh oh!

github-actions bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit b2d6afe)

Uh oh!

github-actions bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions bot commented Mar 14, 2026

PR Code Analyzer ❗

Uh oh!

github-actions bot commented Mar 14, 2026

Uh oh!

github-actions bot commented Mar 15, 2026

Uh oh!

github-actions bot commented Mar 15, 2026

Uh oh!

github-actions bot commented Mar 15, 2026

Uh oh!

github-actions bot commented Mar 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Mar 14, 2026 •

edited

Loading

(Review updated until commit `b2d6afe`)

github-actions bot commented Mar 14, 2026 •

edited

Loading