Skip to content

[REL-1242132] update fork#12

Merged
716 commits merged into
mainfrom
REL-1242132_update
Jan 14, 2026
Merged

[REL-1242132] update fork#12
716 commits merged into
mainfrom
REL-1242132_update

Conversation

@amlynczak-rel

Copy link
Copy Markdown

Which issue does this PR close?

We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax.

Closes #NNN.

Rationale for this change

Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.

What changes are included in this PR?

There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.

Are there any user-facing changes?

If there are user-facing changes then we may require documentation to be updated before approving the PR.

If there are any breaking changes to public APIs, please call them out.

friendlymatthew and others added 30 commits October 16, 2025 11:12
…apache#8625)

# Which issue does this PR close?

- Closes apache#8606

# Rationale for this change

Mapping from an iterator of `Variant`s into a `VariantArray` currently
requires a lot of ceremony. This PR abstracts over this.
# Which issue does this PR close?

- Closes apache#8538 .

# Rationale for this change

# What changes are included in this PR?

Removes redundant `get_type_name` and leverages the `Display` trait of
`DataType`

# Are these changes tested?

Yes, 3 tests added

# Are there any user-facing changes?

No

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
# Which issue does this PR close?

- Closes apache#8030

# Rationale for this change

There is no agreement yet on the what to do with extension types when
modifying `Field` data types (apache#8350), but I think we agree on removing
the experimental warning from the `extension` module. Opening this PR so
we can include it in 57.0.0.

# What changes are included in this PR?

The warning removal from apache#8350.

# Are these changes tested?

Not required.

# Are there any user-facing changes?

Yes, extension types no longer marked experimental.
## Which issue does this PR close?

- Closes apache#8351

## Rationale for this change


## What changes are included in this PR?

This PR refactors and improves the `Display` formatting for `DataType`
by:

- **Improving Union type display** - now shows field information with
parentheses for clarity: `Union(Sparse, 0: ('a': Int32), 1: ('b':
nullable Utf8))`
- **Improving RunEndEncoded type display** - now properly formats both
run_ends and values fields: `RunEndEncoded('run_ends': UInt32, 'values':
nullable Int32)`


## Are these changes tested?
Yes

## Are there any user-facing changes?
…nt Thrift serialization (apache#8634)

# Which issue does this PR close?

We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax.

- Part of apache#5853.

# Rationale for this change

Structures that are to be Thrift serialized as fields in another struct
must implement the `WriteThriftField` trait. In most instances the
implementation of that trait is trivial and adds unnecessary verbosity
to the code.

# What changes are included in this PR?

Add a new `write_thrift_field` macro that generates this boilerplate
code.

# Are these changes tested?

Covered by existing tests

# Are there any user-facing changes?
No
…e#8621)

# Which issue does this PR close?

We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax.

- Related to apache#7835


# Rationale for this change


While testing the arrow 57 upgrade in DataFusion I found a few things
that need to be fixed
in parquet-rs.

- apache/datafusion#17888

One was that the method `ArrowWriter::into_serialized_writer` was
deprecated, (which I know I suggested in
apache#8389 🤦 ). However, when
testing it turns out that the constructor of `SerializedFileWriter` does
a lot of work (like creating the parquet schema from the arrow schema
and messing with metadata)
https://github.com/apache/arrow-rs/blob/c4f0fc12199df696620c73d62523c8eef5743bf2/parquet/src/arrow/arrow_writer/mod.rs#L230-L263

Creating a `RowGroupWriterFactory` directly would involve a bunch of
code duplication

# What changes are included in this PR?

So let's not deprecate this method for now and instead add some
additional docs to guide people to the right lace


# Are these changes tested?
I tested manually upstream

# Are there any user-facing changes?

If there are user-facing changes then we may require documentation to be
updated before approving the PR.

If there are any breaking changes to public APIs, please call them out.
# Which issue does this PR close?

N/A

# Rationale for this change

not testing the correct length

# What changes are included in this PR?

remove * 8 as the length of the buffer is in bytes already

# Are these changes tested?
created tests to make sure they are failing before AND created tests
that make sure that ceil is used for future changes

# Are there any user-facing changes?

Nope
…tArray` (apache#8627)

# Which issue does this PR close?

- Closes apache#8610

# Rationale for this change

Since the fields of `VariantArray` impl `PartialEq`, this PR simply
derives `PartialEq` for `VariantArray`
out.

Based off of apache#8625
…trings for error messages (apache#8636)

# Which issue does this PR close?

This is a small performance improvement for the thrift remodeling

- Part of apache#5853.

# Rationale for this change

Some of the often-called methods in the thrift protocol implementation
created `ParquetError` instances with a string message that had to be
allocated and formatted. This formatting code and probably also some
drop glue bloats these otherwise small methods and prevented inlining.

# What changes are included in this PR?

Introduce a separate error type `ThriftProtocolError` that is smaller
than `ParquetError` and does not contain any allocated data. The
`ReadThrift` trait is not changed, since its custom implementations
actually require the more expressive `ParquetError`.

# Are these changes tested?

The success path is covered by existing tests. Testing the error paths
would require crafting some actually malformed files, or using a fuzzer.

# Are there any user-facing changes?

The `ThriftProtocolError` is crate-internal so there should be no api
changes. Some error messages might differ slightly.
# Which issue does this PR close?

N/A

# Rationale for this change

I have a PR to improve zip perf for scalar but I don't see any
benchmarks for it:
- apache#8653 

# What changes are included in this PR?

created zip benchmarks for scalar and non scalar with different masks 

# Are these changes tested?
N/A

# Are there any user-facing changes?

Nope
…#8594)

# Which issue does this PR close?

- Partial fix for apache/datafusion#17857

# Rationale for this change

These changes add a safer version of `append_value` in `ByteViewBuilder`
that handles panics called `try_append_value`. Datafusions will consume
the API and handle the Result coming back from the function.

# What changes are included in this PR?

# Are these changes tested?

The method is already covered by existing tests.

# Are there any user-facing changes?

No breaking changes, as the original `append_value` method hasn't
changed.

---------

Co-authored-by: Raz Luvaton <16746759+rluvaton@users.noreply.github.com>
Co-authored-by: Matthew Kim <38759997+friendlymatthew@users.noreply.github.com>
Co-authored-by: Jörn Horstmann <git@jhorstmann.net>
…pache#8642)

# Which issue does this PR close?

- Closes apache#8639

# Rationale for this change

add write_batch_size config and change compression to use parquet
Compression

# What changes are included in this PR?

add write_batch_size config and change compression to use parquet
Compression

# Are these changes tested?

I've try these command by myself.

# Are there any user-facing changes?

Yeah

1. zstd level previously is default 1, not change to 3
2. str zStd might not pass
# Which issue does this PR close?

- Part of apache#7835 

# Rationale for this change

Let's get the code to the people

# What changes are included in this PR?

Update version number and CHANGELOG. See rendered version here:
https://github.com/alamb/arrow-rs/blob/alamb/prepare_57.0.0/CHANGELOG.md

# Are these changes tested?

N/A

# Are there any user-facing changes?

No
…he#8635)

# Which issue does this PR close?
- Closes apache#8218

# Rationale for this change

# What changes are included in this PR?
Split the `builder.rs` to three files, `/builder/list`,
`/builder/object` and `builder/metadata`

# Are these changes tested?
Yes
# Are there any user-facing changes?
No
# Which issue does this PR close?

Related to: 
- apache#7456
- apache#8565

# Rationale for this change

Improve the performance in ParquetRecoredBatchReader, especially when
the `rowselector` is short.
- By changing a hash map to a enum array

# What changes are included in this PR?
For `parquet/src/arrow/array_reader/cached_array_reader.rs`, update the
hash function

# Are these changes tested?
The hashmaps are already covered by existing tests.
Also tested by manual read parquets.

# Are there any user-facing changes?
No

# Performance results in arrow_reader_row_filter.rs
on my 3950X
Benchmark | Change | Verdict
-- | -- | --
int64 == 9999 / all_columns / async | 🟢 -1.61% | Improved
int64 == 9999 / all_columns / sync | 🔴 +1.56% | Regressed
int64 == 9999 / exclude_filter_column / async | 🟢 -1.11% | Improved
int64 == 9999 / exclude_filter_column / sync | ⚪ -0.97% | Within noise
float64 > 99.0 / all_columns / async | 🟢 -6.25% | Improved
float64 > 99.0 / all_columns / sync | 🟢 -11.24% | Improved
float64 > 99.0 / exclude_filter_column / async | 🟢 -11.10% | Improved
float64 > 99.0 / exclude_filter_column / sync | 🟢 -3.31% | Improved
ts ≥ 9000 / all_columns / async | 🔴 +2.77% | Regressed
ts ≥ 9000 / all_columns / sync | ⚪ -0.06% | Within noise
ts ≥ 9000 / exclude_filter_column / async | 🟢 -2.54% | Improved
ts ≥ 9000 / exclude_filter_column / sync | ⚪ +0.28% | Within noise
int64 > 90 / all_columns / async | 🟢 -14.68% | Improved
int64 > 90 / all_columns / sync | 🟢 -21.00% | Improved
int64 > 90 / exclude_filter_column / async | 🟢 -17.66% | Improved
int64 > 90 / exclude_filter_column / sync | 🟢 -14.53% | Improved
float64 ≤ 99.0 / all_columns / async | 🟢 -9.20% | Improved
float64 ≤ 99.0 / all_columns / sync | 🟢 -11.07% | Improved
float64 ≤ 99.0 / exclude_filter_column / async | 🟢 -10.01% | Improved
float64 ≤ 99.0 / exclude_filter_column / sync | 🟢 -11.80% | Improved
ts < 9000 / all_columns / async | 🟢 -3.43% | Improved
ts < 9000 / all_columns / sync | 🟢 -6.23% | Improved
ts < 9000 / exclude_filter_column / async | 🟢 -4.00% | Improved
ts < 9000 / exclude_filter_column / sync | 🟢 -3.91% | Improved
utf8View <> '' / all_columns / async | 🟢 -16.56% | Improved
utf8View <> '' / all_columns / sync | 🟢 -12.10% | Improved
utf8View <> '' / exclude_filter_column / async | 🟢 -13.00% | Improved
utf8View <> '' / exclude_filter_column / sync | 🟢 -17.29% | Improved
float64 > 99.0 AND ts ≥ 9000 / all_columns / async | 🔴 +3.51% |
Regressed
float64 > 99.0 AND ts ≥ 9000 / all_columns / sync | 🟢 -2.19% | Improved
float64 > 99.0 AND ts ≥ 9000 / exclude_filter_column / async | 🟢 -2.63%
| Improved
float64 > 99.0 AND ts ≥ 9000 / exclude_filter_column / sync | 🟢 -2.72% |
Improved
…e#8656)

# Which issue does this PR close?

N/A

# Rationale for this change

doing `OffsetBuffer::from_lengths(std::iter::repeat_n(size,
value.len()));` does not utilize SIMD (I explain further if you want)
See [GodBolt Link](https://godbolt.org/z/PTsfvfjqx)

Extracted from:
- apache#8653

After this and the pr below is merged will improve the datafusion scalar
to array to use this and make it really really fast:
- apache#8658

# What changes are included in this PR?

added new function

# Are these changes tested?

yes

# Are there any user-facing changes?

yes
…<repeat>));` with `OffsetBuffer::from_repeated_length(<val>, <repeat>);` (apache#8669)

# Which issue does this PR close?

N/A

# Rationale for this change

Use the dedicated faster function for creating offset with the same
length

# What changes are included in this PR?

replace
```rust
OffsetBuffer::from_lengths(std::iter::repeat_n(<val>, <repeat>));
```

with
```rust
OffsetBuffer::from_repeated_length(<val>, <repeat>);
```

# Are these changes tested?

Existing tests

# Are there any user-facing changes?

Nope

----

Related to:
- apache#8656
# Which issue does this PR close?

N/A

# Rationale for this change

I want to repeat the same value multiple times in a very fast way
which will be used in:
- apache#8653

After this and the pr below is merged will improve the datafusion scalar
to array to use this and make it really really fast:
- apache#8656 

# What changes are included in this PR?

Created a function in `MutableBuffer` to repeat a slice a number of
times in a logarithmic way to reduce memcopy calls

# Are these changes tested?

Yes

# Are there any user-facing changes?

Yes, and added docs

-------

Extracted from:
- apache#8653

Benchmark results on local machine

| Slice Length | Repetitions (n) | repeat_slice_n_times |
extend_from_slice loop | Speedup |

|--------------|-----------------|----------------------|------------------------|---------|
| 3 | 3 | 47.092 ns | 41.910 ns | 0.89x |
| 3 | 64 | 63.548 ns | 222.29 ns | 3.50x |
| 3 | 1024 | 105.57 ns | 3.031 µs | 28.7x |
| 3 | 8192 | 405.71 ns | 24.170 µs | 59.6x |
| 20 | 3 | 48.437 ns | 46.437 ns | 0.96x |
| 20 | 64 | 74.993 ns | 319.04 ns | 4.25x |
| 20 | 1024 | 350.94 ns | 4.437 µs | 12.6x |
| 20 | 8192 | 2.440 µs | 35.524 µs | 14.6x |
| 100 | 3 | 50.369 ns | 47.568 ns | 0.94x |
| 100 | 64 | 119.70 ns | 165.37 ns | 1.38x |
| 100 | 1024 | 1.734 µs | 2.623 µs | 1.51x |
| 100 | 8192 | 10.615 µs | 19.750 µs | 1.86x |

these are the results:

<details>
<summary>Result</summary>


```
MutableBuffer repeat slice/repeat_slice_n_times/slice_len=3 n=3
                        time:   [46.719 ns 47.092 ns 47.453 ns]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
MutableBuffer repeat slice/extend_from_slice loop/slice_len=3 n=3
                        time:   [41.833 ns 41.910 ns 41.996 ns]
Found 11 outliers among 100 measurements (11.00%)
  9 (9.00%) high mild
  2 (2.00%) high severe
MutableBuffer repeat slice/repeat_slice_n_times/slice_len=3 n=64
                        time:   [62.935 ns 63.548 ns 64.183 ns]
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild
MutableBuffer repeat slice/extend_from_slice loop/slice_len=3 n=64
                        time:   [221.75 ns 222.29 ns 222.86 ns]
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe
MutableBuffer repeat slice/repeat_slice_n_times/slice_len=3 n=1024
                        time:   [105.15 ns 105.57 ns 106.01 ns]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
MutableBuffer repeat slice/extend_from_slice loop/slice_len=3 n=1024
                        time:   [3.0240 µs 3.0308 µs 3.0395 µs]
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) low mild
  5 (5.00%) high mild
  4 (4.00%) high severe
MutableBuffer repeat slice/repeat_slice_n_times/slice_len=3 n=8192
                        time:   [401.57 ns 405.71 ns 409.94 ns]
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) high mild
MutableBuffer repeat slice/extend_from_slice loop/slice_len=3 n=8192
                        time:   [24.124 µs 24.170 µs 24.222 µs]
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe
MutableBuffer repeat slice/repeat_slice_n_times/slice_len=20 n=3
                        time:   [48.287 ns 48.437 ns 48.606 ns]
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe
MutableBuffer repeat slice/extend_from_slice loop/slice_len=20 n=3
                        time:   [46.289 ns 46.437 ns 46.611 ns]
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe
MutableBuffer repeat slice/repeat_slice_n_times/slice_len=20 n=64
                        time:   [74.625 ns 74.993 ns 75.395 ns]
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
MutableBuffer repeat slice/extend_from_slice loop/slice_len=20 n=64
                        time:   [318.20 ns 319.04 ns 319.98 ns]
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severe
MutableBuffer repeat slice/repeat_slice_n_times/slice_len=20 n=1024
                        time:   [346.66 ns 350.94 ns 355.17 ns]
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  2 (2.00%) high severe
MutableBuffer repeat slice/extend_from_slice loop/slice_len=20 n=1024
                        time:   [4.4251 µs 4.4369 µs 4.4506 µs]
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  5 (5.00%) high severe
MutableBuffer repeat slice/repeat_slice_n_times/slice_len=20 n=8192
                        time:   [2.4336 µs 2.4401 µs 2.4465 µs]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
MutableBuffer repeat slice/extend_from_slice loop/slice_len=20 n=8192
                        time:   [35.466 µs 35.524 µs 35.589 µs]
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe
MutableBuffer repeat slice/repeat_slice_n_times/slice_len=100 n=3
                        time:   [50.209 ns 50.369 ns 50.530 ns]
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild
MutableBuffer repeat slice/extend_from_slice loop/slice_len=100 n=3
                        time:   [47.439 ns 47.568 ns 47.701 ns]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
MutableBuffer repeat slice/repeat_slice_n_times/slice_len=100 n=64
                        time:   [117.77 ns 119.70 ns 122.00 ns]
Found 12 outliers among 100 measurements (12.00%)
  7 (7.00%) high mild
  5 (5.00%) high severe
MutableBuffer repeat slice/extend_from_slice loop/slice_len=100 n=64
                        time:   [164.88 ns 165.37 ns 166.07 ns]
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe
MutableBuffer repeat slice/repeat_slice_n_times/slice_len=100 n=1024
                        time:   [1.7278 µs 1.7335 µs 1.7398 µs]
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  1 (1.00%) high severe
MutableBuffer repeat slice/extend_from_slice loop/slice_len=100 n=1024
                        time:   [2.6176 µs 2.6232 µs 2.6305 µs]
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) high mild
  4 (4.00%) high severe
MutableBuffer repeat slice/repeat_slice_n_times/slice_len=100 n=8192
                        time:   [10.583 µs 10.615 µs 10.649 µs]
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
MutableBuffer repeat slice/extend_from_slice loop/slice_len=100 n=8192
                        time:   [19.471 µs 19.750 µs 20.185 µs]
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) high mild
  7 (7.00%) high severe
```

</details>
# Which issue does this PR close?

- Followup of apache#8552.

# Rationale for this change

Code cleanup and optimization

# What changes are included in this PR?

Addressed the post-comments in apache#8552 and refactor/optimize the method
`rescale_decimal`

# Are these changes tested?

Covered by existing tests

# Are there any user-facing changes?

No
…che#8638)

# Which issue does this PR close?
- Closes apache#8637.
- Support Variant to Arrow for Null/Time/Decimlal{4,8,16} 

# What changes are included in this PR?
- Add logic in
`typed_value_to_variant`/`PrimitiveVariantToArrowRowBuilder` for
`Null/Time/Decimal{4,8,16}`
- Implement `PrimitiveFromVariant` for `Time64MicrosecondType`
- Add tests to cover the added logic
- 
# Are these changes tested?
Added some tests

# Are there any user-facing changes?

No
# Which issue does this PR close?

Add utf8-view support for json key

# Rationale for this change

Add utf8-view support for json key

# What changes are included in this PR?

Add utf8-view support for json key

# Are these changes tested?

* [x] TODO

# Are there any user-facing changes?

No
# Which issue does this PR close?

- Closes apache#8674

# Rationale for this change

Add json encoding for binary view

# What changes are included in this PR?

Add BinaryViewEncoder

# Are these changes tested?

* [x] TODO

# Are there any user-facing changes?

No
# Which issue does this PR close?

- part of apache#7835 

# Rationale for this change

We added a new crate so let's add that to the instructions too
# What changes are included in this PR?


# Are these changes tested?

# Are there any user-facing changes?
…che#8706)

# Which issue does this PR close?

N/A

# Rationale for this change

It is not obvious that the thrift macros produce public enums only (e.g.
see
apache#8680 (comment)).
This should be made clear in the documentation.

# What changes are included in this PR?

Add said clarification.

# Are these changes tested?

Documentation only, so no tests required.

# Are there any user-facing changes?

No, only changes to private documentation
# Which issue does this PR close?

- Closes apache#8691

# Rationale for this change

The `README.md` file in `arrow-avro` instructs users to install version
56. This is invalid and should be changed to version 57.

# What changes are included in this PR?

Updated the `README.md` file to reference version 57.

# Are these changes tested?

N/A since this a small `README.md` file change.

# Are there any user-facing changes?

The `README.md` file in `arrow-avro` now instructs users to install
version 57.

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
# Which issue does this PR close?

chore: add test case of `RowSelection::trim`

# Rationale for this change

Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.

# What changes are included in this PR?

There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.

# Are these changes tested?

We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?

# Are there any user-facing changes?

If there are user-facing changes then we may require documentation to be
updated before approving the PR.

If there are any breaking changes to public APIs, please call them out.
…DING_SLOTS (apache#8663)

# Which issue does this PR close?

- Closes [apache#8662]

# Rationale for this change

Related to apache#8607
We need to know how many encoding are support to create a decoder slot.

# What changes are included in this PR?
Update the `thrift_enum` to know the fields count of enum `Encoding`,
and the value is passed to `EncodingMask` And the `ENCODING_SLOTS`


# Are these changes tested?
1. Originally I think add a UT can prevent failure after the new
encoding are introduced, then I realized the counts are already
transferred, the UT is not required, the original tests can already
cover the code.

# Are there any user-facing changes?
No
- A follow up from apache#8625

# Rationale for this change

While working on a separate task, I noticed `create_test_variant_array`
was redundant. Since `VariantArray` can already be constructed directly
from an iterator of Variants, this PR removes the now-unnecessary test
helper.
# Which issue does this PR close?

- Closes apache#8685.

# What changes are included in this PR?

In the implementation of `RowConverter::from_binary`, the `BinaryArray`
is broken into parts and an attempt is made to convert the data buffer
into `Vec` at no copying cost with `Buffer::into_vec`. Only if this
fails, the data is copied out for a newly allocated `Vec`.

# Are these changes tested?

Passes existing tests using `RowConverter::from_binary`, which all
convert a non-shared buffer taking advantage of the optimization.
Another test is added to cover the copying path.

# Are there any user-facing changes?

No
# Which issue does this PR close?

- Closes apache#8692.

# Rationale for this change

Explained in issue.

# What changes are included in this PR?

- Adds `FilterPredicate::filter_record_batch`
- Adapts the free function `filter_record_batch` to use the new function
- Uses `new_unchecked` to create the filtered result. The rationale for
this is identical to apache#8583

# Are these changes tested?

Covered by existing tests for `filter_record_batch`

# Are there any user-facing changes?

No

---------

Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
etseidl and others added 24 commits January 10, 2026 07:25
…quet metadata (apache#9008)

# Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax.
-->

- Part of apache#5853.

# Rationale for this change



Add ability to skip the decoding of more types of statistics contained
in the Parquet column metadata. While this currently doesn't have a huge
impact on decode time, it can reduce the amount of memory used by the
`ParquetMetaData`.

# What changes are included in this PR?

Adds more options and tests for those options. Also adds size statistics
to the metadata bench.

# Are these changes tested?

Yes

# Are there any user-facing changes?
Only adds new options, no breaking changes.
Use the suggested Arc<[Buffer]> storage for ViewArray storage instead of
an owned Vec<Buffer> so that the slice clone does not allocate.

# Which issue does this PR close?

- Closes apache#6408 .

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
…mns (apache#9081)

# Which issue does this PR close?

N/A

# Rationale for this change

I noticed that converting around 50 columns the conversion become very
slow, so adding a benchmark as I'm optimizing those parts

# What changes are included in this PR?

added new benchmark for `row_format` that convert 50 columns arrays

# Are these changes tested?

N/A

# Are there any user-facing changes?

Nope

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
…he#9112)

# Which issue does this PR close?

- Part of apache#9061

# Rationale for this change

@tustvold says
apache#9058 (comment)

> My 2 cents is it would be better to move the codepaths relying on
ArrayData over to using the typed arrays directly, this should not only
cut down on allocations but unnecessary validation and dispatch
overheads.

Let's make sure the documentation also encodes this wisdom 

# What changes are included in this PR?

Improve the documentation for `make_array` to include the information
about `make_array`

# Are these changes tested?

By CI
# Are there any user-facing changes?

Only docs and an example, no functional changes
…#8916)

# Which issue does this PR close?

None.

# Rationale for this change

1. Keep consistent with `FlightClient`.
2. We need inner error (for example `tonic::Status`) to know if we
should retry, but ArrowError stores string of `tonic::Status`.

# What changes are included in this PR?

1. Let flight sql client returns `FlightError` .
2. Cleanup code.

# Are these changes tested?

CI

# Are there any user-facing changes?

Yes.

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
# Which issue does this PR close?

- Closes apache#9135 

# Rationale for this change
The code was calling `.reserve(batch_size)` which reserves space to at
least `batch_size` additional elements
(https://doc.rust-lang.org/std/vec/struct.Vec.html#method.reserve).

This also improves performance a bit:

```
filter: primitive, 8192, nulls: 0, selectivity: 0.001
                        time:   [59.509 ms 59.660 ms 59.856 ms]
                        change: [−3.0781% −2.7917% −2.4795%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

filter: primitive, 8192, nulls: 0, selectivity: 0.01
                        time:   [6.0072 ms 6.0226 ms 6.0428 ms]
                        change: [−8.7042% −7.1161% −6.0455%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

Benchmarking filter: primitive, 8192, nulls: 0, selectivity: 0.1: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.5s, enable flat sampling, or reduce sample count to 50.
filter: primitive, 8192, nulls: 0, selectivity: 0.1
                        time:   [1.8664 ms 1.8709 ms 1.8772 ms]
                        change: [−15.187% −14.905% −14.632%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) high mild
  6 (6.00%) high severe

filter: primitive, 8192, nulls: 0, selectivity: 0.8
                        time:   [2.5191 ms 2.5444 ms 2.5717 ms]
                        change: [−13.064% −11.414% −10.003%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high severe

Benchmarking filter: primitive, 8192, nulls: 0.1, selectivity: 0.001: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.7s, or reduce sample count to 60.
filter: primitive, 8192, nulls: 0.1, selectivity: 0.001
                        time:   [76.422 ms 76.671 ms 76.973 ms]
                        change: [−5.5096% −4.0229% −2.8048%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

filter: primitive, 8192, nulls: 0.1, selectivity: 0.01
                        time:   [10.197 ms 10.228 ms 10.262 ms]
                        change: [−3.6627% −3.0569% −2.4919%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

filter: primitive, 8192, nulls: 0.1, selectivity: 0.1
                        time:   [4.6635 ms 4.6750 ms 4.6915 ms]
                        change: [−9.4939% −8.5908% −7.8383%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe

filter: primitive, 8192, nulls: 0.1, selectivity: 0.8
                        time:   [4.7777 ms 4.8115 ms 4.8467 ms]
                        change: [−9.9226% −9.1384% −8.3813%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
```

# What changes are included in this PR?
Changes it to use `self.views.reserve(self.batch_size -
self.views.len())` to avoid allocating more than necessary (i.e. 2x the
amount).

# Are these changes tested?

Existing tests

# Are there any user-facing changes?
…ache#9091)

# Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax.
-->

- Closes #NNN.

# Rationale for this change

Improve JSON binary decoding performance by avoiding per-value
allocations and enabling direct hex decoding into builders.

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

# What changes are included in this PR?

Optimized binary hex decoding paths to reduce allocations and improve
throughput.


```
decode_binary_hex_json  time:   [3.6780 ms 3.6953 ms 3.7150 ms]
                        change: [−61.051% −60.818% −60.565%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe

decode_fixed_binary_hex_json
                        time:   [4.0404 ms 4.1400 ms 4.2901 ms]
                        change: [−56.149% −55.040% −53.330%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 19 outliers among 100 measurements (19.00%)
  7 (7.00%) high mild
  12 (12.00%) high severe

decode_binary_view_hex_json
                        time:   [4.3731 ms 4.4242 ms 4.4767 ms]
                        change: [−53.305% −52.771% −52.239%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
```

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

# Are these changes tested?
Yes

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

# Are there any user-facing changes?

No

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.

If there are any breaking changes to public APIs, please call them out.
-->
# Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax.
-->

- related to apache#9085
- related to apache#9022

# Rationale for this change

the `nullif` kernel has an optimization for counting nulls as part of
applying nullif, and I did not know if that made a difference (turns out
it does).

I made a benchmark to be able to measure this difference

# What changes are included in this PR?

Add a `nullif_kernel` benchmark

# Are these changes tested?

I ran them manually
# Are there any user-facing changes?

No new benchmark
…pache#9129)

# Which issue does this PR close?

- part of apache#9128

# Rationale for this change

While studying / profiling the Parquet reader I have noticed several
places where unecessary allocations are happening

# What changes are included in this PR?

Avoid ArrayData allocation in `PrimitiveArray::reinterpret_cast`

# Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

# Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.

If there are any breaking changes to public APIs, please call them out.
-->

---------

Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>
# Which issue does this PR close?


- Part of apache#9136

# Rationale for this change

API for apache#8951 , as part of a 2-3x
speedup for filtering primitive types.

# What changes are included in this PR?

Adds `BooleanBufferBuilder::extend`, a fast way to extend the buffer
from an iterator.

# Are these changes tested?

Yes, new tests

# Are there any user-facing changes?
# Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax.
-->

- Closes #NNN.

# Rationale for this change

One of my new years resolutions is to try and encourage / build the
arrow-rs community more. Part of doing so is to lower the barrier for
new contributors with clear communication and documentation.

Thus I would like to improve the main readme

# What changes are included in this PR?

Update the README. See rendered preview here:
https://github.com/alamb/arrow-rs/blob/alamb/improved_comms_docs/README.md

1. Move community section to the top of the README file
2. Clarify that most communication happens on github

<img width="1348" height="775" alt="Screenshot 2026-01-09 at 9 33 06 AM"
src="https://github.com/user-attachments/assets/fc3f6041-478d-4170-bb40-3956764a9b8d"
/>


# Are these changes tested?

CI
# Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.

If there are any breaking changes to public APIs, please call them out.
-->

---------

Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>
…pache#9134)

# Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax.
-->

- Closes apache#9133

# Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

Allow casting null arrays to all types we support. We missed (large)
list view, run end encoded and union.

# What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Refactor from null arm to accept all target types to enable casting to
large list view, list view, run end encoded and union types.

# Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Added tests.

# Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.

If there are any breaking changes to public APIs, please call them out.
-->

No.
… return type (apache#9139)

### Which issue does this PR close?

Closes apache#9105

### Rationale for this change

The documentation for `VariantObject::get` previously described
`Result`-style
semantics (`Ok(None)` / `Err`), but the method actually returns
`Option<Variant>`. This mismatch could confuse users of the API.

### What changes are included in this PR?

- Update the documentation for `VariantObject::get` to correctly
describe its
  `Option` return type.
- No functional or behavioral changes are included.
…ray` (apache#9114)

# Which issue does this PR close?


- Part of apache#9061
- broken out of apache#9058

# Rationale for this change

The current implementation of `make_array` for StructArray and
GenericByteViewArray clones `ArrayData` which allocates a new Vec. This
is unnecessary given that `make_array` is passed an owned ArrayData


# What changes are included in this PR?

1. Add a new API to ArrayData to break it down into parts (`into_parts`)
2. Use that API to avoid cloning while constructing StructArray and
GenericByteViewArray

# Are these changes tested?

Yes by CI

# Are there any user-facing changes?
A few fewer allocations when creating arrays

---------

Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>
# Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax.
-->

- Closes apache#9147 .

# Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

# What changes are included in this PR?

Check issue

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

# Are these changes tested?

It's a test

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

# Are there any user-facing changes?

No

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.

If there are any breaking changes to public APIs, please call them out.
-->
Closes apache#9131

This PR updates the Apache Software Foundation copyright year
in `arrow-rs/NOTICE.txt`, as discussed in the release verification
follow-up. No third-party entries are modified.
# Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax.
-->

- Part of apache#9018 .

# Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->
To consider offset in slicing of RunArray.

# What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->
1. Considered offset in slicing of RunArray.
2. Enhanced RunArray slice API.
# Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
3. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->
yes
# Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.

If there are any breaking changes to public APIs, please call them out.
-->
Yes, extended API to access RunArray slices directly than getting it
from index.
…e#9146)

# Which issue does this PR close?

No known issue, but this confirms that nested dicts do indeed work in
Map and Union arrays. It was brought up here:
apache#9126 (comment)

# Rationale for this change

Ensure that IPC roundtripping nested dicts works in Map and Union
arrays.

# What changes are included in this PR?

Unit tests testing the functionality.

# Are these changes tested?

The whole PR consists of tests only.

# Are there any user-facing changes?

No

@alamb @Jefffrey
# Which issue does this PR close?

N/A

# Rationale for this change

allow to reserve so we can avoid reallocating

# What changes are included in this PR?

added `reserve` function to `Rows` + tests

# Are these changes tested?

yes

# Are there any user-facing changes?

yes
…ow conversion (apache#9080)

# Which issue does this PR close?

N/A

# Rationale for this change

Making the row length calculation faster which result in faster row
conversion

# What changes are included in this PR?

1. Instead of iterating over the bytes and getting the length from the
byte slice, we use the offsets directly, this is faster as it saves us
going to the buffer
2. Added new API for `GenericByteViewArray` (explained below)

# Are these changes tested?

Yes

# Are there any user-facing changes?

Yes, added `lengths` function to `GenericByteViewArray` to get an
iterator over the lengths of the items in the array

-----

Related to:
- apache#9078 
- apache#9079

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
…rayData` (apache#9122)

# Which issue does this PR close?
- related to apache#9061
- Part of apache#9128


# Rationale for this change

- similarly to apache#9120

Creating Arrays via ArrayData / `make_array` has overhead (at least 2
Vec allocations) compared to simply creating the arrays directly

# What changes are included in this PR?

Update the parquet reader to create `PrimitiveArray`s directly

# Are these changes tested?
By CI

# Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.

If there are any breaking changes to public APIs, please call them out.
-->
# Which issue does this PR close?

- Closes apache#9096.

# Rationale for this change

The RowFilter API does exist and can evaluate predicates during
evaluation, but it has no examples.

# What changes are included in this PR?

- Added a rustdoc example and blog link to
`ParquetRecordBatchReaderBuilder::with_row_filter`.
- Added a running example in `parquet/examples/read_with_row_filter.rs`

# Are these changes tested?

Yes 
```
cargo run -p parquet --example read_with_row_filter
cargo test -p parquet --doc
```

# Are there any user-facing changes?

Yes, doc only. No API changes.
…ta` (1% improvement) (apache#9120)

# Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax.
-->

- part of apache#9061
- Part of apache#9128

# Rationale for this change

I noticed on apache#9061 that there
is non trivial overhead to create struct arrays. I am trying to improve
`make_array` in parallel, but @tustvold had an even better idea in
apache#9058 (comment)

> My 2 cents is it would be better to move the codepaths relying on
ArrayData over to using the typed arrays directly, this should not only
cut down on allocations but unnecessary validation and dispatch
overheads.

# What changes are included in this PR?

Update the parquet `StructArray` reader (used for the top level
RecordBatch) to directly construct StructArray rather than using
ArrayData

# Are these changes tested?
By existing CI

Benchmarks show a small repeatable improvement of a few percent. For
example

```
arrow_reader_clickbench/async/Q10    1.00     12.7±0.35ms        ? ?/sec    1.02     12.9±0.44ms        ? ?/sec
```

I am pretty sure this is because the click bench dataset has more than
100 columns. Creating such a struct array requires cloning 100
`ArrayData` (one for each child) which each has a Vec<Buffers>. So this
saves (at least) 100 allocations per batch

# Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.

If there are any breaking changes to public APIs, please call them out.
-->
Comment on lines +39 to +53
name: Test
runs-on: ubuntu-latest
container:
image: amd64/rust
steps:
- uses: actions/checkout@v6
with:
submodules: true
- name: Setup Rust toolchain
uses: ./.github/actions/setup-builder
- name: Test parquet-geospatial
run: cargo test -p parquet-geospatial

# test compilation
linux-features:

Check warning

Code scanning / CodeQL

Workflow does not contain permissions Medium

Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {contents: read}

Copilot Autofix

AI 5 months ago

In general, the fix is to explicitly set GITHUB_TOKEN permissions in the workflow to the minimal scope required. For this workflow, the jobs only check out code and run Rust commands; they do not write to the repository or interact with issues, PRs, or releases. Therefore contents: read is sufficient. The cleanest fix without changing any functionality is to add a root-level permissions: block that applies to all jobs.

Concretely, in .github/workflows/parquet-geospatial.yml, add a permissions: block near the top of the workflow (for example, after the name: or after concurrency:) with contents: read. This will apply to all three jobs (linux-test, linux-features, clippy) because none of them define their own permissions. No imports or additional methods are needed, as this is purely a YAML configuration change.

Suggested changeset 1
.github/workflows/parquet-geospatial.yml

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/.github/workflows/parquet-geospatial.yml b/.github/workflows/parquet-geospatial.yml
--- a/.github/workflows/parquet-geospatial.yml
+++ b/.github/workflows/parquet-geospatial.yml
@@ -23,6 +23,9 @@
   group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ github.workflow }}
   cancel-in-progress: true
 
+permissions:
+  contents: read
+
 # trigger for all PRs that touch certain files and changes to main
 on:
   push:
EOF
@@ -23,6 +23,9 @@
group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ github.workflow }}
cancel-in-progress: true

permissions:
contents: read

# trigger for all PRs that touch certain files and changes to main
on:
push:
Copilot is powered by AI and may make mistakes. Always verify output.
Comment on lines +54 to +67
name: Check Compilation
runs-on: ubuntu-latest
container:
image: amd64/rust
steps:
- uses: actions/checkout@v6
with:
submodules: true
- name: Setup Rust toolchain
uses: ./.github/actions/setup-builder
- name: Check compilation (parquet-geospatial)
run: cargo check -p parquet-geospatial

clippy:

Check warning

Code scanning / CodeQL

Workflow does not contain permissions Medium

Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {contents: read}

Copilot Autofix

AI 5 months ago

In general, the fix is to add an explicit permissions block to the workflow (at the top level or for each job) that grants only the minimal scopes required. Here, all jobs simply check out code and run Rust commands; they only need read access to repository contents. The single best fix is to add a workflow-level permissions block with contents: read, which will apply to all jobs that don’t override it. This should be added near the top of .github/workflows/parquet-geospatial.yml, after the name: declaration and before concurrency: or on:. No additional imports or external dependencies are required, and existing behavior will remain unchanged because none of the jobs used elevated permissions.

Suggested changeset 1
.github/workflows/parquet-geospatial.yml

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/.github/workflows/parquet-geospatial.yml b/.github/workflows/parquet-geospatial.yml
--- a/.github/workflows/parquet-geospatial.yml
+++ b/.github/workflows/parquet-geospatial.yml
@@ -19,6 +19,9 @@
 # tests for parquet-geospatial crate
 name: "parquet-geospatial"
 
+permissions:
+  contents: read
+
 concurrency:
   group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ github.workflow }}
   cancel-in-progress: true
EOF
@@ -19,6 +19,9 @@
# tests for parquet-geospatial crate
name: "parquet-geospatial"

permissions:
contents: read

concurrency:
group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ github.workflow }}
cancel-in-progress: true
Copilot is powered by AI and may make mistakes. Always verify output.
Comment on lines +68 to +79
name: Clippy
runs-on: ubuntu-latest
container:
image: amd64/rust
steps:
- uses: actions/checkout@v6
- name: Setup Rust toolchain
uses: ./.github/actions/setup-builder
- name: Setup Clippy
run: rustup component add clippy
- name: Run clippy (parquet-geospatial)
run: cargo clippy -p parquet-geospatial --all-targets --all-features -- -D warnings

Check warning

Code scanning / CodeQL

Workflow does not contain permissions Medium

Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {contents: read}

Copilot Autofix

AI 5 months ago

To fix the problem, explicitly declare a minimal permissions block so the workflow does not inherit potentially broad repository defaults. Since all three jobs (linux-test, linux-features, clippy) only check out code and run cargo commands, they only need read access to the repository contents.

The best way without changing functionality is to add a workflow-level permissions section (so it applies to all jobs) near the top of the file, setting contents: read. This preserves all current behavior (the jobs still run exactly the same steps) but ensures the automatically provided GITHUB_TOKEN cannot write to the repository or perform other unnecessary actions. Concretely, in .github/workflows/parquet-geospatial.yml, add:

permissions:
  contents: read

after the name: "parquet-geospatial" line (or before concurrency: / on:). No imports or additional methods are needed because this is a YAML configuration change only.

Suggested changeset 1
.github/workflows/parquet-geospatial.yml

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/.github/workflows/parquet-geospatial.yml b/.github/workflows/parquet-geospatial.yml
--- a/.github/workflows/parquet-geospatial.yml
+++ b/.github/workflows/parquet-geospatial.yml
@@ -19,6 +19,9 @@
 # tests for parquet-geospatial crate
 name: "parquet-geospatial"
 
+permissions:
+  contents: read
+
 concurrency:
   group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ github.workflow }}
   cancel-in-progress: true
EOF
@@ -19,6 +19,9 @@
# tests for parquet-geospatial crate
name: "parquet-geospatial"

permissions:
contents: read

concurrency:
group: ${{ github.repository }}-${{ github.head_ref || github.sha }}-${{ github.workflow }}
cancel-in-progress: true
Copilot is powered by AI and may make mistakes. Always verify output.
@amlynczak-rel amlynczak-rel closed this pull request by merging all changes into main in 7194aae Jan 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.