Skip to content

Improve performance of first/last aggregates #3022

@andygrove

Description

@andygrove

What is the problem the feature request solves?

Comet is slower than Spark for first and last aggregates. Also, the behavior is not consistent with Spark (see #1646 and #1630). We should probably implement a custom version in Comet to match Spark behavior and try and optimize it.

OpenJDK 64-Bit Server VM 17.0.17+10-Ubuntu-122.04 on Linux 6.8.0-90-generic
AMD Ryzen 9 7950X3D 16-Core Processor
first:                                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                48             56          13         21.8          45.8       1.0X
Comet (Scan)                                         58             60           2         18.1          55.2       0.8X
Comet (Scan + Exec)                                  61             65           3         17.1          58.6       0.8X

OpenJDK 64-Bit Server VM 17.0.17+10-Ubuntu-122.04 on Linux 6.8.0-90-generic
AMD Ryzen 9 7950X3D 16-Core Processor
first_ignore_nulls:                       Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                47             50           3         22.4          44.7       1.0X
Comet (Scan)                                         50             52           1         21.0          47.6       0.9X
Comet (Scan + Exec)                                  60             64           2         17.5          57.2       0.8X

OpenJDK 64-Bit Server VM 17.0.17+10-Ubuntu-122.04 on Linux 6.8.0-90-generic
AMD Ryzen 9 7950X3D 16-Core Processor
last:                                     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                45             48           3         23.1          43.2       1.0X
Comet (Scan)                                         47             54           9         22.4          44.7       1.0X
Comet (Scan + Exec)                                  76             79           4         13.7          72.7       0.6X

OpenJDK 64-Bit Server VM 17.0.17+10-Ubuntu-122.04 on Linux 6.8.0-90-generic
AMD Ryzen 9 7950X3D 16-Core Processor
last_ignore_nulls:                        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
Spark                                                46             48           1         22.8          43.9       1.0X
Comet (Scan)                                         58             64           5         18.1          55.1       0.8X
Comet (Scan + Exec)                                  76             84          10         13.7          72.9       0.6X

Describe the potential solution

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions