Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ Please mark backwards incompatible changes with an exclamation mark at the start

## [Unreleased]

### Added
- The `Aggregations::Composite` class and the `Aggregations#composite` method.
They make it possible to use Elasticsearch's `composite` aggregations.

## [28.2.0] - 2025-05-30

### Added
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -329,6 +329,54 @@ The code above would produce the following query:
``QueryBuilder::Script`` objects here. Their use will produce unintended
results.

composite
---------

This is a multi-bucket aggregation that aggregates the set of documents using a
compound value made out of all the existing combinations of values from the
specified sources. Currently Jay API only allows one type of source: ``terms``.

Using the ``terms`` source it is possible to create a bucket for each existing
combination of values from a set of fields.

Detailed information on how to use this type of aggregation can be found on
`Elasticsearch's documentation on the Composite aggregation`_

Code example:

.. code-block:: ruby

query_builder = JayAPI::Elasticsearch::QueryBuilder.new
query_builder.aggregations.composite('products_by_brand') do |sources|
sources.terms('product', field: 'product.name')
sources.terms('brand', field: 'brand.name')
end

This would generate the following query:

.. code-block:: json

{
"query": {
"match_all": {}
},
"aggs": {
"products_by_brand": {
"composite": {
"sources": [
{ "product": { "terms": { "field": "product.name" } } },
{ "brand": { "terms": { "field": "brand.name" } } }
]
}
}
}
}

This will create one bucket for each existing combination of ``product.name``
and ``brand.name`` in the index. The buckets will only say how many documents
(``doc_count``) exist for each combination. Nested aggregations could be added
to get other information out of the documents in each bucket.

.. _`Elasticsearch's documentation on the Terms aggregation`: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html
.. _`Elasticsearch's documentation on the Avg aggregation`: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-avg-aggregation.html
.. _`Elasticsearch's documentation on the Sum aggregation`: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-sum-aggregation.html
Expand All @@ -337,4 +385,5 @@ The code above would produce the following query:
.. _`Elasticsearch's documentation on the Cardinality aggregation`: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html
.. _`Elasticsearch's documentation on the Date Histogram aggregation`: https://www.elastic.co/docs/reference/aggregations/search-aggregations-bucket-datehistogram-aggregation
.. _`Elasticsearch's documentation on the Scripted Metric aggregation`: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-scripted-metric-aggregation.html
.. _`Elasticsearch's documentation on the Composite aggregation`: https://www.elastic.co/docs/reference/aggregations/search-aggregations-bucket-composite-aggregation
.. _`Painless`: https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-scripting-painless.html
11 changes: 11 additions & 0 deletions lib/jay_api/elasticsearch/query_builder/aggregations.rb
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
require_relative 'aggregations/aggregation'
require_relative 'aggregations/avg'
require_relative 'aggregations/cardinality'
require_relative 'aggregations/composite'
require_relative 'aggregations/date_histogram'
require_relative 'aggregations/filter'
require_relative 'aggregations/scripted_metric'
Expand All @@ -19,7 +20,7 @@
module Elasticsearch
class QueryBuilder
# The list of aggregations to be included in an Elasticsearch query.
class Aggregations

Check warning on line 23 in lib/jay_api/elasticsearch/query_builder/aggregations.rb

View workflow job for this annotation

GitHub Actions / lint

[rubocop] reported by reviewdog 🐶 Class has too many lines. [101/100] Raw Output: lib/jay_api/elasticsearch/query_builder/aggregations.rb:23:7: C: Metrics/ClassLength: Class has too many lines. [101/100]
extend Forwardable

def_delegators :aggregations, :any?, :none?
Expand Down Expand Up @@ -121,6 +122,16 @@
)
end

# Adds a +composite+ aggregation. For more information about the parameters:
# @see JayAPI::Elasticsearch::QueryBuilder::Aggregations::Composite#initialize
def composite(name, size: nil, &block)
add(
::JayAPI::Elasticsearch::QueryBuilder::Aggregations::Composite.new(
name, size: size, &block
)
)
end

# Returns a Hash with the correct format for the current list of
# aggregations. For example:
#
Expand Down
77 changes: 77 additions & 0 deletions lib/jay_api/elasticsearch/query_builder/aggregations/composite.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# frozen_string_literal: true

require 'active_support'
require 'active_support/core_ext/string/inflections'

require_relative 'aggregation'
require_relative 'sources/sources'
require_relative 'errors/aggregations_error'

module JayAPI
module Elasticsearch
class QueryBuilder
class Aggregations
# Represents a Composite aggregation in Elasticsearch. For more
# information about this type of aggregation:
# @see https://www.elastic.co/docs/reference/aggregations/search-aggregations-bucket-composite-aggregation
class Composite < ::JayAPI::Elasticsearch::QueryBuilder::Aggregations::Aggregation
attr_reader :size

# @param [String] name The name of the composite aggregation.
# @param [Integer] size The number of composite buckets to return.
# @yieldparam [JayAPI::Elasticsearch::QueryBuilder::Aggregations::Sources::Sources]
# The collection of sources for the composite aggregation. This
# should be used by the caller to add sources to the composite
# aggregation.
# @raise [JayAPI::Elasticsearch::QueryBuilder::Aggregations::Errors::AggregationsError]
# If the method is called without a block.
def initialize(name, size: nil, &block)
unless block
raise(::JayAPI::Elasticsearch::QueryBuilder::Aggregations::Errors::AggregationsError,
"The #{self.class.name.demodulize} aggregation must be initialized with a block")
end

super(name)
@size = size
block.call(sources)
end

# @return [self] A copy of the receiver. Sources and nested
# aggregations are also cloned.
def clone
# rubocop:disable Lint/EmptyBlock (The sources will be assigned later)
copy = self.class.new(name, size: size) {}
# rubocop:enable Lint/EmptyBlock

copy.aggregations = aggregations.clone
copy.sources = sources.clone
copy
end

# @return [Hash] The Hash representation of the +Aggregation+.
# Properly formatted for Elasticsearch.
def to_h
super do
{
composite: {
sources: sources.to_a,
size: size
}.compact
}
end
end
Comment on lines +41 to +62

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar question for the classes introduced in the 2 previous commits:

I'm seeing that these #clone, #to_h, #to_a methods are not shown to be used in the docs you add in the comings commits. Are these methods then only used to test these classes? No problem for me, but this makes it so that around 50% of the logic in the class is used only to test the class itself, which is a bit strange IMO (although as I said, no problem)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods are normally not used directly, hence they are not listed in the documentation.

#to_h and #to_a are used when you call #to_query on the QueryBuilder instance, there, all these classes are converted into a Hash that is then sent to Elasticsearch. So, basically these are serialization methods. #to_h and #to_a methods are called recursively until everything is either a hash or an array.

Regarding #clone they are used in two cases

  1. When you clone a QueryBuilder object, then all the internal classes are cloned recursively, this prevents changes to clone from affecting the original QueryBuilder object.
  2. When you merge two QueryBuilder objects together. The resulting merged object also gets a clone of all the underlying classes. Again, this prevents changes to the merged object from affecting the original queries.


protected

attr_writer :sources # Used by the #clone method

# @return [JayAPI::Elasticsearch::QueryBuilder::Aggregations::Sources::Sources]
# The collection of sources of the composite aggregation.
def sources
@sources ||= ::JayAPI::Elasticsearch::QueryBuilder::Aggregations::Sources::Sources.new
end
end
end
end
end
end
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# frozen_string_literal: true

require_relative 'terms'

module JayAPI
module Elasticsearch
class QueryBuilder
class Aggregations
module Sources
# Represents the collection of sources for a Composite aggregation in
# Elasticsearch
class Sources
Comment on lines +9 to +12

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sources::Sources? I don't know enough about this codebase to suggest a name change for the class, but is there no other name that is different from the module name the class is in?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I had the class also as namespace, so instead of module Sources I had just class Sources, similar to what we have for QueryBuilder, however I ran into an issue because unlike QueryBuilder, Sources is a superclass.

So this is why at the end I decided to just add a namespace module. I have no issue with changing the name, but.... I couldn't think of a better option, do you have a suggestion?

# Adds a +terms+ source to the collection.
# For information about the parameters:
# @see Sources::Terms#initialize
def terms(name, **kw_args)
sources.push(::JayAPI::Elasticsearch::QueryBuilder::Aggregations::Sources::Terms.new(name, **kw_args))
end

# @return [Array<Hash>] Array representation of the collection of
# sources of the composite aggregation.
def to_a
sources.map(&:to_h)
end

# @return [self] A copy of the receiver (not a shallow clone, it
# clones all of the elements of the collection).
def clone
self.class.new.tap do |copy|
copy.sources.concat(sources.map(&:clone))
end
end

protected

# @return [Array<Object>] The array used to hold the collection of
# sources.
def sources
@sources ||= []
end
end
end
end
end
end
end
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# frozen_string_literal: true

module JayAPI
module Elasticsearch
class QueryBuilder
class Aggregations
module Sources
# Represents a "Terms" value source for a Composite aggregation.
# More information about this type of value source can be found here:
# https://www.elastic.co/docs/reference/aggregations/search-aggregations-bucket-composite-aggregation#_terms
class Terms
attr_reader :name, :field, :order, :missing_bucket, :missing_order

# @param [String] name The name for the value source.
# @param [String] field The field for the value source.
# @param [String, nil] order The order in which the values coming
# from this data source should be ordered, this can be either
# "asc" or "desc"
# @param [Boolean] missing_bucket Whether or not a bucket for the
# documents without a value in +field+ should be created.
# @param [String] missing_order Where to put the bucket for the
# documents with a missing value, either "first" or "last".
def initialize(name, field:, order: nil, missing_bucket: nil, missing_order: nil)
@name = name
@field = field
@order = order
@missing_bucket = missing_bucket
@missing_order = missing_order

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No check on this string (same thing for order actually)? What if I pass "in the middle"? :D

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not usually doing any checking in these classes. If you pass "in the middle" you'll get an error when you try to use the produced query to search the index. I think it is better not to add too much validation here since these classes are facades for Elasticsearch.

Let me know if you disagree.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you think the Elasticsearch error is explicative enough then sure 👍

end

# @return [self] A copy of the receiver.
def clone
self.class.new(
name, field: field, order: order, missing_bucket: missing_bucket, missing_order: missing_order
)
end

# @return [Hash] The hash representation for the value source.
def to_h
{
name => {
terms: {
field: field,
order: order,
missing_bucket: missing_bucket,
missing_order: missing_order
}.compact
}
}
end
end
end
end
end
end
end
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# frozen_string_literal: true

require 'jay_api/elasticsearch/query_builder/aggregations/composite'

RSpec.describe JayAPI::Elasticsearch::QueryBuilder::Aggregations::Composite do
subject(:composite) do
described_class.new('products_by_brand', **constructor_params) do |sources|
sources.terms('product', field: 'product.name', order: 'asc')
sources.terms('brand', field: 'brand.name')
end
end

let(:constructor_params) { {} }

describe '#to_h' do
subject(:method_call) { composite.to_h }

let(:expected_hash) do
{
'products_by_brand' => {
composite: {
sources: [
{ 'product' => { terms: { field: 'product.name', order: 'asc' } } },
{ 'brand' => { terms: { field: 'brand.name' } } }
]
}
}
}
end

it 'returns the expected Hash' do
expect(method_call).to eq(expected_hash)
end

context "when a 'size' has been specified" do
let(:constructor_params) { { size: 10 } }

let(:expected_hash) do
{
'products_by_brand' => {
composite: {
sources: [
{ 'product' => { terms: { field: 'product.name', order: 'asc' } } },
{ 'brand' => { terms: { field: 'brand.name' } } }
],
size: 10
}
}
}
end

it 'returns the expected Hash' do
expect(method_call).to eq(expected_hash)
end
end

context 'with nested aggregations' do
before do
composite.aggs do |aggs|
aggs.avg('avg_price', field: 'product.price')
end
end

let(:expected_hash) do
{
'products_by_brand' => {
composite: {
sources: [
{ 'product' => { terms: { field: 'product.name', order: 'asc' } } },
{ 'brand' => { terms: { field: 'brand.name' } } }
]
},
aggs: {
'avg_price' => { avg: { field: 'product.price' } }
}
}
}
end

it 'returns the expected Hash' do
expect(method_call).to eq(expected_hash)
end
end
end
end
Loading