Skip to content

[JAY-625] Add Elasticsearch's composite aggregation#32

Merged
sergio-bobillier merged 5 commits into
masterfrom
sb-composite-aggregation
Jun 5, 2025
Merged

[JAY-625] Add Elasticsearch's composite aggregation#32
sergio-bobillier merged 5 commits into
masterfrom
sb-composite-aggregation

Conversation

@sergio-bobillier

Copy link
Copy Markdown
Collaborator

This type of aggregation allows documents to be aggregated using composite values made out of all the existing combinations of values coming from the specified sources. For the task that triggered the creation of these changes only the terms source is needed, hence that is the only source being added here.

When used with the terms source, the composite aggregation creates a bucket for each existing combination of values in the given fields.

The class represents a "Terms" value source for Elasticsearch's
composite aggregations. How this value source works is explained here:

https://www.elastic.co/docs/reference/aggregations/search-aggregations-bucket-composite-aggregation#_terms

The class will be used in an upcoming commit by the, yet to be added,
composite aggregation class.
@field = field
@order = order
@missing_bucket = missing_bucket
@missing_order = missing_order

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No check on this string (same thing for order actually)? What if I pass "in the middle"? :D

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not usually doing any checking in these classes. If you pass "in the middle" you'll get an error when you try to use the produced query to search the index. I think it is better not to add too much validation here since these classes are facades for Elasticsearch.

Let me know if you disagree.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you think the Elasticsearch error is explicative enough then sure 👍

Comment on lines +9 to +12
module Sources
# Represents the collection of sources for a Composite aggregation in
# Elasticsearch
class Sources

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sources::Sources? I don't know enough about this codebase to suggest a name change for the class, but is there no other name that is different from the module name the class is in?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I had the class also as namespace, so instead of module Sources I had just class Sources, similar to what we have for QueryBuilder, however I ran into an issue because unlike QueryBuilder, Sources is a superclass.

So this is why at the end I decided to just add a namespace module. I have no issue with changing the name, but.... I couldn't think of a better option, do you have a suggestion?

Comment thread lib/jay_api/elasticsearch/query_builder/aggregations/sources/sources.rb Outdated
Comment thread CHANGELOG.md Outdated
Comment on lines +41 to +62
def clone
# rubocop:disable Lint/EmptyBlock (The sources will be assigned later)
copy = self.class.new(name, size: size) {}
# rubocop:enable Lint/EmptyBlock

copy.aggregations = aggregations.clone
copy.sources = sources.clone
copy
end

# @return [Hash] The Hash representation of the +Aggregation+.
# Properly formatted for Elasticsearch.
def to_h
super do
{
composite: {
sources: sources.to_a,
size: size
}.compact
}
end
end

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar question for the classes introduced in the 2 previous commits:

I'm seeing that these #clone, #to_h, #to_a methods are not shown to be used in the docs you add in the comings commits. Are these methods then only used to test these classes? No problem for me, but this makes it so that around 50% of the logic in the class is used only to test the class itself, which is a bit strange IMO (although as I said, no problem)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods are normally not used directly, hence they are not listed in the documentation.

#to_h and #to_a are used when you call #to_query on the QueryBuilder instance, there, all these classes are converted into a Hash that is then sent to Elasticsearch. So, basically these are serialization methods. #to_h and #to_a methods are called recursively until everything is either a hash or an array.

Regarding #clone they are used in two cases

  1. When you clone a QueryBuilder object, then all the internal classes are cloned recursively, this prevents changes to clone from affecting the original QueryBuilder object.
  2. When you merge two QueryBuilder objects together. The resulting merged object also gets a clone of all the underlying classes. Again, this prevents changes to the merged object from affecting the original queries.

This class holds a collection of sources for a Composite aggregation for
Elasticsearch.

The class will be used in an upcoming commit by said aggregation to hold
its sources and to allow the user to add sources to a composite
aggregation by passing a block to its constructor.

For the moment the class can only handle a Terms source. Additional
sources might be added in the future.
The class model a composite Elasticsearch aggregation.

Tests for the proper handling of nested aggregations were added as
integration tests because creating said tests as unit tests would
require a great deal of mocking logic.
The method allows the user to add composite aggregations. It relies on
the Aggregations::Composite class for that.
Adds documentation on how to use the newly added composite aggregation.
@sergio-bobillier sergio-bobillier merged commit 1ce2b88 into master Jun 5, 2025
3 checks passed
@sergio-bobillier sergio-bobillier deleted the sb-composite-aggregation branch June 5, 2025 07:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants