Skip to content
102 changes: 102 additions & 0 deletions docs/adr/xxxx-source-collected_by-query-parameter.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
---
status: "draft"
---
# Ability to query for Sources without a collected_by value

## Context and Problem Statement

Currently at the Source level there is no filtering on the `collected_by` field.
When vendors are building systems which list content stored in TAMS then they are only looking for the Sources at the top of the content tree.
In many cases this is the multi-Source, however due to single essence (eg audio only) workflows then this is not a reliable method of finding the top level content.
To achieve this they must go through every result and look to see if the `collected_by` field is missing as this then indicates that it is the top of a Source collection.
This process can also throw any form of sensible pagination in a client application as when requesting content from the TAMS API it is not known how many results will be retained or discarded.

This ADR is to look at moving this behaviour from the client side into the API.
This has the benefit of not only making it simpler for any system looking for top level content in the system, but also makes the result more predicable as the number of rows requested from the API will result in the available rows up to that limit.

While the focus of this ADR is on the Source level of the TAMS API, there are sufficient similarities between the data structures of Sources and Flows that it is worth considering whether this should be applied to both levels.

## Considered Options

Source Options:
* Option 1: Follow tags example and use two query parameters for value and exists
* Option 2: Follow tags example but only implement the exists query parameter
* Option 3: Follow accept_get_urls example and use an empty query parameter
* Option 4: Do nothing

Flows:
* Option A: Apply the same querying capabilities at both Source and Flow level
* Option B: Only apply the new query capabilities at the Source level

## Decision Outcome

Recommendation: Option 3 since this already exists within the API as part of the webhooks, however this requires clarification of empty query parameters which is recommended to match the proposed behaviours.

Additionally it is recommended to add the same capability at the Flow level and again match this with the same parameter as used in the webhooks and clarify the empty query parameter behaviour.

### Implementation

See the API specification changes in PR [#xxx](https://github.com/bbc/tams/pull/xxx).

## Pros and Cons of the Options

### Option 1: Follow tags example and use a collected_by_exists query parameter

When querying tags there are two fields available - you query on a tag name to get the value, or to query on a second parameter (`tag_exists`) to find out if that tag exists.
Following this model would mean adding a two parameters - a new query parameter `collected_by_exists` with a boolean value and the `collected_by` to be able to search on one or more ID's.

For the boolean field setting to false this would return all Sources where there is no `collected_by` value present which is the required behaviour.
Setting this to true would return only Sources that have a `collected_by` value.
This option currently has no uses cases, however using this model and a boolean logically requires this behaviour.

| Behaviour | Query Parameter |
| --------- | --------------- |
| Source is not collected | `collected_by_exists=false` |
| Source is collected by specific Source | `collected_by=a46c49f1-4764-42b9-9f91-f267a58903c4` |
| Source is collected by any of the specified Sources | `collected_by=a46c49f1-4764-42b9-9f91-f267a58903c4,f3ac31bb-c66b-43f8-8362-c82e76f0d28d` |
| Source is collected by any Source | `collected_by_exists=true` |
| Note that this combination is nonsense | `collected_by_exists=false&collected_by=a46c49f1-4764-42b9-9f91-f267a58903c4` |

### Option 2: Follow tags example but only implement the exists query parameter

Since the use cases driving this requirement only focus on whether the item is `collected_by` another Source, then option 2 takes just the exists query parameter element from option 1 and implements that.
In this option it is not possible to query on the ID in the collection, only that it exists.

| Behaviour | Query Parameter |
| --------- | --------------- |
| Source is not collected | `collected_by_exists=false` |
| Source is collected by any Source | `collected_by_exists=true` |

* Good: Keeps it simple and only implements the required behaviour
* Good: Does not force the requirement to also filter on the ID values
* Bad: no currently known use case for when the parameter is set to true

### Option 3: Follow accept_get_urls example and use an empty query parameter

On the `/segments` end point it is possible to specify a query parameter of `accept_get_urls`.
This field can be a comma separated list of labels, however it is allowed to be empty which means that the `get_urls` are omitted in the result

Similarly in the webhooks it is possible to specify the `source_collected_by_ids` as an array of Source IDs.
In the current it is not entirely clear what the behaviour of the query parameter if the value is left empty.
However it is proposed to keep the same query parameter for comsistency then update the webhooks to clarify what should happen for an empty query parameter.

| Behaviour | Query Parameter |
| --------- | --------------- |
| Source is not collected | `source_collected_by_ids=` |
| Source is collected by specific Source | `source_collected_by_ids=a46c49f1-4764-42b9-9f91-f267a58903c4` |
| Source is collected by any of the specified Sources | `source_collected_by_ids=a46c49f1-4764-42b9-9f91-f267a58903c4,f3ac31bb-c66b-43f8-8362-c82e76f0d28d` |
| Source is collected by any Source | Not possible |

* Good: If filtering by ID is required then this keeps it to a single field
* Good: Uses an existing field which is already present in the API and provided consistency between different areas
* Bad: Requires the filtering by ID to be a logical query parameter

### Flow level implications

At the Flow level this is currently some ability to query and update Flow collections in the API.
This behaviour focuses on starting from a known Flow to which the collection is applied.
There is no equivalent endpoints at the Source level.

If chosen to also apply at the Flow level then need to consider is `flows_collected_by_id=a46c49f1-4764-42b9-9f91-f267a58903c4` too similar to `/flows/a46c49f1-4764-42b9-9f91-f267a58903c4/flow_collection`?
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The distinction kinda comes up for the role discussion in #173 - the query string parameter lets you search for more things, for example /flows?flows_collected_by_id=<uuid>&format=urn:x-nmos:format:video&codec=video/h264 which feels useful, while the flows/<uuid>/flow_collection is the canonical resource itself - you can edit it there

The first gives you all the metadata about the Flows, where the later only gives you the ID's and roles

Loading