Skip to content

Array / aggregate column filters on landing pages

Date: 2026-06-03

Problem / motivation

findLandingPages(sourceType: ITEMS, filterGroups: [...]) silently ignored filters on tagIds. Tags on an item are exposed as array_agg(tags.id) / array_agg(tags.tag_name) aggregate columns (see ItemRepository.paginated_stmt), and the generic filter builder (apply_filters) had no concept of array columns:

  1. Filtering on tagId (singular) was dropped because the type-hint lookup found no such field on ItemLandingPageResponse.
  2. Filtering on tagIds (plural) blew up in safe_parse_valueissubclass(list[uuid.UUID], bool) raises, so the filter was skipped with a warning.
  3. Even if parsed, the builder emitted array_agg(...) IN (...), which is invalid SQL: an aggregate can only appear in HAVING, and an array must be matched with array operators, not scalar IN.

A second, pre-existing bug was uncovered while tracing this: total_rows runs SELECT count(*) ... GROUP BY ... on grouped landing statements and returns the first group's row count (via scalar()) instead of the number of groups, so the items landing page total is already wrong today.

In scope

  • Detect array-typed columns in the generic filter builder and translate membership operators (IN/EQ/NOT_IN/NE) into PostgreSQL array-overlap (&&) conditions with the value flattened (no [[...]] double-wrap) and cast to the right element type.
  • Route conditions on aggregate columns (e.g. array_agg) to HAVING; route plain array table-columns to WHERE.
  • Support OR-combining an aggregate (HAVING) condition with a row-level (WHERE) condition inside a single filterGroup by promoting the row-level legs into HAVING via bool_or(...).
  • Make total_rows correct for grouped statements (subquery count), so the filtered total is meaningful.
  • Unit tests for the new behaviour and the total_rows fix.

Out of scope

  • New GraphQL types/fields. This is a runtime filter change only — no schema delta.
  • Ordering (orderBy) on array/aggregate columns.

Data model changes

None. No alembic migration (no schema change).

GraphQL surface

No new queries, mutations, types, inputs, or enums. findLandingPages / findLandingPageKpis behave the same; the only change is that filterGroups / filters targeting array columns (tagIds, tags) now produce SQL instead of being silently dropped.

Example that now works:

query {
  findLandingPages(
    sourceType: ITEMS
    filterGroups: [{
      operator: OR
      filters: [
        { columnName: "tagIds", operator: IN, value: "5736b624-...-49ee1" }
        { columnName: "tagIds", operator: IN, value: "d4ebe55c-...-dc79c" }
      ]
    }]
  ) {
    total
    records { ... on ItemLandingPage { id tags tagIds } }
  }
}

emits roughly:

... GROUP BY ...
HAVING (array_agg(tags.id) FILTER (WHERE tags.id IS NOT NULL) && ARRAY[:a]::uuid[]
     OR array_agg(tags.id) FILTER (WHERE tags.id IS NOT NULL) && ARRAY[:b]::uuid[])

RBAC

No new Path / Resource. No backfill.

Background tasks / cron

None.

Frontend contract

  • Filtering items (and any future array-backed landing column) by tag is now done with columnName: "tagIds" (UUIDs) or columnName: "tags" (tag names), operator IN.
  • Each value is matched as "the row has any of these" (array overlap). To match several tags as OR, send one filter per value inside an OR group (as in the example), or pass a JSON array of values to a single filter.
  • NOT_IN / NE mean "the row has none of these" and include rows with no tags.
  • You can mix a tagIds filter and a row-level column filter with operator: OR in the same group (e.g. number = 'X' OR tagIds IN [...]) — the row-level leg is promoted into HAVING via bool_or(...) so it behaves as a true OR.
  • total on grouped landing pages (items) is now the real count of matching rows.

What's being implemented

  • filters.py:
  • New FilterTarget enum + BuiltCondition dataclass; build_filter_condition now returns BuiltCondition | None carrying the WHERE/HAVING target.
  • _contains_aggregate, _array_element_type, _build_array_condition, _coerce_array_values helpers.
  • apply_filters splits conditions into WHERE/HAVING buckets, combines per group, and applies each to the right clause. An OR group mixing both kinds promotes the WHERE legs into HAVING via bool_or(...).
  • total_rows wraps grouped statements in a subquery before counting.
  • tests/graphql/common/test_filters.py: existing tests updated for the new return type; new tests for array overlap, HAVING routing, value flattening, mixed-group degradation, and the grouped total_rows fix.

Future additions

  • orderBy on array/aggregate columns. apply_order_by still rejects them. Deferred — sorting by tag set has no defined product meaning yet.

Open questions

None.