Skip to content

[multistage] physical optimizer lookup join support #17961

@dang-stripe

Description

@dang-stripe

While testing the physical optimizer, we get this error when trying to invoke a lookup join:

Error log:

"message": "Received 1 error from stage N on Server_1234: Right input must be leaf operator",

Query:

WITH "source_1" AS (
  SELECT
    'v1' AS "column_1",
    1234567890 AS "column_2"
  FROM "table_3"
  LIMIT 1
)
SELECT
  /*+ "joinOptions"("join_strategy" = 'lookup') */
  a."column_1",
  a."column_2",
  b."column_3"
FROM "source_1" a
INNER JOIN "table_2" b
  ON b."column_1" = a."column_1"
 AND b."column_2" = a."column_2"

Claude's analysis:

The V2 physical optimizer (usePhysicalOptimizer=true) does not correctly handle lookup joins. When a query uses /*+ joinOptions(join_strategy='lookup') */, V2 inserts a BROADCAST exchange on the right side (dim table), which splits the dim table scan into a separate fragment from the join. At runtime, LookupJoinOperator requires rightInput instanceof LeafOperator but receives a MailboxReceiveOperator instead, throwing "Right input must be leaf operator"

cc @ankitsultana @suvodeep-pyne

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions