Thread
-
Question about hashed ScalarArrayOpExpr equality semantics
Ayush Tiwari <ayushtiwari.slg01@gmail.com> — 2026-05-11T04:34:38Z
Hi, While looking at the hashed ScalarArrayOpExpr path, I noticed that the linear and hashed evaluation paths can give different answers if the comparison function returns NULL for non-NULL inputs. The part I am trying to understand is what assumption the hashed path is allowed to make about equality operators used for hashing. Does declaring an equality operator as HASHES / putting it in a hash opclass imply that, for non-NULL inputs, the equality result is a definite true or false value? Reproduced on master. Setup: CREATE FUNCTION weird_strict_int_eq(int, int) RETURNS bool LANGUAGE sql IMMUTABLE STRICT AS $$ SELECT CASE WHEN $1 = 42 AND $2 = 42 THEN NULL ELSE $1 = $2 END $$; CREATE FUNCTION weird_strict_int_ne(int, int) RETURNS bool LANGUAGE sql IMMUTABLE STRICT AS $$ SELECT NOT weird_strict_int_eq($1, $2) $$; CREATE OPERATOR === ( LEFTARG = int, RIGHTARG = int, PROCEDURE = weird_strict_int_eq, COMMUTATOR = ===, NEGATOR = !==, HASHES ); CREATE OPERATOR !== ( LEFTARG = int, RIGHTARG = int, PROCEDURE = weird_strict_int_ne, COMMUTATOR = !==, NEGATOR = === ); CREATE OPERATOR CLASS weird_strict_int_hash_ops FOR TYPE int USING hash AS OPERATOR 1 ===, FUNCTION 1 hashint4(int); The hashed ScalarArrayOpExpr path activates once the constant array has at least MIN_ARRAY_SIZE_FOR_HASHED_SAOP (currently 9) elements, so the 8-element variant uses the linear evaluation and the 9-element variant uses the hash table. I wrap the left-hand side in a non-immutable function call so the expression is not constant-folded at plan time. CREATE FUNCTION return_int_input(int) RETURNS int LANGUAGE sql STABLE AS $$ SELECT $1 $$; SELECT return_int_input(42) === ANY (ARRAY[1,2,3,4,5,6,7,42]) AS linear_any, return_int_input(42) === ANY (ARRAY[1,2,3,4,5,6,7,8,42]) AS hashed_any, return_int_input(42) !== ALL (ARRAY[1,2,3,4,5,6,7,42]) AS linear_not_in, return_int_input(42) !== ALL (ARRAY[1,2,3,4,5,6,7,8,42]) AS hashed_not_in; The linear path preserves the NULL (UNKNOWN) result from the comparison. The hashed path seems to treat a NULL from the comparison as a non-match, producing a different result. I realize this is a strange equality operator, and that may be the whole point. For a hash table lookup, it seems reasonable to need a definite answer to "does this stored key match the lookup key?". On the other hand, ScalarArrayOpExpr itself has SQL three-valued semantics, and the linear path does preserve the comparison's NULL result. So my question is: is the hashed path allowed to assume that hashable equality operators never return NULL for non-NULL inputs, or should it preserve the same UNKNOWN result that the linear ScalarArrayOpExpr evaluation would produce? Regards, Ayush