Re: PATCH: jsonpath string methods: lower, upper, initcap, l/r/btrim, replace, split_part

David E. Wheeler <david@justatheory.com>

From: "David E. Wheeler" <david@justatheory.com>
To: Chao Li <li.evan.chao@gmail.com>
Cc: Álvaro Herrera <alvherre@kurilemu.de>, Florents Tselai <florents.tselai@gmail.com>, Tom Lane <tgl@sss.pgh.pa.us>, Peter Eisentraut <peter@eisentraut.org>, Robert Haas <robertmhaas@gmail.com>, Alexander Korotkov <aekorotkov@gmail.com>, pgsql-hackers <pgsql-hackers@lists.postgresql.org>, Andrew Dunstan <andrew@dunslane.net>
Date: 2025-10-28T19:38:47Z
Lists: pgsql-hackers

Attachments

On Oct 22, 2025, at 22:43, Chao Li <li.evan.chao@gmail.com> wrote:

> I wonder if there is some consideration for the order? Feels that jpiSttLtrim and the following jpiStrXXX should be placed above jpiTimeXXX.

I wouldn’t think the order would matter.

> I know “b” in “btrim” stands for “both”, just curious why trim both side function is named “btrim()”? In most of programming languages I am aware of, trim() is the choice.

This patch uses existing Postgres functions, of which btrim is one[1].

> + default:
> + ;
> + /* cant' happen */
> + }
> ``` 
> 
> As “default” clause has a comment “can’t happen”, I believe “break” is missing in the case clause.
> 
> Also, do we want to add an assert in default to report a message in case it happens?

Good call, will change.

> 6 - jsonpath_exec.c
> ```
> + resStr = TextDatumGetCString(DirectFunctionCall3Coll(replace_text,
> + C_COLLATION_OID,
> + CStringGetTextDatum(tmp),
> + CStringGetTextDatum(from_str),
> + CStringGetTextDatum(to_str)));
> ```
> 
> For trim functions, DEFAULT_COLLATION_OID used. Why C_COLLATION_OID is used for replace and split_part? I don’t see anything mentioned in your changes to the doc (func-json.sgml).

Intuitively that makes sense to me. Tests pass if I change it. Will update the patch.


> 7 - jsonpath_exec.c
> ```
> + if (!(jb = getScalar(jb, jbvString)))
> + RETURN_ERROR(ereport(ERROR,
> + (errcode(ERRCODE_INVALID_ARGUMENT_FOR_SQL_JSON_DATETIME_FUNCTION),
> +  errmsg("jsonpath item method .%s() can only be applied to a string",
> + jspOperationName(jsp->type)))));
> ```
> 
> ERRCODE_INVALID_ARGUMENT_FOR_SQL_JSON_DATETIME_FUNCTION seems wrong, this is a string function, not a date time function.

Yes. Maybe `ERRCODE_INVALID_PARAMETER_VALUE`? There’s also `ERRCODE_INVALID_JSON_TEXT`, but I think that’s about invalid bytes in a JSON string.

> The two nested “switch (jsp->type)” are quit redundant. We can pull up the second one, and simplify the first one, something like:

Well they assign different values to `func`: ltrim, rtrim, btrim when no arg vs ltrim1, rtrim1, btrim1 when there is an argument.

> 9 - jsonpath_exec.c
> ```
> + if (elem.type != jpiString)
> + elog(ERROR, "invalid jsonpath item type for .replace() from");
> +
> + from_str = jspGetString(&elem, &from_len);
> +
> + jspGetRightArg(jsp, &elem);
> + if (elem.type != jpiString)
> + elog(ERROR, "invalid jsonpath item type for .replace() to");
> ```
> 
> In these two elog(), do we want to log the invalid type? As I see in the “default” clause, jsp->type is logged:
> ```
> + default:
> + elog(ERROR, "unsupported jsonpath item type: %d", jsp->type);
> ```

I think it’s going on precedents such as

```
if (elem.type != jpiNumeric)
	elog(ERROR, "invalid jsonpath item type for .decimal() precision");
```

And also the date time method execution:

```
(errcode(ERRCODE_INVALID_ARGUMENT_FOR_SQL_JSON_DATETIME_FUNCTION),
errmsg("jsonpath item method .%s() can only be applied to a string",
		jspOperationName(jsp->type)))));
```

I see types mentioned only in the context of failed numeric conversions (ERRCODE_NON_NUMERIC_SQL_JSON_ITEM).

Updated patches attached.

Best,

David