Thread

Add xicorr(X, Y): support for the xi (ξ) correlation coefficient by Chatterjee

Florents Tselai <florents.tselai@gmail.com> — 2025-08-31T12:29:51Z
Hi hackers,

In my analytics work, I frequently conduct extensive correlation discovery.
i.e., given a list of columns, run corr(X, Y) over different pairs and see
what pairs score high.
Standard Postgres as-is offers the well-known corr(X, Y)
which is based on the classic Spearman correlation.
Its main drawback is that it detects linear associations.

Over the last 20 years, several measures have been proposed that can detect
non-linear relationships as well.
including the Kendall rank and the Maximal Information Coefficient.

The latest celebrity in the area is the xi (ξ) correlation coefficient
proposed by Chatterjee [0].
It's rank-based, and is very appealing due to its relatively simple
implementation.
You can view a by-hand computation in this video (
https://www.youtube.com/watch?v=2OTHH8wz25c)

I've already released pgxicor [1], an extension.
However, since Scipy has already added this to its library [2], I thought
I'd propose it for core PG as well.

Here’s a first cut of a patch at this stage I’m mainly looking to gauge
interest in including this in core.
Future versions will likely refine the implementation details (e.g., use
ArrayType instead of a growable buffer of doubles,
revisit the way ties are handled, and decide whether clamping of negative
values is appropriate).

[0] https://arxiv.org/pdf/1909.10140
https://souravchatterjee.su.domains/beam-correlation-trans.pdf
[1] https://github.com/Florents-Tselai/pgxicor
[2]
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chatterjeexi.html

https://discuss.scientific-python.org/t/new-function-scipy-stats-xi-correlation/1498