Thread

  1. Add xicorr(X, Y): support for the xi (ξ) correlation coefficient by Chatterjee

    Florents Tselai <florents.tselai@gmail.com> — 2025-08-31T12:29:51Z

    Hi hackers,
    
    In my analytics work, I frequently conduct extensive correlation discovery.
    i.e., given a list of columns, run corr(X, Y) over different pairs and see
    what pairs score high.
    Standard Postgres as-is offers the well-known corr(X, Y)
    which is based on the classic Spearman correlation.
    Its main drawback is that it detects linear associations.
    
    Over the last 20 years, several measures have been proposed that can detect
    non-linear relationships as well.
    including the Kendall rank and the Maximal Information Coefficient.
    
    The latest celebrity in the area is the xi (ξ) correlation coefficient
    proposed by Chatterjee [0].
    It's rank-based, and is very appealing due to its relatively simple
    implementation.
    You can view a by-hand computation in this video (
    https://www.youtube.com/watch?v=2OTHH8wz25c)
    
    I've already released pgxicor [1], an extension.
    However, since Scipy has already added this to its library [2], I thought
    I'd propose it for core PG as well.
    
    Here’s a first cut of a patch at this stage I’m mainly looking to gauge
    interest in including this in core.
    Future versions will likely refine the implementation details (e.g., use
    ArrayType instead of a growable buffer of doubles,
    revisit the way ties are handled, and decide whether clamping of negative
    values is appropriate).
    
    [0] https://arxiv.org/pdf/1909.10140
    https://souravchatterjee.su.domains/beam-correlation-trans.pdf
    [1] https://github.com/Florents-Tselai/pgxicor
    [2]
    https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chatterjeexi.html
    
    https://discuss.scientific-python.org/t/new-function-scipy-stats-xi-correlation/1498