Thread

  1. Re: [WIP]Vertical Clustered Index (columnar store extension) - take2

    Japin Li <japinli@hotmail.com> — 2025-09-18T08:38:50Z

    On Thu, 18 Sep 2025 at 15:07, Peter Smith <smithpb2250@gmail.com> wrote:
    > Hi Timur,
    >
    > Thanks for your ongoing work for this patch.
    >
    > On Thu, Sep 18, 2025 at 1:15 AM Timur Magomedov
    > <t.magomedov@postgrespro.ru> wrote:
    > ...
    >> I've found (using valgrind) some cases of reading random garbage after
    >> allocated memory. Investigation showed this was caused by converting
    >> some nodes to VciScanState* even if they have smaller size allocated
    >> originally. So accessing VciScanState fields was accessing memory after
    >> palloc'ed memory which could be used by any other allocation.
    >>
    >> I think converting to VciScanState* is only valid for nodes with tag
    >> T_CustomScanState so here is a patch that adds assertions for this:
    >> 0001-Assert-corrrect-node-tags-when-casting-to-VciScanSta.patch
    >
    > What exactly did Valgrind report? For example, you said the
    > VciScanState points beyond allocated memory. Do you have any more
    > clues, like where that happened? Did you discover where that (smaller
    > than it should be) memory was allocated in the first place?
    >
    >>
    >> VCI v23 passes the tests with this patch applied.
    >
    > OK. I am not 100% certain about the asserts, but since the existing
    > VCI tests are passing, I have merged your patch as-is into v24-0002. I
    > guess we will find out later if the bug below is due to an old code
    > cast problem or a new code assert problem.
    >
    >>
    >> There are queries that fail unfortunately. I've added one of them to
    >> bugs.sql:
    >> 0002-Reproducer-of-invalid-cast-to-VciScanState.patch
    >> Node with tag 420 (T_NestLoopState) is cast to VciScanState* that fails
    >> newly added assertion.
    >>
    >> I'm not sure where to look next to fix this. Looking forward for you
    >> comments and ideas.
    >
    > OK. I ran with your 2nd patch applied and reproduced the core-dump
    > below, where it tripped over one of your new Asserts at
    > executor/vci_sort.c:89. I can see it is an unexpected value
    > T_NestLoopState.
    >
    I'm curious about the assert here. Why is it that children must be CustomScan?
    It seems that there is a JOIN node, and we do not support customizing the JOIN
    operation, right?
    
    > (gdb) bt 15
    > #0  0x00007ff948aa62c7 in raise () from /lib64/libc.so.6
    > #1  0x00007ff948aa79b8 in abort () from /lib64/libc.so.6
    > #2  0x0000000000c07977 in ExceptionalCondition
    > (conditionName=0x7ff940839fa8 "scanstate->vci.css.ss.ps.type ==
    > T_CustomScanState", fileName=0x7ff940839f90 "executor/vci_sort.c",
    >     lineNumber=89) at assert.c:66
    > #3  0x00007ff9408084e6 in vci_sort_ExecCustomPlan (node=0x2a862f0) at
    > executor/vci_sort.c:89
    > #4  0x000000000079d5bd in ExecCustomScan (pstate=0x2a862f0) at nodeCustom.c:137
    > #5  0x000000000077f693 in ExecProcNodeInstr (node=0x2a862f0) at
    > execProcnode.c:486
    > #6  0x000000000077f664 in ExecProcNodeFirst (node=0x2a862f0) at
    > execProcnode.c:470
    > #7  0x0000000000772b72 in ExecProcNode (node=0x2a862f0) at
    > ../../../src/include/executor/executor.h:316
    > #8  0x0000000000775774 in ExecutePlan (queryDesc=0x2a89100,
    > operation=CMD_SELECT, sendTuples=true, numberTuples=0,
    > direction=ForwardScanDirection, dest=0xe5b1a0 <donothingDR>)
    >     at execMain.c:1697
    > #9  0x000000000077317b in standard_ExecutorRun (queryDesc=0x2a89100,
    > direction=ForwardScanDirection, count=0) at execMain.c:366
    > #10 0x00007ff9407f9efd in vci_executor_run_routine
    > (queryDesc=0x2a89100, direction=ForwardScanDirection, count=0) at
    > executor/vci_executor.c:178
    > #11 0x0000000000772ff5 in ExecutorRun (queryDesc=0x2a89100,
    > direction=ForwardScanDirection, count=0) at execMain.c:301
    > #12 0x00000000006b7f66 in ExplainOnePlan (plannedstmt=0x2a8a628,
    > into=0x0, es=0x2a81388,
    >     queryString=0x28b0fd0 "EXPLAIN (ANALYZE, COSTS FALSE, BUFFERS
    > FALSE, TIMING FALSE, SUMMARY FALSE)\nSELECT *\n  FROM main m\n  JOIN
    > secondary s\n\tON m.id = s.main_id\n WHERE s.val in (\n\t\tSELECT
    > MAX(val)\n\t\t  FROM secondary s2\n\t\t W"..., params=0x0,
    > queryEnv=0x0, planduration=0x7ffe51311320, bufusage=0x0,
    > mem_counters=0x0) at explain.c:579
    > #13 0x00000000006b799a in standard_ExplainOneQuery (query=0x2ac21f8,
    > cursorOptions=2048, into=0x0, es=0x2a81388,
    >     queryString=0x28b0fd0 "EXPLAIN (ANALYZE, COSTS FALSE, BUFFERS
    > FALSE, TIMING FALSE, SUMMARY FALSE)\nSELECT *\n  FROM main m\n  JOIN
    > secondary s\n\tON m.id = s.main_id\n WHERE s.val in (\n\t\tSELECT
    > MAX(val)\n\t\t  FROM secondary s2\n\t\t W"..., params=0x0,
    > queryEnv=0x0) at explain.c:372
    > #14 0x00007ff9407f9ff3 in vci_explain_one_query_routine
    > (queryDesc=0x2ac21f8, cursorOptions=2048, into=0x0, es=0x2a81388,
    >     queryString=0x28b0fd0 "EXPLAIN (ANALYZE, COSTS FALSE, BUFFERS
    > FALSE, TIMING FALSE, SUMMARY FALSE)\nSELECT *\n  FROM main m\n  JOIN
    > secondary s\n\tON m.id = s.main_id\n WHERE s.val in (\n\t\tSELECT
    > MAX(val)\n\t\t  FROM secondary s2\n\t\t W"..., params=0x0,
    > queryEnv=0x0) at executor/vci_executor.c:224
    > (More stack frames follow...)
    > (gdb)
    >
    > I will keep investigating it...
    >
    > I have not included your test case in the v24* patches because I
    > didn't want this known test failure to mask out any other unknown test
    > problems that might occur.
    
    -- 
    Regards,
    Japin Li
    ChengDu WenWu Information Technology Co., Ltd.