Thread

  1. Re: Coccinelle for PostgreSQL development [1/N]: coccicheck.py

    Mats Kindahl <mats.kindahl@gmail.com> — 2025-11-04T09:31:31Z

    On 11/3/25 10:55, Bertrand Drouvot wrote:
    > Hi,
    >
    > On Thu, Oct 30, 2025 at 02:32:45PM +0100, Mats Kindahl wrote:
    >> Hi all,
    >>
    >> Here is an updated set of patches based on the latest HEAD of PostgreSQL.
    > Thanks for those patches and the initiative! Sorry to be late, I started
    > to look at Coccinelle "for PostgreSQL" in [1] (to ensure some macros are used)
    > and saw this thread.
    
    Hi Bertrand and thank you for the comments.
    
    > I did not look at the patches in detail, but sharing some thoughts.
    >
    > I agree that Coccinelle could be/is useful in relation to PostgreSQL development,
    > but I think that we'd need to determine why it would useful to add the
    > coccicheck.py script and the new dependencies in autoconf/meson.
    >
    > Coming back to your points and thinking if it could be used as a tool or integrated
    > into the build system:
    >
    > 1) Identify and correct bugs in the source code both during development and
    > review
    >
    > I agree that makes sense here. Having some bugs detected "automatically" would
    > be great.
    >
    > 2) Make large-scale changes to the source tree to improve the code based on
    > new insights
    >
    > I'm not sure we need to introduce coccicheck.py and add dependencies in
    > autoconf/meson for this. The developer would need to know that he could use
    > the Coccinelle and if he already knows then nothing prevents him from using it
    > in his development tool box.
    >
    > 3) Encode and enforce APIs by ensuring that function calls are used
    > correctly
    >
    > Same as 2) That said we could also imagine running yearly checks automatically
    > too using coccicheck.py.
    >
    > 4) Use improved coding patterns for more efficient code
    >
    > Same as 3) from my point of view.
    >
    > 5) Allow extensions to automatically update code for later PostgreSQL
    > versions
    >
    > Same as 2) from my point of view.
    >
    > So, I think that the current proposal (i.e build system integration) is a good
    > fit for 1), less so for 3) and 4) and not necessarily needed for 2) and 5).
    >
    > The proposal will add new dependencies (as Michael stated up-thread) and introduce
    > a new language (SmPL) that folks would need to be comfortable with to review
    > the .cocci scripts.
    >
    > I don't have an answer to it but I think that the main question is: Should we
    > integrate this into the build system, or just document it as an optional
    > developer tool (wiki or such and provide .cocci scripts example)?
    
    All of these are good points. My main reason for proposing this is that 
    I think that using Coccinelle more systematically would improve the 
    quality of the code and make it easier to maintain. The build system 
    changes and the coccicheck.py script is a proposal for how to make this 
    happen, but I realize there are alternative ways as well as tradeoffs to 
    be made (e.g., is the extra maintenance burden worth the improvement in 
    quality?)
    
    I think there are three issues here:
    
     1. Shall the semantic patches be in the source tree?
     2. Shall we use the coccicheck.py script?
     3. Shall we support this in the build system, that is, meson and automake?
    
    Putting the coccicheck.py script aside for a short while, I think there 
    is value in keeping the semantic patches in the source tree rather than 
    adding them to a wiki for the following reasons:
    
    It makes it easy for reviewers to check that the code reviewed follows 
    "best practices" (of course, assuming that the .cocci scripts represent 
    "best practices") instead of having to download and run these scripts 
    from another source. Having them on a separate page would reduce the 
    value since it makes it more difficult to use.
    
    If they are part of the repository the .cocci scripts will be maintained 
    and updated to match the code in the source tree. If they are on a 
    separate wiki page, or a different repository, they are likely to be 
    version dependent and code might change so that they are not relevant 
    any more, which makes it more difficult to use them for reviewing and 
    checking code since that would also require updating them to match the 
    new code before applying them.
    
    A drawback is that it is an extra maintenance burden (probably small, 
    but extra work nevertheless) and would require the semantic patches to 
    be executed regularly, by the build system or by reviewers, and check 
    that they do not report anything. If these checks are not done 
    regularly, it is possible that the scripts would become obsolete after a 
    while. By making it easy to run the scripts, we are more likely to run 
    them and discover issues in the scripts as well as in the code.
    
    About integrating it with the build system. Note that there are three 
    modes: report, context, and (generate) patch. The important modes are 
    report and patch.
    
    The main advantage of integrating it with the build system is is that it 
    is easy to run, which makes it easy to ensure that the semantic patches 
    do not report anything for both reviewers, developers, and build 
    systems. This will improve the quality of both the semantic patches and 
    the source code.
    
    As you say, it does add an extra dependency, but this should optional 
    and only be used if spatch is installed. For parties not interested in 
    using spatch, hence does not have spatch installed, it will not be used 
    so it should not interfere with them.
    
    Integrating it with the build system would also allow builders and 
    reviewers to use the same method for checking the code using report mode 
    without regard to what version is being used, you would run the checks 
    the same way for each minor release, for example.
    
    As you say, coccicheck.py script is strictly speaking not needed to use 
    Coccinelle and is more intended as a convenience. The advantages of 
    using something like the coccicheck.py script is that:
    
    The coccicheck.py script reads options from the semantic patches and 
    uses them when running spatch. If using spatch directly, you would have 
    to check the scripts, extract the options, and then use them when 
    running spatch. Not impossible, but an extra step that makes it more 
    inconvenient to use Coccinelle, which in turn makes it less likely to be 
    used and maintained.
    
    You could automate the extraction of options from the semantic patches, 
    but this script provides a platform-independent method to run the checks 
    for reviewers, developers, and build systems. Note that different 
    semantic patches can use different options, so if you want to automate 
    this, you would need to write something like coccicheck.py anyway.
    
    Best wishes,
    Mats Kindahl
    
    >
    > [1]:https://www.postgresql.org/message-id/aQMtR/m4kW4Rkul%2B%40ip-10-97-1-34.eu-west-3.compute.internal
    >
    > Regards,
    >