Thread
-
Re: SQL Property Graph Queries (SQL/PGQ)
Henson Choi <assam258@gmail.com> — 2025-12-31T06:23:13Z
SQL/PGQ Patch Review Report ============================ Patch: SQL Property Graph Queries (SQL/PGQ) Commitfest: https://commitfest.postgresql.org/patch/4904 Review Methodology: This review focused on quality assessment, not line-by-line code audit. Key code paths and quality issues were examined with surrounding context when concerns arose. Documentation files were reviewed with AI-assisted grammar and typo checking. Code coverage was measured using gcov and custom analysis tools. Limitations: - Not a security audit or formal verification - Parser and executor internals reviewed at module level, not exhaustively - Performance characteristics not benchmarked TABLE OF CONTENTS ----------------- 1. Executive Summary 2. Issues Found 2.1 Critical / Major 2.2 Minor Issues 2.3 Suggestions for Discussion 3. Test Coverage 3.1 Covered Areas 3.2 Untested Items 3.3 Unimplemented Features (No Test Needed) 4. Code Coverage Analysis 4.1 Overall Coverage 4.2 Coverage by File 4.3 Uncovered Code Risk Assessment 5. Recommended Additional Tests 5.1 Black-box Tests (Functional) 5.2 White-box Tests (Coverage) 5.3 Untestable Code (Defensive) 6. Recommendations 7. Conclusion 1. EXECUTIVE SUMMARY -------------------- Overall assessment: GOOD The SQL/PGQ patch demonstrates solid implementation quality within its defined scope. Code follows PostgreSQL coding standards, test coverage is comprehensive at 90.5%, and documentation is thorough with only minor typos. Rating by Area: - Code Quality: Excellent (PostgreSQL style compliant, 1 FIXME) - Test Coverage: Good (90.5% line coverage, ~180 test cases) - Documentation: Good (Complete, 5 minor issues) - Build/Regress: Pass (make check-world, 56 test suites passed) 2. ISSUES FOUND --------------- 2.1 Critical / Major (None) 2.2 Minor Issues #1 [Code] src/backend/commands/propgraphcmds.c:1632 FIXME: Missing index for plppropid column causes sequential scan. Decision needed: (a) add index, or (c) allow seq scan for rare path. #2 [Doc] doc/src/sgml/ref/alter_property_graph.sgml:286 Compatibility section incorrectly states "CREATE PROPERTY GRAPH" Should be: "ALTER PROPERTY GRAPH" #3 [Doc] doc/src/sgml/ref/alter_property_graph.sgml:65 Synopsis shows VERTEX/NODE and EDGE/RELATIONSHIP as separate clauses, but Description merges them as {VERTEX|NODE|EDGE|RELATIONSHIP}. #4 [Doc] doc/src/sgml/ref/alter_property_graph.sgml:80 Grammar error: "the graph removed" should be "the graph is removed" #5 [Doc] doc/src/sgml/queries.sgml:2929,2931 TODO comments for unimplemented features without clear limitation notes. #6 [Doc] doc/src/sgml/ddl.sgml:5717 Typo: "infrastucture" should be "infrastructure" 2.3 Alternative Approaches for Discussion #1 Support CREATE PROPERTY GRAPH IF NOT EXISTS Rationale: PostgreSQL-style extension, consistent with other DDL. #2 Return 0 rows for non-existent labels instead of error Current: SELECT * FROM GRAPH_TABLE(g MATCH (a:NonExistent) ...) raises error Alternative: Return empty result set instead Rationale: Better for exploratory queries, similar to SQL WHERE on non-existent values returning 0 rows rather than error. #3 Return 0 rows when same variable has different labels Current: SELECT * FROM GRAPH_TABLE(g MATCH (a:Person)-[e]-(a:Company) ...) raises error because variable 'a' has conflicting labels. Alternative: Return empty result set instead Rationale: Logically, a node cannot be both Person AND Company, so 0 rows is the correct result. Consistent with standard SQL WHERE semantics (impossible condition = 0 rows, not error). 3. TEST COVERAGE ---------------- 3.1 Covered Areas - CREATE PROPERTY GRAPH: Empty graph, VERTEX/EDGE TABLES, KEY, LABEL, PROPERTIES - ALTER PROPERTY GRAPH: ADD/DROP VERTEX/EDGE, ADD/DROP LABEL, ADD/DROP PROPERTIES - DROP PROPERTY GRAPH: Basic DROP, IF EXISTS - RENAME TO / SET SCHEMA: Tested in alter_generic.sql - OWNER TO: Permission change tests - GRAPH_TABLE queries: Pattern matching, WHERE, COLUMNS, LATERAL, etc. - RLS integration: PERMISSIVE/RESTRICTIVE policies, inheritance/partition tables - Error cases: Type/collation mismatch, non-existent objects, etc. - psql integration: \dG, \d commands - pg_dump: Tested in t/002_pg_dump.pl 3.2 Untested Items - CREATE TEMP PROPERTY GRAPH (Medium priority) - DROP ... CASCADE (Low priority) - DROP ... RESTRICT (Low priority) 3.3 Unimplemented Features (No Test Needed) - Multiple path patterns: Parser supports, execution not implemented - Quantifier {n,m}: Parser supports, execution not implemented 4. CODE COVERAGE ANALYSIS ------------------------- 4.1 Overall Coverage 90.5% (2,029 / 2,243 lines) 4.2 Coverage by File propgraphcmds.c: 94.6% (DDL command processing) rewriteGraphTable.c: 94.6% (GRAPH_TABLE rewrite) parse_graphtable.c: 97.6% (Graph query parser) parse_clause.c: 98.0% (FROM clause parser) ruleutils.c: 85.1% (pg_get_propgraphdef, etc.) pg_dump.c: 84.0% (Dump/restore) describe.c: 88.9% (psql \dG command) 4.3 Uncovered Code Risk Assessment Low Risk (13 items): - Defensive code, debug functions, unimplemented feature guards Medium Risk (1 item): - TEMP graph functionality untested Conclusion: Most uncovered code consists of error handling, unimplemented feature guards, and debug utilities. No security or functional risk. 5. RECOMMENDED ADDITIONAL TESTS ------------------------------- 5.1 Black-box Tests (Functional) Based on feature specification, independent of code structure. Feature Test Case Priority ------------------------------------------------------------------------------- TEMP graph CREATE TEMP PROPERTY GRAPH Medium TEMP graph TEMP graph with permanent table reference Medium TEMP graph ALTER ADD permanent table to TEMP graph Medium CASCADE DROP ... CASCADE (dependent view cascade) Low RESTRICT DROP ... RESTRICT (dependent object error) Low 5.2 White-box Tests (Coverage) Based on uncovered code paths identified in coverage analysis. File:Line Test Case Target Branch ------------------------------------------------------------------------------- propgraphcmds.c:136,179,254-262 TEMP table in graph creation Auto TEMP conversion propgraphcmds.c:1323,1364 ALTER ADD TEMP to perm graph Error branch propgraphcmds.c:724-734 CREATE without LABEL clause Default label else propgraphcmds.c:1300,1303 ALTER IF EXISTS non-existent missing_ok branch propgraphcmds.c:116 CREATE UNLOGGED attempt UNLOGGED error execMain.c:1162-1163 DML on graph RELKIND_PROPGRAPH rewriteGraphTable.c:120 Multiple path patterns Length check rewriteGraphTable.c:205 Quantifier usage Quantifier error ruleutils.c:7939-7963 Complex label VIEW deparsing T_BoolExpr branch 5.3 Untestable Code (Defensive) File:Line Reason ------------------------------------------------------------------------------- rewriteGraphTable.c:202 PAREN_EXPR blocked at parser level rewriteGraphTable.c:297,304,319 Cyclic pattern edge conflict - unreachable rewriteGraphTable.c:801-818 get_gep_kind_name - error path only nodeFuncs.c:4585-4596,4755-4774 Walker branches - internal implementation 6. RECOMMENDATIONS ------------------ 6.1 Code Review Required (Minor, Decision Needed) Location: propgraphcmds.c:1632 Issue: FIXME - No single-column index for plppropid Current: InvalidOid causes sequential scan Purpose: Check property references across all labels when deleting orphaned properties Options: (a) Add index: Create new single-column index on plppropid (b) Use existing: NOT POSSIBLE (cannot specify plpellabelid, need all labels) (c) Allow: Rare path (property deletion), sequential scan acceptable 6.2 Documentation Fixes (Minor) - alter_property_graph.sgml: 3 typos/errors to fix - queries.sgml: Add clear limitation notes for unimplemented features - ddl.sgml: Fix "infrastucture" typo 6.3 Test Additions (Optional) - CREATE TEMP PROPERTY GRAPH tests - DROP ... CASCADE/RESTRICT tests 7. CONCLUSION ------------- Test Quality: GOOD Core functionality is thoroughly tested with comprehensive error case and security (RLS) integration tests. The patch is well-implemented within its defined scope. Identified issues are minor documentation typos and one code decision point (FIXME). No critical or major issues found. Recommended actions before commit: 1. Decide on FIXME at propgraphcmds.c:1632 (index vs. allow seq scan) 2. Fix 5 documentation issues 3. Consider adding TEMP graph tests (optional but recommended) Points for discussion (optional): 4. CREATE PROPERTY GRAPH IF NOT EXISTS support 5. Error vs. 0 rows behavior for non-existent/conflicting labels Attachment: - coverage_report.tar.gz (HTML coverage report generated by gcov) --- End of Report 2025년 12월 30일 (화) PM 8:27, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>님이 작성: > On Tue, Dec 30, 2025 at 3:14 PM Ashutosh Bapat > <ashutosh.bapat.oss@gmail.com> wrote: > > > > On Thu, Dec 18, 2025 at 2:45 PM Ashutosh Bapat > > <ashutosh.bapat.oss@gmail.com> wrote: > > > > > > > > > > > > > > I did another investigation about whether this level of checking is > > > > necessary. I think according to the letter of the SQL standard, the > > > > typmods must indeed match. It seems Oracle does not check (the > example > > > > mentioned above came from an Oracle source). But I think it's okay > to > > > > keep the check. In PostgreSQL, it is much less common to write like > > > > varchar(1000). And we can always consider relaxing it later. > > > > > > +1. > > > > > > Attached patch adds a couple more test statements. > > > > > > > Squashed this into the main patchset. > > > > > > > > > > 2) I had it in my notes to consider whether we should support the > colon > > > > syntax for label expressions. I think we might have talked about > that > > > > before. > > > > > > > > I'm now leaning toward not supporting it in the first iteration. I > > > > don't know that we have fully explored possible conflicts with host > > > > variable syntax in ecpg and psql and the like. Maybe avoid that for > now. > > > > > > > > > > I was aware of ecpg and I vaguely remember we fixed something in ECPG > > > to allow : in MATCH statement. Probably following changes in > > > psqlscan.l and pgc.l > > > -self [,()\[\].;\:\+\-\*\/\%\^\<\>\=] > > > +self [,()\[\].;\:\|\+\-\*\/\%\^\<\>\=] > > > > > > Those changes add | after : (and not the : itself) so maybe they are > > > not about supporting : . Do you remember what those are? > > > > I reverted those changes from both the files and ran "meson test". I > > did not observe any failure. It seems those changes are not needed. > > But adding them as a separate commit (0004) in case CI bot reveals any > > failures without them. > > > > I noticed that there were no ECPG tests for SQL/PGQ. Added a basic > > test in patch 0003. > > > > > > > > I spotted some examples that use : in ddl.sgml. > > > <programlisting> > > > SELECT customer_name FROM GRAPH_TABLE (myshop MATCH > > > (c:customer)-[:has]->(o:"order" WHERE o.ordered_when = current_date) > > > COLUMNS (c.name AS customer_name)); > > > </programlisting> > > > > > > The query demonstrates that one can use label names in a way that will > > > make the pattern look like an English sentence. Replacing : with IS > > > defeats that purpose. > > > > > > As written in that paragraph, the labels serve the purpose of exposing > > > the table with a different logical view (using different label and > > > property names). So we need that paragraph, but I think we should > > > change the example to use IS instead of :. Attached is suggested > > > minimal change, but I am not happy with it. Another possibility is we > > > completely remove that paragraph; I don't think we need to discuss all > > > possible usages the users will come up with. > > > > > > The patch changes one more instance of : by IS. But that's straight > forward. > > > > > > In ddl.sgml I noticed a seemingly incomplete sentence > > > A property graph is a way to represent database contents, instead > of using > > > relational structures such as tables. > > > > > > Represent the contents as what? I feel the complete sentence should be > > > one of the following > > > property graph is a way to represent database contents as a graph, > > > instead of representing those as relational structures OR > > > property graph is another way to represent database contents instead > > > of using relational structures such as tables > > > > > > But I can't figure out what was originally intended. > > > > > > 0002 contains some edits to this part of documentation. I think the > > paragraph reads better than before. Let me know what you think. > > > > Please let me know which of 0002 to 0004 look good to you. I will > > squash those into the patchset in the next version. > > The previous patchset had a whitespace difference in ECPG expected > output files. Fixed in the attached patchset. > > -- > Best Wishes, > Ashutosh Bapat >