Conversation
WalkthroughReplaces the prior FDW page with a step-based guide: adds Prerequisites (FDW v5.7+, PostgreSQL wire-protocol), stepper-style setup (firewall, Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Moved an updated FDW page we had in gitbook to the Ingestion section with minor updates.
Thank you for any updates. However, please don't relocate: FDW has far more applications than just "ingestion". If you want to have it mentioned more prominently in this section, please tease and cross-link accordingly.
Fair enough, we can get back to that part later. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@docs/feature/fdw/index.md`:
- Around line 69-92: The SQL example uses remote_readings but the foreign table
was created as doc.remote_readings; update the query example to use the
fully-qualified table name doc.remote_readings (or add a short note that the doc
schema must be in the search_path) so the example will work as-is; locate the
example query block that references remote_readings and replace it with
doc.remote_readings (or add the search_path note directly above the query).
🧹 Nitpick comments (1)
docs/feature/fdw/index.md (1)
57-62: Consider a short note about credentials handling.The example inlines
password '*****'. For docs, it may be worth noting whether secrets should be managed via secure settings or tooling to avoid embedding credentials in plain text.
|
(What an insane amount of timeouts we have in the build (https://github.com/crate/cratedb-guide/actions/runs/21951807340/job/63404431908?pr=548). |
Oh. Either it's bad Internet weather today, or SREs are ramping up countermeasures against rogue web crawlers. This will certainly make legitimate link checkers have bad times going forward. |
I'll give them some time then. Thanks for all the feedback! (Build errors are once again about rate-limiting in the link-checks. Addressed in theme.) |
| :::{rubric} System Tables | ||
| ### Set firewall rules | ||
|
|
||
| Ensure outbound firewall rules allow CrateDB → remote DB traffic before |
There was a problem hiding this comment.
Do we need to mention which TCP port (5432) needs to be open?
There was a problem hiding this comment.
we could as an example, but 5432 is not strict, just the default port.
There was a problem hiding this comment.
Thanks. The update 19b68bc includes a relevant comment.
There was a problem hiding this comment.
I wonder what value this content will bring as it seems to be roughly a 1:1 copy of the FDW reference documentation. Without adding any additional content (if needed at all), I'd suggest to keep it as-is (what was the issue here?) or link to the reference content more prominent.
If there is something to add to the existing reference documentation, please improve it instead of creating copies.
Thats a very reasonable concern. And it's valid for a LOT of the pages we have. We haven't come to reorganizing the Reference yet. But if it was up to me, and time enough was available, we would reorganize it significantly. Right now there is way too much overlap across many pages. But this still fits the current direction, and having multiple pages "under construction" in the "All Features" section doesn't give much confidence in the state of the docs tbh. Right now the FDW in Reference is burried under "Administration" which seems odd btw. In flow with the current direction, I would still suggest to add this. If we are to not duplicate info at all, we need another strategic direction and can delete a lot of pages. (which I totally would support - we just don't have the capacity to do it). |
seut
left a comment
There was a problem hiding this comment.
Thats a very reasonable concern. And it's valid for a LOT of the pages we have.
I didn't write the content here itself - I only populated it from Gitbook. The original thought was that in general a lot of the content in Reference is very detailed, and something more easy to get started with and organized for easy discovery was needed.
Only because this was done in the past, does not justify to me to keep up with this. Also, please judge by case, as this is really a 1:1 copy, no further guidance is added here, afaik.
Copying content makes everything more complex to maintain for no good reason, so let us stop here please.
If we really need to include the same content completely in different places, we should imho look into different solution like including remote content, e.g. by https://pypi.org/project/sphinxext-remoteliteralinclude/.
|
IMHO, the current page needs a make-over. If we don't want to gather documentation about FDW in one page, I suggest that we simplify the page to mention that the feature exists and link to the two following pages:
By simplifying, I mean removing everything from "Synopsis" and down, and remove "SQL functions" and "System tables". |
I absolutely agree. Currently, it is obviously just a stub. While I was happy that someone would fill the gaps, it doesn't make sense to add redundant content, which likely has been created by AI-based summarization tools instead of conceiving unique content. If this is the case here, I will come up with a different proposal as we go. |
In this case it was actually a human :-). |
|
I'd like to apologize if it was a human in this case, I don't want to hurt anyones feelings. We've lost a bit track, and also GH-264 did not make much progress 1, so we thought actually improving content was mostly out of scope, because it was merely a rehash of existing material in a different structure. After this converged progressively, it became more difficult to pick any cherries. Footnotes
|
There was a problem hiding this comment.
Hi again. I revisited the review because I didn't do a thorough content comparison before, and because @seut had admonitions about it, also content-wise. I've identified the primary offending section now, and I think the patch will be good to go if we just remove it and more prominently refer to the reference documentation page instead. Thanks!
I think the excellent stepper-based presentation of a typical FDW workflow is absolutely the right choice for this kind of page within the "all features" section that aims to provide an optimal layer above the reference documentation, to improve on guidance matters. It would be sad to leave this behind.
docs/feature/fdw/index.md
Outdated
| Query clauses like `GROUP BY`, `HAVING`, `LIMIT` or `ORDER BY` are executed | ||
| within CrateDB, not within the foreign system. `WHERE` clauses can in some | ||
| circumstances be pushed to the foreign system, but that depends on the | ||
| concrete foreign data wrapper implementation. You can check if this is the | ||
| case by using the {ref}`crate-reference:ref-explain` statement. | ||
|
|
||
| For example, in the following explain output there is a dedicated `Filter` | ||
| node, indicating that the filter is executed within CrateDB: | ||
|
|
||
| ```sql | ||
| CREATE FOREIGN TABLE doc.remote_documents (name text) | ||
| SERVER my_postgresql | ||
| OPTIONS (schema_name 'public', table_name 'documents'); | ||
| EXPLAIN SELECT ts, value FROM remote_readings WHERE device = 'sensor-42'; | ||
| ``` | ||
|
|
||
| ```text | ||
| +--------------------------------------------------------------------------+ | ||
| | QUERY PLAN | | ||
| +--------------------------------------------------------------------------+ | ||
| | Filter[(device = 'sensor-42')] (rows=0) | | ||
| | └ ForeignCollect[doc.remote_readings | [device] | true] (rows=unknown) | | ||
| +--------------------------------------------------------------------------+ | ||
| ``` |
There was a problem hiding this comment.
It looks like this is the primary offending section with a 100% redundancy to the reference documentation page. Can you just remove it?
There was a problem hiding this comment.
Thanks! I'm not entirely sure what exactly you mean should be removed. The whole diff? (92-112)?
Instead of back and forth with me just acting as a scribe, perhaps it's easiest if you just edit the way you can see it being good? 🙏
There was a problem hiding this comment.
It is this original content I am referring to. It seems to have been copied 1:1 to that spot?
cratedb-guide/docs/feature/fdw/index.md
Lines 92 to 112 in f90aa90
There was a problem hiding this comment.
Aha. I see now that the SQL statement is different to probably match the tutorial, and that the paragraph above is the teaser to that, which is probably needed to make sense. Well, it still bears 95% redundancy otherwise, so I'd still vote to remove the section to satisfy corresponding admonitions from colleagues.
|
I put this back to draft as it's not ready for merging. @amotl If you have an exact idea about where to take this, I would appreciate if you do the edits. |
| ## Example | ||
|
|
||
| :::{card} | ||
| :link: https://github.com/crate/cratedb-examples/tree/main/application/roapi | ||
| :link-type: url | ||
| {material-regular}`play_arrow;2em` | ||
| Integrating ROAPI data sources with CrateDB. | ||
| +++ | ||
| Demonstrates how to mount ROAPI data sources as tables in CrateDB | ||
| using the PostgreSQL foreign data wrapper. | ||
| ::: |
There was a problem hiding this comment.
Does it also work the other way round, to mount tables from CrateDB into PostgreSQL? ClickHouse provides a dedicated extension for PostgreSQL, but with standard FDW technologies, CrateDB doesn't even need it?
There was a problem hiding this comment.
@coderabbitai: Please carry that forward into a dedicated issue in crate/crate-clients-tools, or alternatively crate/cratedb-guide if the former doesn't work. Thanks!
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Try to create issue in crate/crate-clients-tools first
gh issue create \
--repo crate/crate-clients-tools \
--title "Explore: Mounting CrateDB tables into PostgreSQL via standard FDW (reverse direction)" \
--body "## Background
This issue was raised in the context of [cratedb-guide PR `#548`](https://github.com/crate/cratedb-guide/pull/548#discussion_r3029349045) by `@amotl`, while documenting the CrateDB Foreign Data Wrapper (FDW) feature.
## Question / Topic
CrateDB currently supports FDW to *read* from remote PostgreSQL servers (or any service speaking the PostgreSQL wire protocol). But does it also work the **other way round** — mounting tables *from CrateDB* into PostgreSQL using standard FDW technologies?
ClickHouse, for comparison, provides a dedicated PostgreSQL extension ([pg_clickhouse](https://github.com/ClickHouse/pg_clickhouse)) for this purpose. However, since CrateDB speaks the PostgreSQL wire protocol, it may be possible to use **standard PostgreSQL FDW** (e.g., \`postgres_fdw\`) to connect PostgreSQL to CrateDB without needing a dedicated extension.
## Potential Scope
- Investigate whether \`postgres_fdw\` (or another standard FDW) can be used in PostgreSQL to mount CrateDB tables.
- Document the steps, limitations, and any known compatibility issues.
- Consider adding a guide or note in [cratedb-guide](https://github.com/crate/cratedb-guide) if the scenario is feasible.
## References
- PR: https://github.com/crate/cratedb-guide/pull/548
- Comment: https://github.com/crate/cratedb-guide/pull/548#discussion_r3029349045
- Requested by: `@amotl`
" 2>&1
Length of output: 55
@amotl, done! I've created the issue in crate/crate-clients-tools:
👉 Explore: Mounting CrateDB tables into PostgreSQL via standard FDW (reverse direction)
It captures the question about whether standard PostgreSQL FDW (e.g., postgres_fdw) can be used to mount CrateDB tables into PostgreSQL — without needing a dedicated extension like ClickHouse's pg_clickhouse — given that CrateDB speaks the PostgreSQL wire protocol. It also includes backlinks to this PR and the originating comment.
🧠 Learnings used
Learnt from: amotl
Repo: crate/cratedb-guide PR: 385
File: docs/connect/java.md:48-51
Timestamp: 2025-10-08T01:34:18.867Z
Learning: CrateDB JDBC driver uses the `jdbc:crate://` protocol scheme but communicates via the PostgreSQL wire protocol on port 5432, just like the PostgreSQL JDBC driver (`jdbc:postgresql://`). Do not confuse the `jdbc:crate://` scheme with other protocol schemes like `crate://` (used by SQLAlchemy dialect for CrateDB) or assume it uses the HTTP endpoint on port 4200.
Learnt from: amotl
Repo: crate/cratedb-guide PR: 408
File: docs/connect/elixir.md:18-22
Timestamp: 2025-10-16T10:58:18.401Z
Learning: When connecting to CrateDB using PostgreSQL-compatible drivers like Postgrex (Elixir), the `database` parameter handling may differ from standard PostgreSQL usage because CrateDB has no notion of databases like PostgreSQL does. CrateDB uses schemas to separate concerns instead. The `database` parameter might need to be omitted or handled differently to avoid connection issues.
Learnt from: amotl
Repo: crate/cratedb-guide PR: 0
File: :0-0
Timestamp: 2025-10-06T13:45:25.891Z
Learning: For the cratedb-guide repository, general discussions and broader documentation concerns should be added to the existing runsheet/tracking ticket (e.g., GH-227) rather than creating separate issues.
Learnt from: amotl
Repo: crate/cratedb-guide PR: 0
File: :0-0
Timestamp: 2025-10-06T16:40:16.322Z
Learning: In the crate/cratedb-guide repository, documentation is organized by content type: files in `docs/feature/`, `docs/integrate/`, and `docs/ingest/` are reference material designed for quick access, while `docs/start/` contains learning path content. Reference material sections use top-right reference layouts for efficient navigation.
Learnt from: amotl
Repo: crate/cratedb-guide PR: 395
File: docs/install/cloud/azure/terraform.md:108-127
Timestamp: 2025-10-10T17:58:26.545Z
Learning: In the crate/cratedb-guide repository, during documentation format conversions (e.g., from reStructuredText to MyST/Markdown), markdownlint warnings such as MD053 (unused reference-style links) should be ignored and not flagged as issues.
Learnt from: amotl
Repo: crate/cratedb-guide PR: 406
File: docs/connect/go.md:110-137
Timestamp: 2025-10-20T21:49:18.785Z
Learning: In the cratedb-guide repository, documentation examples for connecting to CrateDB (e.g., Go examples in docs/connect/go.md) should be kept minimal and focused on demonstrating basic functionality, rather than including comprehensive error handling. Full examples with proper error handling are maintained separately in the cratedb-examples repository.
Learnt from: amotl
Repo: crate/cratedb-guide PR: 263
File: docs/integrate/kafka/docker-python.md:112-120
Timestamp: 2025-08-22T18:11:12.776Z
Learning: In documentation and tutorial repositories like cratedb-guide, code examples should prioritize simplicity, clarity, and educational value over production-ready features. Comprehensive error handling, extensive validation, and other production concerns can distract from the main learning objectives and make examples harder to follow. Review suggestions should focus on correctness and clarity rather than production hardening.
Learnt from: bmunkholm
Repo: crate/cratedb-guide PR: 340
File: docs/home/index.md:84-97
Timestamp: 2025-09-25T19:31:54.320Z
Learning: In the CrateDB Guide docs (MyST), the CrateDB Cloud card on the homepage should link to `getting-started` using `:link-type: ref` instead of the previous `cloud:index` intersphinx target. This change was implemented in PR `#340` to direct users to the getting started section rather than directly to the Cloud documentation.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
docs/feature/fdw/index.md (1)
82-84:⚠️ Potential issue | 🟡 MinorUse fully-qualified table name or note search_path requirement.
The query uses
remote_readingswithout schema qualification. Since the foreign table defaults todoc.remote_readings, the query should either use the fully-qualified name or include a note thatdocmust be in thesearch_path.🔧 Proposed fix
SELECT ts, value -FROM remote_readings +FROM doc.remote_readings WHERE device = 'sensor-42';🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/feature/fdw/index.md` around lines 82 - 84, The example query uses remote_readings without a schema, which is ambiguous because the foreign table lives in schema doc; update the example to use the fully-qualified name doc.remote_readings or add a short note that doc must be present in the session search_path before running the query (referencing the table name remote_readings and schema doc to locate the example).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@docs/feature/fdw/index.md`:
- Around line 82-84: The example query uses remote_readings without a schema,
which is ambiguous because the foreign table lives in schema doc; update the
example to use the fully-qualified name doc.remote_readings or add a short note
that doc must be present in the session search_path before running the query
(referencing the table name remote_readings and schema doc to locate the
example).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 68e4482e-3bd6-429a-b51f-02129379c489
📒 Files selected for processing (1)
docs/feature/fdw/index.md
About
Updated FDW page we had in gitbook with minor updates.
Preview
References