Skip to content

floware init script#295

Open
rootflo-hardik wants to merge 4 commits into
developfrom
floware_init_script
Open

floware init script#295
rootflo-hardik wants to merge 4 commits into
developfrom
floware_init_script

Conversation

@rootflo-hardik

@rootflo-hardik rootflo-hardik commented Jun 15, 2026

Copy link
Copy Markdown
Contributor
  • for creating db on startup

Summary by CodeRabbit

  • Chores
    • Updated the container startup flow to use a dedicated initialization script before launching the server.
  • New Features
    • Added optional automatic Postgres database provisioning on startup (set FLOWARE_DB_CREATE=true to create the target database if missing and ensure required extensions are installed).
  • Bug Fixes
    • Improved embedding/vector index creation and cleanup in migrations to make upgrades and rollbacks more reliable.
    • Made token key loading safer by only decoding keys in development mode and handling empty key inputs gracefully.

- for creating db on startup
@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR introduces three independent changes: a floware-init.sh container entrypoint script that optionally initializes the Postgres database and launches the server (wired into the Dockerfile as ENTRYPOINT), a refactored Alembic migration that eliminates AUTOCOMMIT connection mode and CONCURRENTLY index syntax in favor of op.get_bind() with SET LOCAL, and a TokenService update that conditions private/public key loading to development mode only with safe null handling for empty keys.

Changes

Floware Container Init Script

Layer / File(s) Summary
Init script: env setup, DB creation, and server launch
wavefront/server/scripts/floware-init.sh
New shell script: enables set -e, prepends /app/.venv/bin to PATH, conditionally creates the Floware Postgres DB via inline psycopg2 Python guarded by FLOWARE_DB_CREATE=true, connects to target DB_NAME and ensures the vector extension exists, then starts the server with uv run server.py.
Dockerfile: copy, chmod, and ENTRYPOINT wiring
wavefront/server/docker/floware.Dockerfile
Copies floware-init.sh into /app/scripts/, marks it executable, and sets it as the container ENTRYPOINT, replacing the prior CMD-based server startup.

Alembic Migration Index Refactor

Layer / File(s) Summary
Index creation and removal refactoring
wavefront/server/modules/db_repo_module/db_repo_module/alembic/versions/2026_04_10_1000-e8f2a1c3b5d9_add_hnsw_index_on_embeddings.py
upgrade() switches from AUTOCOMMIT connection mode to op.get_bind() direct access, sets maintenance_work_mem via SET LOCAL, and creates three indexes using CREATE INDEX IF NOT EXISTS (two HNSW cosine indexes with explicit vector dimension casts to 512 and 1024, plus a GIN index on token). downgrade() drops the same indexes with DROP INDEX IF EXISTS, removing the prior CONCURRENTLY syntax.

Token Service Dev-Mode Key Loading

Layer / File(s) Summary
Conditional key initialization and safe loading
wavefront/server/modules/auth_module/auth_module/services/token_service.py
TokenService now initializes private_key and public_key only when running in development mode; in production mode both are set to None. The _load_key helper safely returns None for empty/falsy key inputs instead of attempting base64 decoding.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • rootflo/wavefront#274: Modifies the same Alembic migration file 2026_04_10_1000-e8f2a1c3b5d9_add_hnsw_index_on_embeddings.py, refactoring how HNSW/Gin indexes are created and dropped by changing connection mode and index creation concurrency.
  • rootflo/wavefront#234: Adds a container init shell script that conditionally creates the PostgreSQL database via inline psycopg2 before launching the server, using the same pattern as floware-init.sh.

Suggested reviewers

  • vishnurk6247
  • vizsatiz

Poem

🐇 A script boots the container with care,
Creating databases in the Postgres layer fair,
While migrations tune indexes with grace,
And dev-mode keys know their rightful place.
Three changes converge in harmonious flow—
Floware flourishes brighter below! 🌱

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title "floware init script" is vague and generic. While it mentions a component (floware init script), it does not clearly convey the main purpose or scope of the changes, which include database initialization, Dockerfile modifications, Alembic migrations, and token service updates. Consider a more descriptive title such as "Add floware init script for database creation and startup configuration" or "Initialize floware database on startup with KMS configuration" to better reflect the changeset's scope.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch floware_init_script

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@wavefront/server/scripts/floware-init.sh`:
- Around line 11-16: The psycopg2.connect() call is missing the port and
connect_timeout parameters, which will cause failures on non-default PostgreSQL
deployments and potential indefinite hangs on network issues. Add the port
parameter by reading from the DB_PORT environment variable (similar to how
DB_HOST, DB_USERNAME, DB_PASSWORD, and DB_NAME are retrieved) and pass it to the
psycopg2.connect() function. Additionally, add a connect_timeout parameter with
a reasonable bounded value (such as 10 seconds) to prevent the connection from
hanging indefinitely if there are network connectivity problems.
- Around line 19-23: The database creation logic at lines 19-23 has a TOCTOU
vulnerability where concurrent startups can both see the database as missing,
causing one to fail when executing CREATE DATABASE. Replace the
check-then-create pattern (the cur.fetchone() conditional) with a try-except
block that directly attempts to create the database and catches
psycopg2.errors.DuplicateDatabase when another process creates it concurrently.
Additionally, replace the unsafe string concatenation with dbname.replace() with
psycopg2.sql.Identifier() to safely parameterize the database identifier,
eliminating the manual escaping logic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a1402e1d-a097-4c2b-bee8-49a5fcc62863

📥 Commits

Reviewing files that changed from the base of the PR and between 8fb1d9b and 06afbbf.

📒 Files selected for processing (2)
  • wavefront/server/docker/floware.Dockerfile
  • wavefront/server/scripts/floware-init.sh

Comment on lines +11 to +16
host = os.environ['DB_HOST']
user = os.environ['DB_USERNAME']
password = os.environ['DB_PASSWORD']
dbname = os.environ['DB_NAME']

conn = psycopg2.connect(host=host, user=user, password=password, dbname='postgres')

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "floware-init.sh" -type f

Repository: rootflo/wavefront

Length of output: 104


🏁 Script executed:

cat -n ./wavefront/server/scripts/floware-init.sh | head -30

Repository: rootflo/wavefront

Length of output: 1070


🏁 Script executed:

rg "DB_PORT" --type-list | head -20

Repository: rootflo/wavefront

Length of output: 451


🏁 Script executed:

rg "DB_PORT" -A 2 -B 2

Repository: rootflo/wavefront

Length of output: 4821


🏁 Script executed:

rg "connect_timeout" -A 2 -B 2

Repository: rootflo/wavefront

Length of output: 1829


🏁 Script executed:

cat -n ./wavefront/server/packages/flo_cloud/flo_cloud/postgres/postgres.py | grep -A 5 -B 5 "connect_timeout"

Repository: rootflo/wavefront

Length of output: 448


Use DB_PORT and a bounded connect timeout in the admin connection.

The psycopg2.connect() call on line 16 is missing both port and connect_timeout parameters. Since DB_PORT is consumed by the runtime configuration across the codebase (alembic, celery workers, config.ini files), the init script will fail on non-5432 deployments. Additionally, without a bounded connect_timeout, startup can hang indefinitely if there are network issues.

Suggested patch
 import psycopg2, os
 
 host = os.environ['DB_HOST']
+port = os.environ.get('DB_PORT', '5432')
 user = os.environ['DB_USERNAME']
 password = os.environ['DB_PASSWORD']
 dbname = os.environ['DB_NAME']
+connect_timeout = int(os.environ.get('DB_CONNECT_TIMEOUT', '5'))
 
-conn = psycopg2.connect(host=host, user=user, password=password, dbname='postgres')
+conn = psycopg2.connect(
+    host=host,
+    port=port,
+    user=user,
+    password=password,
+    dbname='postgres',
+    connect_timeout=connect_timeout,
+)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
host = os.environ['DB_HOST']
user = os.environ['DB_USERNAME']
password = os.environ['DB_PASSWORD']
dbname = os.environ['DB_NAME']
conn = psycopg2.connect(host=host, user=user, password=password, dbname='postgres')
import psycopg2, os
host = os.environ['DB_HOST']
port = os.environ.get('DB_PORT', '5432')
user = os.environ['DB_USERNAME']
password = os.environ['DB_PASSWORD']
dbname = os.environ['DB_NAME']
connect_timeout = int(os.environ.get('DB_CONNECT_TIMEOUT', '5'))
conn = psycopg2.connect(
host=host,
port=port,
user=user,
password=password,
dbname='postgres',
connect_timeout=connect_timeout,
)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@wavefront/server/scripts/floware-init.sh` around lines 11 - 16, The
psycopg2.connect() call is missing the port and connect_timeout parameters,
which will cause failures on non-default PostgreSQL deployments and potential
indefinite hangs on network issues. Add the port parameter by reading from the
DB_PORT environment variable (similar to how DB_HOST, DB_USERNAME, DB_PASSWORD,
and DB_NAME are retrieved) and pass it to the psycopg2.connect() function.
Additionally, add a connect_timeout parameter with a reasonable bounded value
(such as 10 seconds) to prevent the connection from hanging indefinitely if
there are network connectivity problems.

Comment on lines +19 to +23
cur.execute(\"SELECT 1 FROM pg_database WHERE datname = %s\", (dbname,))

if not cur.fetchone():
cur.execute('CREATE DATABASE \"' + dbname.replace('\"', '\"\"') + '\"')
print('Database created')

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "floware-init.sh" -type f

Repository: rootflo/wavefront

Length of output: 104


🏁 Script executed:

cat -n ./wavefront/server/scripts/floware-init.sh | head -40

Repository: rootflo/wavefront

Length of output: 1143


🏁 Script executed:

python3 << 'PY'
import psycopg2
from psycopg2 import errors

# Check if DuplicateDatabase exists in psycopg2.errors
print("DuplicateDatabase available:", hasattr(errors, 'DuplicateDatabase'))

# Also check sql module availability
from psycopg2 import sql
print("sql.Identifier available:", hasattr(sql, 'Identifier'))
print("sql.SQL available:", hasattr(sql, 'SQL'))

# Verify the exception inheritance
if hasattr(errors, 'DuplicateDatabase'):
    print("DuplicateDatabase base classes:", errors.DuplicateDatabase.__bases__)
PY

Repository: rootflo/wavefront

Length of output: 182


🌐 Web query:

psycopg2 DuplicateDatabase exception psycopg2.errors

💡 Result:

The psycopg2.errors.DuplicateDatabase exception is a specific exception class provided by the psycopg2 library to handle the PostgreSQL SQLSTATE error 42P04 [1][2]. This error occurs when you attempt to execute a CREATE DATABASE statement for a database that already exists [1]. Because PostgreSQL does not support a native IF NOT EXISTS clause for CREATE DATABASE, this exception is commonly used to gracefully handle such attempts in application code [1][3]. Key details: - Module: psycopg2.errors [2][4] - PostgreSQL SQLSTATE: 42P04 [1][5] - Exception Hierarchy: It is a subclass of ProgrammingError [2][6]. Because it inherits from standard DB-API exception classes, catching the broader ProgrammingError will also catch DuplicateDatabase, ensuring backward compatibility with legacy code [7][4]. - Usage Example: import psycopg2 from psycopg2 import errors #... (connection setup) try: with conn.cursor as cur: cur.execute("CREATE DATABASE my_database") except errors.DuplicateDatabase: # Handle the case where the database already exists pass The exception is non-retriable, meaning the statement is immediately aborted, though the database connection remains open and usable [1].

Citations:


🌐 Web query:

psycopg2 sql.Identifier sql.SQL module documentation

💡 Result:

The psycopg2.sql module provides utilities for generating dynamic SQL queries in a safe and convenient manner, specifically for parts of queries—such as table or column names—that cannot be parameterized using standard execute arguments [1][2]. sql.SQL The sql.SQL class represents a literal snippet of an SQL query. It is primarily used to create a template with {} placeholders into which variable parts (such as Identifiers or Literals) can be merged using the format method [1][3]. Key characteristics: - It acts as a Composable object [2]. - The string passed to SQL is not escaped, meaning it should contain only constant SQL code or template structures, never raw user-provided data [1][3]. - It supports join and format methods to build complex queries dynamically [1][3]. Example: query = sql.SQL("SELECT * FROM {}").format(sql.Identifier('my_table')) sql.Identifier The sql.Identifier class represents a PostgreSQL identifier, such as a table name or a column name. It ensures that the identifier is correctly escaped (e.g., by double-quoting) to prevent SQL injection and handle reserved keywords or special characters properly [1][4]. Key characteristics: - It can accept one or more strings to represent a qualified name (a dot-separated sequence of identifiers), such as schema.table [1][4][5]. - Since version 2.8, it supports passing multiple strings directly to the constructor [5][2]. Example: # Represents "schema"."table" table = sql.Identifier('schema', 'table') Together, these objects allow you to maintain query structure while safely injecting dynamic identifiers, keeping them separate from query parameters (which use %s placeholders) [1][5]. top_results: [1][4][5][3][2]

Citations:


🌐 Web query:

PostgreSQL CREATE DATABASE IF NOT EXISTS clause support

💡 Result:

PostgreSQL does not natively support an IF NOT EXISTS clause for the CREATE DATABASE statement [1][2][3]. As of June 2026, the official PostgreSQL documentation for the CREATE DATABASE command does not include this functionality [4][5][6]. Because CREATE DATABASE cannot be executed within a transaction block, you cannot simply wrap it in a standard PL/pgSQL DO block with exception handling to achieve this behavior directly [2]. However, you can simulate this functionality using one of the following approaches: 1. Using psql with \gexec: You can use the psql meta-command \gexec to conditionally execute the CREATE DATABASE command by checking the pg_database system catalog [2][3]: SELECT 'CREATE DATABASE my_database' WHERE NOT EXISTS (SELECT FROM pg_database WHERE datname = 'my_database')\gexec 2. Using an external shell script: You can check for the database's existence via shell commands before attempting to create it [3]: psql -U postgres -tc "SELECT 1 FROM pg_database WHERE datname = 'my_database'" | grep -q 1 || psql -U postgres -c "CREATE DATABASE my_database" 3. Using dblink (for internal procedural logic): If you need to perform this within a stored procedure, you can use the dblink extension to execute the command outside of the current transaction [2]: CREATE EXTENSION IF NOT EXISTS dblink; DO $$ BEGIN PERFORM dblink_exec('', 'CREATE DATABASE my_database'); EXCEPTION WHEN duplicate_database THEN RAISE NOTICE 'Database already exists, skipping.'; END $$;

Citations:


Address the database creation race condition.

Lines 19-23 implement a TOCTOU (time-of-check-time-of-use) vulnerability. Concurrent container startups can both see the database as missing; one will fail on CREATE DATABASE, causing avoidable failure.

Use try-except with psycopg2.errors.DuplicateDatabase instead of the check-then-create pattern. This approach also improves security by replacing string concatenation with sql.Identifier() for safe identifier parameterization.

Suggested patch
-import psycopg2, os
+import psycopg2, os
+from psycopg2 import errors, sql
@@
-cur.execute(\"SELECT 1 FROM pg_database WHERE datname = %s\", (dbname,))
-
-if not cur.fetchone():
-    cur.execute('CREATE DATABASE \"' + dbname.replace('\"', '\"\"') + '\"')
-    print('Database created')
-else:
-    print('Database already exists, skipping')
+try:
+    cur.execute(sql.SQL("CREATE DATABASE {}").format(sql.Identifier(dbname)))
+    print("Database created")
+except errors.DuplicateDatabase:
+    print("Database already exists, skipping")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@wavefront/server/scripts/floware-init.sh` around lines 19 - 23, The database
creation logic at lines 19-23 has a TOCTOU vulnerability where concurrent
startups can both see the database as missing, causing one to fail when
executing CREATE DATABASE. Replace the check-then-create pattern (the
cur.fetchone() conditional) with a try-except block that directly attempts to
create the database and catches psycopg2.errors.DuplicateDatabase when another
process creates it concurrently. Additionally, replace the unsafe string
concatenation with dbname.replace() with psycopg2.sql.Identifier() to safely
parameterize the database identifier, eliminating the manual escaping logic.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@wavefront/server/modules/auth_module/auth_module/services/token_service.py`:
- Around line 39-40: Add explicit validation in the TokenService.__init__ method
after the private_key and public_key are loaded. When self.is_dev is True, check
that both self.private_key and self.public_key are not None; if either is None,
raise an appropriate exception immediately during initialization. This prevents
the None values from propagating into jwt.encode and jwt.decode calls at request
time, ensuring failures are caught at startup rather than during runtime
authentication operations.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7d5b8bce-25dd-47cd-bb10-38759eac1de7

📥 Commits

Reviewing files that changed from the base of the PR and between b36c054 and f183f64.

📒 Files selected for processing (1)
  • wavefront/server/modules/auth_module/auth_module/services/token_service.py

Comment on lines +39 to +40
self.private_key = self._load_key(private_key) if self.is_dev else None
self.public_key = self._load_key(public_key) if self.is_dev else None

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fail fast when dev-mode keys are missing to avoid auth-path runtime failures.

After Line 49-51, empty key config now becomes None, but Line 39-40 stores that None without validation. In dev-mode paths, this can bubble into jwt.encode/jwt.decode and fail at request time instead of startup. Add explicit validation in __init__ when self.is_dev is true.

Suggested fix
         self.is_dev = app_env == 'dev' or (kms_service is None)
         self.private_key = self._load_key(private_key) if self.is_dev else None
         self.public_key = self._load_key(public_key) if self.is_dev else None
+        if self.is_dev and (not self.private_key or not self.public_key):
+            raise ValueError(
+                'private_key and public_key must be configured when running without KMS'
+            )

Also applies to: 49-51

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@wavefront/server/modules/auth_module/auth_module/services/token_service.py`
around lines 39 - 40, Add explicit validation in the TokenService.__init__
method after the private_key and public_key are loaded. When self.is_dev is
True, check that both self.private_key and self.public_key are not None; if
either is None, raise an appropriate exception immediately during
initialization. This prevents the None values from propagating into jwt.encode
and jwt.decode calls at request time, ensuring failures are caught at startup
rather than during runtime authentication operations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant