Skip to content

Fix embeddings perf#157

Open
donhardman wants to merge 25 commits intomasterfrom
fix/embeddings-perf
Open

Fix embeddings perf#157
donhardman wants to merge 25 commits intomasterfrom
fix/embeddings-perf

Conversation

@donhardman
Copy link
Copy Markdown
Member

No description provided.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 20, 2026

Linux debug test results

  8 files    8 suites   14m 45s ⏱️
523 tests 511 ✅ 12 💤 0 ❌
537 runs  525 ✅ 12 💤 0 ❌

Results for commit 0b8835a.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 20, 2026

Linux release test results

  8 files    8 suites   8m 1s ⏱️
523 tests 511 ✅ 12 💤 0 ❌
537 runs  525 ✅ 12 💤 0 ❌

Results for commit 0b8835a.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 20, 2026

Windows test results

  5 files    5 suites   19m 35s ⏱️
501 tests 489 ✅ 12 💤 0 ❌
509 runs  497 ✅ 12 💤 0 ❌

Results for commit 0b8835a.

♻️ This comment has been updated with latest results.

@donhardman donhardman changed the title Fix embeddings per Fix embeddings perf Apr 20, 2026
@donhardman donhardman requested a review from sanikolaev April 21, 2026 09:39
@donhardman donhardman force-pushed the fix/embeddings-perf branch from 5767e6b to 70627da Compare April 22, 2026 17:36
@github-actions
Copy link
Copy Markdown

clt

❌ CLT tests in test/clt-tests/mcl/
✅ OK: 19
❌ Failed: 4
⏳ Duration: 224s
👉 Check Action Results for commit 110802e

Failed tests:

🔧 Edit failed tests in UI:

test/clt-tests/mcl/auto-embeddings-backup-restore.rec
––– input –––
rm -f /var/log/manticore/searchd.log; stdbuf -oL searchd $SEARCHD_FLAGS > /dev/null; if timeout 10 grep -qm1 '\[BUDDY\] started' <(tail -n 1000 -f /var/log/manticore/searchd.log); then echo 'Buddy started!'; else echo 'Timeout or failed!'; cat /var/log/manticore/searchd.log;fi
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_backup (
    title TEXT,
    content TEXT,
    status INTEGER,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='title, content'
) engine='columnar'"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_backup (id, title, content, status) VALUES
    (1, 'machine learning', 'neural networks', 1),
    (2, 'deep learning', 'transformers', 1),
    (3, 'computer vision', 'image processing', 2)"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "FLUSH RAMCHUNK test_backup; OPTIMIZE TABLE test_backup OPTION sync=1, cutoff=1"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT id FROM test_backup WHERE KNN(vec, 2, 'artificial intelligence')"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -E -e "SELECT id, title, content, KNN_DIST() as distance FROM test_backup WHERE KNN(vec, 3, 'artificial intelligence') ORDER BY distance"
––– output –––
OK
––– input –––
manticore-backup --version | grep -c "Manticore Backup"
––– output –––
OK
––– input –––
mkdir -p /tmp/backup && chmod 777 /tmp/backup; echo $?
––– output –––
OK
––– input –––
manticore-backup --backup-dir=/tmp/backup --tables=test_backup 2>&1 | grep -c "Backing up table"
––– output –––
OK
––– input –––
ls -d /tmp/backup/backup-* | wc -l
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "FREEZE test_backup"
––– output –––
+-----------------------------------------------------+-----------------------------------------------------+
| file                                                | normalized                                          |
+-----------------------------------------------------+-----------------------------------------------------+
| /var/lib/manticore/test_backup/test_backup.0.spc    | /var/lib/manticore/test_backup/test_backup.0.spc    |
| /var/lib/manticore/test_backup/test_backup.0.spd    | /var/lib/manticore/test_backup/test_backup.0.spd    |
| /var/lib/manticore/test_backup/test_backup.0.spds   | /var/lib/manticore/test_backup/test_backup.0.spds   |
| /var/lib/manticore/test_backup/test_backup.0.spe    | /var/lib/manticore/test_backup/test_backup.0.spe    |
| /var/lib/manticore/test_backup/test_backup.0.sph    | /var/lib/manticore/test_backup/test_backup.0.sph    |
| /var/lib/manticore/test_backup/test_backup.0.sphi   | /var/lib/manticore/test_backup/test_backup.0.sphi   |
| /var/lib/manticore/test_backup/test_backup.0.spi    | /var/lib/manticore/test_backup/test_backup.0.spi    |
- | /var/lib/manticore/test_backup/test_backup.0.spidx  | /var/lib/manticore/test_backup/test_backup.0.spidx  |
+ | /var/lib/manticore/test_backup/test_backup.0.spknn  | /var/lib/manticore/test_backup/test_backup.0.spknn  |
- | /var/lib/manticore/test_backup/test_backup.0.spknn  | /var/lib/manticore/test_backup/test_backup.0.spknn  |
+ | /var/lib/manticore/test_backup/test_backup.0.spm    | /var/lib/manticore/test_backup/test_backup.0.spm    |
- | /var/lib/manticore/test_backup/test_backup.0.spm    | /var/lib/manticore/test_backup/test_backup.0.spm    |
+ | /var/lib/manticore/test_backup/test_backup.0.spp    | /var/lib/manticore/test_backup/test_backup.0.spp    |
- | /var/lib/manticore/test_backup/test_backup.0.spp    | /var/lib/manticore/test_backup/test_backup.0.spp    |
+ | /var/lib/manticore/test_backup/test_backup.0.spt    | /var/lib/manticore/test_backup/test_backup.0.spt    |
- | /var/lib/manticore/test_backup/test_backup.0.spt    | /var/lib/manticore/test_backup/test_backup.0.spt    |
+ | /var/lib/manticore/test_backup/test_backup.meta     | /var/lib/manticore/test_backup/test_backup.meta     |
- | /var/lib/manticore/test_backup/test_backup.meta     | /var/lib/manticore/test_backup/test_backup.meta     |
+ | /var/lib/manticore/test_backup/test_backup.settings | /var/lib/manticore/test_backup/test_backup.settings |
- | /var/lib/manticore/test_backup/test_backup.settings | /var/lib/manticore/test_backup/test_backup.settings |
+ +-----------------------------------------------------+-----------------------------------------------------+
- +-----------------------------------------------------+-----------------------------------------------------+
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM test_backup"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_backup (id, title, content, status) VALUES (4, 'frozen insert', 'test data', 3)"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "UNFREEZE test_backup"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM test_backup"
––– output –––
OK
––– input –––
mysqldump -h0 -P9306 manticore test_backup > /tmp/logical_backup.sql 2>/dev/null; echo $?
––– output –––
OK
––– input –––
grep -c "INSERT INTO" /tmp/logical_backup.sql
––– output –––
OK
––– input –––
searchd --stopwait > /dev/null 2>&1; echo $?
––– output –––
OK
––– input –––
rm -f /etc/manticoresearch/manticore.conf; rm -rf /var/lib/manticore/*; echo "Cleaned for restore"
––– output –––
OK
––– input –––
manticore-backup --backup-dir=/tmp/backup --restore 2>&1 | grep -c "backup-"
––– output –––
OK
––– input –––
BACKUP_NAME=$(manticore-backup --backup-dir=/tmp/backup --restore 2>&1 | grep backup- | awk '{print $1}' | head -1)
manticore-backup --backup-dir=/tmp/backup --restore=$BACKUP_NAME 2>&1 | grep -c "Starting to restore"
––– output –––
- 1
+ 0
––– input –––
searchd > /dev/null 2>&1; echo $?
––– output –––
- 0
+ 1
––– input –––
echo "Waiting for searchd to start"; sleep 3
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM test_backup"
––– output –––
- +----------+
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
- | count(*) |
- +----------+
- |        3 |
- +----------+
––– input –––
mysql -h0 -P9306 -e "FLUSH RAMCHUNK test_backup; OPTIMIZE TABLE test_backup OPTION sync=1, cutoff=1"; echo $?
––– output –––
- 0
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
+ 1
––– input –––
mysql -h0 -P9306 -e "SELECT id FROM test_backup WHERE KNN(vec, 2, 'artificial intelligence')"
––– output –––
- +------+
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
- | id   |
- +------+
- |    1 |
- |    3 |
- |    2 |
- +------+
––– input –––
mysql -h0 -P9306 -e "ALTER TABLE test_backup ADD COLUMN new_field INTEGER"; echo $?
––– output –––
- 0
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
+ 1
––– input –––
mysql -h0 -P9306 -e "DESC test_backup" | grep "new_field"
––– output –––
- | new_field | uint         | columnar fast_fetch     |
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_copy (
    title TEXT,
    content TEXT,
    status INTEGER,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='title, content'
) engine='columnar'"; echo $?
––– output –––
- 0
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
+ 1
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_copy (id, title, content, status) VALUES
    (1, 'machine learning', 'neural networks', 1),
    (2, 'deep learning', 'transformers', 1),
    (3, 'computer vision', 'image processing', 2),
    (4, 'frozen insert', 'test data', 3)"; echo $?
––– output –––
- 0
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
+ 1
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM test_copy"
––– output –––
- +----------+
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
- | count(*) |
- +----------+
- |        4 |
- +----------+
––– input –––
mysql -h0 -P9306 -e "SELECT id FROM test_copy WHERE KNN(vec, 2, 'artificial intelligence')"
––– output –––
- +------+
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
- | id   |
- +------+
- |    1 |
- |    3 |
- |    2 |
- |    4 |
- +------+
––– input –––
mysql -h0 -P9306 -e "FLUSH RAMCHUNK test_copy; OPTIMIZE TABLE test_copy OPTION sync=1, cutoff=1"; echo $?
––– output –––
- 0
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
+ 1
––– input –––
mysql -h0 -P9306 -e "SELECT id FROM test_copy WHERE KNN(vec, 2, 'artificial intelligence')"
––– output –––
- +------+
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
- | id   |
- +------+
- |    1 |
- |    3 |
- |    2 |
- |    4 |
- +------+
test/clt-tests/mcl/auto-embeddings-syntax-check.rec
––– input –––
rm -f /var/log/manticore/searchd.log; stdbuf -oL searchd --stopwait > /dev/null; stdbuf -oL searchd ${SEARCHD_ARGS:-} > /dev/null
––– output –––
OK
––– input –––
if timeout 10 grep -qm1 'accepting connections' <(tail -n 1000 -f /var/log/manticore/searchd.log); then echo 'Accepting connections!'; else echo 'Timeout or failed!'; fi
––– output –––
OK
––– input –––
cosine_similarity() {
    local file1="$1" file2="$2"

    awk '
    NR==FNR { a[NR]=$1; suma2+=$1*$1; next }
    {
        dot += a[FNR]*$1
        sumb2 += $1*$1
    }
    END {
        print dot / (sqrt(suma2) * sqrt(sumb2))
    }' "$file1" "$file2"
}
––– output –––
OK
––– input –––
export -f cosine_similarity
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_invalid_model (title TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'voyage/invalid-model-name-12345' FROM = 'title') " 2>&1
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_valid_model_no_api_key (title TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'voyage/voyage-3.5-lite' FROM = 'title') " 2>&1
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_voyage_remote (title TEXT, content TEXT, description TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'voyage/voyage-3.5-lite' FROM = 'title, content' API_KEY='${VOYAGE_API_KEY}') "; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -E -e "SHOW CREATE TABLE test_voyage_remote"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_voyage_remote (id, title, content, description) VALUES(1, 'machine learning algorithms', 'deep neural networks and artificial intelligence', 'advanced AI research')"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) as record_count FROM test_voyage_remote WHERE id=1"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_voyage_remote (id, title, content, description) VALUES(2, 'machine learning algorithms', 'deep neural networks and artificial intelligence', 'different description')"

mysql -h0 -P9306 -e "SELECT embedding FROM test_voyage_remote WHERE id=1" | \
    grep -v embedding | \
    sed 's/[0-9]\+\(\.[0-9]\+\)\?/\n&\n/g' | \
    grep -E '^[0-9]+(\.[0-9]+)?$' | \
    awk '{printf "%.5f\n", $1}' > /tmp/vector1.txt

mysql -h0 -P9306 -e "SELECT embedding FROM test_voyage_remote WHERE id=2" | \
    grep -v embedding | \
    sed 's/[0-9]\+\(\.[0-9]\+\)\?/\n&\n/g' | \
    grep -E '^[0-9]+(\.[0-9]+)?$' | \
    awk '{printf "%.5f\n", $1}' > /tmp/vector2.txt

SIMILARITY=$(cosine_similarity /tmp/vector1.txt /tmp/vector2.txt)

echo "Cosine similarity: $SIMILARITY"

RESULT=$(awk -v sim="$SIMILARITY" 'BEGIN {
    if (sim > 0.99)
        print "SUCCESS: Same FROM fields produce similar vectors (similarity: " sim ")"
    else
        print "FAIL: Different vectors (FROM does not include description field and should not change generated vector value) (similarity: " sim ")"
}')

echo "$RESULT"

rm -f /tmp/vector1.txt /tmp/vector2.txt
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_voyage_title_only (title TEXT, content TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'voyage/voyage-3.5-lite' FROM = 'title' API_KEY='${VOYAGE_API_KEY}') "; mysql -h0 -P9306 -e "INSERT INTO test_voyage_title_only (id, title, content) VALUES(1, 'machine learning algorithms', 'completely different content here')"; MD5_MULTI=$(mysql -h0 -P9306 -e "SELECT embedding FROM test_voyage_remote WHERE id=1" | grep -v embedding | md5sum | awk '{print $1}'); MD5_SINGLE=$(mysql -h0 -P9306 -e "SELECT embedding FROM test_voyage_title_only WHERE id=1" | grep -v embedding | md5sum | awk '{print $1}'); echo "multi_field_md5: $MD5_MULTI"; echo "single_field_md5: $MD5_SINGLE"; if [ "$MD5_MULTI" != "$MD5_SINGLE" ]; then echo "SUCCESS: Different FROM specifications produce different vectors"; else echo "INFO: FROM field comparison result"; fi
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test__invalid_field (title TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'voyage/text-embedding-ada-002' FROM = 'nonexistent_field') " 2>&1
––– output –––
OK
––– input –––
if mysql -h0 -P9306 -e "SHOW TABLES LIKE 'test_voyage_no_from'" | grep -q test_voyage_no_from; then mysql -h0 -P9306 -e "INSERT INTO test__no_from (id, title, embedding) VALUES(1, 'test title', '(0.1, 0.2, 0.3, 0.4, 0.5)')"; echo "insert_result: $?"; else echo "insert_result: skipped (table not created)"; fi
––– output –––
OK
––– input –––
if mysql -h0 -P9306 -e "SHOW TABLES LIKE 'test__no_from'" | grep -q test_voyage_no_from; then mysql -h0 -P9306 -e "SHOW CREATE TABLE test_voyage_no_from"; else echo "table_structure: skipped (table not created)"; fi
––– output –––
OK
––– input –––
if [ -n "$VOYAGE_API_KEY" ] && [ "$VOYAGE_API_KEY" != "dummy_key_for_testing" ]; then echo "API key is available for testing"; else echo "API key not available - using dummy for error testing"; fi
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT id, knn_dist() FROM test_voyage_remote WHERE knn(embedding, 3, 'machine learning and artificial intelligence')\G"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) as count FROM test_voyage_remote WHERE knn(embedding, 5, 'technology and AI') AND id > 0"
––– output –––
OK
––– input –––
API_KEY_VAL="${VOYAGE_API_KEY}"; cat > /etc/manticoresearch/manticore.conf << CONFEOF
searchd {
    listen = 127.0.0.1:9306:mysql41
    listen = 127.0.0.1:9308:http
    log = /var/log/manticore/searchd.log
    pid_file = /var/run/manticore/searchd.pid
}

table test_voyage_plain {
    type = rt
    path = /var/lib/manticore/test_voyage_plain
    rt_field = title
    rt_field = content
    rt_attr_float_vector = embedding
    knn = {"attrs":[{"name":"embedding","type":"hnsw","hnsw_similarity":"L2","hnsw_m":16,"hnsw_ef_construction":200,"model_name":"voyage/voyage-3.5-lite","from":"title,content","api_key":"${API_KEY_VAL}"}]}
}
CONFEOF
––– output –––
OK
––– input –––
searchd --stopwait --quiet
––– output –––
OK
––– input –––
rm -f /var/log/manticore/searchd.log; stdbuf -oL searchd --stopwait > /dev/null; stdbuf -oL searchd ${SEARCHD_ARGS:-} > /dev/null
––– output –––
OK
––– input –––
if timeout 10 grep -qm1 'accepting connections' <(tail -n 1000 -f /var/log/manticore/searchd.log); then echo 'Accepting connections!'; else echo 'Timeout or failed!'; fi
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SHOW TABLES"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_voyage_plain (id, title, content) VALUES(1, 'bread', 'food item'), (2, 'cat', 'animal pet')"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) as count FROM test_voyage_plain"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -E -e "SELECT id, title FROM test_voyage_plain WHERE knn(embedding, 2, 'dog')"
––– output –––
OK
––– input –––
cat > /etc/manticoresearch/manticore.conf << 'EOF'
searchd {
    listen = 127.0.0.1:9306:mysql41
    listen = 127.0.0.1:9308:http
    log = /var/log/manticore/searchd.log
    pid_file = /var/run/manticore/searchd.pid
}

table test_voyage_no_key {
    type = rt
    path = /var/lib/manticore/test_voyage_no_key
    rt_field = title
    rt_attr_float_vector = embedding
    knn = {"attrs":[{"name":"embedding","type":"hnsw","hnsw_similarity":"L2","model_name":"voyage/voyage-3.5-lite","from":"title"}]}
}
EOF
––– output –––
OK
––– input –––
searchd --stopwait --quiet
––– output –––
OK
––– input –––
searchd 2>&1|grep WARNING
––– output –––
- WARNING: table 'test_voyage_no_key': prealloc: Invalid API key for remote model - NOT SERVING
+ [Wed Apr 22 18:19:22.112 2026] [134] WARNING: Error initializing secondary index: daemon requires secondary library v19 (trying to load v20)
+ WARNING: table 'test_voyage_no_key': prealloc: Invalid API key for remote model - NOT SERVING
test/clt-tests/mcl/auto-embeddings-voyage-remote.rec
––– input –––
rm -f /var/log/manticore/searchd.log; stdbuf -oL searchd --stopwait > /dev/null; stdbuf -oL searchd ${SEARCHD_ARGS:-} > /dev/null
––– output –––
OK
––– input –––
if timeout 10 grep -qm1 'accepting connections' <(tail -n 1000 -f /var/log/manticore/searchd.log); then echo 'Accepting connections!'; else echo 'Timeout or failed!'; fi
––– output –––
OK
––– input –––
cosine_similarity() {
    local file1="$1" file2="$2"

    awk '
    NR==FNR { a[NR]=$1; suma2+=$1*$1; next }
    {
        dot += a[FNR]*$1
        sumb2 += $1*$1
    }
    END {
        print dot / (sqrt(suma2) * sqrt(sumb2))
    }' "$file1" "$file2"
}
––– output –––
OK
––– input –––
export -f cosine_similarity
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_invalid_model (title TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'voyage/invalid-model-name-12345' FROM = 'title') " 2>&1
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_valid_model_no_api_key (title TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'voyage/voyage-3.5-lite' FROM = 'title') " 2>&1
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_voyage_remote (title TEXT, content TEXT, description TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'voyage/voyage-3.5-lite' FROM = 'title, content' API_KEY='${VOYAGE_API_KEY}') "; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -E -e "SHOW CREATE TABLE test_voyage_remote"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_voyage_remote (id, title, content, description) VALUES(1, 'machine learning algorithms', 'deep neural networks and artificial intelligence', 'advanced AI research')"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) as record_count FROM test_voyage_remote WHERE id=1"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_voyage_remote (id, title, content, description) VALUES(2, 'machine learning algorithms', 'deep neural networks and artificial intelligence', 'different description')"

mysql -h0 -P9306 -e "SELECT embedding FROM test_voyage_remote WHERE id=1" | \
    grep -v embedding | \
    sed 's/[0-9]\+\(\.[0-9]\+\)\?/\n&\n/g' | \
    grep -E '^[0-9]+(\.[0-9]+)?$' | \
    awk '{printf "%.5f\n", $1}' > /tmp/vector1.txt

mysql -h0 -P9306 -e "SELECT embedding FROM test_voyage_remote WHERE id=2" | \
    grep -v embedding | \
    sed 's/[0-9]\+\(\.[0-9]\+\)\?/\n&\n/g' | \
    grep -E '^[0-9]+(\.[0-9]+)?$' | \
    awk '{printf "%.5f\n", $1}' > /tmp/vector2.txt

SIMILARITY=$(cosine_similarity /tmp/vector1.txt /tmp/vector2.txt)

echo "Cosine similarity: $SIMILARITY"

RESULT=$(awk -v sim="$SIMILARITY" 'BEGIN {
    if (sim > 0.99)
        print "SUCCESS: Same FROM fields produce similar vectors (similarity: " sim ")"
    else
        print "FAIL: Different vectors (FROM does not include description field and should not change generated vector value) (similarity: " sim ")"
}')

echo "$RESULT"

rm -f /tmp/vector1.txt /tmp/vector2.txt
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_voyage_title_only (title TEXT, content TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'voyage/voyage-3.5-lite' FROM = 'title' API_KEY='${VOYAGE_API_KEY}') "; mysql -h0 -P9306 -e "INSERT INTO test_voyage_title_only (id, title, content) VALUES(1, 'machine learning algorithms', 'completely different content here')"; MD5_MULTI=$(mysql -h0 -P9306 -e "SELECT embedding FROM test_voyage_remote WHERE id=1" | grep -v embedding | md5sum | awk '{print $1}'); MD5_SINGLE=$(mysql -h0 -P9306 -e "SELECT embedding FROM test_voyage_title_only WHERE id=1" | grep -v embedding | md5sum | awk '{print $1}'); echo "multi_field_md5: $MD5_MULTI"; echo "single_field_md5: $MD5_SINGLE"; if [ "$MD5_MULTI" != "$MD5_SINGLE" ]; then echo "SUCCESS: Different FROM specifications produce different vectors"; else echo "INFO: FROM field comparison result"; fi
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test__invalid_field (title TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'voyage/text-embedding-ada-002' FROM = 'nonexistent_field') " 2>&1
––– output –––
OK
––– input –––
if mysql -h0 -P9306 -e "SHOW TABLES LIKE 'test_voyage_no_from'" | grep -q test_voyage_no_from; then mysql -h0 -P9306 -e "INSERT INTO test__no_from (id, title, embedding) VALUES(1, 'test title', '(0.1, 0.2, 0.3, 0.4, 0.5)')"; echo "insert_result: $?"; else echo "insert_result: skipped (table not created)"; fi
––– output –––
OK
––– input –––
if mysql -h0 -P9306 -e "SHOW TABLES LIKE 'test__no_from'" | grep -q test_voyage_no_from; then mysql -h0 -P9306 -e "SHOW CREATE TABLE test_voyage_no_from"; else echo "table_structure: skipped (table not created)"; fi
––– output –––
OK
––– input –––
if [ -n "$VOYAGE_API_KEY" ] && [ "$VOYAGE_API_KEY" != "dummy_key_for_testing" ]; then echo "API key is available for testing"; else echo "API key not available - using dummy for error testing"; fi
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT id, knn_dist() FROM test_voyage_remote WHERE knn(embedding, 3, 'machine learning and artificial intelligence')\G"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) as count FROM test_voyage_remote WHERE knn(embedding, 5, 'technology and AI') AND id > 0"
––– output –––
OK
––– input –––
API_KEY_VAL="${VOYAGE_API_KEY}"; cat > /etc/manticoresearch/manticore.conf << CONFEOF
searchd {
    listen = 127.0.0.1:9306:mysql41
    listen = 127.0.0.1:9308:http
    log = /var/log/manticore/searchd.log
    pid_file = /var/run/manticore/searchd.pid
}

table test_voyage_plain {
    type = rt
    path = /var/lib/manticore/test_voyage_plain
    rt_field = title
    rt_field = content
    rt_attr_float_vector = embedding
    knn = {"attrs":[{"name":"embedding","type":"hnsw","hnsw_similarity":"L2","hnsw_m":16,"hnsw_ef_construction":200,"model_name":"voyage/voyage-3.5-lite","from":"title,content","api_key":"${API_KEY_VAL}"}]}
}
CONFEOF
––– output –––
OK
––– input –––
searchd --stopwait --quiet
––– output –––
OK
––– input –––
rm -f /var/log/manticore/searchd.log; stdbuf -oL searchd --stopwait > /dev/null; stdbuf -oL searchd ${SEARCHD_ARGS:-} > /dev/null
––– output –––
OK
––– input –––
if timeout 10 grep -qm1 'accepting connections' <(tail -n 1000 -f /var/log/manticore/searchd.log); then echo 'Accepting connections!'; else echo 'Timeout or failed!'; fi
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SHOW TABLES"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_voyage_plain (id, title, content) VALUES(1, 'bread', 'food item'), (2, 'cat', 'animal pet')"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) as count FROM test_voyage_plain"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -E -e "SELECT id, title FROM test_voyage_plain WHERE knn(embedding, 2, 'dog')"
––– output –––
OK
––– input –––
cat > /etc/manticoresearch/manticore.conf << 'EOF'
searchd {
    listen = 127.0.0.1:9306:mysql41
    listen = 127.0.0.1:9308:http
    log = /var/log/manticore/searchd.log
    pid_file = /var/run/manticore/searchd.pid
}

table test_voyage_no_key {
    type = rt
    path = /var/lib/manticore/test_voyage_no_key
    rt_field = title
    rt_attr_float_vector = embedding
    knn = {"attrs":[{"name":"embedding","type":"hnsw","hnsw_similarity":"L2","model_name":"voyage/voyage-3.5-lite","from":"title"}]}
}
EOF
––– output –––
OK
––– input –––
searchd --stopwait --quiet
––– output –––
OK
––– input –––
searchd 2>&1|grep WARNING
––– output –––
- WARNING: table 'test_voyage_no_key': prealloc: Invalid API key for remote model - NOT SERVING
+ [Wed Apr 22 18:19:27.961 2026] [134] WARNING: Error initializing secondary index: daemon requires secondary library v19 (trying to load v20)
+ WARNING: table 'test_voyage_no_key': prealloc: Invalid API key for remote model - NOT SERVING
test/clt-tests/mcl/auto-embeddings-openai-remote.rec
––– input –––
rm -f /var/log/manticore/searchd.log; stdbuf -oL searchd --stopwait > /dev/null; stdbuf -oL searchd ${SEARCHD_ARGS:-} > /dev/null
––– output –––
OK
––– input –––
if timeout 10 grep -qm1 'accepting connections' <(tail -n 1000 -f /var/log/manticore/searchd.log); then echo 'Accepting connections!'; else echo 'Timeout or failed!'; fi
––– output –––
OK
––– input –––
cosine_similarity() {
    local file1="$1" file2="$2"

    awk '
    NR==FNR { a[NR]=$1; suma2+=$1*$1; next }
    {
        dot += a[FNR]*$1
        sumb2 += $1*$1
    }
    END {
        print dot / (sqrt(suma2) * sqrt(sumb2))
    }' "$file1" "$file2"
}
––– output –––
OK
––– input –––
export -f cosine_similarity
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_invalid_model (title TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'openai/invalid-model-name-12345' FROM = 'title') " 2>&1
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_valid_model_no_api_key (title TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'openai/text-embedding-ada-002' FROM = 'title') " 2>&1
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_openai_remote (title TEXT, content TEXT, description TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'openai/text-embedding-ada-002' FROM = 'title, content' API_KEY='${OPENAI_API_KEY}') "; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SHOW CREATE TABLE test_openai_remote"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_openai_remote (id, title, content, description) VALUES(1, 'machine learning algorithms', 'deep neural networks and artificial intelligence', 'advanced AI research')"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) as record_count FROM test_openai_remote WHERE id=1"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_openai_remote (id, title, content, description) VALUES(2, 'machine learning algorithms', 'deep neural networks and artificial intelligence', 'different description')"

mysql -h0 -P9306 -e "SELECT embedding FROM test_openai_remote WHERE id=1" | \
    grep -v embedding | \
    sed 's/[0-9]\+\(\.[0-9]\+\)\?/\n&\n/g' | \
    grep -E '^[0-9]+(\.[0-9]+)?$' | \
    awk '{printf "%.5f\n", $1}' > /tmp/vector1.txt

mysql -h0 -P9306 -e "SELECT embedding FROM test_openai_remote WHERE id=2" | \
    grep -v embedding | \
    sed 's/[0-9]\+\(\.[0-9]\+\)\?/\n&\n/g' | \
    grep -E '^[0-9]+(\.[0-9]+)?$' | \
    awk '{printf "%.5f\n", $1}' > /tmp/vector2.txt

SIMILARITY=$(cosine_similarity /tmp/vector1.txt /tmp/vector2.txt)

echo "Cosine similarity: $SIMILARITY"

RESULT=$(awk -v sim="$SIMILARITY" 'BEGIN {
    if (sim > 0.99)
        print "SUCCESS: Same FROM fields produce similar vectors (similarity: " sim ")"
    else
        print "FAIL: Different vectors (FROM does not include description field and should not change generated vector value) (similarity: " sim ")"
}')

echo "$RESULT"

rm -f /tmp/vector1.txt /tmp/vector2.txt
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_openai_title_only (title TEXT, content TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'openai/text-embedding-ada-002' FROM = 'title' API_KEY='${OPENAI_API_KEY}') "; mysql -h0 -P9306 -e "INSERT INTO test_openai_title_only (id, title, content) VALUES(1, 'machine learning algorithms', 'completely different content here')"; MD5_MULTI=$(mysql -h0 -P9306 -e "SELECT embedding FROM test_openai_remote WHERE id=1" | grep -v embedding | md5sum | awk '{print $1}'); MD5_SINGLE=$(mysql -h0 -P9306 -e "SELECT embedding FROM test_openai_title_only WHERE id=1" | grep -v embedding | md5sum | awk '{print $1}'); echo "multi_field_md5: $MD5_MULTI"; echo "single_field_md5: $MD5_SINGLE"; if [ "$MD5_MULTI" != "$MD5_SINGLE" ]; then echo "SUCCESS: Different FROM specifications produce different vectors"; else echo "INFO: FROM field comparison result"; fi
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_openai_invalid_field (title TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'openai/text-embedding-ada-002' FROM = 'nonexistent_field') " 2>&1
––– output –––
OK
––– input –––
if mysql -h0 -P9306 -e "SHOW TABLES LIKE 'test_openai_no_from'" | grep -q test_openai_no_from; then mysql -h0 -P9306 -e "INSERT INTO test_openai_no_from (id, title, embedding) VALUES(1, 'test title', '(0.1, 0.2, 0.3, 0.4, 0.5)')"; echo "insert_result: $?"; else echo "insert_result: skipped (table not created)"; fi
––– output –––
OK
––– input –––
if mysql -h0 -P9306 -e "SHOW TABLES LIKE 'test_openai_no_from'" | grep -q test_openai_no_from; then mysql -h0 -P9306 -e "SHOW CREATE TABLE test_openai_no_from"; else echo "table_structure: skipped (table not created)"; fi
––– output –––
OK
––– input –––
if [ -n "$OPENAI_API_KEY" ] && [ "$OPENAI_API_KEY" != "dummy_key_for_testing" ]; then echo "API key is available for testing"; else echo "API key not available - using dummy for error testing"; fi
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT id, knn_dist() FROM test_openai_remote WHERE knn(embedding, 3, 'machine learning and artificial intelligence')\G"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) as count FROM test_openai_remote WHERE knn(embedding, 5, 'technology and AI') AND id > 0"
––– output –––
OK
––– input –––
API_KEY_VAL="${OPENAI_API_KEY}"; cat > /etc/manticoresearch/manticore.conf << CONFEOF
searchd {
    listen = 127.0.0.1:9306:mysql41
    listen = 127.0.0.1:9308:http
    log = /var/log/manticore/searchd.log
    pid_file = /var/run/manticore/searchd.pid
}

table test_openai_plain {
    type = rt
    path = /var/lib/manticore/test_openai_plain
    rt_field = title
    rt_field = content
    rt_attr_float_vector = embedding
    knn = {"attrs":[{"name":"embedding","type":"hnsw","hnsw_similarity":"L2","hnsw_m":16,"hnsw_ef_construction":200,"model_name":"openai/text-embedding-ada-002","from":"title,content","api_key":"${API_KEY_VAL}"}]}
}
CONFEOF
––– output –––
OK
––– input –––
searchd --stopwait --quiet
––– output –––
OK
––– input –––
rm -f /var/log/manticore/searchd.log; stdbuf -oL searchd --stopwait > /dev/null; stdbuf -oL searchd ${SEARCHD_ARGS:-} > /dev/null
––– output –––
OK
––– input –––
if timeout 10 grep -qm1 'accepting connections' <(tail -n 1000 -f /var/log/manticore/searchd.log); then echo 'Accepting connections!'; else echo 'Timeout or failed!'; fi
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SHOW TABLES"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_openai_plain (id, title, content) VALUES(1, 'bread', 'food item'), (2, 'cat', 'animal pet')"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) as count FROM test_openai_plain"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -E -e "SELECT id, title FROM test_openai_plain WHERE knn(embedding, 2, 'dog')"
––– output –––
OK
––– input –––
cat > /etc/manticoresearch/manticore.conf << 'EOF'
searchd {
    listen = 127.0.0.1:9306:mysql41
    listen = 127.0.0.1:9308:http
    log = /var/log/manticore/searchd.log
    pid_file = /var/run/manticore/searchd.pid
}

table test_openai_no_key {
    type = rt
    path = /var/lib/manticore/test_openai_no_key
    rt_field = title
    rt_attr_float_vector = embedding
    knn = {"attrs":[{"name":"embedding","type":"hnsw","hnsw_similarity":"L2","model_name":"openai/text-embedding-ada-002","from":"title"}]}
}
EOF
––– output –––
OK
––– input –––
searchd --stopwait --quiet
––– output –––
OK
––– input –––
searchd 2>&1|grep WARNING
––– output –––
- WARNING: table 'test_openai_no_key': prealloc: Invalid API key for remote model - NOT SERVING
+ [Wed Apr 22 18:18:24.995 2026] [134] WARNING: Error initializing secondary index: daemon requires secondary library v19 (trying to load v20)
+ WARNING: table 'test_openai_no_key': prealloc: Invalid API key for remote model - NOT SERVING

@github-actions
Copy link
Copy Markdown

clt

❌ CLT tests in test/clt-tests/mcl/
✅ OK: 19
❌ Failed: 4
⏳ Duration: 248s
👉 Check Action Results for commit 8b2932b

Failed tests:

🔧 Edit failed tests in UI:

test/clt-tests/mcl/auto-embeddings-backup-restore.rec
––– input –––
rm -f /var/log/manticore/searchd.log; stdbuf -oL searchd $SEARCHD_FLAGS > /dev/null; if timeout 10 grep -qm1 '\[BUDDY\] started' <(tail -n 1000 -f /var/log/manticore/searchd.log); then echo 'Buddy started!'; else echo 'Timeout or failed!'; cat /var/log/manticore/searchd.log;fi
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_backup (
    title TEXT,
    content TEXT,
    status INTEGER,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='title, content'
) engine='columnar'"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_backup (id, title, content, status) VALUES
    (1, 'machine learning', 'neural networks', 1),
    (2, 'deep learning', 'transformers', 1),
    (3, 'computer vision', 'image processing', 2)"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "FLUSH RAMCHUNK test_backup; OPTIMIZE TABLE test_backup OPTION sync=1, cutoff=1"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT id FROM test_backup WHERE KNN(vec, 2, 'artificial intelligence')"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -E -e "SELECT id, title, content, KNN_DIST() as distance FROM test_backup WHERE KNN(vec, 3, 'artificial intelligence') ORDER BY distance"
––– output –––
OK
––– input –––
manticore-backup --version | grep -c "Manticore Backup"
––– output –––
OK
––– input –––
mkdir -p /tmp/backup && chmod 777 /tmp/backup; echo $?
––– output –––
OK
––– input –––
manticore-backup --backup-dir=/tmp/backup --tables=test_backup 2>&1 | grep -c "Backing up table"
––– output –––
OK
––– input –––
ls -d /tmp/backup/backup-* | wc -l
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "FREEZE test_backup"
––– output –––
+-----------------------------------------------------+-----------------------------------------------------+
| file                                                | normalized                                          |
+-----------------------------------------------------+-----------------------------------------------------+
| /var/lib/manticore/test_backup/test_backup.0.spc    | /var/lib/manticore/test_backup/test_backup.0.spc    |
| /var/lib/manticore/test_backup/test_backup.0.spd    | /var/lib/manticore/test_backup/test_backup.0.spd    |
| /var/lib/manticore/test_backup/test_backup.0.spds   | /var/lib/manticore/test_backup/test_backup.0.spds   |
| /var/lib/manticore/test_backup/test_backup.0.spe    | /var/lib/manticore/test_backup/test_backup.0.spe    |
| /var/lib/manticore/test_backup/test_backup.0.sph    | /var/lib/manticore/test_backup/test_backup.0.sph    |
| /var/lib/manticore/test_backup/test_backup.0.sphi   | /var/lib/manticore/test_backup/test_backup.0.sphi   |
| /var/lib/manticore/test_backup/test_backup.0.spi    | /var/lib/manticore/test_backup/test_backup.0.spi    |
- | /var/lib/manticore/test_backup/test_backup.0.spidx  | /var/lib/manticore/test_backup/test_backup.0.spidx  |
+ | /var/lib/manticore/test_backup/test_backup.0.spknn  | /var/lib/manticore/test_backup/test_backup.0.spknn  |
- | /var/lib/manticore/test_backup/test_backup.0.spknn  | /var/lib/manticore/test_backup/test_backup.0.spknn  |
+ | /var/lib/manticore/test_backup/test_backup.0.spm    | /var/lib/manticore/test_backup/test_backup.0.spm    |
- | /var/lib/manticore/test_backup/test_backup.0.spm    | /var/lib/manticore/test_backup/test_backup.0.spm    |
+ | /var/lib/manticore/test_backup/test_backup.0.spp    | /var/lib/manticore/test_backup/test_backup.0.spp    |
- | /var/lib/manticore/test_backup/test_backup.0.spp    | /var/lib/manticore/test_backup/test_backup.0.spp    |
+ | /var/lib/manticore/test_backup/test_backup.0.spt    | /var/lib/manticore/test_backup/test_backup.0.spt    |
- | /var/lib/manticore/test_backup/test_backup.0.spt    | /var/lib/manticore/test_backup/test_backup.0.spt    |
+ | /var/lib/manticore/test_backup/test_backup.meta     | /var/lib/manticore/test_backup/test_backup.meta     |
- | /var/lib/manticore/test_backup/test_backup.meta     | /var/lib/manticore/test_backup/test_backup.meta     |
+ | /var/lib/manticore/test_backup/test_backup.settings | /var/lib/manticore/test_backup/test_backup.settings |
- | /var/lib/manticore/test_backup/test_backup.settings | /var/lib/manticore/test_backup/test_backup.settings |
+ +-----------------------------------------------------+-----------------------------------------------------+
- +-----------------------------------------------------+-----------------------------------------------------+
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM test_backup"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_backup (id, title, content, status) VALUES (4, 'frozen insert', 'test data', 3)"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "UNFREEZE test_backup"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM test_backup"
––– output –––
OK
––– input –––
mysqldump -h0 -P9306 manticore test_backup > /tmp/logical_backup.sql 2>/dev/null; echo $?
––– output –––
OK
––– input –––
grep -c "INSERT INTO" /tmp/logical_backup.sql
––– output –––
OK
––– input –––
searchd --stopwait > /dev/null 2>&1; echo $?
––– output –––
OK
––– input –––
rm -f /etc/manticoresearch/manticore.conf; rm -rf /var/lib/manticore/*; echo "Cleaned for restore"
––– output –––
OK
––– input –––
manticore-backup --backup-dir=/tmp/backup --restore 2>&1 | grep -c "backup-"
––– output –––
OK
––– input –––
BACKUP_NAME=$(manticore-backup --backup-dir=/tmp/backup --restore 2>&1 | grep backup- | awk '{print $1}' | head -1)
manticore-backup --backup-dir=/tmp/backup --restore=$BACKUP_NAME 2>&1 | grep -c "Starting to restore"
––– output –––
- 1
+ 0
––– input –––
searchd > /dev/null 2>&1; echo $?
––– output –––
- 0
+ 1
––– input –––
echo "Waiting for searchd to start"; sleep 3
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM test_backup"
––– output –––
- +----------+
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
- | count(*) |
- +----------+
- |        3 |
- +----------+
––– input –––
mysql -h0 -P9306 -e "FLUSH RAMCHUNK test_backup; OPTIMIZE TABLE test_backup OPTION sync=1, cutoff=1"; echo $?
––– output –––
- 0
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
+ 1
––– input –––
mysql -h0 -P9306 -e "SELECT id FROM test_backup WHERE KNN(vec, 2, 'artificial intelligence')"
––– output –––
- +------+
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
- | id   |
- +------+
- |    1 |
- |    3 |
- |    2 |
- +------+
––– input –––
mysql -h0 -P9306 -e "ALTER TABLE test_backup ADD COLUMN new_field INTEGER"; echo $?
––– output –––
- 0
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
+ 1
––– input –––
mysql -h0 -P9306 -e "DESC test_backup" | grep "new_field"
––– output –––
- | new_field | uint         | columnar fast_fetch     |
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_copy (
    title TEXT,
    content TEXT,
    status INTEGER,
    vec FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2'
    MODEL_NAME='sentence-transformers/all-MiniLM-L6-v2'
    FROM='title, content'
) engine='columnar'"; echo $?
––– output –––
- 0
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
+ 1
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_copy (id, title, content, status) VALUES
    (1, 'machine learning', 'neural networks', 1),
    (2, 'deep learning', 'transformers', 1),
    (3, 'computer vision', 'image processing', 2),
    (4, 'frozen insert', 'test data', 3)"; echo $?
––– output –––
- 0
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
+ 1
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) FROM test_copy"
––– output –––
- +----------+
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
- | count(*) |
- +----------+
- |        4 |
- +----------+
––– input –––
mysql -h0 -P9306 -e "SELECT id FROM test_copy WHERE KNN(vec, 2, 'artificial intelligence')"
––– output –––
- +------+
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
- | id   |
- +------+
- |    1 |
- |    3 |
- |    2 |
- |    4 |
- +------+
––– input –––
mysql -h0 -P9306 -e "FLUSH RAMCHUNK test_copy; OPTIMIZE TABLE test_copy OPTION sync=1, cutoff=1"; echo $?
––– output –––
- 0
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
+ 1
––– input –––
mysql -h0 -P9306 -e "SELECT id FROM test_copy WHERE KNN(vec, 2, 'artificial intelligence')"
––– output –––
- +------+
+ ERROR 2003 (HY000): Can't connect to MySQL server on '0:9306' (111)
- | id   |
- +------+
- |    1 |
- |    3 |
- |    2 |
- |    4 |
- +------+
test/clt-tests/mcl/auto-embeddings-syntax-check.rec
––– input –––
rm -f /var/log/manticore/searchd.log; stdbuf -oL searchd --stopwait > /dev/null; stdbuf -oL searchd ${SEARCHD_ARGS:-} > /dev/null
––– output –––
OK
––– input –––
if timeout 10 grep -qm1 'accepting connections' <(tail -n 1000 -f /var/log/manticore/searchd.log); then echo 'Accepting connections!'; else echo 'Timeout or failed!'; fi
––– output –––
OK
––– input –––
cosine_similarity() {
    local file1="$1" file2="$2"

    awk '
    NR==FNR { a[NR]=$1; suma2+=$1*$1; next }
    {
        dot += a[FNR]*$1
        sumb2 += $1*$1
    }
    END {
        print dot / (sqrt(suma2) * sqrt(sumb2))
    }' "$file1" "$file2"
}
––– output –––
OK
––– input –––
export -f cosine_similarity
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_invalid_model (title TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'voyage/invalid-model-name-12345' FROM = 'title') " 2>&1
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_valid_model_no_api_key (title TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'voyage/voyage-3.5-lite' FROM = 'title') " 2>&1
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_voyage_remote (title TEXT, content TEXT, description TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'voyage/voyage-3.5-lite' FROM = 'title, content' API_KEY='${VOYAGE_API_KEY}') "; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -E -e "SHOW CREATE TABLE test_voyage_remote"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_voyage_remote (id, title, content, description) VALUES(1, 'machine learning algorithms', 'deep neural networks and artificial intelligence', 'advanced AI research')"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) as record_count FROM test_voyage_remote WHERE id=1"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_voyage_remote (id, title, content, description) VALUES(2, 'machine learning algorithms', 'deep neural networks and artificial intelligence', 'different description')"

mysql -h0 -P9306 -e "SELECT embedding FROM test_voyage_remote WHERE id=1" | \
    grep -v embedding | \
    sed 's/[0-9]\+\(\.[0-9]\+\)\?/\n&\n/g' | \
    grep -E '^[0-9]+(\.[0-9]+)?$' | \
    awk '{printf "%.5f\n", $1}' > /tmp/vector1.txt

mysql -h0 -P9306 -e "SELECT embedding FROM test_voyage_remote WHERE id=2" | \
    grep -v embedding | \
    sed 's/[0-9]\+\(\.[0-9]\+\)\?/\n&\n/g' | \
    grep -E '^[0-9]+(\.[0-9]+)?$' | \
    awk '{printf "%.5f\n", $1}' > /tmp/vector2.txt

SIMILARITY=$(cosine_similarity /tmp/vector1.txt /tmp/vector2.txt)

echo "Cosine similarity: $SIMILARITY"

RESULT=$(awk -v sim="$SIMILARITY" 'BEGIN {
    if (sim > 0.99)
        print "SUCCESS: Same FROM fields produce similar vectors (similarity: " sim ")"
    else
        print "FAIL: Different vectors (FROM does not include description field and should not change generated vector value) (similarity: " sim ")"
}')

echo "$RESULT"

rm -f /tmp/vector1.txt /tmp/vector2.txt
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_voyage_title_only (title TEXT, content TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'voyage/voyage-3.5-lite' FROM = 'title' API_KEY='${VOYAGE_API_KEY}') "; mysql -h0 -P9306 -e "INSERT INTO test_voyage_title_only (id, title, content) VALUES(1, 'machine learning algorithms', 'completely different content here')"; MD5_MULTI=$(mysql -h0 -P9306 -e "SELECT embedding FROM test_voyage_remote WHERE id=1" | grep -v embedding | md5sum | awk '{print $1}'); MD5_SINGLE=$(mysql -h0 -P9306 -e "SELECT embedding FROM test_voyage_title_only WHERE id=1" | grep -v embedding | md5sum | awk '{print $1}'); echo "multi_field_md5: $MD5_MULTI"; echo "single_field_md5: $MD5_SINGLE"; if [ "$MD5_MULTI" != "$MD5_SINGLE" ]; then echo "SUCCESS: Different FROM specifications produce different vectors"; else echo "INFO: FROM field comparison result"; fi
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test__invalid_field (title TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'voyage/text-embedding-ada-002' FROM = 'nonexistent_field') " 2>&1
––– output –––
OK
––– input –––
if mysql -h0 -P9306 -e "SHOW TABLES LIKE 'test_voyage_no_from'" | grep -q test_voyage_no_from; then mysql -h0 -P9306 -e "INSERT INTO test__no_from (id, title, embedding) VALUES(1, 'test title', '(0.1, 0.2, 0.3, 0.4, 0.5)')"; echo "insert_result: $?"; else echo "insert_result: skipped (table not created)"; fi
––– output –––
OK
––– input –––
if mysql -h0 -P9306 -e "SHOW TABLES LIKE 'test__no_from'" | grep -q test_voyage_no_from; then mysql -h0 -P9306 -e "SHOW CREATE TABLE test_voyage_no_from"; else echo "table_structure: skipped (table not created)"; fi
––– output –––
OK
––– input –––
if [ -n "$VOYAGE_API_KEY" ] && [ "$VOYAGE_API_KEY" != "dummy_key_for_testing" ]; then echo "API key is available for testing"; else echo "API key not available - using dummy for error testing"; fi
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT id, knn_dist() FROM test_voyage_remote WHERE knn(embedding, 3, 'machine learning and artificial intelligence')\G"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) as count FROM test_voyage_remote WHERE knn(embedding, 5, 'technology and AI') AND id > 0"
––– output –––
OK
––– input –––
API_KEY_VAL="${VOYAGE_API_KEY}"; cat > /etc/manticoresearch/manticore.conf << CONFEOF
searchd {
    listen = 127.0.0.1:9306:mysql41
    listen = 127.0.0.1:9308:http
    log = /var/log/manticore/searchd.log
    pid_file = /var/run/manticore/searchd.pid
}

table test_voyage_plain {
    type = rt
    path = /var/lib/manticore/test_voyage_plain
    rt_field = title
    rt_field = content
    rt_attr_float_vector = embedding
    knn = {"attrs":[{"name":"embedding","type":"hnsw","hnsw_similarity":"L2","hnsw_m":16,"hnsw_ef_construction":200,"model_name":"voyage/voyage-3.5-lite","from":"title,content","api_key":"${API_KEY_VAL}"}]}
}
CONFEOF
––– output –––
OK
––– input –––
searchd --stopwait --quiet
––– output –––
OK
––– input –––
rm -f /var/log/manticore/searchd.log; stdbuf -oL searchd --stopwait > /dev/null; stdbuf -oL searchd ${SEARCHD_ARGS:-} > /dev/null
––– output –––
OK
––– input –––
if timeout 10 grep -qm1 'accepting connections' <(tail -n 1000 -f /var/log/manticore/searchd.log); then echo 'Accepting connections!'; else echo 'Timeout or failed!'; fi
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SHOW TABLES"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_voyage_plain (id, title, content) VALUES(1, 'bread', 'food item'), (2, 'cat', 'animal pet')"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) as count FROM test_voyage_plain"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -E -e "SELECT id, title FROM test_voyage_plain WHERE knn(embedding, 2, 'dog')"
––– output –––
OK
––– input –––
cat > /etc/manticoresearch/manticore.conf << 'EOF'
searchd {
    listen = 127.0.0.1:9306:mysql41
    listen = 127.0.0.1:9308:http
    log = /var/log/manticore/searchd.log
    pid_file = /var/run/manticore/searchd.pid
}

table test_voyage_no_key {
    type = rt
    path = /var/lib/manticore/test_voyage_no_key
    rt_field = title
    rt_attr_float_vector = embedding
    knn = {"attrs":[{"name":"embedding","type":"hnsw","hnsw_similarity":"L2","model_name":"voyage/voyage-3.5-lite","from":"title"}]}
}
EOF
––– output –––
OK
––– input –––
searchd --stopwait --quiet
––– output –––
OK
––– input –––
searchd 2>&1|grep WARNING
––– output –––
- WARNING: table 'test_voyage_no_key': prealloc: Invalid API key for remote model - NOT SERVING
+ [Wed Apr 22 19:23:56.608 2026] [134] WARNING: Error initializing secondary index: daemon requires secondary library v19 (trying to load v20)
+ WARNING: table 'test_voyage_no_key': prealloc: Invalid API key for remote model - NOT SERVING
test/clt-tests/mcl/auto-embeddings-voyage-remote.rec
––– input –––
rm -f /var/log/manticore/searchd.log; stdbuf -oL searchd --stopwait > /dev/null; stdbuf -oL searchd ${SEARCHD_ARGS:-} > /dev/null
––– output –––
OK
––– input –––
if timeout 10 grep -qm1 'accepting connections' <(tail -n 1000 -f /var/log/manticore/searchd.log); then echo 'Accepting connections!'; else echo 'Timeout or failed!'; fi
––– output –––
OK
––– input –––
cosine_similarity() {
    local file1="$1" file2="$2"

    awk '
    NR==FNR { a[NR]=$1; suma2+=$1*$1; next }
    {
        dot += a[FNR]*$1
        sumb2 += $1*$1
    }
    END {
        print dot / (sqrt(suma2) * sqrt(sumb2))
    }' "$file1" "$file2"
}
––– output –––
OK
––– input –––
export -f cosine_similarity
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_invalid_model (title TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'voyage/invalid-model-name-12345' FROM = 'title') " 2>&1
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_valid_model_no_api_key (title TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'voyage/voyage-3.5-lite' FROM = 'title') " 2>&1
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_voyage_remote (title TEXT, content TEXT, description TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'voyage/voyage-3.5-lite' FROM = 'title, content' API_KEY='${VOYAGE_API_KEY}') "; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -E -e "SHOW CREATE TABLE test_voyage_remote"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_voyage_remote (id, title, content, description) VALUES(1, 'machine learning algorithms', 'deep neural networks and artificial intelligence', 'advanced AI research')"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) as record_count FROM test_voyage_remote WHERE id=1"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_voyage_remote (id, title, content, description) VALUES(2, 'machine learning algorithms', 'deep neural networks and artificial intelligence', 'different description')"

mysql -h0 -P9306 -e "SELECT embedding FROM test_voyage_remote WHERE id=1" | \
    grep -v embedding | \
    sed 's/[0-9]\+\(\.[0-9]\+\)\?/\n&\n/g' | \
    grep -E '^[0-9]+(\.[0-9]+)?$' | \
    awk '{printf "%.5f\n", $1}' > /tmp/vector1.txt

mysql -h0 -P9306 -e "SELECT embedding FROM test_voyage_remote WHERE id=2" | \
    grep -v embedding | \
    sed 's/[0-9]\+\(\.[0-9]\+\)\?/\n&\n/g' | \
    grep -E '^[0-9]+(\.[0-9]+)?$' | \
    awk '{printf "%.5f\n", $1}' > /tmp/vector2.txt

SIMILARITY=$(cosine_similarity /tmp/vector1.txt /tmp/vector2.txt)

echo "Cosine similarity: $SIMILARITY"

RESULT=$(awk -v sim="$SIMILARITY" 'BEGIN {
    if (sim > 0.99)
        print "SUCCESS: Same FROM fields produce similar vectors (similarity: " sim ")"
    else
        print "FAIL: Different vectors (FROM does not include description field and should not change generated vector value) (similarity: " sim ")"
}')

echo "$RESULT"

rm -f /tmp/vector1.txt /tmp/vector2.txt
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_voyage_title_only (title TEXT, content TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'voyage/voyage-3.5-lite' FROM = 'title' API_KEY='${VOYAGE_API_KEY}') "; mysql -h0 -P9306 -e "INSERT INTO test_voyage_title_only (id, title, content) VALUES(1, 'machine learning algorithms', 'completely different content here')"; MD5_MULTI=$(mysql -h0 -P9306 -e "SELECT embedding FROM test_voyage_remote WHERE id=1" | grep -v embedding | md5sum | awk '{print $1}'); MD5_SINGLE=$(mysql -h0 -P9306 -e "SELECT embedding FROM test_voyage_title_only WHERE id=1" | grep -v embedding | md5sum | awk '{print $1}'); echo "multi_field_md5: $MD5_MULTI"; echo "single_field_md5: $MD5_SINGLE"; if [ "$MD5_MULTI" != "$MD5_SINGLE" ]; then echo "SUCCESS: Different FROM specifications produce different vectors"; else echo "INFO: FROM field comparison result"; fi
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test__invalid_field (title TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'voyage/text-embedding-ada-002' FROM = 'nonexistent_field') " 2>&1
––– output –––
OK
––– input –––
if mysql -h0 -P9306 -e "SHOW TABLES LIKE 'test_voyage_no_from'" | grep -q test_voyage_no_from; then mysql -h0 -P9306 -e "INSERT INTO test__no_from (id, title, embedding) VALUES(1, 'test title', '(0.1, 0.2, 0.3, 0.4, 0.5)')"; echo "insert_result: $?"; else echo "insert_result: skipped (table not created)"; fi
––– output –––
OK
––– input –––
if mysql -h0 -P9306 -e "SHOW TABLES LIKE 'test__no_from'" | grep -q test_voyage_no_from; then mysql -h0 -P9306 -e "SHOW CREATE TABLE test_voyage_no_from"; else echo "table_structure: skipped (table not created)"; fi
––– output –––
OK
––– input –––
if [ -n "$VOYAGE_API_KEY" ] && [ "$VOYAGE_API_KEY" != "dummy_key_for_testing" ]; then echo "API key is available for testing"; else echo "API key not available - using dummy for error testing"; fi
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT id, knn_dist() FROM test_voyage_remote WHERE knn(embedding, 3, 'machine learning and artificial intelligence')\G"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) as count FROM test_voyage_remote WHERE knn(embedding, 5, 'technology and AI') AND id > 0"
––– output –––
OK
––– input –––
API_KEY_VAL="${VOYAGE_API_KEY}"; cat > /etc/manticoresearch/manticore.conf << CONFEOF
searchd {
    listen = 127.0.0.1:9306:mysql41
    listen = 127.0.0.1:9308:http
    log = /var/log/manticore/searchd.log
    pid_file = /var/run/manticore/searchd.pid
}

table test_voyage_plain {
    type = rt
    path = /var/lib/manticore/test_voyage_plain
    rt_field = title
    rt_field = content
    rt_attr_float_vector = embedding
    knn = {"attrs":[{"name":"embedding","type":"hnsw","hnsw_similarity":"L2","hnsw_m":16,"hnsw_ef_construction":200,"model_name":"voyage/voyage-3.5-lite","from":"title,content","api_key":"${API_KEY_VAL}"}]}
}
CONFEOF
––– output –––
OK
––– input –––
searchd --stopwait --quiet
––– output –––
OK
––– input –––
rm -f /var/log/manticore/searchd.log; stdbuf -oL searchd --stopwait > /dev/null; stdbuf -oL searchd ${SEARCHD_ARGS:-} > /dev/null
––– output –––
OK
––– input –––
if timeout 10 grep -qm1 'accepting connections' <(tail -n 1000 -f /var/log/manticore/searchd.log); then echo 'Accepting connections!'; else echo 'Timeout or failed!'; fi
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SHOW TABLES"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_voyage_plain (id, title, content) VALUES(1, 'bread', 'food item'), (2, 'cat', 'animal pet')"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) as count FROM test_voyage_plain"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -E -e "SELECT id, title FROM test_voyage_plain WHERE knn(embedding, 2, 'dog')"
––– output –––
OK
––– input –––
cat > /etc/manticoresearch/manticore.conf << 'EOF'
searchd {
    listen = 127.0.0.1:9306:mysql41
    listen = 127.0.0.1:9308:http
    log = /var/log/manticore/searchd.log
    pid_file = /var/run/manticore/searchd.pid
}

table test_voyage_no_key {
    type = rt
    path = /var/lib/manticore/test_voyage_no_key
    rt_field = title
    rt_attr_float_vector = embedding
    knn = {"attrs":[{"name":"embedding","type":"hnsw","hnsw_similarity":"L2","model_name":"voyage/voyage-3.5-lite","from":"title"}]}
}
EOF
––– output –––
OK
––– input –––
searchd --stopwait --quiet
––– output –––
OK
––– input –––
searchd 2>&1|grep WARNING
––– output –––
- WARNING: table 'test_voyage_no_key': prealloc: Invalid API key for remote model - NOT SERVING
+ [Wed Apr 22 19:24:08.461 2026] [134] WARNING: Error initializing secondary index: daemon requires secondary library v19 (trying to load v20)
+ WARNING: table 'test_voyage_no_key': prealloc: Invalid API key for remote model - NOT SERVING
test/clt-tests/mcl/auto-embeddings-openai-remote.rec
––– input –––
rm -f /var/log/manticore/searchd.log; stdbuf -oL searchd --stopwait > /dev/null; stdbuf -oL searchd ${SEARCHD_ARGS:-} > /dev/null
––– output –––
OK
––– input –––
if timeout 10 grep -qm1 'accepting connections' <(tail -n 1000 -f /var/log/manticore/searchd.log); then echo 'Accepting connections!'; else echo 'Timeout or failed!'; fi
––– output –––
OK
––– input –––
cosine_similarity() {
    local file1="$1" file2="$2"

    awk '
    NR==FNR { a[NR]=$1; suma2+=$1*$1; next }
    {
        dot += a[FNR]*$1
        sumb2 += $1*$1
    }
    END {
        print dot / (sqrt(suma2) * sqrt(sumb2))
    }' "$file1" "$file2"
}
––– output –––
OK
––– input –––
export -f cosine_similarity
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_invalid_model (title TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'openai/invalid-model-name-12345' FROM = 'title') " 2>&1
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_valid_model_no_api_key (title TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'openai/text-embedding-ada-002' FROM = 'title') " 2>&1
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_openai_remote (title TEXT, content TEXT, description TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'openai/text-embedding-ada-002' FROM = 'title, content' API_KEY='${OPENAI_API_KEY}') "; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SHOW CREATE TABLE test_openai_remote"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_openai_remote (id, title, content, description) VALUES(1, 'machine learning algorithms', 'deep neural networks and artificial intelligence', 'advanced AI research')"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) as record_count FROM test_openai_remote WHERE id=1"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_openai_remote (id, title, content, description) VALUES(2, 'machine learning algorithms', 'deep neural networks and artificial intelligence', 'different description')"

mysql -h0 -P9306 -e "SELECT embedding FROM test_openai_remote WHERE id=1" | \
    grep -v embedding | \
    sed 's/[0-9]\+\(\.[0-9]\+\)\?/\n&\n/g' | \
    grep -E '^[0-9]+(\.[0-9]+)?$' | \
    awk '{printf "%.5f\n", $1}' > /tmp/vector1.txt

mysql -h0 -P9306 -e "SELECT embedding FROM test_openai_remote WHERE id=2" | \
    grep -v embedding | \
    sed 's/[0-9]\+\(\.[0-9]\+\)\?/\n&\n/g' | \
    grep -E '^[0-9]+(\.[0-9]+)?$' | \
    awk '{printf "%.5f\n", $1}' > /tmp/vector2.txt

SIMILARITY=$(cosine_similarity /tmp/vector1.txt /tmp/vector2.txt)

echo "Cosine similarity: $SIMILARITY"

RESULT=$(awk -v sim="$SIMILARITY" 'BEGIN {
    if (sim > 0.99)
        print "SUCCESS: Same FROM fields produce similar vectors (similarity: " sim ")"
    else
        print "FAIL: Different vectors (FROM does not include description field and should not change generated vector value) (similarity: " sim ")"
}')

echo "$RESULT"

rm -f /tmp/vector1.txt /tmp/vector2.txt
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_openai_title_only (title TEXT, content TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'openai/text-embedding-ada-002' FROM = 'title' API_KEY='${OPENAI_API_KEY}') "; mysql -h0 -P9306 -e "INSERT INTO test_openai_title_only (id, title, content) VALUES(1, 'machine learning algorithms', 'completely different content here')"; MD5_MULTI=$(mysql -h0 -P9306 -e "SELECT embedding FROM test_openai_remote WHERE id=1" | grep -v embedding | md5sum | awk '{print $1}'); MD5_SINGLE=$(mysql -h0 -P9306 -e "SELECT embedding FROM test_openai_title_only WHERE id=1" | grep -v embedding | md5sum | awk '{print $1}'); echo "multi_field_md5: $MD5_MULTI"; echo "single_field_md5: $MD5_SINGLE"; if [ "$MD5_MULTI" != "$MD5_SINGLE" ]; then echo "SUCCESS: Different FROM specifications produce different vectors"; else echo "INFO: FROM field comparison result"; fi
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "CREATE TABLE test_openai_invalid_field (title TEXT, embedding FLOAT_VECTOR KNN_TYPE='hnsw' HNSW_SIMILARITY='l2' MODEL_NAME = 'openai/text-embedding-ada-002' FROM = 'nonexistent_field') " 2>&1
––– output –––
OK
––– input –––
if mysql -h0 -P9306 -e "SHOW TABLES LIKE 'test_openai_no_from'" | grep -q test_openai_no_from; then mysql -h0 -P9306 -e "INSERT INTO test_openai_no_from (id, title, embedding) VALUES(1, 'test title', '(0.1, 0.2, 0.3, 0.4, 0.5)')"; echo "insert_result: $?"; else echo "insert_result: skipped (table not created)"; fi
––– output –––
OK
––– input –––
if mysql -h0 -P9306 -e "SHOW TABLES LIKE 'test_openai_no_from'" | grep -q test_openai_no_from; then mysql -h0 -P9306 -e "SHOW CREATE TABLE test_openai_no_from"; else echo "table_structure: skipped (table not created)"; fi
––– output –––
OK
––– input –––
if [ -n "$OPENAI_API_KEY" ] && [ "$OPENAI_API_KEY" != "dummy_key_for_testing" ]; then echo "API key is available for testing"; else echo "API key not available - using dummy for error testing"; fi
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT id, knn_dist() FROM test_openai_remote WHERE knn(embedding, 3, 'machine learning and artificial intelligence')\G"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) as count FROM test_openai_remote WHERE knn(embedding, 5, 'technology and AI') AND id > 0"
––– output –––
OK
––– input –––
API_KEY_VAL="${OPENAI_API_KEY}"; cat > /etc/manticoresearch/manticore.conf << CONFEOF
searchd {
    listen = 127.0.0.1:9306:mysql41
    listen = 127.0.0.1:9308:http
    log = /var/log/manticore/searchd.log
    pid_file = /var/run/manticore/searchd.pid
}

table test_openai_plain {
    type = rt
    path = /var/lib/manticore/test_openai_plain
    rt_field = title
    rt_field = content
    rt_attr_float_vector = embedding
    knn = {"attrs":[{"name":"embedding","type":"hnsw","hnsw_similarity":"L2","hnsw_m":16,"hnsw_ef_construction":200,"model_name":"openai/text-embedding-ada-002","from":"title,content","api_key":"${API_KEY_VAL}"}]}
}
CONFEOF
––– output –––
OK
––– input –––
searchd --stopwait --quiet
––– output –––
OK
––– input –––
rm -f /var/log/manticore/searchd.log; stdbuf -oL searchd --stopwait > /dev/null; stdbuf -oL searchd ${SEARCHD_ARGS:-} > /dev/null
––– output –––
OK
––– input –––
if timeout 10 grep -qm1 'accepting connections' <(tail -n 1000 -f /var/log/manticore/searchd.log); then echo 'Accepting connections!'; else echo 'Timeout or failed!'; fi
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SHOW TABLES"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "INSERT INTO test_openai_plain (id, title, content) VALUES(1, 'bread', 'food item'), (2, 'cat', 'animal pet')"; echo $?
––– output –––
OK
––– input –––
mysql -h0 -P9306 -e "SELECT COUNT(*) as count FROM test_openai_plain"
––– output –––
OK
––– input –––
mysql -h0 -P9306 -E -e "SELECT id, title FROM test_openai_plain WHERE knn(embedding, 2, 'dog')"
––– output –––
OK
––– input –––
cat > /etc/manticoresearch/manticore.conf << 'EOF'
searchd {
    listen = 127.0.0.1:9306:mysql41
    listen = 127.0.0.1:9308:http
    log = /var/log/manticore/searchd.log
    pid_file = /var/run/manticore/searchd.pid
}

table test_openai_no_key {
    type = rt
    path = /var/lib/manticore/test_openai_no_key
    rt_field = title
    rt_attr_float_vector = embedding
    knn = {"attrs":[{"name":"embedding","type":"hnsw","hnsw_similarity":"L2","model_name":"openai/text-embedding-ada-002","from":"title"}]}
}
EOF
––– output –––
OK
––– input –––
searchd --stopwait --quiet
––– output –––
OK
––– input –––
searchd 2>&1|grep WARNING
––– output –––
- WARNING: table 'test_openai_no_key': prealloc: Invalid API key for remote model - NOT SERVING
+ [Wed Apr 22 19:22:55.595 2026] [134] WARNING: Error initializing secondary index: daemon requires secondary library v19 (trying to load v20)
+ WARNING: table 'test_openai_no_key': prealloc: Invalid API key for remote model - NOT SERVING

- Add platform-specific BLAS features for candle
- Remove fixed thread count for ONNX models
- Calculate batch size dynamically based on CPU count
- Set minimum batch size to 8 and cap at 128
- Improve throughput for BERT and ONNX inference
- Simplify batch size boundary logic
- Enable multi-threaded batch tokenization
- Replace dynamic batch sizing with fixed constants
- Initialize multiple ONNX sessions based on CPU count
- Parallelize batch processing using rayon across session pool
- Refactor forward pass logic into reusable run_batch method
- Allow configuring batch size and threads via environment variables
- Support concurrent inference on a single shared model session
- Improve parallel processing efficiency for ONNX models
- Replace single SyncSession with a pool of Mutex-wrapped sessions
- Calculate session pool size based on available parallelism
- Dispatch batches across the session pool using rayon
- Remove unsafe SyncSession wrapper and UnsafeCell usage
- Add example to benchmark parallel ORT performance
- Instrument local model to track concurrent sessions
- Verify intra-thread scaling for pooled sessions
- Replace session pool with single session and pipelining
- Overlap tokenization with inference using sync channels
- Remove rayon dependency and parallel batch dispatch
- Delete ort_parallel example and parallelization tests
- Simplify session initialization and thread management
- Use sequential batching instead of threaded pipelining
- Improve cache locality by avoiding upfront tokenization
- Remove thread spawning and channel communication
- Implement SyncSession to allow parallel model execution
- Remove Mutex around ONNX session to eliminate contention
- Set default intra-op threads to 0 for automatic scaling
- Refactor prediction logic to use shared batched utility
- Overlap tokenization and inference for ONNX models
- Reduce idle time between batch processing steps
- Improve throughput for large text batch embeddings
- Increase default batch size from 8 to 32
- Handle empty text input arrays efficiently
@donhardman donhardman force-pushed the fix/embeddings-perf branch 2 times, most recently from 18ad409 to 924ac02 Compare April 23, 2026 10:20
- Replace SyncSession with platform-aware SessionWrapper
- Implement Mutex locking for Windows ORT sessions
- Retain UnsafeCell for concurrent execution on Unix
- Integrate SessionWrapper into OnnxEmbeddingModel
- Use parallel workers instead of serial pipelining
- Process small batches directly to avoid thread overhead
- Scale large inference tasks across available CPU cores
- Add unified tokenization and inference helper
- Replace modulo operator with idiomatic Rust is_multiple_of method in test assertion
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants