Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
229c087
initial support for MSSQL
Nov 27, 2023
4dee798
updated MSSQL test
Nov 27, 2023
65142a5
added MSSQL
Nov 27, 2023
caacfe9
drop old
Jan 11, 2024
37d8b52
support for native vectors
Jan 19, 2024
7a359e3
notes on how to install and run
Jan 19, 2024
079d020
more details
Jan 19, 2024
1158598
fixed column name
Jan 23, 2024
d2a13f0
removed fast_executemany as generates error
Jan 23, 2024
f076177
added link
Jan 23, 2024
44e319c
send binary data instead of strings
Jan 26, 2024
c581321
added filter support
Jan 26, 2024
4c73b36
added notes
Jan 26, 2024
6c3ebf5
added link
Jan 26, 2024
4711980
updated for installing pip
Jan 26, 2024
3d27fee
added pip
Jan 26, 2024
01d21bb
use clustered index
Jan 27, 2024
dbf1ea8
correct notes
Jan 27, 2024
26e6079
code improvements
Jan 27, 2024
0409d9c
updated to use binary format
Jan 28, 2024
a30019a
smaller optimizations
Jan 28, 2024
f1b3f43
improved scripts
Jan 31, 2024
6e2013c
corrected header values to comply with latest bits
Jan 31, 2024
38a8497
using vanilla vector_distance
Apr 22, 2024
075f011
added setup instructions for MSSQL
Apr 22, 2024
fe356a6
using ad hoc sql instead of stored procedure to find and filter
Apr 22, 2024
588695b
fixed code
Apr 22, 2024
09bf65b
fixed code
Apr 22, 2024
654c7dd
Update MSSQL-Setup.md
Apr 25, 2024
a96ddc8
Merge remote-tracking branch 'upstream/main'
May 3, 2024
5958560
fixed dependency error
May 3, 2024
c687c85
pin streamlit to 1.31.1 to avoid incompatibilities
May 4, 2024
94f3e1f
Merge branch 'main' into vector_updated
May 4, 2024
b881af8
merged with latest version
May 4, 2024
be10989
refactored code
May 4, 2024
a698452
Merge branch 'main' of https://github.com/zilliztech/VectorDBBench in…
May 23, 2024
c67bd37
updated notes
May 23, 2024
120851d
Merge branch 'zilliztech-main' into vector
May 23, 2024
b028184
Merge remote-tracking branch 'upstream/main' into vector-sync
Aug 26, 2024
000458c
using plain vector_distance
Aug 26, 2024
923e177
commented unused parameter
Aug 26, 2024
3c45645
Merge remote-tracking branch 'upstream/main' into vectorupd
Jan 9, 2025
e281dc3
support for vector type
Jan 9, 2025
118d002
completed basic vector type support
Jan 9, 2025
54d86fc
added vector index creation
Jan 10, 2025
6ec2ac3
removed unneeded files
Jan 10, 2025
a92de0c
updated notes
Jan 10, 2025
899ec32
fixed search queries
Jan 10, 2025
eafc5e0
updated notes
Jan 10, 2025
0cca1d3
removed hard-coded values
Jan 10, 2025
a7bf759
:Add CLI Support
JoshInnis Jan 15, 2025
8253c5e
Remove static declaration of search_embedding from MSSQL
Jan 22, 2025
10b363c
Add the CLI Configurations
Jan 22, 2025
5a29423
Change CREATE VECTOR INDEX to build with Euclidean Distance
JoshInnis Feb 5, 2025
5393168
Drop the vector table type and Stored Procedure
JoshInnis Feb 18, 2025
592bde3
Remove hard coded metric type
JoshInnis Feb 18, 2025
52527eb
Update MSSQL-Setup.md
JoshInnis Apr 15, 2025
8a31745
Update MSSQL-Setup.md
JoshInnis Apr 18, 2025
e7e3cdc
Rename MSSQL-Setup.md to README-MSSQLmd
JoshInnis Apr 18, 2025
edcda67
Delete MSSQL-Notes.txt
JoshInnis Apr 18, 2025
55d6f36
Update README-MSSQLmd
JoshInnis Apr 18, 2025
92c8902
Update README-MSSQLmd
JoshInnis Apr 18, 2025
168df35
Update README-MSSQLmd
JoshInnis Apr 21, 2025
e4ee845
Fix formatting bug in logging when loading table
JoshInnis Apr 22, 2025
013dd5d
Fix Environs Package
JoshInnis Apr 22, 2025
2af3f22
Report p50 and p95 latencies in Serial Search
JoshInnis Apr 22, 2025
b19d0a4
Support EntraId
Aug 27, 2025
f93379f
Cleanup Connection Strings
Sep 2, 2025
6edf668
Change logging and error checking to logging library
Sep 2, 2025
cc299d7
Add Azure-identity to pyproject.toml
Oct 17, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Thanks to: https://rehansaeed.com/gitattributes-best-practices/

# Set default behavior to automatically normalize line endings.
* text=auto

# Force batch scripts to always use CRLF line endings so that if a repo is accessed
# in Windows via a file share from Linux, the scripts will work.
*.{cmd,[cC][mM][dD]} text eol=crlf
*.{bat,[bB][aA][tT]} text eol=crlf

# Force bash scripts to always use LF line endings so that if a repo is accessed
# in Unix via a file share from Windows, the scripts will work.
*.sh text eol=lf
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,5 @@ __MACOSX
build/
venv/
.idea/
results/
results/
.venv/
63 changes: 63 additions & 0 deletions README-MSSQLmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Run VectorDBBench against MSSQL database

VectorDBBench has been tested running on WSL2 + Ubuntu 22.04.4 LTS.

## Install ODBC

Follow instructions here: https://learn.microsoft.com/en-us/sql/connect/odbc/linux-mac/installing-the-microsoft-odbc-driver-for-sql-server

## Install Python 3.11

Follow instructions here: https://ubuntuhandbook.org/index.php/2022/10/python-3-11-released-how-install-ubuntu/)

## Install pip for Python3.11 :

Use the following commands:

```
sudo apt install python3.11 python3.11-distutils python3.11-venv
curl -sS https://bootstrap.pypa.io/get-pip.py | python3.11
```

## Clone the repository

```
git clone https://github.com/MSSQL-VectorDBBench/VectorDBBench
```

Clone the repository into a local folder

## Install VectorDBBench dependencies

Change directoies into VectorDBBench and Install the VectorDBBench and its dependencies

```
cd VectorDBBench
pip install pyodbc
pip install .
```


## Run VectorDBBench on the Command Line Interface with help

```
vectordbbench mssql --help
```

## Run VectorDBBench on the Command Line Interface
The database must exist and there must be enough room to build the index

```
vectordbbench mssql --database=vectordb --server=**IP_ADDRESS** --uid=sa --pwd=**PASSWORD_HERE** --concurrency-duration=1800 --skip-search-concurrent --case-type=Performance1536D500K
```

## Run VectorDBBench on the Command Line Interface with Existing Data

```
vectordbbench mssql --database=vectordb --server=**IP_ADDRESS** --uid=sa --pwd=**PASSWORD_HERE** --concurrency-duration=1800 --skip-search-concurrent --case-type=Performance1536D500K
```

## Start VectorDBBench in the GUI Mode
```
python -m vectordb_bench
```
8 changes: 6 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -15,14 +15,14 @@ authors = [
{name="XuanYang-cn", email="xuan.yang@zilliz.com"},
]
description = "VectorDBBench is not just an offering of benchmark results for mainstream vector databases and cloud services, it's your go-to tool for the ultimate performance and cost-effectiveness comparison. Designed with ease-of-use in mind, VectorDBBench is devised to help users, even non-professionals, reproduce results or test new systems, making the hunt for the optimal choice amongst a plethora of cloud services and open-source vector databases a breeze."

readme = "README.md"
requires-python = ">=3.11"
classifiers = [
"Programming Language :: Python :: 3",
"License :: OSI Approved :: MIT License",
"Operating System :: OS Independent",
]

dependencies = [
"click",
"pytz",
Expand All @@ -35,7 +35,7 @@ dependencies = [
"psutil",
"polars",
"plotly",
"environs",
"environs<14.0.1",
"pydantic<v2",
"scikit-learn",
"pymilvus", # with pandas, numpy, ujson
Expand Down Expand Up @@ -68,6 +68,8 @@ all = [
"memorydb",
"alibabacloud_ha3engine_vector",
"alibabacloud_searchengine20211025",
"pyodbc",
"azure-identity"
]

qdrant = [ "qdrant-client" ]
Expand All @@ -85,6 +87,7 @@ memorydb = [ "memorydb" ]
chromadb = [ "chromadb" ]
opensearch = [ "opensearch-py" ]
aliyun_opensearch = [ "alibabacloud_ha3engine_vector", "alibabacloud_searchengine20211025"]
mssql = [ "pyodbc", "azure-identity" ]

[project.urls]
"repository" = "https://github.com/zilliztech/VectorDBBench"
Expand Down Expand Up @@ -207,3 +210,4 @@ builtins-ignorelist = [
# "filter",
]


16 changes: 15 additions & 1 deletion vectordb_bench/backend/clients/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,11 +39,16 @@ class DB(Enum):
AWSOpenSearch = "OpenSearch"
AliyunElasticsearch = "AliyunElasticsearch"
Test = "test"
AliyunOpenSearch = "AliyunOpenSearch"
AliyunOpenSearch = "AliyunOpenSearch"
MSSQL = "MSSQL"

@property
def init_cls(self) -> type[VectorDB]: # noqa: PLR0911, PLR0912
"""Import while in use"""
if self == DB.MSSQL:
from .mssql.mssql import MSSQL
return MSSQL

if self == DB.Milvus:
from .milvus.milvus import Milvus

Expand Down Expand Up @@ -135,6 +140,10 @@ def init_cls(self) -> type[VectorDB]: # noqa: PLR0911, PLR0912
@property
def config_cls(self) -> type[DBConfig]: # noqa: PLR0911, PLR0912
"""Import while in use"""
if self == DB.MSSQL:
from .mssql.config import MSSQLConfig
return MSSQLConfig

if self == DB.Milvus:
from .milvus.config import MilvusConfig

Expand Down Expand Up @@ -227,6 +236,11 @@ def case_config_cls( # noqa: PLR0911
self,
index_type: IndexType | None = None,
) -> type[DBCaseConfig]:

if self == DB.MSSQL:
from .mssql.config import MSSQLVectorIndexConfig
return MSSQLVectorIndexConfig

if self == DB.Milvus:
from .milvus.config import _milvus_case_config

Expand Down
56 changes: 56 additions & 0 deletions vectordb_bench/backend/clients/mssql/cli.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
from typing import Annotated, Unpack

import click
from pydantic import SecretStr

from ....cli.cli import (
CommonTypedDict,
cli,
click_parameter_decorators_from_typed_dict,
run,
)
from .. import DB


class MSSQLTypedDict(CommonTypedDict):
server: Annotated[
str, click.option("--server", type=str, help="server url", required=True)
]
database: Annotated[
str,
click.option("--database", type=str, help="database name", required=True),
]
uid: Annotated[
str,
click.option("--uid", type=str, help="User id", required=False),
]
pwd: Annotated[
str,
click.option("--pwd", type=str, help="user password", required=False),
]
entraid: Annotated[
str,
click.option("--entraid", type=str, help="Entra Id Authentication", required=False),
]



@cli.command()
@click_parameter_decorators_from_typed_dict(MSSQLTypedDict)
def MSSQL(**parameters: Unpack[MSSQLTypedDict]):
from .config import MSSQLConfig, MSSQLVectorIndexConfig

run(
db=DB.MSSQL,
db_config=MSSQLConfig(
server=parameters["server"],
database=parameters["database"],
uid=parameters["uid"],
pwd=parameters["pwd"],
entraid=parameters["entraid"]
),
db_case_config=MSSQLVectorIndexConfig(

),
**parameters,
)
102 changes: 102 additions & 0 deletions vectordb_bench/backend/clients/mssql/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
import pyodbc
import struct
import logging
from azure.identity import ManagedIdentityCredential
from pydantic import BaseModel, SecretStr
from typing import Optional
from ..api import DBConfig, DBCaseConfig, MetricType

log = logging.getLogger(__name__)

MSSQL_CONNECTION_STRING_PLACEHOLDER="DRIVER={ODBC Driver 18 for SQL Server};SERVER=%s;DATABASE=%s;UID=%s;PWD=%s;LongAsMax=yes;Connect Timeout=30;TrustServerCertificate=Yes"

# --- Constants for Token Authentication ---
SQL_COPT_SS_ACCESS_TOKEN = 1256
SQL_SERVER_TOKEN_SCOPE = "https://database.windows.net/.default"

# --- Your Modified MSSQLConfig Class ---

class MSSQLConfig(DBConfig):
server: str
database: str
uid: Optional[str] = None
pwd: Optional[SecretStr] = None
entraid: Optional[str] = None

def to_dict(self) -> dict:
"""
Prepares connection parameters. If entraid is provided, it fetches a token
manually and returns connection attributes for pyodbc.
"""
# --- Case 1: Standard SQL Authentication ---
if self.entraid is None:
if not self.uid or not self.pwd:
log.error("UID and PWD must be provided for standard SQL auth.")

pwd_str = self.pwd.get_secret_value()
connection_string = (
f"DRIVER={{ODBC Driver 18 for SQL Server}};"
f"SERVER={self.server};"
f"DATABASE={self.database};"
f"UID={self.uid};"
f"PWD={pwd_str};"
"LongAsMax=yes;"
"Connect Timeout=30;"
"Encrypt=yes;"
"TrustServerCertificate=Yes"
)
return {"connection_string": connection_string}

# --- Case 2: Entra ID Managed Identity (Manual Token Auth) ---
log.info(f"Attempting to get token for User-Assigned Identity: {self.entraid}")

# 1. Get credentials and token using azure-identity
credential = ManagedIdentityCredential(client_id=self.entraid)
access_token = credential.get_token(SQL_SERVER_TOKEN_SCOPE)
token_bytes = access_token.token.encode("UTF-16-LE")

# 2. Pack the token for the driver
token_struct = struct.pack(f'<I{len(token_bytes)}s', len(token_bytes), token_bytes)

log.info("Token acquired successfully.")

# 3. Create the connection string WITHOUT auth keywords (UID, PWD, AUTHENTICATION)
connection_string = (
f"DRIVER={{ODBC Driver 18 for SQL Server}};"
f"SERVER={self.server};"
f"DATABASE={self.database};"
"LongAsMax=yes;"
"Connect Timeout=30;"
"Encrypt=yes;"
"TrustServerCertificate=Yes"
)

# 4. Return both the string and the token attributes
return {
"connection_string": connection_string,
"attrs_before": {SQL_COPT_SS_ACCESS_TOKEN: token_struct}
}


class MSSQLVectorIndexConfig(BaseModel, DBCaseConfig):
metric_type: MetricType | None = None
efSearch: int | None = 48

def parse_metric(self) -> str:
if self.metric_type == MetricType.L2:
return "euclidean"
elif self.metric_type == MetricType.IP:
return "dot"
return "cosine"

def index_param(self) -> dict:
return {
"lists" : self.lists,
"metric" : self.parse_metric()
}

def search_param(self) -> dict:
return {
"efSearch" : self.efSearch,
"metric" : self.parse_metric()
}
Loading