Skip to content

Dataset name validation + multi-dataset search #39

@GordonBie123

Description

@GordonBie123

Description

Add a validate_dataset_name() helper that enforces the dataset naming rules from the manual (lowercase, hyphens only, no spaces or special characters). Apply it in the upload endpoint. Extend the search endpoint to accept a comma-separated datasets parameter for multi-dataset scoping.


Acceptance Criteria

  • validate_dataset_name(name: str) -> str helper exists (suggest app/utils/validation.py)
  • Validator raises ValueError for names containing uppercase, spaces, underscores, or special characters other than hyphens
  • Validator raises ValueError for empty string input
  • Validator is called in POST /documents/upload — invalid dataset names return HTTP 422 with a clear message
  • GET /documents/search accepts datasets: Optional[str] as a comma-separated string (e.g. ?datasets=fast-food,equipment) and converts it to list[str] before passing to search_knowledge_graph()
  • Unit tests written for validate_dataset_name() covering: valid names, uppercase, spaces, underscores, empty string, leading/trailing hyphens

Technical Notes

Valid examples: "fast-food", "kitchen-equipment", "industry-reports"
Invalid examples: "Fast Food" (spaces + uppercase), "doc_2024" (underscores), "reports!" (special chars)

A simple regex is sufficient:

import re

def validate_dataset_name(name: str) -> str:
    if not name:
        raise ValueError("Dataset name cannot be empty")
    if not re.match(r'^[a-z0-9]+(-[a-z0-9]+)*$', name):
        raise ValueError(
            f"Invalid dataset name '{name}'. "
            "Use lowercase letters, numbers, and hyphens only (e.g. 'fast-food')."
        )
    return name

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions