Create sample dataset using llm by AumPatel1 · Pull Request #157 · plexe-ai/plexe

AumPatel1 · 2025-12-02T22:11:54Z

Completed a TODO where llm is called for dataset generation keeping the orginal manual logic there as fallback

…arator instances - Add _MetricDefinition class to separate metric type (definition) from metric value (measurement) - Implement comparator caching via _get_shared_comparator() to ensure only one comparator object exists per unique (method, target, epsilon) combination - All solutions using the same comparison method now share the same comparator instance, reducing memory usage - Maintain full backward compatibility - no changes to Metric class API - Remove TODO comment as the refactoring addresses the concern about mixing metric and metric value concerns

- Add epsilon to __eq__ method to ensure metrics with different epsilon values are correctly differentiated - Add epsilon to __hash__ method to maintain hash contract (must include all fields used in __eq__) - Fix redundant condition in Metric.__gt__ method This fixes critical bugs identified in code review that could cause incorrect metric comparisons.

- Reformat code to comply with black formatting standards - Fixes CI formatting check failure

greptile-apps · 2025-12-02T22:15:05Z

Greptile Overview

Greptile Summary

This PR enhances sample data generation to use LLM-based realistic values instead of simple synthetic patterns. The main functional change is in create_input_sample() which now uses Provider to generate context-aware sample data (e.g., realistic ages, emails) and validates with Pydantic, with fallback to the original pattern-based generation.

Key changes:

replaced hardcoded pattern generation (i * 10, etc.) with LLM calls that generate realistic values based on field names
added Pydantic validation to ensure generated samples match schema
preserved fallback behavior when LLM generation fails
added comprehensive documentation (CODEBASE_STRUCTURE.md, COMPLETE_MODEL_BUILDING_WALKTHROUGH.md)
fixed missing newline at EOF in metric.py

The implementation is solid with proper error handling, though the hardcoded provider string could be made configurable to align with patterns used elsewhere in the codebase.

Confidence Score: 4/5

safe to merge with minor style consideration
score reflects well-implemented feature with proper error handling and fallback logic, but includes a style suggestion about hardcoded provider configuration that would improve consistency with existing patterns
no files require special attention - all changes are well-structured

Important Files Changed

File Analysis

Filename	Score	Overview
CODEBASE_STRUCTURE.md	5/5	New documentation file - comprehensive architectural overview with no issues
COMPLETE_MODEL_BUILDING_WALKTHROUGH.md	5/5	New documentation file - detailed model building walkthrough with no issues
plexe/internal/models/entities/metric.py	5/5	Formatting fix only - missing newline at EOF added, no functional changes
plexe/tools/datasets.py	4/5	Enhanced sample generation with LLM - uses Provider with fallback, validates with Pydantic. Minor hardcoded provider concern

greptile-apps

_{4 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2025-12-02T22:15:04Z

plexe/tools/datasets.py

+        SampleModel = type("SampleModel", (BaseModel,), schema_fields)
+
+        # Use LLM to generate sensible sample values based on field names and types
+        provider = Provider("openai/gpt-4o-mini")


style: hardcoded openai/gpt-4o-mini provider prevents users from configuring model choice

other tools in this module receive llm_to_use parameter (see get_training_code_generation_tool, get_select_target_metric). consider adding provider parameter or getting from config

Suggested change

provider = Provider("openai/gpt-4o-mini")

# Use LLM to generate sensible sample values based on field names and types

# TODO: make provider configurable via parameter

provider = Provider("openai/gpt-4o-mini")

Prompt To Fix With AI

This is a comment left during a code review. Path: plexe/tools/datasets.py Line: 135:135 Comment: **style:** hardcoded `openai/gpt-4o-mini` provider prevents users from configuring model choice other tools in this module receive `llm_to_use` parameter (see `get_training_code_generation_tool`, `get_select_target_metric`). consider adding provider parameter or getting from config ```suggestion # Use LLM to generate sensible sample values based on field names and types # TODO: make provider configurable via parameter provider = Provider("openai/gpt-4o-mini") ``` How can I resolve this? If you propose a fix, please make it concise.

- Fix black formatting issues for CI compliance - Format both files to match project style standards

AumPatel1 added 4 commits November 28, 2025 18:45

style: format metric.py with black

ac52843

- Reformat code to comply with black formatting standards - Fixes CI formatting check failure

Generate Sample Data using LLM

05e63db

greptile-apps bot reviewed Dec 2, 2025

View reviewed changes

style: format metric.py and datasets.py with black

cbcee7c

- Fix black formatting issues for CI compliance - Format both files to match project style standards

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create sample dataset using llm#157

Create sample dataset using llm#157
AumPatel1 wants to merge 5 commits intoplexe-ai:mainfrom
AumPatel1:datasets

AumPatel1 commented Dec 2, 2025 •

edited

Loading

Uh oh!

greptile-apps bot commented Dec 2, 2025

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

AumPatel1 commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot commented Dec 2, 2025

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

AumPatel1 commented Dec 2, 2025 •

edited

Loading