Skip to content

Numerical column considered as regional? #459

@miaoli-04

Description

@miaoli-04

Environment Details

Please indicate the following details about the environment in which you found the bug:

  • CTGAN version: 1.25.0 (sdv)
  • Python version: 3.12.8
  • Operating System:

Error Description

When trying the fit the "automobile" dataset from UCIML, the 'city-mpg' column, which is continuous, seems to be interpreted as a location and a column of strings is generated in the synthetic data. This might have to do with the column name, as if I rename the column as 'mpg', column of correct datatype will be returned.
Image

Steps to reproduce

import pandas as pd
from sdv.single_table import CTGANSynthesizer
from sdv.metadata import Metadata

from ucimlrepo import fetch_ucirepo 
  
# fetch dataset 
automobile = fetch_ucirepo(id=10) 
  
# data (as pandas dataframes) 
X = automobile.data.features 


metadata = Metadata.detect_from_dataframe(X)

synthesizer = CTGANSynthesizer(metadata)
synthesizer.fit(X)
synthetic_data = synthesizer.sample(num_rows=1000)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingunder discussionIssue is currently being discussed

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions