Skip to content

fix: declareColumns now correctly sanitizes Haskell Identifiers passed to it#43

Merged
mchav merged 7 commits intomainfrom
unknown repository
Aug 7, 2025
Merged

fix: declareColumns now correctly sanitizes Haskell Identifiers passed to it#43
mchav merged 7 commits intomainfrom
unknown repository

Conversation

@ghost
Copy link
Copy Markdown

@ghost ghost commented Jul 31, 2025

  • We use section 2.4 of the Haskell 2010 report to define the conditions under which a string is a valid Haskell Identifier.
  • If it is, we sanitize it by filtering everything except alphanumeric characters and wrapping it in _s.

Comment thread src/DataFrame/Functions.hs Outdated
specs = zip names types
specs = zipWith (\name type_ -> (sanitize name, type_)) names types
sanitize t = if isValidIdentifier t
then "_" <> T.filter Char.isAlphaNum t <> "_"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe let's move this branching logic to a separate function and write some test cases for it.

Some column names I'm curious about:

  • "Data": your code handles this well
  • "My Data": this will become "_MyData_". I assume isAlphaNum filters spaces, right? My intuition says it should be "my_data"
  • "Distance (km/h)": this will become "_Distancekmh_". It should be "distance_km_h".
  • "0 Age": this will become "_0Age_". It should probably be "_0_age_"
  • "***": should fail and I think it fails fine in this case.

Also the condition looks backwards. It should be if valid then t else filter t.

Copy link
Copy Markdown
Author

@ghost ghost Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so I've modified my logic to better fit what you described here. I filter out parentheses but leave valid strings alone. With the current logic I have:

Data            -> _data_
My Data         -> my_data
Distance (km/h) -> distance_km_h
0 Age           -> _0_age_
camelCaseStr    -> camelCaseStr
camelCase$Str   -> camelcase_str -- an invalid character will flatten the entire identifier
snake_case_str  -> snake_case_str
12_snake_case   -> _12_snake_case_
***             -> _____  -- the stars are turned into underscores and then underscores on either side

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the condition was actually correct, I'd just named it very poorly. Let me know if the current logic and naming looks wonky too.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Thanks,

@adityakaldate21-dev
Copy link
Copy Markdown

Subject: Request to Be Assigned Task – Fix for declareColumns Sanitization

Hi Team,

I’d like to take ownership of the following task:

Fix: Ensure that declareColumns correctly sanitizes Haskell identifiers passed to it.

If this task is still unassigned, please assign it to me. I’m ready to start working on the fix.

Thanks,
Aditya Kaldate

@adityakaldate21-dev
Copy link
Copy Markdown

import qualified Data.Vector as VB
import Language.Haskell.TH
import qualified Language.Haskell.TH.Syntax as TH
import qualified Data.Char as Char
import qualified Data.Text as T
import qualified Data.List as L
import qualified Data.Map.Strict as M
import Data.Function (on)

-- Your existing functions: isReservedId, isVarId, isValidIdentifier...

-- Fix for declareColumns
declareColumns :: DataFrame -> DecsQ
declareColumns df = let
names = (map fst . L.sortBy (compare on snd) . M.toList . columnIndices) df
types = map (columnTypeString . (unsafeGetColumn df)) names
sanitize t =
if isValidIdentifier t
then "" <> T.filter Char.isAlphaNum t <> ""
else t
specs = zipWith (\name type_ -> (sanitize name, type_)) names types
in mapM ((name, typeStr) -> do
typ <- typeFromString [typeStr]
let varName = mkName (T.unpack name)
sigD varName (return typ)
) specs

  1. Line removed: specs = zip names types – unnecessary and shadowed.

  2. Fixed: Used Char.isLower instead of non-existent Char.isLowerCase.

  3. Added import: import qualified Data.Text as T

  4. Ensured sanitize returns only valid Haskell identifiers using T.filter Char.isAlphaNum.

@ghost ghost requested a review from mchav August 6, 2025 12:19
isLowerCase fails on GHC 9.4.8
@mchav mchav merged commit 0e5ba13 into DataHaskell:main Aug 7, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants