fix: `declareColumns` now correctly sanitizes Haskell Identifiers passed to it · Pull Request #43 · DataHaskell/dataframe

ghost · 2025-07-31T17:21:51Z

We use section 2.4 of the Haskell 2010 report to define the conditions under which a string is a valid Haskell Identifier.
If it is, we sanitize it by filtering everything except alphanumeric characters and wrapping it in _s.

…oid an error in TH

mchav · 2025-07-31T18:07:14Z

-        specs = zip names types
+        specs = zipWith (\name type_ -> (sanitize name, type_)) names types
+        sanitize t = if isValidIdentifier t
+                     then "_" <> T.filter Char.isAlphaNum t <> "_"


maybe let's move this branching logic to a separate function and write some test cases for it.

Some column names I'm curious about:

"Data": your code handles this well

"My Data": this will become "_MyData_". I assume isAlphaNum filters spaces, right? My intuition says it should be "my_data"

"Distance (km/h)": this will become "_Distancekmh_". It should be "distance_km_h".

"0 Age": this will become "_0Age_". It should probably be "_0_age_"

"***": should fail and I think it fails fine in this case.

Also the condition looks backwards. It should be if valid then t else filter t.

Ok so I've modified my logic to better fit what you described here. I filter out parentheses but leave valid strings alone. With the current logic I have:

Data -> _data_ My Data -> my_data Distance (km/h) -> distance_km_h 0 Age -> _0_age_ camelCaseStr -> camelCaseStr camelCase$Str -> camelcase_str -- an invalid character will flatten the entire identifier snake_case_str -> snake_case_str 12_snake_case -> _12_snake_case_ *** -> _____ -- the stars are turned into underscores and then underscores on either side

Also the condition was actually correct, I'd just named it very poorly. Let me know if the current logic and naming looks wonky too.

This looks great! Thanks,

adityakaldate21-dev · 2025-08-05T13:02:14Z

Subject: Request to Be Assigned Task – Fix for declareColumns Sanitization

Hi Team,

I’d like to take ownership of the following task:

Fix: Ensure that declareColumns correctly sanitizes Haskell identifiers passed to it.

If this task is still unassigned, please assign it to me. I’m ready to start working on the fix.

Thanks,
Aditya Kaldate

adityakaldate21-dev · 2025-08-05T13:05:05Z

import qualified Data.Vector as VB
import Language.Haskell.TH
import qualified Language.Haskell.TH.Syntax as TH
import qualified Data.Char as Char
import qualified Data.Text as T
import qualified Data.List as L
import qualified Data.Map.Strict as M
import Data.Function (on)

-- Your existing functions: isReservedId, isVarId, isValidIdentifier...

-- Fix for declareColumns
declareColumns :: DataFrame -> DecsQ
declareColumns df = let
names = (map fst . L.sortBy (compare on snd) . M.toList . columnIndices) df
types = map (columnTypeString . (unsafeGetColumn df)) names
sanitize t =
if isValidIdentifier t
then "" <> T.filter Char.isAlphaNum t <> ""
else t
specs = zipWith (\name type_ -> (sanitize name, type_)) names types
in mapM ((name, typeStr) -> do
typ <- typeFromString [typeStr]
let varName = mkName (T.unpack name)
sigD varName (return typ)
) specs

Line removed: specs = zip names types – unnecessary and shadowed.
Fixed: Used Char.isLower instead of non-existent Char.isLowerCase.
Added import: import qualified Data.Text as T
Ensured sanitize returns only valid Haskell identifiers using T.filter Char.isAlphaNum.

…Identifier` to `isHaskellIdentifier`

isLowerCase fails on GHC 9.4.8

sharmrj added 2 commits July 31, 2025 22:43

fix: declareColumns now correctly sanitizes Haskell Identifiers to av…

7a4b09e

…oid an error in TH

fix: The sanitize function ought to allow numeric characters too

010eda5

ghost mentioned this pull request Jul 31, 2025

declareColumns fails if column is an invalid Haskell identifier #41

Closed

Fixed up the isValidIdentifier logic slightly

1bede79

mchav requested changes Aug 5, 2025

View reviewed changes

sharmrj added 3 commits August 6, 2025 16:51

refactored code into a standalone sanitize function; renamed `isValid…

b76f76a

…Identifier` to `isHaskellIdentifier`

sanitize now lets valid identifiers through without changes

3fe93d8

Added unit tests for sanitize

3263318

ghost requested a review from mchav August 6, 2025 12:19

mchav approved these changes Aug 6, 2025

View reviewed changes

chore: Use legacy isLower

5dd8000

isLowerCase fails on GHC 9.4.8

mchav merged commit 0e5ba13 into DataHaskell:main Aug 7, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: `declareColumns` now correctly sanitizes Haskell Identifiers passed to it#43

fix: `declareColumns` now correctly sanitizes Haskell Identifiers passed to it#43
mchav merged 7 commits intomainfrom
unknown repository

ghost commented Jul 31, 2025

Uh oh!

mchav Jul 31, 2025

Uh oh!

ghost Aug 6, 2025 •

edited by ghost

Loading

Uh oh!

ghost Aug 6, 2025

Uh oh!

mchav Aug 6, 2025

Uh oh!

adityakaldate21-dev commented Aug 5, 2025

Uh oh!

adityakaldate21-dev commented Aug 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ghost commented Jul 31, 2025

Uh oh!

mchav Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

ghost Aug 6, 2025 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ghost Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

mchav Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

adityakaldate21-dev commented Aug 5, 2025

Uh oh!

adityakaldate21-dev commented Aug 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ghost Aug 6, 2025 •

edited by ghost

Loading