Skip to content

Fix - Frequencies should be Double#44

Merged
mchav merged 2 commits intoDataHaskell:mainfrom
kayvank:42/Frequencies-show-doubles
Aug 8, 2025
Merged

Fix - Frequencies should be Double#44
mchav merged 2 commits intoDataHaskell:mainfrom
kayvank:42/Frequencies-show-doubles

Conversation

@kayvank
Copy link
Copy Markdown
Contributor

@kayvank kayvank commented Jul 31, 2025

Frequencies should show doubles, not Int, per #42

Comment thread src/DataFrame/Operations/Statistics.hs Outdated
Nothing -> (T.pack . show) c'
initDf = empty & insertVector "Statistic" (V.fromList ["Count" :: T.Text, "Percentage (%)"])
in L.foldl' (\df (col, k) -> insertVector (vText col) (V.fromList [k, k * 100 `div` total]) df) initDf counts
in L.foldl' (\df (col, k) -> insertVector (vText col) (V.fromList [k, k * (100 :: Double) / total]) df) initDf counts
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still stuck on this issue, and would very much like to learn how to fix it.
I thought maybe doing this would help:

    in L.foldl' (\df (col, k) -> insertVector (vText col) (V.fromList [toRowValue k, toRowValue $ k * 100.0  / total]) df) initDf counts

it did not and resulted compile error:

Could not deduce ‘Read DataFrame.Internal.Row.RowValue’
        arising from a use of ‘insertVector’
      from the context: Columnable a
        bound by a pattern with constructor:
                   BoxedColumn :: forall a. Columnable a => V.Vector a -> Column,
                 in a case alternative

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I don't quite like my approach here but for something to be Columnable it has to have a Read, Ord, Eq, and Show instance.

ghci> data Test = T Int deriving (Show, Eq, Read, Ord)
ghci> D.fromList [T 1, T 2]
[T 1,T 2]

RowValue doesn't have a Read instance and you can't do deriving similar to what we did in the CLI cause it's a GADT.

So to unblock you can just make a stub Read instance for RowValue which throws UNIMPLEMENTED. I don't imagine someone will try to read row values in the near future. I don't even know where they are used.

Sorry this was so painful. It ended up suffering from silly design decisions.

Copy link
Copy Markdown

@ghost ghost Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do a quick and dirty Read instance implementation:

instance Read RowValue where
  readsPrec prec ('V': 'a': 'l': 'u': 'e': ' ' : a) = readsPrec prec a -- or perhaps Value $ readsPrec prec a ?

At least, that's the legacy way of doing it. The newer better way is probably to do readPrec (instead of readsPrec) and use lexP or something

@kayvank kayvank marked this pull request as ready for review August 8, 2025 15:00
Comment thread src/DataFrame/Operations/Statistics.hs Outdated
Nothing -> throw $ ColumnNotFoundException name "frequencies" (map fst $ M.toList $ columnIndices df)
Just ((BoxedColumn (column :: V.Vector a))) -> let
counts = valueCounts @a name df
counts :: [(a, Double)]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we try leaving this as an int instead?

Comment thread src/DataFrame/Operations/Statistics.hs Outdated
counts = valueCounts @a name df
counts :: [(a, Double)]
counts = second fromIntegral <$> valueCounts @a name df
total :: Double
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

total can also be an int

Comment thread src/DataFrame/Operations/Statistics.hs Outdated
Nothing -> (T.pack . show) c'
initDf = empty & insertVector "Statistic" (V.fromList ["Count" :: T.Text, "Percentage (%)"])
in L.foldl' (\df (col, k) -> insertVector (vText col) (V.fromList [k, k * 100 `div` total]) df) initDf counts
in L.foldl' (\df (col, k) -> insertVector (vText col) (V.fromList [toRowValue k, toRowValue $ k * 100.0 / total]) df) initDf counts
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For percentage we can do the fromIntegral just before division rather than having the counts be doubles.

We can also format the string to round it for display:

import Text.Printf (printf)

printf "%.2f\n" (3.14159 :: Double)

Copy link
Copy Markdown
Contributor Author

@kayvank kayvank Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The printf will result percentage to print as such:

 Percentage (%) | "44.26"   | "31.74"  | "0.02"   | "11.09"  | "12.88"

I could read them back as double priort to converting to RowValue which will result in this:

1     | Percentage (%) | 44.26     | 31.74    | 2.0e-2   | 11.09    | 12.88

I'll lookinto rounding the Doubles as an alterntive

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the round function that you already had there and here is the result:

--------------------------------------------------------------------------------
index |   Statistic    | <1H OCEAN |  INLAND  |  ISLAND  | NEAR BAY | NEAR OCEAN
------|----------------|-----------|----------|----------|----------|-----------
 Int  |      Text      | RowValue  | RowValue | RowValue | RowValue |  RowValue
------|----------------|-----------|----------|----------|----------|-----------
0     | Count          | 9136      | 6551     | 5        | 2290     | 2658
1     | Percentage (%) | 44.26     | 31.74    | 2.0e-2   | 11.09    | 12.8

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! Thank you.

Frequencies should show doubles not Int, per Issue DataHaskell#42
Add `Read` instance to RowValue, to meet the `Columnable` constraint
@kayvank kayvank force-pushed the 42/Frequencies-show-doubles branch from a423fe8 to d08d067 Compare August 8, 2025 15:24
@kayvank kayvank requested a review from mchav August 8, 2025 17:49
Round the decimals for the frequency function
@kayvank kayvank force-pushed the 42/Frequencies-show-doubles branch from 2044c41 to 0667da0 Compare August 8, 2025 18:19
@mchav mchav merged commit a1807f7 into DataHaskell:main Aug 8, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants