Hente folkemengdefiler#59
Conversation
aosthus
left a comment
There was a problem hiding this comment.
All in all a job well done.
I have some comments and suggestions. Most of them are only suggestions that are already commented on here.
In addition to the individual comments, I also have some general comments about some of the files. For starters, I do now have enough knowledge about unit tests that I can really code review the tests/*.py-files in a meaningful way. You might want to hear from others on this regard.
Also, it is very good to have example-notebooks as you are providing here. I would however highly recommend to rewrite the comments in these notebooks to comply with universal design guidelines. This means not using headings unless they are actual headings, and using normal text when it is actual normal text. I would be happy to help you with this if this is confusing.
As I have some comments, I am waiting a little bit with the approving of this pull request. I hope that some of the suggestions I have provided can be answered one way or the other. Feel free to ask me for some time to look at it together. After a response on my comments, I will reevaluate my review, and if I am satisfied with the answers (one way or the other), I would approve the PR. As it stands now, there are still, as I have suggested, some questions I feel need answering before approving.
| def _round_half_up(values: pd.Series, decimals: int = 0) -> pd.Series: | ||
| """Kommersiell avrunding (0.5 -> 1, -0.5 -> -1), også for desimaler. | ||
|
|
||
| Fungerer på numpy-arrays eller pandas-Serier av tall. | ||
| """ | ||
| factor = 10**decimals | ||
| # Cast to float to avoid issues with Int64 etc. | ||
| v = pd.Series(pd.to_numeric(values, errors="coerce").astype(float)) | ||
| arr = v.to_numpy(dtype=float) | ||
| # round half away from zero | ||
| rounded = np.sign(arr) * np.floor(np.abs(arr) * factor + 0.5) / factor | ||
|
|
||
| return pd.Series(rounded, index=v.index, name=v.name) |
There was a problem hiding this comment.
This function might not work as intended. For the most part, it seems to round commercially, but there are cases where this is not rounding commercially. It does round correctly many of the times, but not always. See example below, for a fictional dataset.
This will correctly round:
df = pd.DataFrame({'col1': ['1', '2', '3', '4', '5'], 'var1': [1.5, 2.5, 3.5, -1.5, -2.5], 'var2': [0.125, 0.575, 1.005, 1.275, 2.445]})
_round_half_up(df['var1'])0 2.0
1 3.0
2 4.0
3 -2.0
4 -3.0
Name: var1, dtype: float64
This will not correctly round:
df = pd.DataFrame({'col1': ['1', '2', '3', '4', '5'], 'var1': [1.5, 2.5, 3.5, -1.5, -2.5], 'var2': [0.145, 0.575, 1.005, 1.275, 2.445]})
_round_half_up(df['var2'], 2)0 0.13 <-- Correcttly rounds, but...
1 0.57 <-- Not correctly rounded, as well as the ones below
2 1.00
3 1.27
4 2.44
Name: var2, dtype: float64
It might be unusual that some of the .5-points round one way, and the others round another way. Consider if this function is needed, or if the used can round data later on in the process using dapla-statbank-client towards the statbank, or if the used is expected to know how to round data themselves. If it is needed, consider the suggested code below as a suggestion to how to rewrite the _round_half_up()-function:
| from decimal import Decimal, ROUND_HALF_UP | |
| def _round_half_up(values: pd.Series, decimals: int = 0) -> pd.Series: | |
| q = Decimal(1).scaleb(-decimals) | |
| def r(x): | |
| if pd.isna(x): | |
| return np.nan | |
| return float(Decimal(str(x)).quantize(q, rounding=ROUND_HALF_UP)) | |
| return pd.to_numeric(values, errors="coerce").map(r) |
There was a problem hiding this comment.
Given that there is a hente_data_folkemengde()-function, is it possible to rewrite this example notebook so that it uses that function?
There was a problem hiding this comment.
Må legge inn endringer.
aosthus
left a comment
There was a problem hiding this comment.
All in all a job well done.
I have some comments and suggestions. Most of them are only suggestions that are already commented on here.
In addition to the individual comments, I also have some general comments about some of the files. For starters, I do now have enough knowledge about unit tests that I can really code review the tests/*.py-files in a meaningful way. You might want to hear from others on this regard.
Also, it is very good to have example-notebooks as you are providing here. I would however highly recommend to rewrite the comments in these notebooks to comply with universal design guidelines. This means not using headings unless they are actual headings, and using normal text when it is actual normal text. I would be happy to help you with this if this is confusing.
As I have some comments, I am waiting a little bit with the approving of this pull request. I hope that some of the suggestions I have provided can be answered one way or the other. Feel free to ask me for some time to look at it together. After a response on my comments, I will reevaluate my review, and if I am satisfied with the answers (one way or the other), I would approve the PR. As it stands now, there are still, as I have suggested, some questions I feel need answering before approving.
|



Sjekke hvordan det ser ut så langt.