The purpose of this issue is to translate the existing scripts from the dc_doh_hackathon to enable them to be run repeatedly on the restaurant inspection dataset as it is updated.
From the issue_3 folder:
Take the building_features_with_census.py script and modify it to be run from the command line taking
three arguments:
- A folder with input restaurant inspection data files (the script should concatenate and merge the files in the directory as appropriate)
- The shapefile for census blocks
- The output filename
Please also provide a README.md that describes the script and how to run it.
You can model the solution after the files here or
here
Place all of your files in the scripts/feature_engineering/extract_building_permit_features/ folder
For reference, here is the original issue description from the dc_doh_hackathon:
issue_3
Start with the Building Permit data in the /Data Sets/Building Permits/ folder in Dropbox.
Write a script that uses this data to produce a feature data table for the number of new building permits issued in the last 4 weeks.
You can find the data format and examples on the Feature Dataset Format tab in this document
Input:
CSV files with data for each given year
Output:
A CSV file with
- 1 row for each building permit type and subtype, and each week, year, and census block
- The dataset should include the following columns:
feature_id: The ID for the feature, in this case, "building_permits_issued_last_4_weeks"
feature_type: Building permit type
feature_subtype: Building permit subtype
year: The ISO-8601 year of the feature value
week: The ISO-8601 week number of the feature value
census_block_2010: The 2010 Census Block of the feature value
value: The value of the feature, i.e. the number of new building permits of the specified types and subtypes issued in the given census block during the previous 4 weeks starting from the year and week above.
The purpose of this issue is to translate the existing scripts from the dc_doh_hackathon to enable them to be run repeatedly on the restaurant inspection dataset as it is updated.
From the issue_3 folder:
Take the
building_features_with_census.pyscript and modify it to be run from the command line takingthree arguments:
Please also provide a
README.mdthat describes the script and how to run it.You can model the solution after the files here or
here
Place all of your files in the
scripts/feature_engineering/extract_building_permit_features/folderFor reference, here is the original issue description from the dc_doh_hackathon:
issue_3
Start with the Building Permit data in the
/Data Sets/Building Permits/folder in Dropbox.Write a script that uses this data to produce a feature data table for the number of new building permits issued in the last 4 weeks.
You can find the data format and examples on the
Feature Dataset Formattab in this documentInput:
CSV files with data for each given year
Output:
A CSV file with
feature_id: The ID for the feature, in this case,"building_permits_issued_last_4_weeks"feature_type: Building permit typefeature_subtype: Building permit subtypeyear: The ISO-8601 year of the feature valueweek: The ISO-8601 week number of the feature valuecensus_block_2010: The 2010 Census Block of the feature valuevalue: The value of the feature, i.e. the number of new building permits of the specified types and subtypes issued in the given census block during the previous 4 weeks starting from the year and week above.