Skip to content

Extract Features from Building Permit Data #5

@jasonasher

Description

@jasonasher

The purpose of this issue is to translate the existing scripts from the dc_doh_hackathon to enable them to be run repeatedly on the restaurant inspection dataset as it is updated.

From the issue_3 folder:
Take the building_features_with_census.py script and modify it to be run from the command line taking
three arguments:

  1. A folder with input restaurant inspection data files (the script should concatenate and merge the files in the directory as appropriate)
  2. The shapefile for census blocks
  3. The output filename

Please also provide a README.md that describes the script and how to run it.

You can model the solution after the files here or
here

Place all of your files in the scripts/feature_engineering/extract_building_permit_features/ folder

For reference, here is the original issue description from the dc_doh_hackathon:

issue_3

Start with the Building Permit data in the /Data Sets/Building Permits/ folder in Dropbox.
Write a script that uses this data to produce a feature data table for the number of new building permits issued in the last 4 weeks.

You can find the data format and examples on the Feature Dataset Format tab in this document

Input:
CSV files with data for each given year

Output:
A CSV file with

  • 1 row for each building permit type and subtype, and each week, year, and census block
  • The dataset should include the following columns:

feature_id: The ID for the feature, in this case, "building_permits_issued_last_4_weeks"
feature_type: Building permit type
feature_subtype: Building permit subtype
year: The ISO-8601 year of the feature value
week: The ISO-8601 week number of the feature value
census_block_2010: The 2010 Census Block of the feature value
value: The value of the feature, i.e. the number of new building permits of the specified types and subtypes issued in the given census block during the previous 4 weeks starting from the year and week above.

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions