# Input Preparation Before running the Exposome Geocoder, you need to prepare your input data in one of three supported formats. --- ## Overview You need to prepare **only ONE** of the following data elements per encounter: - **Option 1**: Address data (multi-column or single-column format) - **Option 2**: Coordinate data (latitude/longitude) - **Option 3**: OMOP CDM database tables --- ## Option 1: Address Data ### Format A: Multi-Column Address Prepare a CSV file with the following columns: | street | city | state | zip | year | entity_id | |------------------|--------------|-------|-------|------|-----------| | 1250 W 16th St | Jacksonville | FL | 32209 | 2019 | 1 | | 2001 SW 16th St | Gainesville | FL | 32608 | 2019 | 2 | **Required Columns:** - `street` - Street address (required for precise geocoding) - `city` - City name - `state` - State abbreviation (2 letters) - `zip` - 5-digit ZIP code (required for precise geocoding) - `year` - Year for the address - `entity_id` - Unique identifier for the entity > **Important:** Both `street` and `zip` are required. Missing these fields may lead to **imprecise geocoding**. ### Format B: Single Column Address Alternatively, combine all address components into a single column: | address | year | entity_id | |------------------------------------------|------|-----------| | 1250 W 16th St Jacksonville FL 32209 | 2019 | 1 | | 2001 SW 16th St Gainesville FL 32608 | 2019 | 2 | **Required Columns:** - `address` - Full address as a single string - `year` - Year for the address - `entity_id` - Unique identifier for the entity ### Sample Files - [Address samples](https://github.com/bihorac-LAB/EnvironmentalData/tree/main/Tools/demo/address_files/input) --- ## Option 2: Coordinate Data If you already have geocoded coordinates, prepare a CSV file with latitude and longitude: | latitude | longitude | entity_id | |------------|------------|-----------| | 30.353463 | -81.6749 | 1 | | 29.634219 | -82.3433 | 2 | **Required Columns:** - `latitude` - Latitude in decimal degrees - `longitude` - Longitude in decimal degrees - `entity_id` - Unique identifier for the entity ### Sample Files - [Coordinate samples](https://github.com/bihorac-LAB/EnvironmentalData/tree/main/Tools/demo/latlong_files/input) --- ## Option 3: OMOP CDM Data If you're working with an OMOP Common Data Model database, the geocoder can extract data directly. ### Required Tables and Columns | Table | Required Columns | |--------------------|------------------| | `person` | person_id | | `visit_occurrence` | visit_occurrence_id, visit_start_date, visit_end_date, person_id | | `location` | location_id, address_1, address_2, city, state, zip, location_source_value, country_concept_id, country_source_value, latitude, longitude | | `location_history` | location_id, relationship_type_concept_id, domain_id, entity_id, start_date, end_date | ### Sample Files - [OMOP input samples](https://github.com/bihorac-LAB/EnvironmentalData/tree/main/Tools/demo/OMOP/input) --- ## Optional Supporting Files Including these optional files helps streamline the **end-to-end workflow** between geocoding and exposome linkage: ### LOCATION.csv CDM-formatted location table with geocoded information: | location_id | address_1 | address_2 | city | state | zip | county | location_source_value | country_concept_id | country_source_value | latitude | longitude | |-------------|-----------|-----------|------|-------|-----|--------|-----------------------|--------------------|----------------------|----------|-----------| | 1 | 1248 N Blackstone Ave | | FRESNO | CA | 93703 | | UNITED STATES OF AMERICA | | UNITED STATES OF AMERICA | 36.75891146 | -119.7902719 | ### LOCATION_HISTORY.csv CDM-formatted location history table: | location_id | relationship_type_concept_id | domain_id | entity_id | start_date | end_date | |-------------|------------------------------|-----------|-----------|------------|----------| | 1 | 32848 | 1147314 | 3763 | 1998-01-01 | 2020-01-01 | > **Important:** Do **not** date-shift your LOCATION/LOCATION_HISTORY files before linkage. Date shifting (if used) should occur post-linkage in the GIS Linkage step. If `LOCATION.csv` and `LOCATION_HISTORY.csv` are provided during geocoding: - Output automatically includes updated latitude/longitude information - Ready for immediate use with the postgis linkage container If **not provided**: - You must manually update LOCATION files with geocoded lat/lon before linkage --- ## Folder Structure Organize your input files in a dedicated folder: ``` your_project/ ├── input_address/ # For address-based data │ ├── patients_address.csv │ ├── LOCATION.csv # Optional │ └── LOCATION_HISTORY.csv # Optional │ ├── input_coordinates/ # For coordinate-based data │ ├── coordinates.csv │ ├── LOCATION.csv # Optional │ └── LOCATION_HISTORY.csv # Optional ``` > ⚠️ **File Format:** Only `.csv` files are supported. Convert `.xlsx` or other formats to CSV before running the tool. --- ## Next Steps Once your input data is prepared: 1. **For CSV inputs (Option 1 & 2)**: Proceed to [Geocoding Setup](https://github.com/bihorac-LAB/EnvironmentalData/wiki/Geocoding-Setup) 2. **For OMOP inputs (Option 3)**: Review [Running the Geocoder](https://github.com/bihorac-LAB/EnvironmentalData/wiki/Running-the-Geocoder) [Return to Home](https://github.com/bihorac-LAB/EnvironmentalData/wiki/Home)