FPDS Contract Extractor

A Python tool for extracting structured contract data from FPDS (Federal Procurement Data System) PDF reports into CSV format.

Overview

This script processes PDF files containing federal contract data from FPDS and extracts individual contract records with all 26 fields into a structured CSV file. Each row in the output represents one complete contract with standardized column headers.

Features

Comprehensive extraction: Extracts all 26 contract fields per record
Clean data output: Removes headers, newlines, and formatting artifacts
Robust processing: Handles errors gracefully and continues processing
Progress tracking: Shows real-time progress during extraction
Flexible input: Process entire PDFs or specify page limits for testing
Command-line interface: Easy to use from terminal or scripts

Requirements

Python 3.7+
Required packages:
- pdfplumber - PDF parsing and table extraction
- pandas - Data manipulation and CSV output

Installation

Clone or download this repository
Install required packages:
```
pip install pdfplumber pandas
```

Usage

Basic Usage

python contract_extract.py input.pdf output.csv

Advanced Usage

# Process only first 50 pages (for testing)
python contract_extract.py contracts.pdf sample.csv --max-pages 50

# Get help
python contract_extract.py --help

Examples

# Extract all contracts from FPDS PDF
python contract_extract.py ICE_AllContracts_250121_50k.pdf all_contracts.csv

# Test with first 10 pages
python contract_extract.py ICE_AllContracts_250121_50k.pdf test.csv --max-pages 10

Output Format

The script outputs a CSV file with 26 columns representing all contract fields:

Column	Description	Example
Contract ID	Unique contract identifier	70CDCR22P00000024
Reference IDV	Reference indefinite delivery vehicle	70CDCR22D00000002
Modification Number	Contract modification number	P00011
Transaction Number	Transaction sequence number	0
Award/IDV Type	Type of award/contract	PO Purchase Order
Action Obligation ($)	Contract value	$258,000.00
Date Signed	Contract signature date	Sep 16, 2025
Solicitation Date	Date of solicitation	Jul 6, 2022
Contracting Agency ID	Agency identifier	7012
Contracting Agency	Full agency name	U.S. IMMIGRATION AND CUSTOMS ENFORCEMENT
Contracting Office Name	Contracting office	DETENTION COMPLIANCE AND REMOVALS
PSC Type	Product/Service Code type	S
PSC	Product/Service Code	X1FB
PSC Description	Service description	LEASE/RENTAL OF RECREATIONAL BUILDINGS
NAICS	Industry classification code	713990
NAICS Description	Industry description	ALL OTHER AMUSEMENT AND RECREATION INDUSTRIES
Entity City	Contractor city	CONROE
Entity State	Contractor state	TX
Entity ZIP Code	Contractor ZIP code	773024850
Additional Reporting Code	Special reporting codes	E, S
Additional Reporting Description	Reporting description	EMPLOYMENT ELIGIBILITY VERIFICATION
Unique Entity ID	Contractor unique ID	XR3HKXN6M1B3
Ultimate Parent Unique Entity ID	Parent company ID	WGN2KJJD27Q3
Ultimate Parent Legal Business Name	Parent company name	AKIMA INFRASTRUCTURE PROTECTION LLC
Legal Business Name	Contractor legal name	GO & ZALEZ INC.
CAGE Code	Commercial and Government Entity code	6S0S5

Performance

Processing speed: ~2-3 pages per second
Memory usage: Moderate (processes one page at a time)
Error handling: Continues processing if individual pages fail
Large files: Tested with 600+ page PDFs

Sample Output

From a 10-page test extraction:

23 contracts extracted
26 columns with complete data
Contract values: $211K - $14.6M
Major contractors: CoreCivic, GEO Group, G4S, Akima Infrastructure

Troubleshooting

Common Issues

Missing dependencies
```
pip install pdfplumber pandas
```
PDF file not found
- Check file path and spelling
- Use absolute paths if needed
Memory issues with large PDFs
- Use --max-pages to process in chunks
- Process smaller sections and combine results
No contracts extracted
- Verify PDF contains FPDS contract data
- Check if PDF format matches expected structure

Error Messages

❌ Error: Input PDF file not found - Check file path
❌ Error: Input file must be a PDF - Ensure file has .pdf extension
Missing library - Install required packages
KeyError during extraction - PDF format may not match expected structure

Technical Details

PDF Structure Expected

4-column table format with field:value pairs
Standard FPDS header: "www.fpds.gov List of contracts matching your search criteria"
26 fields per contract from "Contract ID:" to "CAGE Code:"

Processing Steps

Open PDF and identify total pages
For each page:
- Extract tables using pdfplumber
- Remove FPDS headers
- Parse 4-column format into field:value pairs
- Clean data (remove newlines, extra whitespace)
- Group fields into complete contracts
Combine all contracts into single DataFrame
Export to CSV with standardized column names

Contributing

To contribute improvements:

Test with different FPDS PDF formats
Add error handling for edge cases
Optimize performance for very large files
Add additional output formats (JSON, Excel)

License

This tool is provided as-is for processing federal contract data from public FPDS reports.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
EXAMPLES.md		EXAMPLES.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
contract_extract.py		contract_extract.py
requirements.txt		requirements.txt
sample_10_pages.csv		sample_10_pages.csv
sample_output_structure.csv		sample_output_structure.csv
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FPDS Contract Extractor

Overview

Features

Requirements

Installation

Usage

Basic Usage

Advanced Usage

Examples

Output Format

Performance

Sample Output

Troubleshooting

Common Issues

Error Messages

Technical Details

PDF Structure Expected

Processing Steps

Contributing

License

About

Uh oh!

Releases

Packages

Languages

corintxt/fpds-extract

Folders and files

Latest commit

History

Repository files navigation

FPDS Contract Extractor

Overview

Features

Requirements

Installation

Usage

Basic Usage

Advanced Usage

Examples

Output Format

Performance

Sample Output

Troubleshooting

Common Issues

Error Messages

Technical Details

PDF Structure Expected

Processing Steps

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages