pqls

A command-line tool for listing the contents and metadata of Apache Parquet files and partitioned parquet datasets, modelled on HDF5's h5ls.

Install

curl -fsSL https://github.com/dunnock/pqls/releases/latest/download/install.sh | sh

Or install with cargo:

cargo install pqls

Examples

Inspect a single file:

pqls data.parquet

Detailed stats (per-column min/max/nulls):

pqls -d data.parquet

Dump as CSV:

pqls --csv data.parquet
pqls --csv --head 100 data.parquet

List a partitioned dataset:

pqls /path/to/dataset/
pqls -d -r /path/to/dataset/

Machine-readable output:

pqls -q data.parquet

CLI

pqls [OPTIONS] <PATH> [PATH_B]

ARGS:
  <PATH>            path to a .parquet file or directory to inspect
  [PATH_B]          second .parquet file for schema diff (required by --diff)

OPTIONS:
      --diff                compare schemas of two files; exits 0 if identical, 1 if different
  -d, --detail              show per-row-group column statistics (min/max/nulls)
  -r, --recursive           recurse into a directory and list all .parquet files
      --csv                 dump rows as CSV to stdout
      --head <N>            limit output to the first N rows (applies to --csv and --ndjson)
  -q, --quiet               suppress human-readable headers; emit tab-separated summary lines
      --schema              print schema only (column names and types)
      --json                emit output as JSON (works with --schema, --kv-meta, --check, --partition-stats, --diff)
      --ndjson              stream rows as newline-delimited JSON (NDJSON)
      --sample <N>          emit N randomly-sampled rows; requires --ndjson or --csv
      --columns <COLS>      comma-separated list of column names to project (e.g. id,ts,value)
      --kv-meta             print Parquet key-value metadata (writer version, custom properties)
      --scan-stats          scan the full file to compute per-column min/max/nulls/n_distinct; requires -d
      --partition-stats     aggregate row counts and file sizes across a Hive-partitioned directory; requires -r
      --check               verify file integrity by reading the footer and all row groups
      --deep                with --check: read every data page (slower but catches corrupt column data)
  -h, --help                print help
  -V, --version             print version

Why pqls?

Static binary. No JVM, no Python interpreter, no pip install. Drop the binary on any Linux box and it runs — sub-100ms startup on the critical path of a data pipeline.

Composable. Stdout is always clean (data only; warnings go to stderr). Pipe anywhere:

pqls --csv file.parquet | xsv stats
pqls --schema file.parquet | diff - expected.schema

Agent-friendly. Machine-readable --schema --json and --ndjson output let code agents inspect schema and rows without parsing human text. See SKILL.md for patterns.

One-liner install:

curl -fsSL https://github.com/dunnock/pqls/releases/latest/download/install.sh | sh

Fast:

Tool	Runtime	Startup	Schema dump	Stats	Pipe-composable
pqls	none (static)	~50ms	`--schema --json`	`--scan-stats`	yes
parquet-tools	JVM	~2s	text only	yes	no
DuckDB	Go binary	~200ms	SQL only	SQL	no
fastparquet	Python	~500ms	Python API	Python API	no

How pqls compares

	pqls	parquet-cli (Apache)	pqrs	DuckDB
Static binary, no JVM/Python	yes	no (JAR)	yes	yes
`--schema --json` for agents	yes	no (text only)	no	via SQL
NDJSON rows (`--ndjson`)	yes	no	cat -f json	via SQL
Column projection (`--columns`)	yes	yes	no	via SQL
Random sampling (`--sample N`)	yes	no	yes	ORDER BY random()
Key-value metadata (`--kv-meta`)	yes	footer cmd	no	parquet_kv_metadata()
Directory / partition listing	yes	no	no	no
SKILL.md for code agents	yes	no	no	no
Composable (stdin/stdout clean)	yes	no	partial	no

pqls is the only static binary in this list that produces JSON schema output and NDJSON rows without requiring SQL. It is designed for shell pipelines and agent tooling where DuckDB's startup time or SQL syntax is overhead.

Agent usage

pqls is designed to be called by code agents (Claude, Codex, Cursor, etc.) without any human at the terminal.

Discover schema

pqls --schema --json /path/to/foo.parquet

Returns a JSON object — safe to parse with jq or Python json.loads. Field logical_type tells you DATE, TIMESTAMP_MICROS, DECIMAL(10,2), etc.

Sample rows to understand data

pqls --ndjson --sample 50 foo.parquet

50 rows, one JSON object per line. Pipe to jq for field inspection.

Project specific columns

pqls --ndjson --columns user_id,amount --sample 20 foo.parquet

Check embedded metadata (Spark / Pandas schema)

pqls --kv-meta --json foo.parquet | jq '.["pandas"]'

Composable pipeline example

# Find which files in a partitioned dataset have more than 1M rows
pqls -q --recursive /data/events/ \
  | awk -F'\t' '$2 > 1000000 { print $1 }'

Exit code contract

Scripts should test $?:

0 — success, output on stdout
1 — file/path error or schema mismatch (with --diff)
2 — corrupt or invalid parquet, or bad flag combination

License

Licensed under either of MIT or Apache-2.0 at your option.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github/workflows		.github/workflows
examples		examples
projects/pqls		projects/pqls
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
SKILL.md		SKILL.md
install.sh		install.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pqls

Install

Or install with cargo:

Examples

CLI

Why pqls?

How pqls compares

Agent usage

Discover schema

Sample rows to understand data

Project specific columns

Check embedded metadata (Spark / Pandas schema)

Composable pipeline example

Exit code contract

License

About

Licenses found

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pqls

Install

Or install with cargo:

Examples

CLI

Why pqls?

How pqls compares

Agent usage

Discover schema

Sample rows to understand data

Project specific columns

Check embedded metadata (Spark / Pandas schema)

Composable pipeline example

Exit code contract

License

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages