import(csv,parquet): replace DuckDB-on-temp-file introspection with in-process row streaming via ArrowBuffer

## Summary

The CSV and Parquet import handlers (`internal/api/import.go#handleCSVImport`, `handleParquetImport`) currently introspect the uploaded temp file by issuing DuckDB queries against `read_csv(...)` / `read_parquet(...)`. This is architecturally inconsistent with the LP and TLE import handlers (which parse rows in-process and push through `ArrowBuffer`), and it's fragile under the new DuckDB sandbox model from #442.

The immediate symptom — surfaced while building [arcctl PR #3](https://github.com/Basekick-Labs/arcctl/pull/3) — is that **CSV imports fail with a DuckDB parser error** before any data lands:

```
arc: failed to read file: query failed: Parser Error: syntax error at or near "("

LINE 1: SELECT column_name FROM (DESCRIBE read_csv('/Users/nacho/dev/basekick-labs/arc/.tmp/arc-uploads/arc-import-XXXX/import.csv', auto_detect=true, header=true))
                                                  ^
```

Came back as HTTP 422 from `POST /api/v1/import/csv` against a tiny 91-byte CSV with valid headers + 3 rows + every flag at defaults. Same failure against a 0-byte CSV (errors during introspection, before the empty-file check fires).

Same handler also fails for Parquet (untested in PR #3 but uses the identical `getColumns` → `read_parquet(...)` path).

## Architecture problem (the bigger fix)

Per @xe-nvdk's clarification on PR #3:

> *"You don't query the CSV, you query the data ingested through `/api/v1/query`. There is no CSV in arc when the data is ingested."*

That's the canonical contract: arc serves queries over **its own Parquet partitions**, not over arbitrary uploaded files. The current CSV/Parquet import handlers violate this by routing the file through DuckDB's `read_csv` / `read_parquet` directly:

```go
// internal/api/import.go (current)

readExpr := h.buildReadExpression(filePath, opts)  // → read_csv('/tmp/.../import.csv', ...)

// 1. schema introspection — direct DuckDB query against temp file
columns, err := h.getColumns(readExpr)              // SELECT column_name FROM (DESCRIBE read_csv(...))

// 3. stats — another direct DuckDB query against temp file
statsQuery := fmt.Sprintf("SELECT COUNT(*), MIN(%s)::VARCHAR, MAX(%s)::VARCHAR FROM %s", timeCast, timeCast, readExpr)
```

This works *only* because the upload directory is in the sandbox `allowed_directories` list (added in #442). It's still:

- **Inconsistent with LP/TLE** — both use `arrowBuffer.WriteColumnarRecord(...)` and never issue DuckDB queries against the temp file. LP parses in-process via `ingest.NewLineProtocolParser`; TLE has its own parser.
- **Fragile against DuckDB version drift** — the `SELECT column_name FROM (DESCRIBE ...)` SQL is rejected by the DuckDB version arc ships today (`duckdb_version=v1.5.1` per the smoke). `DESCRIBE` can't always be used as a derived table.
- **A new sandbox hole every time a temp dir moves** — every directory the import handler writes to has to be allowlisted explicitly. #442 kept the list intentionally minimal; this pattern keeps adding to it.

## Proposed fix

Mirror the LP/TLE handler pattern in CSV and Parquet:

1. **Parse rows in-process** (not via DuckDB).
   - CSV: use `encoding/csv` or an Arrow-aware reader. The header row provides column names directly — no `DESCRIBE` needed.
   - Parquet: use `github.com/apache/arrow-go/v18/parquet` (or whatever arc already uses for compaction). Schema is in the file metadata; no `DESCRIBE` needed.

2. **Push rows through `arrowBuffer.WriteColumnarRecord(...)`** — same code path as LP. This means:
   - Time-column validation moves to the row parser (fail-fast per row, not a separate up-front query).
   - Time-format casting happens in the parser, not via a DuckDB SQL expression.
   - Partition layout, hot/cold tier handling, replication, compaction triggers — all automatic via the ArrowBuffer pipeline.

3. **Post-ingest stats come from the query layer over the ingested partitions**, if the response needs them at all. The current `ImportResult` includes `time_range_min` / `time_range_max` and `partitions_created`; those can be computed by the ArrowBuffer (which already knows what it wrote) without any after-the-fact `SELECT` against `read_csv`.

4. **Drop the upload directory from the sandbox allowlist** once nothing in the import path issues DuckDB queries against `/tmp/arc-uploads/...`. Pure-Go parsing means the temp file is read by `os.Open` + `csv.Reader` (or the parquet equivalent) and deleted before any query runs.

## Out of scope

- LP and TLE imports — already use this pattern.
- **Patching `getColumns` SQL syntax in isolation** (e.g. swap `SELECT ... FROM (DESCRIBE ...)` for `pragma table_info(...)`) — would fix the symptom without fixing the architecture. Don't.
- Audit of every other DuckDB query that touches external files — separate issue; this one is about the import path specifically.

## Severity

**Medium.** Production CSV/Parquet imports are broken today against the current DuckDB version + sandbox config. Workaround: convert source data to line protocol and use `import lp`, which works. But that's a 1-format workaround for an API that ostensibly supports 3 formats.

## Test plan (for the eventual fix)

- [ ] CSV import against a 3-row file with explicit `--time-column` + `--time-format` produces the same `ImportResult` shape as today (rows_imported, partitions_created, time_range_min/max, columns, duration_ms).
- [ ] CSV import against a 0-byte file fails with `"file is empty"` (the same 400 LP returns today), not a parser error.
- [ ] CSV with `delimiter=';'` and `skip_rows=1` ingests correctly.
- [ ] Parquet import against a basic file with `time` column produces matching `ImportResult`.
- [ ] Imported data is queryable via `arcctl query` (or `POST /api/v1/query`) against the same database/measurement.
- [ ] Sandbox `allowed_directories` no longer contains the upload temp dir, and CSV/Parquet imports still succeed (proves the temp file isn't being read via DuckDB anymore).
- [ ] LP and TLE imports continue to work unchanged.

## Discovered while

Building `arcctl import csv` in https://github.com/Basekick-Labs/arcctl/pull/3. arcctl correctly surfaces the verbatim 422 from the server; this is purely an arc-side fix.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

import(csv,parquet): replace DuckDB-on-temp-file introspection with in-process row streaming via ArrowBuffer #472

Summary

Architecture problem (the bigger fix)

Proposed fix

Out of scope

Severity

Test plan (for the eventual fix)

Discovered while

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

import(csv,parquet): replace DuckDB-on-temp-file introspection with in-process row streaming via ArrowBuffer #472

Description

Summary

Architecture problem (the bigger fix)

Proposed fix

Out of scope

Severity

Test plan (for the eventual fix)

Discovered while

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions