Skip to content
Merged

Dev #32

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
a6e265f
Improve numeric comparisons and diagnostics
dawnmy Oct 7, 2025
424302c
Merge pull request #25 from dawnmy/codex/fix-type-inference-and-error…
dawnmy Oct 7, 2025
d00f7b7
Add branching and regex helpers to mutate expressions
dawnmy Oct 9, 2025
9934598
Merge pull request #26 from dawnmy/codex/implement-multi-branch-case_…
dawnmy Oct 9, 2025
1a24d73
Fix case_when branch parsing and add regression tests
dawnmy Oct 9, 2025
9e66e15
Merge branch 'dev' into codex/implement-multi-branch-case_when-and-mo…
dawnmy Oct 9, 2025
05dbbb8
Merge pull request #27 from dawnmy/codex/implement-multi-branch-case_…
dawnmy Oct 9, 2025
40e830c
Fix column selector heuristics in case_when
dawnmy Oct 9, 2025
ddb0cda
Merge branch 'dev' into codex/implement-multi-branch-case_when-and-mo…
dawnmy Oct 9, 2025
5f07d05
Merge pull request #28 from dawnmy/codex/implement-multi-branch-case_…
dawnmy Oct 9, 2025
5e02a09
Fix evaluate_truthy regression and add coverage
dawnmy Oct 9, 2025
9e04f88
Merge pull request #29 from dawnmy/codex/fix-multiple-definitions-and…
dawnmy Oct 9, 2025
e428b1e
Update expression.rs
dawnmy Oct 9, 2025
a11d26e
Add file metadata injection to cut command
dawnmy Oct 9, 2025
5b86f6b
Merge pull request #30 from dawnmy/codex/add-support-for-__file__-and…
dawnmy Oct 9, 2025
f48f32c
Allow cut to process multiple files
dawnmy Oct 9, 2025
49e9b9e
Merge branch 'dev' into codex/add-support-for-__file__-and-__base__-c…
dawnmy Oct 9, 2025
bd30214
Merge pull request #31 from dawnmy/codex/add-support-for-__file__-and…
dawnmy Oct 9, 2025
9186157
Update README.md
dawnmy Oct 15, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "tsvkit"
version = "0.9.3"
version = "0.9.5"
edition = "2024"

[dependencies]
Expand Down
35 changes: 33 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,7 @@ The same expression language powers `filter -e`, `mutate -e name=EXPR`, and rege
| `+ - * / ^` | Arithmetic operators (`^` is exponentiation, right-associative). | Numbers |
| `== != < <= > >=` | Comparisons. | Numbers or strings |
| `&` / `and` | Logical AND. | Booleans |
| `|` / `or` | Logical OR. | Booleans |
| `\|` / `or` | Logical OR. | Booleans |
| `!` / `not` | Logical negation. | Booleans |
| `~` | Regex match. Right-hand side can be literal text or a `$range`. | Strings |
| `!~` | Regex does *not* match. | Strings |
Expand All @@ -165,9 +165,27 @@ The same expression language powers `filter -e`, `mutate -e name=EXPR`, and rege
| `ln(expr)` | Natural logarithm |
| `log(expr)` / `log10(expr)` | Base-10 logarithm |
| `log2(expr)` | Base-2 logarithm |
| `len(expr)` | Character count using Unicode code points. |
| `is_na(expr)` | Returns `1` when the expression is blank/`NA`/`NaN`, otherwise `0`. |

Functions accept column references (`abs($purity - 1)`), constants, or subexpressions. Empty or non-numeric values yield blanks.

**Conditional and regex helpers**

- `case_when(condition -> result, ..., _ -> default)` evaluates each boolean condition in order and returns the matching result. The final `_` branch acts as the default.
- `switch(value, [match1, match2] -> result, ..., _ -> default)` compares `value` to one or more literal matches (strings or numbers) and returns the corresponding result.
- `re(value, pattern)` evaluates a regex against `value`, returning `1` or `0`. When the pattern matches, capture groups become available as `$1`, `$2`, etc. for the remainder of the expression (use `$0`-style numeric selectors sparingly when you rely on captures).

Example:

```
case_when(
re($sample, "^ERR(\\d+)$") -> $1,
re($sample, "^SRR") -> "SRA",
_ -> $sample
)
```

**Row-wise aggregation helpers**

Available within `mutate` expressions via functions such as `sum($col1:$col5)`; see the [Mutate](#mutate) section for the full list.
Expand Down Expand Up @@ -225,7 +243,7 @@ tsvkit filter -e '$group == "case" & $purity >= 0.94' examples/samples.tsv
| Literals | `1.25`, `"case"` | Strings use double quotes; escape inner quotes with `\"`. |
| Arithmetic | `($rna_ug - $dna_ug) / $rna_ug` | Standard precedence applies (parentheses for clarity). |
| Comparisons | `$purity >= 0.9`, `$group != "control"` | Works on numeric or string data. |
| Logical | `($purity >= 0.9) & ($group == "case")` | `&`, `|`, and `!` (or `and`, `or`, `not`). |
| Logical | `($purity >= 0.9) & ($group == "case")` | `&`, `\|`, and `!` (or `and`, `or`, `not`). |
| Numeric functions | `log2($total)`, `sqrt($reads)` | See [Expression language essentials](#expression-language-essentials). |
| Row-wise aggregators | `sum($dna_ug:$rna_ug)`, `mode($1,$3)`, `countunique($gene:)` | Same catalog as [`summarize`](#summarize): totals, quantiles (`q*` / `p*`), variance/SD, products, entropy, argmin/argmax, membership stats. Works with ranges, lists, and open selectors. |
| Regex match | `$tech ~ "sRNA"`, `$notes !~ "(?i)fail"` | Patterns follow Rust `regex` syntax. `(?i)` enables case-insensitive matching. |
Expand Down Expand Up @@ -260,6 +278,19 @@ tsvkit mutate \
examples/cytokines.tsv
```

Use `case_when`, `switch`, and the `re()` helper for richer branching logic and regex capture reuse:

```bash
tsvkit mutate \
-e 'label = case_when(
re($sample, "^ERR(\d+)$") -> $1,
re($sample, "^SRR") -> "SRA",
_ -> $sample
)' \
-e 'bucket = case_when(len($clean) == 0 -> "empty", len($clean) < 5 -> "short", _ -> "long")' \
examples/samples.tsv
```

Apply in-place edits with the sed-style form:

```bash
Expand Down
62 changes: 61 additions & 1 deletion src/common.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,28 @@ use csv::ReaderBuilder;
use flate2::read::MultiGzDecoder;
use xz2::read::XzDecoder;

#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum SpecialColumn {
FilePath,
FileBase,
}

impl SpecialColumn {
pub fn default_header(self) -> &'static str {
match self {
SpecialColumn::FilePath => "__file__",
SpecialColumn::FileBase => "__base__",
}
}
}

#[derive(Debug, Clone)]
pub enum ColumnSelector {
Index(usize),
FromEnd(usize),
Name(String),
Range(Option<Box<ColumnSelector>>, Option<Box<ColumnSelector>>),
Special(SpecialColumn),
}

pub fn parse_selector_list(spec: &str) -> Result<Vec<ColumnSelector>> {
Expand Down Expand Up @@ -86,6 +102,12 @@ pub fn resolve_selectors(
let mut indices = Vec::with_capacity(selectors.len());
for selector in selectors {
match selector {
ColumnSelector::Special(special) => {
bail!(
"special column '{}' cannot be resolved as a positional index",
special.default_header()
);
}
ColumnSelector::Index(_) | ColumnSelector::FromEnd(_) | ColumnSelector::Name(_) => {
let index = resolve_selector_index(headers, selector, no_header)?;
indices.push(index);
Expand Down Expand Up @@ -207,6 +229,21 @@ pub fn should_skip_record(
false
}

pub fn inconsistent_width_error(
source: &str,
row_number: usize,
expected: usize,
actual: usize,
) -> anyhow::Error {
anyhow!(
"rows in {} have inconsistent column counts at row {} (expected {}, got {})",
source,
row_number,
expected,
actual,
)
}

pub fn default_headers(len: usize) -> Vec<String> {
(1..=len).map(|i| format!("col{}", i)).collect()
}
Expand Down Expand Up @@ -251,6 +288,11 @@ fn parse_simple_selector(token: &str) -> Result<ColumnSelector> {
if let Some(literal) = parse_brace_literal(token)? {
return Ok(ColumnSelector::Name(literal));
}
match token {
"__file__" => return Ok(ColumnSelector::Special(SpecialColumn::FilePath)),
"__base__" => return Ok(ColumnSelector::Special(SpecialColumn::FileBase)),
_ => {}
}
if let Some(stripped) = token.strip_prefix('-') {
if stripped.is_empty() {
bail!("column selector '-' must include an index");
Expand Down Expand Up @@ -520,6 +562,10 @@ fn resolve_selector_index(
.with_context(|| format!("column '{}' not found", name))?;
Ok(index)
}
ColumnSelector::Special(special) => bail!(
"special column '{}' not supported without column injection",
special.default_header()
),
ColumnSelector::Range(_, _) => {
bail!("unexpected nested column range")
}
Expand All @@ -528,7 +574,10 @@ fn resolve_selector_index(

#[cfg(test)]
mod tests {
use super::{ColumnSelector, parse_selector_list, parse_single_selector, resolve_selectors};
use super::{
ColumnSelector, SpecialColumn, parse_selector_list, parse_single_selector,
resolve_selectors,
};

#[test]
fn resolves_name_range() {
Expand Down Expand Up @@ -642,4 +691,15 @@ mod tests {
let err = parse_selector_list("`foo").unwrap_err();
assert!(err.to_string().contains("unterminated backtick"));
}

#[test]
fn distinguishes_injected_and_literal_file_columns() {
let selectors = parse_selector_list("__file__,{__file__},`__base__`").unwrap();
assert!(matches!(
selectors[0],
ColumnSelector::Special(SpecialColumn::FilePath)
));
assert!(matches!(selectors[1], ColumnSelector::Name(ref name) if name == "__file__"));
assert!(matches!(selectors[2], ColumnSelector::Name(ref name) if name == "__base__"));
}
}
Loading