org-dex-parse

Extract structured data from org-mode files. Point it at an .org file, get back Python objects — titles, timestamps, links, tags, clock entries, properties, and more — ready to query, store, or pipe into whatever you’re building.

Built for org-dex, usable standalone. Uses orgparse as the parsing backend.

Try it

From the command line

pip install org-dex-parse   # Python >= 3.11

Use one of your own org files, or create a test file:

* TODO Write report                                       :work:
  DEADLINE: <2026-04-01>
  :PROPERTIES:
  :ID:  abc-001
  :END:
** Notes
   Some references: [[id:other][see also]].
* DONE Review draft
  CLOSED: [2026-03-15 Sun 10:00]
  :PROPERTIES:
  :ID:  abc-002
  :END:

python -m org_dex_parse example.org

Output:

example.org: 2 items
  Write report
    id=abc-001  level=1  line=1
    todo=TODO
    local_tags={'work'}
  Review draft
    id=abc-002  level=1  line=9
    todo=DONE

Add -v to include body text, --json for machine-readable output.

A ready-made config file is included for common setups:

python -m org_dex_parse --config examples/config.json example.org

It covers TODO keywords, drawer filtering, and item selection rules for a typical org-mode setup. Copy it and adjust to your needs — the fields are documented in Configuration.

From Python

from org_dex_parse import parse_file, Config

result = parse_file("notes.org", Config())

for item in result.items:
    print(f"{item.todo or ''} {item.title}")
    print(f"  id={item.item_id}  tags={item.local_tags}")
    if item.deadline:
        print(f"  deadline={item.deadline.date}")
    if item.links:
        print(f"  links={len(item.links)}")

Each Item in result.items is a heading with :ID: that passed the configured predicate — see Key concepts.

Key concepts

The parser distinguishes two kinds of headings:

Items — headings with :ID: that pass the predicate. Each produces a 24-field structured object.
Scaffolding — everything else. Organizational headings whose content (body, links, timestamps, clock) rolls up into the nearest ancestor item. Nothing is lost — scaffolding content is collected, not discarded.

You control what counts as an item through a predicate. The default accepts every heading with :ID:. You can narrow it — for example, require a :Type: property, or exclude headings with ROAM_EXCLUDE.

                 org file
                    |
                 orgparse
               (syntax tree)
                    |
              org-dex-parse
             (semantic layer)
                    |
               Item stream
             (24-field frozen
              dataclasses)
                    |
     +--------------+--------------+
     v              v              v
org-dex       custom indexers   data pipelines
(DB + UI)     (knowledge graphs)(analytics)

orgparse handles the org-mode grammar. org-dex-parse handles item discrimination, field extraction, and content filtering.

Installation

pip install org-dex-parse

Requires Python >= 3.11. Single dependency: orgparse>=0.4,<0.5.

Examples

The examples below show the parser on increasingly complex org files. Each starts with the org source, then shows the Python code and what each field contains.

Example 1: default predicate — items and scaffolding

With Config() (default), every heading with :ID: is an item. Headings without :ID: are scaffolding — their content rolls up into the nearest ancestor item.

* Project
** TODO Write report                                       :work:
   DEADLINE: <2026-04-01>
   :PROPERTIES:
   :ID:       a1b2c3
   :END:
*** Notes
    Some text with [[id:ref][a link]].
    Meeting on <2026-03-20 Thu>.
** DONE Review draft
   CLOSED: [2026-03-15 Sun 10:00]
   :PROPERTIES:
   :ID:       d4e5f6
   :END:
** Background reading
   No :ID: here — just an organizational heading.

config = Config(
    todos=("TODO",),
    dones=("DONE",),
)
result = parse_file("project.org", config)
# result.items → 2 items

Heading	`:ID:`?	Item?	Why
Project	no	no	No `:ID:` → scaffolding
Write report	yes	yes	Has `:ID:`
Notes	no	no	No `:ID:` → scaffolding of above
Review draft	yes	yes	Has `:ID:`
Bg reading	no	no	No `:ID:` → scaffolding

“Notes” is scaffolding under “Write report”. Its body text, the link [[id:ref][a link]], and the timestamp <2026-03-20> all become part of the “Write report” item:

item = result.items[0]  # Write report
item.title           # "Write report"
item.todo            # "TODO"
item.local_tags      # frozenset({"work"})
item.deadline.date   # datetime.date(2026, 4, 1)
item.active_ts[0].date  # datetime.date(2026, 3, 20)  ← from "Notes"
item.links[0].target    # "id:ref"                    ← from "Notes"
item.body            # "Notes\nSome text with a link.\nMeeting on ..."

Example 2: `:Type:` predicate — narrower item definition

With Config(item_predicate=["property", "Type"]), a heading must have both :ID: and a :Type: property to be an item:

* Inbox
  :PROPERTIES:
  :ID:       aaa-111
  :Type:     area
  :END:
** TODO Buy groceries
   SCHEDULED: <2026-03-17 Tue>
   :PROPERTIES:
   :ID:       bbb-222
   :Type:     task
   :END:
** Grocery list
   :PROPERTIES:
   :ID:       ccc-333
   :END:
   - Milk
   - Bread

config = Config(
    item_predicate=["property", "Type"],
    todos=("TODO",),
    dones=("DONE",),
)
result = parse_file("inbox.org", config)
# result.items → 2 items (Inbox, Buy groceries)
# "Grocery list" has :ID: but no :Type: → scaffolding

Heading	`:ID:`?	`:Type:`?	Item?	Why
Inbox	yes	`area`	yes	Has `:ID:` + `:Type:`
Buy groceries	yes	`task`	yes	Has `:ID:` + `:Type:`
Grocery list	yes	—	no	Has `:ID:` but no `:Type:` → scaffolding

“Grocery list” is scaffolding — but it’s at level 2, a sibling of “Buy groceries”, not its child. Both are children of “Inbox”. So “Grocery list” content rolls up to Inbox, not “Buy groceries”:

inbox = result.items[0]  # Inbox
inbox.body              # "Grocery list\n- Milk\n- Bread"

item = result.items[1]  # Buy groceries
item.scheduled.date     # datetime.date(2026, 3, 17)
item.properties         # (("Type", "task"),)
item.parent_item_id     # "aaa-111"  ← Inbox is the parent item
item.body               # None — no scaffolding under this item

Example 3: org-roam style — exclude archived nodes

org-roam users typically want every :ID: heading except those marked with ROAM_EXCLUDE. The not operator handles this:

* Main topic
  :PROPERTIES:
  :ID:       roam-001
  :END:
  This is a permanent note.
  See also [[https://example.com/reference][Reference paper]].
** Supporting argument
   :PROPERTIES:
   :ID:       roam-002
   :END:
   Evidence from [[id:roam-005][another note]].
** COMMENT Draft section
   :PROPERTIES:
   :ID:       roam-003
   :ROAM_EXCLUDE: t
   :END:
   Work in progress — not ready for the graph.

config = Config(
    item_predicate=["not", ["property", "ROAM_EXCLUDE"]],
)
result = parse_file("roam-note.org", config)
# result.items → 2 items (Main topic, Supporting argument)
# "Draft section" is excluded by the predicate

Heading	`:ID:`?	`ROAM_EXCLUDE`?	Item?	Why
Main topic	yes	no	yes	`:ID:` + not excluded
Supporting	yes	no	yes	`:ID:` + not excluded
Draft	yes	`t`	no	`ROAM_EXCLUDE` → scaffold

item = result.items[0]  # Main topic
item.links[0].target       # "https://example.com/reference"
item.links[0].description  # "Reference paper"
item.body
# "This is a permanent note.\n"
# "See also Reference paper.\n"
# "COMMENT Draft section\n"           ← scaffolding heading
# "Work in progress — not ready ..."  ← scaffolding body

Example 4: LOGBOOK data — clock entries and state changes

Clock entries and state changes are extracted from the :LOGBOOK: drawer. They are collected from the item and its scaffolding children.

* TODO Deep work session                                   :focus:
  SCHEDULED: <2026-03-17 Tue 09:00>
  :PROPERTIES:
  :ID:       clock-001
  :END:
  :LOGBOOK:
  CLOCK: [2026-03-16 Mon 14:00]--[2026-03-16 Mon 15:30] =>  1:30
  CLOCK: [2026-03-16 Mon 10:00]--[2026-03-16 Mon 11:45] =>  1:45
  - State "TODO"       from "PLANNING"  [2026-03-15 Sun 09:00]
  - State "PLANNING"   from              [2026-03-14 Sat 18:00]
  :END:
  Focus on the analysis section.

config = Config(
    todos=("PLANNING", "TODO"),
    dones=("DONE",),
)
result = parse_file("work.org", config)
item = result.items[0]

# Clock entries (collected from :LOGBOOK:)
len(item.clock)                  # 2
item.clock[0].start              # datetime(2026, 3, 16, 10, 0)
item.clock[0].end                # datetime(2026, 3, 16, 11, 45)
item.clock[0].duration_minutes   # 105
item.clock[1].start              # datetime(2026, 3, 16, 14, 0)
item.clock[1].duration_minutes   # 90

# State changes (chronological order)
len(item.state_changes)               # 2
item.state_changes[0].to_state        # "PLANNING"
item.state_changes[0].from_state      # None  ← first assignment
item.state_changes[1].to_state        # "TODO"
item.state_changes[1].from_state      # "PLANNING"

# Body excludes LOGBOOK content
item.body   # "Focus on the analysis section."

Example 5: timestamps — dedicated vs generic

The parser distinguishes dedicated timestamps (SCHEDULED, DEADLINE, CLOSED, created, archived) from generic timestamps found in the body text. Each has its own field — no double-counting.

* DONE Submit paper
  SCHEDULED: <2026-03-01 Sun> DEADLINE: <2026-03-10 Tue> CLOSED: [2026-03-09 Mon 23:55]
  :PROPERTIES:
  :ID:       ts-001
  :CREATED:  [2026-01-10 Sat]
  :ARCHIVE_TIME: 2026-03-15 Sun 12:00
  :END:
  Submitted before the deadline.
  Conference is <2026-06-15 Mon>--<2026-06-18 Thu>.
  Received confirmation on [2026-03-10 Tue].

config = Config(dones=("DONE",))
result = parse_file("paper.org", config)
item = result.items[0]

# Dedicated timestamps — from planning line and properties
item.scheduled.date       # datetime.date(2026, 3, 1)
item.scheduled.active     # True   (angle brackets)
item.deadline.date        # datetime.date(2026, 3, 10)
item.closed.date          # datetime.datetime(2026, 3, 9, 23, 55)
item.closed.active        # False  (square brackets)
item.created.date         # datetime.date(2026, 1, 10)
item.archived.date        # datetime.datetime(2026, 3, 15, 12, 0)

# Generic timestamps — from body text only (no overlap with above)
len(item.active_ts)       # 0  ← the range endpoints are NOT here
len(item.inactive_ts)     # 1  ← [2026-03-10 Tue]
len(item.range_ts)        # 1  ← the conference range
item.range_ts[0].start.date  # datetime.date(2026, 6, 15)
item.range_ts[0].end.date    # datetime.date(2026, 6, 18)
item.range_ts[0].active      # True

Scaffolding planning lines become generic timestamps. The rule above (dedicated fields, no double-counting) applies only to the item’s own planning line. When a scaffolding heading has SCHEDULED, DEADLINE, or CLOSED, those timestamps have no dedicated destination — they are promoted to generic timestamps (active_ts / inactive_ts) so they are not lost.

* TODO Project plan
  DEADLINE: <2026-04-01>
  :PROPERTIES:
  :ID:       plan-001
  :END:
** Phase 1
   SCHEDULED: <2026-03-15 Sun>
   Define requirements.
** Phase 2
   DEADLINE: <2026-03-25 Tue>
   Build prototype.

config = Config(todos=("TODO",), dones=("DONE",))
result = parse_file("plan.org", config)
item = result.items[0]  # Project plan

# Item's own planning → dedicated field
item.deadline.date        # datetime.date(2026, 4, 1)

# Scaffolding planning → promoted to generic timestamps
# Phase 1's SCHEDULED and Phase 2's DEADLINE have no dedicated
# field on the parent item, so they become active_ts.
len(item.active_ts)       # 2
item.active_ts[0].date    # datetime.date(2026, 3, 15)  ← Phase 1 SCHEDULED
item.active_ts[1].date    # datetime.date(2026, 3, 25)  ← Phase 2 DEADLINE

Example 6: tags, properties, and inheritance

Tags on a heading are local_tags. Tags from ancestors are inherited_tags (minus any tags in tags_exclude_from_inheritance). Properties come from the direct :PROPERTIES: drawer only — never from children.

#+FILETAGS: :project:

* Research                                                :science:
  :PROPERTIES:
  :ID:       tag-001
  :Type:     area
  :Effort:   3:00
  :END:
** Literature review                                       :reading:
   :PROPERTIES:
   :ID:       tag-002
   :Type:     task
   :END:

config = Config(
    item_predicate=["property", "Type"],
    tags_exclude_from_inheritance=frozenset({"noexport"}),
)
result = parse_file("research.org", config)

parent = result.items[0]  # Research
parent.local_tags       # frozenset({"science"})
parent.inherited_tags   # frozenset({"project"})  ← from FILETAGS
parent.properties       # (("Type", "area"), ("Effort", "180"))

child = result.items[1]  # Literature review
child.local_tags        # frozenset({"reading"})
child.inherited_tags    # frozenset({"project", "science"})
child.parent_item_id    # "tag-001"
child.properties        # (("Type", "task"),)
# Effort is NOT here — properties are per-heading, not inherited

Example 7: links — org-mode and bare URLs

Links are extracted from the complete raw_text of the item (including scaffolding children and content inside excluded drawers). Two kinds are captured:

Org-mode links — any [[target]] or [[target][description]], regardless of schema (id:, https://, file:, ./image.png, fuzzy, etc.). The target is stored raw — the consumer extracts the schema if needed.
Bare URLs — http:// and https:// URLs outside of [[...]].

* Reference collection
  :PROPERTIES:
  :ID:       link-001
  :END:
  Key paper: [[https://arxiv.org/abs/2301.00001][Attention is all you need]].
  Related note: [[id:abc-123][Transformer architecture]].
  Blog post: https://example.com/transformers
  :SEE_ALSO:
  [[id:def-456][History of neural networks]]
  :END:

config = Config(
    exclude_drawers=frozenset({"see_also"}),
)
result = parse_file("refs.org", config)
item = result.items[0]

len(item.links)  # 4

item.links[0].target       # "https://arxiv.org/abs/2301.00001"
item.links[0].description  # "Attention is all you need"

item.links[1].target       # "id:abc-123"
item.links[1].description  # "Transformer architecture"

item.links[2].target       # "https://example.com/transformers"
item.links[2].description  # None  ← bare URL, no description

item.links[3].target       # "id:def-456"
item.links[3].description  # "History of neural networks"
# ↑ extracted from :SEE_ALSO: — links survive drawer exclusion

# Body EXCLUDES :SEE_ALSO: content
item.body
# "Key paper: Attention is all you need.\n"
# "Related note: Transformer architecture.\n"
# "Blog post: https://example.com/transformers"

Example 8: body and raw_text — what’s included, what’s filtered

body is the filtered text meant for display. raw_text is the complete unfiltered org-mode source. Both include scaffolding children.

* TODO Prepare presentation                                :work:
  DEADLINE: <2026-04-01>
  :PROPERTIES:
  :ID:       body-001
  :Type:     task
  :END:
  :LOGBOOK:
  - State "TODO" from "PLANNING" [2026-03-15 Sun 09:00]
  :END:
  First draft of the slides.
  See [[id:ref-001][design document]].
** Outline
   - Introduction (5 min)
   - Main argument (15 min)
   - Q&A (10 min)

config = Config(
    item_predicate=["property", "Type"],
    todos=("PLANNING", "TODO"),
    dones=("DONE",),
)
result = parse_file("pres.org", config)
item = result.items[0]

# body: filtered, human-readable
# - PROPERTIES drawer: excluded (orgparse strips it from body)
# - LOGBOOK drawer: excluded (always, hardcoded)
# - "Outline" heading: INCLUDED (scaffolding heading text)
# - Link syntax resolved to description text
item.body
# "First draft of the slides.\n"
# "See design document.\n"
# "Outline\n"
# "- Introduction (5 min)\n"
# "- Main argument (15 min)\n"
# "- Q&A (10 min)"

# raw_text: complete unfiltered org source
# Includes PROPERTIES, LOGBOOK, link syntax, everything.
# Does NOT include content from other items.
"LOGBOOK" in item.raw_text       # True
":ID:" in item.raw_text          # True
"[[id:ref-001]" in item.raw_text # True  ← raw link syntax preserved

Configuration

Config controls what the parser considers an item and how it extracts data. All fields have sensible defaults — the minimal config is Config() (any heading with :ID: is an item).

from org_dex_parse import Config

config = Config(
    # Which headings with :ID: are items (default: all of them)
    item_predicate=["property", "Type"],

    # TODO keywords for your org-mode setup
    todos=("TODO", "NEXT", "DOING"),
    dones=("DONE", "CANCELED"),

    # Tags that don't propagate to children
    # (matches org-tags-exclude-from-inheritance)
    tags_exclude_from_inheritance=frozenset({"noexport", "pin"}),

    # Drawers excluded from body text (not from links)
    exclude_drawers=frozenset({"logbook", "see_also"}),

    # Source blocks excluded from body text
    exclude_blocks=frozenset({"comment"}),

    # Properties omitted from Item.properties
    exclude_properties=frozenset({"archive_file"}),

    # Property name for creation date (default "CREATED")
    created_property="CREATED",

    # Extra characters allowed in tag names (default: none)
    # Standard org-mode: [a-zA-Z0-9_@]
    extra_tag_chars="%#",
)

Item predicate

The predicate determines which :ID: headings become items. Three forms are accepted:

Form	Example	Use case
`None`	`Config()`	All headings with `:ID:`
`list`	`Config(item_predicate=["property", "Type"])`	JSON-serializable (recommended)
`callable`	`Config(item_predicate=lambda h: ...)`	Python-only

The list form uses s-expressions (JSON arrays) with these operators:

Operator	Example	Meaning
`property`	`["property", "Type"]`	Has property `Type`
`not`	`["not", ["property", "ARCHIVE_TIME"]]`	Negation
`and`	`["and", ["property", "Type"], ["not", ["property", "ARCHIVE_TIME"]]]`	All must match (short-circuit)
`or`	`["or", expr1, expr2]`	Any must match (short-circuit)

The list form is the recommended interface — it is serializable (JSON-RPC, config files, CLI) and covers the common cases. The callable form exists for backward compatibility and advanced use.

“TODO” and “DONE” keywords

org-mode needs to know your TODO keywords to correctly parse headings. If you use custom keywords, pass them in Config:

config = Config(
    todos=("TODO", "NEXT", "WAITING"),
    dones=("DONE", "CANCELED"),
)

Without this, headings like ** NEXT Write report will have item.todo = None and "NEXT" will be part of item.title.

Drawer and block exclusion

exclude_drawers and exclude_blocks control what is excluded from Item.body. They do not affect link extraction — links are extracted from the complete raw text, so links inside excluded drawers are still captured.

The :LOGBOOK: drawer is always excluded from body and from generic timestamp extraction. Its contents are parsed by dedicated handlers (Item.clock, Item.state_changes).

Item fields

Each Item is a frozen (immutable) dataclass with 24 fields:

Field	Type	Description
`title`	`str`	Heading text (without TODO/priority/tags)
`item_id`	`str`	Value of `:ID:` property
`level`	`int`	Heading level (1, 2, 3…)
`linenumber`	`int`	Source file line number
`file_path`	`str`	Path to the org file
`todo`	`str \vert None`	TODO keyword (`None` if absent)
`priority`	`str \vert None`	Priority letter (`None` if absent)
`local_tags`	`frozenset[str]`	Tags on this heading
`inherited_tags`	`frozenset[str]`	Tags from ancestor headings
`parent_item_id`	`str \vert None`	`:ID:` of nearest item ancestor
`scheduled`	`Timestamp \vert None`	`SCHEDULED` planning timestamp
`deadline`	`Timestamp \vert None`	`DEADLINE` planning timestamp
`closed`	`Timestamp \vert None`	`CLOSED` planning timestamp
`created`	`Timestamp \vert None`	Creation date (from configured property)
`archived`	`Timestamp \vert None`	Archive date (from `ARCHIVE_TIME` property)
`active_ts`	`tuple[Timestamp, ...]`	Generic active timestamps from body
`inactive_ts`	`tuple[Timestamp, ...]`	Generic inactive timestamps from body
`range_ts`	`tuple[Range, ...]`	Date ranges from body
`clock`	`tuple[ClockEntry, ...]`	CLOCK entries from `:LOGBOOK:`
`state_changes`	`tuple[StateChange, ...]`	State transitions from `:LOGBOOK:`
`body`	`str \vert None`	Body text (filtered, `None` if empty)
`raw_text`	`str`	Complete unfiltered source text
`links`	`tuple[Link, ...]`	All links (org-mode + bare URLs)
`properties`	`tuple[tuple[str, str], ...]`	Properties (excluding `ID`, `ARCHIVE_TIME`, created)

Supporting types

Timestamp(date, active, repeater)
#   date: datetime.date | datetime.datetime
#   active: bool            # <...> = True, [...] = False
#   repeater: str | None    # e.g. "+1w"

Link(target, description)
#   target: str             # raw, e.g. "id:abc", "https://...", "Heading"
#   description: str | None

Range(start, end, active)
#   start: Timestamp
#   end: Timestamp
#   active: bool

ClockEntry(start, end, duration_minutes)
#   start: datetime.datetime
#   end: datetime.datetime | None      # None for running clocks
#   duration_minutes: int | None       # None for running clocks

StateChange(to_state, from_state, timestamp)
#   to_state: str                      # e.g. "DONE"
#   from_state: str | None             # e.g. "TODO", None for first
#   timestamp: datetime.datetime

CLI reference

All Config fields are available as CLI flags. Run python -m org_dex_parse --help for the full list.

# Default: any heading with :ID: is an item
python -m org_dex_parse file.org

# With a predicate
python -m org_dex_parse --predicate '["property", "Type"]' file.org

# With TODO keywords
python -m org_dex_parse --todos TODO,NEXT,DOING --dones DONE,CANCELED file.org

# From a config file (all fields optional)
python -m org_dex_parse --config myconfig.json file.org

# JSON output
python -m org_dex_parse --json file.org

# Verbosity: -v adds body, -vv adds raw_text
python -m org_dex_parse -v file.org
python -m org_dex_parse --json -vv file.org

An example config file is included in examples/config.json — it documents all available fields and can be used directly:

python -m org_dex_parse --config examples/config.json file.org

Precedence: CLI flags override config file values, which override defaults.

Performance

Extraction profile on a real-world org archive (4,380 items, Linux, Python 3.11):

Field	Count
title	4380
item_id	4380
level	4380
linenumber	4380
file_path	4380
todo	4380
priority	1442
local_tags	4380
inherited_tags	4358
parent_item_id	0
scheduled	40
deadline	4
closed	4369
created	0
archived	4380
active_ts	2453
inactive_ts	255
range_ts	1874
clock	251
state_changes	872
body	3124
raw_text	4380
links	10214
properties	4755


File size	5.0 MB
Lines	135,511
Extraction time	2.5 s

Breakdown: orgparse loads the syntax tree in ~1.5 s, org-dex-parse walks the tree and extracts all fields in ~1.0 s. The extraction phase uses O(n) pre-computed caches for parent lookup and tag inheritance.

Assumptions and requirements

The parser makes the following assumptions about the org files it processes:

=:ID:= is required. A heading without an :ID: property is never an item — it is scaffolding. This is a structural invariant, not a configurable option.
TODO keywords must be declared. org-mode determines TODO keywords at file level (#+TODO:) or in Emacs configuration. The parser doesn’t read Emacs config — pass your keywords in Config.todos / Config.dones. Without them, keywords are not recognized and become part of the heading title.
=org-log-into-drawer= must be =t= (the org-mode default). The parser filters the :LOGBOOK: drawer by name. Custom drawer names and inline logging are not supported (see Limitations).

Limitations

Known limitations of v0.1

LOGBOOK drawer name is hardcoded

The parser assumes org-log-into-drawer is t (Emacs default), which means logging goes into a drawer named :LOGBOOK:. If your setup uses a custom drawer name (org-log-into-drawer set to a string) or inline logging (org-log-into-drawer set to nil), logging timestamps will leak into inactive_ts as false positives.

Tag character monkey-patch is not thread-safe

When Config.extra_tag_chars is non-empty, the parser temporarily modifies a global regex in orgparse to allow the extra characters. This is not thread-safe — do not call parse_file concurrently from multiple threads with different extra_tag_chars values. Single-threaded use (including sequential calls with different configs) is safe.

Encrypted headings (org-crypt) not handled

org-mode supports encrypting subtrees via org-crypt. The encrypted body (a PGP/GPG blob) is opaque text — the parser processes it as regular body content, extracting meaningless timestamps, links, and text from the ciphertext.

orgparse private API dependency

The parser depends on 4 private attributes of orgparse (_repeater, _duration, _body_lines, RE_HEADING_TAGS). All access is isolated in an adapter module (_orgparse_compat.py) — the rest of the codebase never touches orgparse internals directly. The attributes are protected by guard tests and a version pin (orgparse>=0.4,<0.5), but may break if orgparse changes its internals within the pinned range.

Development

git clone https://github.com/gdvek/org-dex-parse.git
cd org-dex-parse
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
pytest tests/ -v

License

GPL-3.0-or-later

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
examples		examples
org_dex_parse		org_dex_parse
tests		tests
.gitignore		.gitignore
CHANGELOG.org		CHANGELOG.org
LICENSE		LICENSE
README.org		README.org
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

org-dex-parse

Try it

From the command line

From Python

Key concepts

Installation

Examples

Example 1: default predicate — items and scaffolding

Example 2: :Type: predicate — narrower item definition

Example 3: org-roam style — exclude archived nodes

Example 4: LOGBOOK data — clock entries and state changes

Example 5: timestamps — dedicated vs generic

Example 6: tags, properties, and inheritance

Example 7: links — org-mode and bare URLs

Example 8: body and raw_text — what’s included, what’s filtered

Configuration

Item predicate

“TODO” and “DONE” keywords

Drawer and block exclusion

Item fields

Supporting types

CLI reference

Performance

Assumptions and requirements

Limitations

LOGBOOK drawer name is hardcoded

Tag character monkey-patch is not thread-safe

Encrypted headings (org-crypt) not handled

orgparse private API dependency

Development

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Example 2: `:Type:` predicate — narrower item definition

Packages