diff --git a/.jules/bolt.md b/.jules/bolt.md index 2eab34c884fa57..b5e486fe94a939 100644 --- a/.jules/bolt.md +++ b/.jules/bolt.md @@ -105,6 +105,9 @@ ## 2024-05-24 - [Avoid `email.parser` for large package indexes] **Learning:** `email.parser.Parser` performs full RFC 822/2822 compliance checks which adds massive overhead. When parsing tens of thousands of machine-generated opkg package index blocks with predictable `Key: Value` line formats, standard string splitting and `.startswith()` checks provide a ~14x speedup. **Action:** When extracting a few specific headers from a trusted and uniform block format instead of parsing arbitrary emails, avoid `email.parser.Parser` and use fast native python string operations instead. Make sure to use `.strip()` when parsing values to correctly handle `\r\n` line endings. +## 2024-05-14 - Python String Concatenation Optimization +**Learning:** In Python, string concatenation within a loop using the `+=` operator involves creating a new string object and copying contents on each iteration because strings are immutable. This leads to O(N^2) performance degradation over many iterations. +**Action:** Replace `+=` string concatenation inside loops with `list.append()` to collect the string parts, followed by `''.join(list)` outside the loop. This ensures an O(N) linear time complexity and avoids unnecessary allocations, providing significant performance speedups. ## 2026-05-22 - [Python String Splitting Memory Overhead] **Learning:** When parsing tens of thousands of blocks in a massive text file, using `.split()` to chunk the entire string creates an intermediate list containing all chunk strings simultaneously, leading to massive memory bloat (O(N) memory overhead in addition to the original string). diff --git a/scripts/dl_github_archive.py b/scripts/dl_github_archive.py index 21052d3b74ff2b..83d415c4756ea4 100755 --- a/scripts/dl_github_archive.py +++ b/scripts/dl_github_archive.py @@ -350,7 +350,7 @@ def _init_commit_ts(self): version_is_sha1sum = len(self.version) == 40 if not version_is_sha1sum: apis.insert(0, apis.pop()) - reasons = '' + reasons = [] for api in apis: url = api['url'] attr_path = api['attr_path'] @@ -364,8 +364,8 @@ def _init_commit_ts(self): self.commit_ts_cache.set(url, ct) return except Exception as e: - reasons += '\n' + (" {}: {}".format(url, e)) - raise self._error('Cannot fetch commit ts:{}'.format(reasons)) + reasons.append('\n {}: {}'.format(url, e)) + raise self._error('Cannot fetch commit ts:{}'.format(''.join(reasons))) def _init_commit_ts_remote_get(self, url, attrpath): resp = self._make_request(url)