Skip to content
This repository was archived by the owner on Mar 27, 2026. It is now read-only.

multi identifier preference record uploads#444

Closed
JonnavithulaGirish wants to merge 94 commits intomainfrom
jonnavithulaGirish/multiIdentifier
Closed

multi identifier preference record uploads#444
JonnavithulaGirish wants to merge 94 commits intomainfrom
jonnavithulaGirish/multiIdentifier

Conversation

@JonnavithulaGirish
Copy link
Copy Markdown
Member

@JonnavithulaGirish JonnavithulaGirish commented Aug 7, 2025

Related Issues

Usage

pnpm start consent upload-preferences --auth=your-auth-token --partition=your-partition --directory=./examples/pm-test --dryRun=true --skipWorkflowTriggers=true --skipExistingRecordCheck=true --isSilent=true --attributes="Tags:transcend-cli,Source:transcend-cli" --transcendUrl=https://api.transcend.io/ --allowedIdentifierNames="email,personId,memberId" --identifierColumns="email_id,person_id,member_id"

Note

Major redesign of preference uploads with parallelism, multi-identifier support, and persistent state.

  • Replaces single-file flow with a directory-based, multi-process pool (worker.ts) and live dashboard; many new flags (--allowedIdentifierNames, --identifierColumns, --uploadConcurrency, --maxChunkSize, etc.)
  • Adds persistent schema (FileFormatState, schemaState.ts) and receipts (receiptsState.ts) with exponential-backoff reads; aggregates success/fail/pending and exports failing-updates CSV
  • Splits pipeline into plan and execute: buildInteractiveUploadPreferencePlan (validation/mapping) + interactivePreferenceUploaderFromPlan (batch upload with smarter retries/splitting and email validation)
  • Enhances identifier fetching (getPreferencesForIdentifiers): progress callbacks, recursive split on validation errors, improved retry logic
  • Introduces CSV transforms and mapping utilities; removes legacy uploadPreferenceManagementPreferencesInteractive
  • Adds utility scripts find-exact.ts and reconcile-preference-records.ts; logs active Sombra URL; minor GraphQL retry log tweak
  • Updates README usage/examples and CHANGELOG; bumps package to 9.0.0

Written by Cursor Bugbot for commit 85abeeb. This will update automatically on new commits. Configure here.

@linear
Copy link
Copy Markdown

linear bot commented Aug 7, 2025

Comment on lines +102 to +119
allowedIdentifierNames: {
kind: 'parsed',
parse: (value: string) => value.split(',').map((s) => s.trim()),
brief:
'Identifiers configured for the run. Comma-separated list of identifier names.',
},
identifierColumns: {
kind: 'parsed',
parse: (value: string) => value.split(',').map((s) => s.trim()),
brief:
'Columns in the CSV that should be used as identifiers. Comma-separated list of column names.',
},
columnsToIgnore: {
kind: 'parsed',
parse: (value: string) => value.split(',').map((s) => s.trim()),
brief:
'Columns in the CSV that should be ignored. Comma-separated list of column names.',
optional: true,
Copy link
Copy Markdown
Member

@bencmbrook bencmbrook Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The preferred pattern for lists is to use the built-in variadic: ','. This will parse it in the same way, but it will also (A) provide better error messages for malformed inputs, and (B) it will also self-document that it's a list which can be provided as a comma-separated argument like this:

FLAGS
      --auth                    The Transcend API key.
     [--identifierColumns]...   Identifier names configured for the run. [separator = ,]

It also allows users to pass --identifierColumns 1 --identifierColumns 2 --identifierColumns 3

And if any of these are enums (I don't think they are, but) you can also do something like this to validate (and also self-document) the expected inputs:
Image

If you

Suggested change
allowedIdentifierNames: {
kind: 'parsed',
parse: (value: string) => value.split(',').map((s) => s.trim()),
brief:
'Identifiers configured for the run. Comma-separated list of identifier names.',
},
identifierColumns: {
kind: 'parsed',
parse: (value: string) => value.split(',').map((s) => s.trim()),
brief:
'Columns in the CSV that should be used as identifiers. Comma-separated list of column names.',
},
columnsToIgnore: {
kind: 'parsed',
parse: (value: string) => value.split(',').map((s) => s.trim()),
brief:
'Columns in the CSV that should be ignored. Comma-separated list of column names.',
optional: true,
allowedIdentifierNames: {
kind: 'parsed',
parse: String,
variadic: ',',
brief: 'Identifier names configured for the run.',
},
identifierColumns: {
kind: 'parsed',
parse: String,
variadic: ',',
brief: 'Columns in the CSV that should be used as identifiers.',
},
columnsToIgnore: {
kind: 'parsed',
parse: String,
variadic: ',',
brief: 'Columns in the CSV that should be ignored.',
optional: true,

]);
currentState.timestampColum = timestampName;

currentState.setValue(timestampName, 'timestampColumn');
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Incorrect Parameter Order in setValue Calls

The setValue calls in parsePreferenceFileFormatFromCsv.ts and parsePreferenceAndPurposeValuesFromCsv.ts pass parameters in the wrong order, using (value, key) instead of the expected (key, value). This is inconsistent with getValue's key-first convention and likely results in incorrect state updates.

Additional Locations (1)

Fix in Cursor Fix in Web

'When uploading preferences to v1/preferences - this is the number of concurrent requests made at any given time by a single process.' +
"This is NOT the batch size—it's how many batch *tasks* run in parallel. " +
'The number of total concurrent requests is maxed out at concurrency * uploadConcurrency.',
default: '75', // FIXME 25
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Fix: incorrect default uploadConcurrency value persisted

The comment "// FIXME 25" on line 114 suggests the default value for uploadConcurrency should be 25 instead of 75, but was left at 75. This appears to be temporary debugging code or an unfinished change that was accidentally committed.

Fix in Cursor Fix in Web

Object.entries(datum).reduce(
(acc, [key, value]) =>
Object.assign(acc, {
[key.replace(/[^a-z_.+\-A-Z -~]/g, '')]: value,

Check warning

Code scanning / CodeQL

Overly permissive regular expression range Medium

Suspicious character range that overlaps with A-Z in the same character class, and overlaps with a-z in the same character class.
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 8 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

This PR is being reviewed by Cursor Bugbot

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

...pref,
lastUpdatedDate: pref.lastUpdatedDate
? pref.lastUpdatedDate
: new Date('08/24/2025').toISOString(),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded fallback date will become stale

Low Severity

The fallback date new Date('08/24/2025') is hardcoded for records missing lastUpdatedDate. This magic date will become outdated and may cause data integrity issues when the actual date significantly differs from this static value, making records appear to have stale timestamps.

Fix in Cursor Fix in Web

.split(',')
.map((email) => email.trim().toLowerCase());

const keys = Object.keys(preferences[0]);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty CSV file causes crash in transformCsv

Medium Severity

The transformCsv function accesses preferences[0] without checking if the array is empty. If a CSV file contains only a header row with no data rows, preferences will be an empty array, and Object.keys(preferences[0]) will throw a TypeError because preferences[0] is undefined. This crashes the worker with an unhelpful error message.

Fix in Cursor Fix in Web

CHANGELOG.md Outdated

## [9.0.0] - 2025-08-15

FIXME
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIXME placeholder committed in changelog

Medium Severity

The version 9.0.0 changelog entry contains only "FIXME" as placeholder text instead of actual release notes. This placeholder was committed and will be visible to users.

Fix in Cursor Fix in Web

*/
async function main(): Promise<void> {
const opts: Options = {
in: path.resolve('./working/costco/concerns/out.csv'),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Development script with hardcoded paths committed

Medium Severity

This 872-line script contains a hardcoded local path ./working/costco/concerns/out.csv and is not referenced anywhere in package.json or other source files. It appears to be a development/debugging utility that was accidentally committed to the repository.

Fix in Cursor Fix in Web

// Create got instance with default values
return got.extend({
prefixUrl: customerUrl,
prefixUrl: process.env.SOMBRA_URL || customerUrl,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sombra URL log does not match actual URL used

Medium Severity

The log message at line 49 displays customerUrl, but line 52 uses process.env.SOMBRA_URL || customerUrl as the actual prefixUrl. When the environment variable is set, the logged URL differs from the URL actually used, causing misleading diagnostic output during debugging.

Fix in Cursor Fix in Web

const shouldLog =
total % logInterval === 0 ||
Math.floor((total - identifiers.length) / logInterval) <
Math.floor(total / logInterval);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Progress logging uses wrong variable in boundary check

Medium Severity

The maybeLogProgress function receives a delta parameter but the shouldLog calculation uses identifiers.length (total count) instead of delta in the boundary-crossing check. This causes the condition (total - identifiers.length) to be negative until processing completes, making the comparison always true and logging after every single group instead of at the intended interval.

Fix in Cursor Fix in Web

nodes {
id
# FIXME remove
status
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug field with FIXME comment committed

Low Severity

A status field was added to the GraphQL query with a comment # FIXME remove, indicating this is temporary debugging code that was accidentally committed. This adds unnecessary data fetching overhead and the field may not be used.

Fix in Cursor Fix in Web

main().catch((err) => {
console.error(err?.stack ?? String(err));
process.exit(1);
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unreferenced utility script committed to source

Low Severity

This 341-line standalone script for searching files by content is not referenced anywhere in package.json or other source files. It appears to be a development utility for finding specific strings in CSV/JSON/parquet files that was accidentally committed to the repository.

Fix in Cursor Fix in Web

michaelfarrell76 and others added 15 commits February 22, 2026 20:34
- New `consent configure-preference-upload` command that scans all CSV
  files to discover headers and unique values, then interactively builds
  the column mapping config (identifiers, ignored columns, timestamps,
  purposes/preferences and value mappings).
- Move `columnsToIgnore` and `identifierColumns` into the persisted
  config file (FileFormatState) instead of CLI flags.
- Remove `allowedIdentifierNames`, `identifierColumns`, `columnsToIgnore`,
  and `skipMetadata` CLI flags from `upload-preferences`.
- Add `--regenerate` flag to force re-running config generation and
  `--chunkSizeMB` flag for auto-chunking oversized CSV files.
- Add `nonInteractive` mode to all three parse* functions so worker
  processes throw instead of prompting when config is incomplete.
- Workers now derive identifier/ignore columns from the schema config.

Co-authored-by: Cursor <cursoragent@cursor.com>
- Create `admin find-exact` command with stricli flags (--needle, --root,
  --exts, --noParquet, --concurrency, --maxBytes).
- Fix all ESLint violations from the original script (JSDoc, no-plusplus,
  no-return-await, no-param-reassign, no-explicit-any, etc.).
- Register in admin routes and delete the old standalone src/find-exact.ts.
- Fix pre-existing ESLint issues in getPreferenceMetadataFromRow.ts and
  fetchConsentPreferencesChunked.test.ts.

Co-authored-by: Cursor <cursoragent@cursor.com>
- Keep version at 9.0.0
- Keep FIXME comment on batchUploader rate limit regex
- Remove forceTriggerWorkflows from merged throughput test

Co-authored-by: Cursor <cursoragent@cursor.com>
- Keep version at 9.0.0
- Use main's compact JSDoc for MetadataMapping/ColumnMetadataMap
- Keep FileFormatState name (v9 breaking rename)

Co-authored-by: Cursor <cursoragent@cursor.com>
- Exclude .cursor/ from doctoc pre-commit hook
- Add JSDoc descriptions for prompt type annotations

Made-with: Cursor
@bencmbrook
Copy link
Copy Markdown
Member

migrated transcend-io/tools#11

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants