Add anonymize script #4

sirkostya009 · 2026-01-07T11:56:19Z

of https://github.com/Hypersequent/tms-issues/issues/2179

gemini-code-assist · 2026-01-07T11:56:37Z

Summary of Changes

Hello @sirkostya009, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a robust Go-based anonymization script designed to process CSV files containing sensitive information. The primary goal is to transform identifiable data into consistently generated, readable placeholders, thereby enabling the use of production-like datasets in non-production environments without compromising privacy. The script intelligently handles various data types, including structured JSON and parameterized strings, ensuring that the anonymized output retains its original format and relationships.

Highlights

New Anonymization Script: A new Go command-line tool, cmd/anonymize/main.go, has been added to facilitate the anonymization of sensitive data within CSV files.
Data Obfuscation Logic: The script implements sophisticated logic to replace original data in fields like names, tags, requirements, file metadata, preconditions, steps, parameter values, and custom fields with human-readable, yet meaningless, generated strings. This includes 'Lorem Ipsum' style text, 'adj-animal' combinations for tags, and 'REQ Adj Noun' for requirements.
Structure Preservation: Crucially, the anonymization process preserves the structural integrity of the data, including JSON fields and parameter placeholders (e.g., ${param_name}), ensuring that the output remains syntactically valid and functional for its intended use cases.
Go Version Update: The project's Go module has been updated to require Go version 1.25, reflecting a general update to the development environment.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new script for anonymizing CSV data. The overall structure is good, leveraging a state struct and helper functions for clarity and performance optimizations like using strings.Builder. However, the current implementation has several critical issues that could lead to panics when processing invalid or unexpected data formats in the CSV. I've identified multiple places where the code can panic due to unchecked assumptions about the input data, such as out-of-bounds slice access and nil map access. There is also a logic bug in how anonymized filenames are generated. My review provides specific suggestions to make the script more robust and correct.

gemini-code-assist · 2026-01-07T11:58:51Z

cmd/anonymize/main.go

+				matches := requirementsMDLinkRegex.FindStringSubmatch(requirement)
+				name := matches[1]


Accessing matches[1] will cause a panic if requirementsMDLinkRegex.FindStringSubmatch does not find a match, as matches will be nil. You should add a check to ensure matches is not nil and has at least 2 elements before accessing the captured group.

matches := requirementsMDLinkRegex.FindStringSubmatch(requirement) if len(matches) < 2 { continue // Or handle error for malformed requirement } name := matches[1]

gemini-code-assist · 2026-01-07T11:58:51Z

cmd/anonymize/main.go

+			read[HeaderSteps] = buf.String()[:buf.Len()-1] // trim trailing newline
+		}
+
+		var parameterValueKeys map[string]string


The parameterValueKeys map is declared but not initialized. If the HeaderParameterValues column is empty, this map will remain nil, causing a panic on line 331 when it's accessed. It should be initialized to an empty map to prevent this.

var parameterValueKeys = make(map[string]string)

gemini-code-assist · 2026-01-07T11:58:51Z

cmd/anonymize/main.go

+			if err != nil {
+				panic(err)
+			}
+			parameterValueKeys = make(map[string]string, len(parameterValues[0].Values))


Accessing parameterValues[0] will cause a panic if the parameterValues slice is empty (e.g., from an empty JSON array []). You should add a check for len(parameterValues) > 0 before this block of code that processes the parameter values.

gemini-code-assist · 2026-01-07T11:58:51Z

cmd/anonymize/main.go

+			if strings.HasPrefix(jsonVal, `{"value":"`) {
+				value := jsonVal[len(`{"value":"`) : len(jsonVal)-2]
+				buf.Reset()
+				buf.WriteString(`{"value":"`)
+				buf.WriteString(anon.readableCustomField(value))
+				buf.WriteString(`"}`)
+				read[HeaderTemplateSuffixParams+1+i] = buf.String()
+			}


This manual JSON parsing is brittle. It relies on strings.HasPrefix and string slicing, which will fail if the JSON format varies slightly (e.g., with extra whitespace, different key ordering, or additional fields). A much more robust approach is to unmarshal the JSON into a struct. This also allows you to correctly handle the logic described in the comments, like skipping values where isDefault is true.

Suggested change

if strings.HasPrefix(jsonVal, `{"value":"`) {

value := jsonVal[len(`{"value":"`) : len(jsonVal)-2]

buf.Reset()

buf.WriteString(`{"value":"`)

buf.WriteString(anon.readableCustomField(value))

buf.WriteString(`"}`)

read[HeaderTemplateSuffixParams+1+i] = buf.String()

}

var cf struct {

Value *string `json:"value"`

}

if err := json.Unmarshal([]byte(jsonVal), &cf); err == nil && cf.Value != nil {

value := *cf.Value

buf.Reset()

buf.WriteString(`{"value":"`)

buf.WriteString(anon.readableCustomField(value))

buf.WriteString(`"}`)

read[HeaderTemplateSuffixParams+1+i] = buf.String()

}

gemini-code-assist · 2026-01-07T11:58:52Z

cmd/anonymize/main.go

+func (a *state) readableFileID(original string) string {
+	if existing, ok := a.files[original]; ok {
+		return existing
+	}
+	a.fileCounter++
+	readable := "file-" + padInt(a.fileCounter, 3)
+	a.files[original] = readable
+	return readable
+}
+
+func (a *state) readableFileName(ext string) string {
+	adj := adjectives[a.fileCounter%len(adjectives)]
+	noun := nouns[a.fileCounter%len(nouns)]
+	return "document-" + adj + "-" + noun + ext
+}


There is a logic bug in the coupling between readableFileID and readableFileName. readableFileID increments a.fileCounter only for new file IDs. readableFileName then uses this counter. If an already-seen file ID is processed, readableFileID returns a cached value without incrementing the counter. The subsequent call to readableFileName will then use a stale counter value, leading to incorrect filenames being generated.

To fix this, you should generate and cache both the new ID and new filename together. Consider refactoring into a single function that returns both, and stores them in a struct in your state map. For example:

// In state struct: type fileMapping struct { newID string newName string } files map[string]fileMapping // New function: func (a *state) getAnonFileInfo(originalID, ext string) (string, string) { if mapping, ok := a.files[originalID]; ok { return mapping.newID, mapping.newName } a.fileCounter++ newID := "file-" + padInt(a.fileCounter, 3) adj := adjectives[a.fileCounter%len(adjectives)] noun := nouns[a.fileCounter%len(nouns)] newName := "document-" + adj + "-" + noun + ext a.files[originalID] = fileMapping{newID: newID, newName: newName} return newID, newName } // In main(): ext := filepath.Ext(files[i].FileName) files[i].ID, files[i].FileName = anon.getAnonFileInfo(files[i].ID, ext)

gemini-code-assist · 2026-01-07T11:58:52Z

cmd/anonymize/main.go

+	if *filename == "" {
+		panic("no -f flag provided")
+	}


Using panic for handling expected errors like invalid user input is not idiomatic for Go applications. It causes an abrupt stop and a stack trace, which isn't user-friendly. Consider handling this error gracefully by printing a message to os.Stderr and exiting with a non-zero status code, for example using log.Fatal or fmt.Fprintf(os.Stderr, ...) followed by os.Exit(1). This applies to other panic calls for file I/O and parsing errors in this function as well.

gemini-code-assist · 2026-01-07T11:58:52Z

cmd/anonymize/main.go

+			for j, r := range word {
+				a.textBuf.WriteRune(unicode.ToUpper(r))
+				a.textBuf.WriteString(word[j+utf8.RuneLen(r):])
+				break
+			}


This for loop to capitalize the first letter of a word is unconventional as it's designed to only run once. A more idiomatic and clearer approach would be to decode the first rune and then capitalize it, which avoids the loop structure.

r, size := utf8.DecodeRuneInString(word) a.textBuf.WriteRune(unicode.ToUpper(r)) a.textBuf.WriteString(word[size:])

add anonymize cmd

d2a8444

sirkostya009 requested review from hi-rai, ramilamparo and satvik007 January 7, 2026 11:56

sirkostya009 self-assigned this Jan 7, 2026

gofumpt

3a0621d

gemini-code-assist bot reviewed Jan 7, 2026

View reviewed changes

update golangci-lint

fe96ee7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add anonymize script #4

Add anonymize script #4

Uh oh!

sirkostya009 commented Jan 7, 2026

Uh oh!

gemini-code-assist bot commented Jan 7, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 7, 2026

Uh oh!

gemini-code-assist bot Jan 7, 2026

Uh oh!

gemini-code-assist bot Jan 7, 2026

Uh oh!

gemini-code-assist bot Jan 7, 2026

Uh oh!

gemini-code-assist bot Jan 7, 2026

Uh oh!

gemini-code-assist bot Jan 7, 2026

Uh oh!

gemini-code-assist bot Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		matches := requirementsMDLinkRegex.FindStringSubmatch(requirement)
		name := matches[1]

Add anonymize script #4

Are you sure you want to change the base?

Add anonymize script #4

Uh oh!

Conversation

sirkostya009 commented Jan 7, 2026

Uh oh!

gemini-code-assist bot commented Jan 7, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants