Skip to content

Conversation

@sirkostya009
Copy link

@gemini-code-assist
Copy link

Summary of Changes

Hello @sirkostya009, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a robust Go-based anonymization script designed to process CSV files containing sensitive information. The primary goal is to transform identifiable data into consistently generated, readable placeholders, thereby enabling the use of production-like datasets in non-production environments without compromising privacy. The script intelligently handles various data types, including structured JSON and parameterized strings, ensuring that the anonymized output retains its original format and relationships.

Highlights

  • New Anonymization Script: A new Go command-line tool, cmd/anonymize/main.go, has been added to facilitate the anonymization of sensitive data within CSV files.
  • Data Obfuscation Logic: The script implements sophisticated logic to replace original data in fields like names, tags, requirements, file metadata, preconditions, steps, parameter values, and custom fields with human-readable, yet meaningless, generated strings. This includes 'Lorem Ipsum' style text, 'adj-animal' combinations for tags, and 'REQ Adj Noun' for requirements.
  • Structure Preservation: Crucially, the anonymization process preserves the structural integrity of the data, including JSON fields and parameter placeholders (e.g., ${param_name}), ensuring that the output remains syntactically valid and functional for its intended use cases.
  • Go Version Update: The project's Go module has been updated to require Go version 1.25, reflecting a general update to the development environment.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new script for anonymizing CSV data. The overall structure is good, leveraging a state struct and helper functions for clarity and performance optimizations like using strings.Builder. However, the current implementation has several critical issues that could lead to panics when processing invalid or unexpected data formats in the CSV. I've identified multiple places where the code can panic due to unchecked assumptions about the input data, such as out-of-bounds slice access and nil map access. There is also a logic bug in how anonymized filenames are generated. My review provides specific suggestions to make the script more robust and correct.

Comment on lines +221 to +222
matches := requirementsMDLinkRegex.FindStringSubmatch(requirement)
name := matches[1]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Accessing matches[1] will cause a panic if requirementsMDLinkRegex.FindStringSubmatch does not find a match, as matches will be nil. You should add a check to ensure matches is not nil and has at least 2 elements before accessing the captured group.

                                matches := requirementsMDLinkRegex.FindStringSubmatch(requirement)
				if len(matches) < 2 {
					continue // Or handle error for malformed requirement
				}
				name := matches[1]

read[HeaderSteps] = buf.String()[:buf.Len()-1] // trim trailing newline
}

var parameterValueKeys map[string]string

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The parameterValueKeys map is declared but not initialized. If the HeaderParameterValues column is empty, this map will remain nil, causing a panic on line 331 when it's accessed. It should be initialized to an empty map to prevent this.

                var parameterValueKeys = make(map[string]string)

if err != nil {
panic(err)
}
parameterValueKeys = make(map[string]string, len(parameterValues[0].Values))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Accessing parameterValues[0] will cause a panic if the parameterValues slice is empty (e.g., from an empty JSON array []). You should add a check for len(parameterValues) > 0 before this block of code that processes the parameter values.

Comment on lines +342 to +349
if strings.HasPrefix(jsonVal, `{"value":"`) {
value := jsonVal[len(`{"value":"`) : len(jsonVal)-2]
buf.Reset()
buf.WriteString(`{"value":"`)
buf.WriteString(anon.readableCustomField(value))
buf.WriteString(`"}`)
read[HeaderTemplateSuffixParams+1+i] = buf.String()
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This manual JSON parsing is brittle. It relies on strings.HasPrefix and string slicing, which will fail if the JSON format varies slightly (e.g., with extra whitespace, different key ordering, or additional fields). A much more robust approach is to unmarshal the JSON into a struct. This also allows you to correctly handle the logic described in the comments, like skipping values where isDefault is true.

Suggested change
if strings.HasPrefix(jsonVal, `{"value":"`) {
value := jsonVal[len(`{"value":"`) : len(jsonVal)-2]
buf.Reset()
buf.WriteString(`{"value":"`)
buf.WriteString(anon.readableCustomField(value))
buf.WriteString(`"}`)
read[HeaderTemplateSuffixParams+1+i] = buf.String()
}
var cf struct {
Value *string `json:"value"`
}
if err := json.Unmarshal([]byte(jsonVal), &cf); err == nil && cf.Value != nil {
value := *cf.Value
buf.Reset()
buf.WriteString(`{"value":"`)
buf.WriteString(anon.readableCustomField(value))
buf.WriteString(`"}`)
read[HeaderTemplateSuffixParams+1+i] = buf.String()
}

Comment on lines +470 to +484
func (a *state) readableFileID(original string) string {
if existing, ok := a.files[original]; ok {
return existing
}
a.fileCounter++
readable := "file-" + padInt(a.fileCounter, 3)
a.files[original] = readable
return readable
}

func (a *state) readableFileName(ext string) string {
adj := adjectives[a.fileCounter%len(adjectives)]
noun := nouns[a.fileCounter%len(nouns)]
return "document-" + adj + "-" + noun + ext
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is a logic bug in the coupling between readableFileID and readableFileName. readableFileID increments a.fileCounter only for new file IDs. readableFileName then uses this counter. If an already-seen file ID is processed, readableFileID returns a cached value without incrementing the counter. The subsequent call to readableFileName will then use a stale counter value, leading to incorrect filenames being generated.

To fix this, you should generate and cache both the new ID and new filename together. Consider refactoring into a single function that returns both, and stores them in a struct in your state map. For example:

// In state struct:
type fileMapping struct {
    newID   string
    newName string
}
files map[string]fileMapping

// New function:
func (a *state) getAnonFileInfo(originalID, ext string) (string, string) {
    if mapping, ok := a.files[originalID]; ok {
        return mapping.newID, mapping.newName
    }
    a.fileCounter++
    newID := "file-" + padInt(a.fileCounter, 3)
    adj := adjectives[a.fileCounter%len(adjectives)]
    noun := nouns[a.fileCounter%len(nouns)]
    newName := "document-" + adj + "-" + noun + ext
    a.files[originalID] = fileMapping{newID: newID, newName: newName}
    return newID, newName
}

// In main():
ext := filepath.Ext(files[i].FileName)
files[i].ID, files[i].FileName = anon.getAnonFileInfo(files[i].ID, ext)

Comment on lines +156 to +158
if *filename == "" {
panic("no -f flag provided")
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using panic for handling expected errors like invalid user input is not idiomatic for Go applications. It causes an abrupt stop and a stack trace, which isn't user-friendly. Consider handling this error gracefully by printing a message to os.Stderr and exiting with a non-zero status code, for example using log.Fatal or fmt.Fprintf(os.Stderr, ...) followed by os.Exit(1). This applies to other panic calls for file I/O and parsing errors in this function as well.

Comment on lines +407 to +411
for j, r := range word {
a.textBuf.WriteRune(unicode.ToUpper(r))
a.textBuf.WriteString(word[j+utf8.RuneLen(r):])
break
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This for loop to capitalize the first letter of a word is unconventional as it's designed to only run once. A more idiomatic and clearer approach would be to decode the first rune and then capitalize it, which avoids the loop structure.

                        r, size := utf8.DecodeRuneInString(word)
				a.textBuf.WriteRune(unicode.ToUpper(r))
				a.textBuf.WriteString(word[size:])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants