Try to speed up coverage compute (again)#138
Conversation
|
This pull request sets up GitHub code scanning for this repository. Once the scans have completed and the checks have passed, the analysis results for this pull request branch will appear on this overview. Once you merge this pull request, the 'Security' tab will show more code scanning analysis results (for example, for the default branch). Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results. For more information about GitHub code scanning, check out the documentation. |
|
|
||
| """; | ||
|
|
||
| await context.Database.ExecuteSqlRawAsync(upsertSql); |
Check failure
Code scanning / CodeQL
SQL query built from user-controlled sources High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 6 months ago
To fix this issue, we should avoid building SQL queries with untrusted data using string interpolation or concatenation. Instead, use parameterized queries, even for batch inserts or upserts. In Entity Framework Core, ExecuteSqlRawAsync can accept SQL parameter placeholders (e.g., @param) and separate parameter values.
When building upserts for batches, the typical approach is to use parameters for each inserted value. Since batching with dynamic parameter numbers is awkward, the best way is to create a list of parameters for each value, and reference them accordingly in the VALUES clause. This preserves the bulk-upsert functionality without exposing it to SQL injection.
Steps:
- Replace interpolated values in the
upsertSqlstring andvaluesListwith parameter placeholders. - Create and add the corresponding parameters to the call to
ExecuteSqlRawAsync. - Ensure all values, especially
userId, are added as SQL parameters and not interpolated.
All changes are to be made in Jiten.Api/Jobs/ComputationJob.cs within the batch upsert logic in the ComputeUserCoverage method.
| @@ -92,20 +92,28 @@ | ||
| { | ||
| var batch = coverageResults.Skip(i).Take(batchSize).ToList(); | ||
|
|
||
| var valuesList = string.Join(", ", batch.Select(r => | ||
| $"('{userId}'::uuid, {r.DeckId}::numeric, {r.Coverage}::numeric, {r.UniqueCoverage}::numeric)")); | ||
| var parameters = new List<object>(); | ||
| var values = new List<string>(); | ||
| for (var j = 0; j < batch.Count; j++) | ||
| { | ||
| values.Add($"(@userId{j}, @deckId{j}, @coverage{j}, @uniqueCoverage{j})"); | ||
| parameters.Add(new Npgsql.NpgsqlParameter($"userId{j}", userId)); // Use NpgsqlParameter for PostgreSQL | ||
| parameters.Add(new Npgsql.NpgsqlParameter($"deckId{j}", batch[j].DeckId)); | ||
| parameters.Add(new Npgsql.NpgsqlParameter($"coverage{j}", batch[j].Coverage)); | ||
| parameters.Add(new Npgsql.NpgsqlParameter($"uniqueCoverage{j}", batch[j].UniqueCoverage)); | ||
| } | ||
| var valuesList = string.Join(", ", values); | ||
|
|
||
| var upsertSql = $""" | ||
| INSERT INTO "user"."UserCoverages" ("UserId", "DeckId", "Coverage", "UniqueCoverage") | ||
| VALUES {valuesList} | ||
| ON CONFLICT ("UserId", "DeckId") | ||
| DO UPDATE SET | ||
| "Coverage" = EXCLUDED."Coverage", | ||
| "UniqueCoverage" = EXCLUDED."UniqueCoverage"; | ||
| INSERT INTO "user"."UserCoverages" ("UserId", "DeckId", "Coverage", "UniqueCoverage") | ||
| VALUES {valuesList} | ||
| ON CONFLICT ("UserId", "DeckId") | ||
| DO UPDATE SET | ||
| "Coverage" = EXCLUDED."Coverage", | ||
| "UniqueCoverage" = EXCLUDED."UniqueCoverage"; | ||
| """; | ||
|
|
||
| """; | ||
|
|
||
| await context.Database.ExecuteSqlRawAsync(upsertSql); | ||
| await context.Database.ExecuteSqlRawAsync(upsertSql, parameters.ToArray()); | ||
| } | ||
| } | ||
| finally |
No description provided.