Make stats and string matching case-insensitive#9
Make stats and string matching case-insensitive#9ltning wants to merge 4 commits intoDRuggeri:masterfrom
Conversation
| result.Matched = true | ||
| result.QueryClient = match[1] | ||
| result.QueryName = match[2] | ||
| result.QueryName = strings.ToLower(match[2]) |
There was a problem hiding this comment.
Thanks for the PR! I think the idea is great, but I detect a possible bug here. Since BIND will log the query exactly as received, I think we want to fold the entire input down to lower case first or make the match case insensitive. If we don't, we'll have an issue under the following circumstances:
- The list of matches includes "FOOBAR.com" (but this is folded to "foobar.com") by the changes in names_collector.go
- The client queries for "FooBar.com" -> therefore, no match against "foobar"
Conversely, if we fold the whole line down to lower case (maybe on line 39 with line = strings.ToLower(line), then we'll have a successful match. Since DNS RFC says that DNS names are not case sensitive, a non-case match is indeed a match... I think this is fine.
To me, it feels like it's more obvious to fold to lower case than to check/enforce the regex to have the case-insensitive prepend flag. WDYT?
There was a problem hiding this comment.
Uhm, I'm not sure what I missed? Wouldn't my PR make sure that 'foobar.com' is inserted in the array and by ToLower-ing the name when looking it up, isn't this going to cover both cases?
There was a problem hiding this comment.
You're close - the only issue is that the string coming into this function (in line) is coming straight from the log file. I did a quick test and confirmed that BIND will log the value exactly as it is received. So, earlier on in the PR, you're correctly coaxing the search strings to lower case - but at this point, the lower case string is compared to a mixed case value in line if the user queried for "FoObAr". So I'm thinking that if we convert line to lower case around line 39, it'd be all set.
New: Strings read from the inc/excl files are rendered lowercase when adding to the internal lists. Names from log are lowercased at ingestion.
As it was, stats were produced for each variation of each name, in a case sensitive manner. This balloons the stats massively and makes it less intuitive to make per-name/domain statistics and monitoring. It is assumed that in the vast majority of cases, case is not interesting.