I think in the function hll::HyperLogLog::merge it should not be the bitwise or between the two registers:
if (M_[r] < other.M_[r]) {
M_[r] |= other.M_[r];
}
Instead it should be a simple assignment:
if (M_[r] < other.M_[r]) {
M_[r] = other.M_[r];
}
In my application I use the HyperLogLog sketches mainly to estimate merged unions. I noticed that sometimes the estimate is too high and the relative error skyrockets. I found that when I exclusively use hll::HyperLogLog::add the estimate is good.
When I changed the above snippet, the value from the merging routine was exactly the same as the one from just adding. With my understanding of the HyperLogLog algorithm I also think this is theoretically the correct thing to do.
I don't know about the HyperLogLogHIP, because I have not yet read the article about that estimator. But it also doesn't give good estimates when I do a lot of merging.
I think in the function
hll::HyperLogLog::mergeit should not be the bitwise or between the two registers:Instead it should be a simple assignment:
In my application I use the HyperLogLog sketches mainly to estimate merged unions. I noticed that sometimes the estimate is too high and the relative error skyrockets. I found that when I exclusively use
hll::HyperLogLog::addthe estimate is good.When I changed the above snippet, the value from the merging routine was exactly the same as the one from just adding. With my understanding of the HyperLogLog algorithm I also think this is theoretically the correct thing to do.
I don't know about the HyperLogLogHIP, because I have not yet read the article about that estimator. But it also doesn't give good estimates when I do a lot of merging.