Skip to content

mfence vs lfence vs cpuid #32

@oreparaz

Description

@oreparaz

We've been a bit lazy on how we're using RDTSC. The original piece of code (probably about 10 years ago) had this comment:

Intel actually recommends calling CPUID to serialize the execution flow
 and reduce variance in measurement due to out-of-order execution.
 We don't do that here yet.
 see §3.2.1 http://www.intel.com/content/www/us/en/embedded/training/ia-32-ia-64-benchmark-code-execution-paper.html

That link is gone, but the paper can be found in mirrors. It's a good resource and has the following advice. We should probably just follow it:

1 2

Resources:

uint64_t rdtsc() {
  uint64_t a, d;
  asm volatile ("mfence");
  asm volatile ("rdtsc" : "=a" (a), "=d" (d));
  a = (d<<32) | a;
  asm volatile ("mfence");
  return a;
}
long long ticks(void)
{
  unsigned long long result;
  asm volatile(".byte 15;.byte 49;shlq $32,%%rdx;orq %%rdx,%%rax"
    : "=a"(result) :: "%rdx");
  return result;
}

The test programs use the serializing instruction CPUID before and after reading the time stamp counter in order to prevent out-of-order execution to interfere with the measurements.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions