Assignment 2; Memory micro benchmark

Table of Contents

Assignments

The assignment contains two different exercises. You only have to complete one of the exercises.

  • Exercise 1 Measure and plot cache and memory access latencies for a specific computer using a micro benchmark.
  • Exercise 2 Theoretically compute and plot cache and memory access curves given cache size, block size etc.

Before you start; read the instructions and choose your exercise (1 or 2). You may work by yourself or in groups of two. Before you start, send me, ( Håkan Zeffer ), an e-mail with the name of the exercise (1 or 2) that you plan to do. If you plan to do exercise 1, also include the name of the microprocessor you plan use.

Presenting your work and what to hand in

Assignment 2 is examined with a short report and an oral exam. You shall write a short report that includes and explaines the diagram you have generated. If you have done exercise 1, present your diagram and all information, such as cache size, that you have managed to get from your diagram. If you do exercise 2, present some example diagrams, and the formulae you use (plus a simple explanation of it).

For both exercises, include some discussion about how you performed the experiments, clever ideas, problems and solutions, things you can explain and things you can not explain in the diagram, etc.

When you are done with your exercise, try to find another group that has done the other exercise, e.g. if you did exercise 1 then find a group that did exercise 2, and try to find parameters for the theoretical micro benchmark so that the diagrams match. Discuss the results in your report.

The oral exam will be a short 15 minutes discussion with me ( Håkan Zeffer ). A booking list is now available on my door (house 1, floor 2, room 221).

The micro benchmark

The micro benchmark you will construct is a program that measures memory access latency when it strides (with different stride sizes) through an array of varying size. The output is a graph that contains the access times for given strides and array sizes. The stride describes how many elements there are between each access in the array, e.g., an array of size 512B with stride 128B is accessed a total number of 4 times. The result of a microbenchmark can look like:

strideexample.jpg

A memory access latency micro benchmark can be used to study many computer architecture parameters. For example:

  • L1 cache size
  • L1 cache associativity
  • L1 cache line size
  • L1 cache access time
  • L2 cache size
  • L2 cache associativity
  • L2 cache line size
  • L2 cache access time
  • TLB entries
  • TLB associativity
  • TLB lookup time
  • Page size
  • Memory access time

Exercise 1; Micro benchmark program

The exercise is to write a program that measures the time it takes to access memory for different array sizes. A well commented program skeleton can be extracted from the /it/kurs/dark2/ht04/lab2/stridelab.tar.gz archive:

host$ tar xvfz /it/kurs/dark2/ht04/lab2/stridelab.tar.gz

stridelab/
stridelab/stride.c
stridelab/Makefile
stridelab/strideplot.m
host$

The file stridelab/Makefile can be used to compile the stridelab/stride.c program. The file stridelab/strideplot.m can be used to plot the output from the micro benchmark, using matlab. The skeleton program can be compiled with the makefile. However, it does not generate any memory system latencies.

Your work is to complete the function static float micro_benchmark(volatile char *array, int stride, int size, float *dummy). It is supposed to return the average access time from the volatile array. array is a pointer to the array in which you shall access elements. stride is the number of bytes of the stride. size is the size of the array in bytes and dummy is a help variable (described later). The conduct_experimen function calls the micro_benchmark function for each combination of stride and array sizes.

The stride micro benchmark takes the following command line arguments: ./stride [min stride size] [max stride size] [min array size] [max array size]. You can choose strides between 1 and 8388608 (8MB) and array sizes between 4096 (4kB) and 16777216 (16MB). All values have to be powers of two. The stride and array sizes should be chosen large enough for some of the values to miss in both the L1 and the L2 caches. Typical values for a test are ./stride 4 4194304 16384 8388608, that is, the stride sizes will be 4, 8, ..., 4194304 bytes and the array sizes will be 16384, 32768, ..., 8388608 bytes.

Some hints:

  • Use the highrestimer function when computing access times in nanoseconds. The file stride.c contains two versions of the function. One for Solaris targets and one for Linux targets. Use the cpp(1) program to choose between them.
  • Repeat the time measurements a large number of times (probably > 1000) and compute the average access time to get better timing results.
  • An optimizing compiler may remove loads if the resulting value is not used. It might be good to write the micro_benchmark in assembler or to actually use the value loaded from the array. If you choose to use the value from the load, it can be good to "return" it using the dummy pointer. Other alternatives include doing both a load and a store to each address.
  • Measure the latency overhead the loops introduce and subtract it from the load loop times.
  • It might be a good idea to repeat the experiment and take the minimum time for each combination of stride length and array size.

Exercise 2; Theoretical micro benchmark program

In this exercise the time measurements have been replaced with a theoretical modeling of the access times, based on the parameters given above (L1 access time etc.). You should write a program that takes these parameters as input values and then computes the access times for varying stride and array sizes. A well commented skeleton program can be extracted from the /stud/kurs/dark2/ht03/lab2/stridesim.tar.gz archive:

host$ tar xvfz /it/kurs/dark2/ht04/lab2/stridesim.tar.gz

stridesim/
stridesim/stridesim.c
stridesim/Makefile
stridesim/strideplot.m
stridesim/Architecture.stats
host$

Input parameters can be found in stridesim/Architecture.stats. The program can be compiled with the makefile but it does not generate any timing results.

Your work is to write the function static float micro_benchmark_theoretic(int stride, int size, memhier_t *memhier) in the stridesim.c file. memhier is a pointer to a memhier_t struct that contains all the parameters that are read from the input file. stride is the number of bytes of the stride, size is the size of the array in bytes. The function is supposed to return the computed access time for a given array size and stride. The conduct_experiment function calls the micro_benchmark function for each combination of stride and array sizes.

The command line arguments are: ./stridesim [min stride size] [max stride size] [min array size] [max array size]. You can choose stride between 1 and 8388608 (8MB) and array sizes between 4096 (4kB) and 16777216 (16MB). The values have to be powers of two. The strides and array sizes should be chosen large enough for some of the values to miss in both the L1 and the L2 cache. Typical values for a computation are: ./stridesim 4 4194304 16384 8388608, that is, the stride sizes will be 4, 8, ..., 4194304 bytes and the array sizes will be 16384, 32768, ..., 8388608 bytes.

Some hints:

  • Compute cache miss (or hit) rates for each cache level and the TLB in separate.
  • Divide the computation of cache hit/miss rates in a number of different cases.
  • Accesses that miss in the L1 cache might hit in the L2 cache. Accesses that miss in the L2 cache always hit in memory.
  • The time for TLB fill handling has to be added to the total memory access time. A TLB hit costs nothing but a miss adds large penalties.

When you are done with your program, make sure to test it with different access times etc. Do this by changing the input parameters in the Architecture.stats file.

Generating diagrams

There is a matlab subroutine (strideplot) located in both the archives found in /stud/kurs/dark2/ht03/lab2. First, generate your output data:

host$ ./stride 4 4194304 16384 8388608 > mydata.dat

Second, start matlab:

host$ matlab &

Finally, plot the data. This is done with a command to matlab (in the matlab prompt, the window to the right):

matlab>> strideplot('mydata.dat')

The diagram will be drawn in a separate window. The picture can be exported as postscript or jpeg under the file menu. You may of course use any other tool to produce the diagram.