. ====== Ch5.5 Reducing Miss Rate (也是五種) ======
※ 3種Miss Type(失誤類型):
1. Compulsory—The very first access to a block cannot be in the cache, so the block must be brought into the cache. These are also called cold start misses or first reference misses.
2. Capacity—If the cache cannot contain all the blocks needed during execution of a program, capacity misses (in addition to compulsory misses) will occur because of blocks being discarded and later retrieved.
3. Conflict—If the block placement strategy is set associative or direct mapped, conflict misses (in addition to compulsory and capacity misses) will occur because a block may be discarded and later retrieved if too many blocks map to its set. These misses are also called collision misses or interference misses. The idea is that hits in a fully associative cache which become misses in an N-way set associative cache are due to more than N requests on some popular sets.
※ <方法1.> Larger Block Size : 較大區塊利用空間區域性來減低失誤率(減低Compulsory Miss),但增加penalty,與Conflict Miss.
※ <方法2.> Larger Cache : 減低Capacity Miss, 但是會有較長的Hit time,及較高成本.
※ <方法3.> Higher Associativity : 2:1 cache rule of thumb : a direct-mapped cache of size N has about the same miss rate as a 2-way set associative cache of size N/2. This held for cache sizes less than 128 KB.
※ <方法4.> Way Prediction & Pseudo-Associative Caches : 只檢查快取記憶體中的一部份來看是否命中,若失誤,再檢查其他部分, (減低Conflict Miss, )
In way-prediction, extra bits are kept in the cache to predict the set of the next cache access. This prediction means the multiplexor is set early to select the desired set, and only a single tag comparison is performed that clock cycle. A miss results in checking the other sets for matches in subsequent clock cycles. pseudo-associative or column associative. Accesses proceed just as in the direct-mapped cache for a hit. On a miss, however, before going to the next lower level of the memory hierarchy, a second cache entry is checked to see if it matches there. A simple way is to invert the most significant bit of the index field to find the other block in the “pseudo set.” ※ <方法5.> Compiler Optimization : 1. Loop Interchange :
/* Before */ for (j = 0; j < 100; j = j+1)
for (i = 0; i < 5000; i = i+1)
x[i][j] = 2 * x[i][j];
/* After */ for (i = 0; i < 5000; i = i+1)
for (j = 0; j < 100; j = j+1)
x[i][j] = 2 * x[i][j];
2. Blocking :
/* Before */ for (i = 0; i < N; i = i+1)
for (j = 0; j < N; j = j+1)
{ r = 0;
for (k = 0; k < N; k = k + 1)
r = r + y[i][k]*z[k][j];
x[i][j] = r;
};
/* After */ for (jj = 0; jj < N; jj = jj+B)
for (kk = 0; kk < N; kk = kk+B)
for (i = 0; i < N; i = i+1)
for (j = jj; j < min(jj+B,N); j = j+1)
{ r = 0;
for (k = kk; k < min(kk+B,N); k = k + 1)
r = r + y[i][k]*z[k][j];
x[i][j] = x[i][j] + r;
};
====== Ch5.6 Reducing Cache miss penalty or Miss rate via parallelism (三種) ======
※方法1☆ Nonblocking Caches to Reduce Stalls on Cache Misses (無阻隔式快取記憶體): 又稱 lockup-free cache (無鎖式快取記憶體) 1. Hit under miss optimization : Reduces the effective miss penalty by being helpful during miss instead of ignoring the requests of CPU. 2. Hit under multiple miss (or miss under miss) : It is beneficial only if the memory system can service multiple misses.
※方法2☆ Hardware prefetching of instructions and data :
※方法3☆ compiler-controlled prefetching :
1. an alternative to hardware prefetching is for the compiler to insert preftech instructions.
2. The most effective prefetch is “semantically invisible”to a program : - It doesn’t change the contents of registers and memory, and - It cannot cause virtual memory faults.
沒有留言:
張貼留言