2007年12月18日 星期二

[EE_CSIE] Computer Architecture Chapter04 Notes (4)

=== Ch4.5 Hardware Support for Exposing more parallelism at compile time === 

 ※ such as loop unrolling, software pipelining, and trace scheduling can be used to increase the amount of parallelism available when the behavior of branches is fairly predictable at compile time. When the behavior of branches is not well known, compiler techniques alone may not be able to uncover much ILP. 

 ※ 將指令擴充 : 
The first is an extension of the instruction set to include conditional (條件指令) or predicated (預測指令) instructions. 

  ※ 條件指令(Conditional instructions) : 
1. An instruction refers to a condition, which is evaluated as part of the instruction execution. 
2. If the condition is true, the instruction is executed normally. 
3. If the condition is false, the execution continues as if the instruction were no-op (空指令). 例如: if (A==0) {S=T;} 

  ※ Compiler Speculation with Hardware Support : 
To speculate ambitiously requires 3 capabilities: (良好的預測執行3要素) 
1. the ability of the compiler to find instructions that, with the possible use of register renaming, can be speculatively moved and not affect the program data flow, 
2. the ability to ignore exceptions in speculated instructions, until we know that such exceptions should really occur, and 
3. the ability to speculatively interchange loads and stores, or stores and stores, which may have address conflicts. 

  ※ Hardware Support for Preserving Exception Behavior : There are 4 methods that have been investigated for supporting more ambitious speculation without introducing erroneous exception behavior: 
1. The H/W and OS cooperatively ignore exceptions for speculative instructions.  - this approach preserves exception behavior for correct programs, but not for incorrect ones.  This approach may be viewed as unacceptable for some programs, but it has been used, under program control, as a “fast mode” in several processors. 
 2. Speculative instructions that never raise exceptions are used, and checks are introduced to determine when an exception should occur. 
 3. A set of status bits, called poison bits, are attached to the result registers written by speculated instructions when the instructions cause exceptions. The poison bits cause a fault when a normal instruction attempts to use the register. 
 4. A mechanism is provided to indicate that an instruction is speculative and the H/W buffers the instruction result until it is certain that the instruction is no longer speculative. 

 ---------------------------------------------------------- 
Example1
Here is an unusual loop. First, list the dependences and then rewrite the loop so that it is parallel.   for (i=1;i
<100 a="" b="" c="" d="" e="" i="" s1="" s2="" s3="" span="" style="font-weight: bold;">Solution : 
1. S2 to S1以及 S3 to S1, a[] -> true-dep. 
 2. S1 to S2, bi -> anti-dep. 
3. S3 to S1 loop-carried output-dep. 
 4. S3 to S2 loop-carried true-dep. 
 5. S3 to S3 loop-carried true-dep. 
 化解為:
   for (i = 1; i < 100; i = i + 1) {
     a[i] = b[i] + c[i]; //S1
     b[i] = a[i] + d[i]; //S2
   }
   a[100] = a[99] + e[99]; 

 ---------------------------------------------------------- 
EXAMPLE2: Here is a simple code fragment:
  for (i=2;i<=100;i+=2)
    a[i] = a[50*i+1]; 
 To use the GCD test, this loop must first be “normalized”

—written so that the index starts at 1 and increments by 1 on every iteration. Write a normalized version of the loop (change the indices as needed), then use the GCD test to see if there is a dependence. Solution : normalized正規化 
=>  for(i<1 1="" a="2," b="0," c="100," d="1" gcd="" i="" test=""> gcd(2,100)=2 且 d-b=1, 因為1是2的因數, 所以有相依性存在. (但是,實際上,Loop 載入順序是 a[101], a[201], …,a[5001]並指到 a[2], a[4],…,a[100]並不是相依性)
<1 1="" a="2," b="0," c="100," d="1" gcd="" i="" test="">

沒有留言:

張貼留言