2007年12月20日 星期四

[EE_CSIE] Computer Architecture Chapter05 Notes (7)

. ====== Ch5.8 Main memory and Organization for improving performance (三種) ====== 


※ 1. Wider Main Memory : 若寬度變兩倍,存取記憶體只需原來的一半 

 ※ 2. Simple Interleaved Memory : 簡單交錯式的記憶體 

 ※ 3. Independent Memory Banks : 獨立式記憶體庫 


  ====== Ch5.10 Virtual Memory ====== 

Virtual Memory (虛擬記憶體) : 

1. A means of sharing a smaller amount of physical memory among many processes. 

2. It divides physical memory into blocks and allocates them to different processes. 


 ※ There are further differences between caches and virtual memory beyond those quantitative : 

1. Replacement on cache misses is primarily controlled by hardware, while virtual memory replacement is primarily controlled by the operating system. The longer miss penalty means it’s more important to make a good decision, so the operating system can be involved and spend take time deciding what to replace. 

2. The size of the processor address determines the size of virtual memory, but the cache size is independent of the processor address size. 

3. In addition to acting as the lower-level backing store for main memory in the hierarchy, secondary storage is also used for the file system. In fact, the file system occupies most of secondary storage. It is not normally in the address space.


※ Translation lookaside buffer (TLB) 轉換後備緩衝區 : 

- also called translation buffer (TB) 

- It is special address translation cache. 

- A TLB entry is like a chche entry where the tag holds portions of the virtual address and the data portion hold a physical page frame number, protection field, valid bit, and usually a use bit and dirty bit.


Selecting a Page Size : 分頁大小 

=> The following favor a larger size: 

1. The size of the page table is inversely proportional to the page size; memory (or other resources used for the memory map) can therefore be saved by making the pages bigger. 

2. As mentioned on page 433 in section 5.7, a larger page size can allow larger caches with fast cache hit times. 

3. Transferring larger pages to or from secondary storage, possibly over a network, is more efficient than transferring smaller pages. 

4. The number of TLB entries are restricted, so a larger page size means that more memory can be mapped efficiently, thereby reducing the number of TLB misses.


====== Ch5.11 Protection and Examples of Virtual Memory ====== 

※Protecting Processes The simplest protection mechanism is a pair of registers that checks every address to be sure that it falls between the two limits, traditionally called base and bound. An address is valid if   Base <= Address <= Bound In some systems, the address is considered an unsigned number that is always added to the base, so the limit test is just   (Base + Address) <= Bound 

※ the Computer Designer has 3 more responsibilities in helping the OS Designer protect processes from each other: 

1. Provide at least two modes, indicating whether the running process is a user process or an operating system process. This latter process is sometimes called a kernel process, a supervisor process, or an executive process. 

2. Provide a portion of the CPU state that a user process can use but not write. This state includes the base/bound registers, a user/supervisor mode bit(s), and the exception enable/disable bit. Users are prevented from writing this state because the operating system cannot control user processes if users can change the address range checks, give themselves supervisor privileges, or disable exceptions. 

3. Provide mechanisms whereby the CPU can go from user mode to supervisor mode and vice versa. The first direction is typically accomplished by a system call, implemented as a special instruction that transfers control to a dedicated location in supervisor code space. The PC is saved from the point of the system call, and the CPU is placed in supervisor mode. The return to user mode is like a subroutine return that restores the previous user/supervisor mode.


End.

[EE_CSIE] Computer Architecture Chapter05 Notes (6)

.

====== Ch5.7 Reduce Hit-Rate (四種) ======


※ (方法1) Small and Simple Caches :
1. A time-consuming portion of a cache hit is using the index portion of the address to read the tag memory and then compare it to the address. - smaller hardware is faster - keep the cache simple
2. main benefit of direct-mapped caches:the designer can overlap the tag check with the transition of the data.

※(方法2) Avoiding Address Translation During Indexing of the Cache :
1. Using virtual addresses for the cache, since hits are much more common than misses.
2. Why doesn’t everyone build virtually addressed caches ?
- One reason is protection
- another reason is that every time a process is switched the virtual addresses refer to different physical addresses, requiring the cache to be flushed.

※ (方法3) Pipelined Cache Access :

※ (方法4) Trace Caches :
- Instead of limiting the instructions in a static cache block to spatial locality, a trace cache finds a dynamic sequence of instructions including taken branches to load into a cache block.
- It comes from the cache blocks containing dynamic traces of the executed instructions as determined by the CPU transfer than containing static sequences of instructions as determined by memory.

綜合分析: + 表示改進,– 表示負面影響。


Figure 5.26 ... from
Computer Architecture : A Quantitative Approach


[EE_CSIE] Computer Architecture Chapter05 Notes (5)

. ====== Ch5.5 Reducing Miss Rate (也是五種) ====== 

※ 3種Miss Type(失誤類型): 

1. Compulsory—The very first access to a block cannot be in the cache, so the block must be brought into the cache. These are also called cold start misses or first reference misses. 

2. Capacity—If the cache cannot contain all the blocks needed during execution of a program, capacity misses (in addition to compulsory misses) will occur because of blocks being discarded and later retrieved. 

3. Conflict—If the block placement strategy is set associative or direct mapped, conflict misses (in addition to compulsory and capacity misses) will occur because a block may be discarded and later retrieved if too many blocks map to its set. These misses are also called collision misses or interference misses. The idea is that hits in a fully associative cache which become misses in an N-way set associative cache are due to more than N requests on some popular sets. 


  ※ <方法1.> Larger Block Size : 較大區塊利用空間區域性來減低失誤率(減低Compulsory Miss),但增加penalty,與Conflict Miss. 

  ※ <方法2.> Larger Cache : 減低Capacity Miss, 但是會有較長的Hit time,及較高成本. 

  ※ <方法3.> Higher Associativity : 2:1 cache rule of thumb : a direct-mapped cache of size N has about the same miss rate as a 2-way set associative cache of size N/2. This held for cache sizes less than 128 KB. 

  ※ <方法4.> Way Prediction & Pseudo-Associative Caches : 只檢查快取記憶體中的一部份來看是否命中,若失誤,再檢查其他部分, (減低Conflict Miss, )

 In way-prediction, extra bits are kept in the cache to predict the set of the next cache access. This prediction means the multiplexor is set early to select the desired set, and only a single tag comparison is performed that clock cycle. A miss results in checking the other sets for matches in subsequent clock cycles. pseudo-associative or column associative. Accesses proceed just as in the direct-mapped cache for a hit. On a miss, however, before going to the next lower level of the memory hierarchy, a second cache entry is checked to see if it matches there. A simple way is to invert the most significant bit of the index field to find the other block in the “pseudo set.” ※ <方法5.> Compiler Optimization : 1. Loop Interchange :

/* Before */  for (j = 0; j < 100; j = j+1)

   for (i = 0; i < 5000; i = i+1)

    x[i][j] = 2 * x[i][j];

/* After */  for (i = 0; i < 5000; i = i+1)

   for (j = 0; j < 100; j = j+1)

    x[i][j] = 2 * x[i][j];

2. Blocking :

/* Before */  for (i = 0; i < N; i = i+1)

   for (j = 0; j < N; j = j+1)

     { r = 0;

      for (k = 0; k < N; k = k + 1)

       r = r + y[i][k]*z[k][j];

       x[i][j] = r;

    };

/* After */  for (jj = 0; jj < N; jj = jj+B)

   for (kk = 0; kk < N; kk = kk+B)

    for (i = 0; i < N; i = i+1) 

    for (j = jj; j < min(jj+B,N); j = j+1)

      { r = 0;

        for (k = kk; k < min(kk+B,N); k = k + 1)

         r = r + y[i][k]*z[k][j];

        x[i][j] = x[i][j] + r;

      };


====== Ch5.6 Reducing Cache miss penalty or Miss rate via parallelism (三種) ====== 

※方法1☆ Nonblocking Caches to Reduce Stalls on Cache Misses (無阻隔式快取記憶體): 又稱 lockup-free cache (無鎖式快取記憶體) 1. Hit under miss optimization : Reduces the effective miss penalty by being helpful during miss instead of ignoring the requests of CPU. 2. Hit under multiple miss (or miss under miss) : It is beneficial only if the memory system can service multiple misses. 

 ※方法2☆ Hardware prefetching of instructions and data : 

※方法3☆ compiler-controlled prefetching : 

1. an alternative to hardware prefetching is for the compiler to insert preftech instructions. 

2. The most effective prefetch is “semantically invisible”to a program : - It doesn’t change the contents of registers and memory, and - It cannot cause virtual memory faults.


2007年12月19日 星期三

[EE_CSIE] Computer Architecture Chapter05 Notes (4)

. ====== Ch5.4 Reducing Cache Miss Penalty (五種) ====== ※ [方法1] Multi-Level Caches :

Average memory access time = Hit timeL1 + Miss rateL1 × Miss penaltyL1 and Miss penaltyL1 = Hit timeL2 + Miss rateL2 × Miss penaltyL2

所以

Average memory access time = Hit timeL1 + Miss rateL1× (Hit timeL2 + Miss rateL2 × Miss penaltyL2)

1. Local miss rate : This rate is simply the number of misses in a cache divided by the total number of memory accesses to this cache. As you would expect, for the first-level cache it is equal to Miss rate L1 and for the second-level cache it is Miss rate L2. 2. Global miss rate : The number of misses in the cache divided by the total number of memory accesses generated by the CPU. Using the terms above, the global miss rate for the first-level cache is still just Miss rateL1 but for the second-level cache it is Miss rate L1 × Miss rate L2.

Average memory stalls per instruction = Misses per instruction L1× Hit time L2 + Misses per instruction L2 × Miss penalty L2.

EXAMPLE Suppose that in 1000 memory references there are 40 misses in the 1st-level cache and 20 misses in the 2nd-level cache. What are the various miss rates? Assume the Miss penalty from L2 cache to Memory is 100 clock cycles, the Hit time of L2 cache is 10 clock cycles, the Hit time of L1 is 1 clock cycles, and there are 1.5 memory references per instruction. What is the average memory access time and average stall cycles per instruction? Ignore the impact of writes. ANSWER : 1st-level local and global miss rate = 40 / 1000 = 4% 2nd-level local miss rate = 20 / 40 = 50% 2nd-level global miss rate = 20 / 1000 = 2% => Average memory access time  = 1 + 4%(10 + 50% × 100 ) = 3.4 clock cycles. (若無L2 => Average memory access time = 1 + 4% × 100 = 5 clock cycles.) 1.5 memory references per instruction => 1000 memory reference per 667 instructions. 所以 每千個指令的失誤率 Miss Rate L1 = 40*1.5 = 60 , Miss Rate L2 = 20*1.5=30 Average memory stalls per instruction = Misses per instruction L1× Hit time L2 + Misses per instruction L2 × Miss penalty L2 = (60/1000) × 10 + (30/1000) × 100 = 0.060 × 10 + 0.030 × 100 = 3.6 clock cycles

另一種算法是 (Average memory access time - L1 Hit time ) × 平均Cache存取次數 = (3.4 – 1.0) * 1.5 = 3.6 clock cycles. ※ [方法2] Critical word first & Early restart : (只對Block大的Cache有效) 1. Critical word first : Request the missed word first from memory and send it to the CPU as soon as it arrives; let the CPU continue execution while filling the rest of the words in the block. Critical-word-first fetch is also called wrapped fetch and requested word first. 2. Early restart : Fetch the words in normal order, but as soon as the requested word of the block arrives, send it to the CPU and let the CPU continue execution. ※ [方法3] Giving Priority to Read Misses over Writes : This optimization serves reads before writes have been completed. ※ [方法4] Merging write buffer : (將連續字元組的多個寫入動作合併為單一區塊) If the buffer contains other modified blocks, the addresses can be checked to see if the address of this new data matches the address of the valid write buffer entry. If so, the new data are combined with that entry, called write merging. ※ [方法5] Victim caches : One approach to lower miss penalty is to remember what was discarded in case it is needed again. Since the discarded data has already been fetched, it can be used again at small cost.

[EE_CSIE] Computer Architecture Chapter05 Notes (3)

.
====== Ch5.3 Cache Performance ======

Average Memory Access Time (AMAT) = Hit time + Miss Rate × Miss Penalty


EXAMPLE : 比較何者有較低的失誤率,
(假設Cahe為具有寫入緩衝區的直接寫入式cache)
1. 16KB指令快取 + 16KB資料快取,
2. 32KB合併式快取(Unified Cache),
假設36%指令是資料快取, 而命中需1個時脈週期(Hit time=1),
失誤代價=100時脈週期(Miss Penalty=100).
而Unified合併式快取,因無法同時處理兩個要求,Load與Store必須多出額外的1個時脈週期.

ANSWER :
先計算每千個指令的失誤次數轉換為失誤率.

Misses / Instruction = ( Miss Rate × Memory Access ) / Instruction

Miss Rate = [ ( Misses/1000 Instructions ) / 1000 ] / ( Memory Access / Instruction )

因為每個指令存取需要一次的 Memory Access以取得指令:
=> Miss Rate 16KB指令 = [ 3.82 / 1000 ] / 1.00 = 0.004 (佔 74%)
=> Miss Rate 16KB資料 = [ 40.9 / 1000 ] / 0.36 = 0.114 (佔26%)
=> 分離式Cache整體失誤率 = 74% × 0.004 + 26% × 0.114 = 0.0324


合併式Unified Cache必須計算指令與資料存取:
=> Miss Rate 32KB合併式

  = [ 43.3 / 1000 ] / ( 1.00 + 0.36 ) = 0.0318 (比較上,稍微低)

Average memory access time (AMAT)
= % instructions × (Hit time + Instruction miss rate × Miss penalty) +% data × (Hit time + Data miss rate × Miss penalty)



=> AMAT分離式 = 74% × ( 1 + 0.004 × 100 ) + 26% × ( 1 + 0.114 × 100 ) = 4.24

=> AMAT合併式 = 74% × ( 1 + 0.0318 × 100 ) + 26% × ( 1 + 1 + 0.0318 × 100 ) = 4.44

所以, 分離式Cache在每個時脈週期提供兩個記憶體存取,
因而避免掉結構危障.
雖然Miss Rate 較高,但AMAT仍然比僅單一存取阜的Unified Cache要短.


※ 比較有沒有Cache對效能的影響
EXAMPLE
: An in-order execution computer
(Such as Ultra SPARC III), Cache Penalty=100 clock cycles,
all instructions normally take 1.0 clock cycles,
Assume the Average Miss Rate is 2%, there is an average of 1.5 memory references per instruction, and that the average number of Cache Misses per 1000 instructions is 30.
What is the impact on performance when behavior of the cache is included?
Calculate the impact using both misses per instruction and miss rate.
ANSWER :
CPU time
= IC×[ ( CPI execution + (Memory stall clock cycles)/Instruction ] × Clock cycle time

1. 包含Cache失誤的效能---
CPU time = IC × (1.0 + 30/1000 × 100) × Clock cycle time
     = IC × 4.0 × Clock cycle time

2. 使用Miss Rate來計算---
CPU time = IC×[CPI execution + Miss Rate×(Memory accesses/Instruction)×Miss penalty]×Clock cycle time

=> CPU time
 = IC×[1.0 + 2%×1.5×100]×Clock cycle time
  = IC × 4.0 × Clock cycle time

The clock cycle time and (IC) instruction count are the same, with or without a cache.
Thus, CPU time increases fourfold, with CPI from 1.00 for a “perfect cache” to 4.00 with a cache that can miss.
Without any memory hierarchy at all the CPI would increase again to 1.0 + 100 × 1.5 or 151— a factor of almost 40 times longer than a system with a cache!


※ 比較不同架構的Cache 對效能得影響 (Direct-Mapped v.s. 2-way-associative ):
EXAMPLE
Assume that the CPI=2.0 with a perfect cache, the clock cycle time is 1.0 ns, there are 1.5 memory references per instruction, both caches size is 64 KB, and block size of 64 bytes. One cache is direct mapped and the other is two-way set associative. For set-associative caches we must add a multiplexor to select between the blocks in the set depending on the tag match. Since the speed of the CPU is tied directly to the speed of a cache hit, assume the CPU clock cycle time must be stretched 1.25 times to accommodate the selection multiplexor of the set-associative cache. Cache miss penalty=75 ns for either cache organization. First, calculate the average memory access time, and then CPU performance. Assume the hit time=1 clock cycle, the Miss Rate = 1.4% for direct-mapped 64-KB cache, the Miss Rate=1.0% for a two-way set-associative cache.
ANSWER :
Average memory access time = Hit time + Miss rate × Miss penalty

AMAT(1-way) = 1.0 + (0.014×75) = 2.05 ns
AMAT(2-way) = 1.0×1.25 + (0.01×75) = 2.00 ns (AMAT較佳)

CPU time = IC×[CPI execution + Miss Rate×(Memory accesses/Instruction)×Miss penalty]×Clock cycle time

CPU time(1-way) = IC×[2 + 0.0014×1.5×75]×1.0 = 3.58 × IC (CPU time較佳)
CPU time(2-way) = IC×[2×1.25 + 0.01×1.5×75] ×1.0=3.63 × IC
=> 相對效能 = CPU time (2-way) / CPU time(1-way) = 3.63 / 3.58 = 1.01

※ Out-of-Order Execution Processor:
( Memory stall cycles / Instruction )
= ( Misses / Instruction ) * (Total miss latency – Overlapped miss latency )

[EE_CSIE] Computer Architecture Chapter05 Notes (2)

.
Q1. Where can a block be placed in a cache?
1. If each block has only one place it can appear in the cache, the cache is said to be direct mapped.
This mapping is usually : (Block address) MOD (Number of blocks in cache)
2. If a block can be placed anywhere in the cache, the cache is said to be fully associative.
3. If a block can be placed in a restricted set of places in the cache, the cache is set associative.
A set is a group of blocks in the cache. A block is first mapped onto a set, and then the block can be placed anywhere within that set. The set is usually chosen by bit selection;
that is, (Block address) MOD (Number of sets in cache)

If there are n blocks in a set, the cache placement is called n-way set associative.




※ Q2: How is a block found if it is in the cache?
1. offset從區塊選出需要的資料,
2. Index field選出某一集合,
3. Tag field比對是否命中


如果Cache大小不變,增加關連性會增加每個集合中的區塊數,=>Index會縮短,Tag會加長.
Index=0即為Full associative.

※ Q3: Which block should be replaced on a cache miss?
1. Random
2. LRU(least-recently used)
3. FIFO

※ Q4: What happens on a write?
1. Write through : The information is written to both the block in the cache and to the block in the lower-level memory.
2. Write back : The information is written only to the block in the cache. The modified cache block is written to main memory only when it is replaced.

To reduce the frequency of writing back blocks on replacement, a feature called the dirty bit is commonly used. This status bit indicates whether the block is dirty (modified while in the cache) or clean (not modified). If it is clean, the block is not written back on a miss, since identical information to the cache is found in lower levels.


※ When the CPU must wait for writes to complete during write through, the CPU is said to write stall. A common optimization to reduce write stalls is a write buffer, which allows the processor to continue as soon as the data is written to the buffer, thereby overlapping processor execution with memory updating. As we shall see shortly, write stalls can occur even with write buffers.

Since the data are not needed on a write, there are two are two options on a write miss:
1. Write allocate : The block is allocated on a write miss, followed by the write hit actions above. In this natural option, write misses act like read misses.
2. No-write allocate : This apparently unusual alternative is write misses do not affect the cache. Instead, the block is modified only in the lower level memory.


※ Alpha 21264 Data Cache :
Cache Size = 64K bytes = 2^16 bytes.
Block Size = 64 bytes = 2^6 bytes. (block offset = 6)
2-way associativity. Write-back and Write-allocate.

2^index = Cache Size / (Block Size × Set associativity)

2^index = 65526 / (64 × 2) => index = 512 => index field = 9 bits

步驟1 : 21264CPU送出48 bits虛擬位址到Cache以供Tag檢查,並在同一時間將虛擬位址轉換為44 bits的實際位址. => Tag field = 44 -9 -6 = 29 bits

步驟2 : Index選擇,兩個Tag同時被比較,而比較結果相同者被選出,

步驟3 : 兩個標籤被由Cache讀出後,就與由CPU送出的區塊位址中的標籤部分比較.為了確定該標籤包含有效資訊, Valid bit必須為1,否則比較的結果就會被忽略.

步驟4 : 假設其中一個標籤比對符合,就會通知CPU利用比對成功的輸入從2:1Mux載入適當資料.
21264採用Write-back方法,並對每一block利用一個Dirty Bit來記錄其是否曾被寫入.如果該要被置換出的Block (Victim) 曾被修改過,它的資料以及位址就會被送進 Victim buffer.
CPU知道送出的是Instruction Address或Data address,所以不同類型的Address可以使用不同的port.如此可以將記憶體架構與CPU間的頻寬加倍. 分離式Cache也讓設計者可以針對每個Cache做最佳化(不同的容量、區塊大小、關連性都可調整). 而另一種 Unified Cache or mixed Cache are applied to caches that can contain either instructions or data.

[EE_CSIE] Computer Architecture Chapter05 Notes (1)

.
====== Ch5 Memory Hierarchy Design ======
※ The principle of locality (區域性原則) : most programs do not access all code or data uniformly


====== Ch5.2 Review of the ABCs of Caches ======

Cache
Instruction Cache / Data Cache
Unified Cache
Memory Stall Cycles / Misses per instruction / Miss Rate / Miss Penalty
Set / Direct mapping / N-way set Associative / Fully Associative
Block / Block Address / Tag field / Index field / Block offset
Valid bit
Random replacement
LRU
Write through / Write back
Dirty bit
Virtual Memory
Write stall / Write buffer / Write allocate / No-write allocate
Page
Page fault
Average Memory Access Time (AMAT)
Cache Hit / Cache Miss / Hit time
Locality (temporal/Spacial)
Access trace

公式1=>
CPU execution time = ( CPU clock cycles + Memory stall cycles ) × Clock cycle time

Memory stall cycles : CPU為了等待MEM存取時所暫停的時間.

公式2=>
Memory stall cycles = Number of miss × Miss Penalty
= IC × ( Miss / Instruction ) × Miss Penalty

= IC × ( Memory Accesses / Instruction ) × Miss Rate × Miss Penalty

EXAMPLE :
 CPI=4 , 50%屬於資料存取Load與Store,
 Miss Penalty=25, Miss rate=2%,
 若所有都在Cache命中,則快多少?
ANSWER :
保證命中 CPU execution time
 = ( CPU clock cycles + Memory stall cycles ) × Clock cycle time
 = (IC × CPI + 0 ) × Clock cycle time = IC × 1.0 × Clock cycle time

實際上 Memory stall cycles
 = IC × ( Memory Accesses / Instruction ) × Miss Rate × Miss Penalty
 = IC × ( 1 + 0.5 ) × 0.02 × 25 = IC × 0.75
=> CPU execution time
  = (IC × CPI + IC × 0.75) × Clock cycle time
  = IC × 1.75 × Clock cycle time
  所以快了 1.75 倍.

公式3=>
Misses / Instruction
= ( Miss Rate × Memory Access ) / Instruction
= Miss Rate × ( Memory Access / Instruction )

所以上一個例題
Misses / Instruction = 0.02 × ( 1.5 / Instruction ) = 0.03
Memory stall cycles = Number of miss × Miss Penalty
 = IC × ( Miss / Instruction ) × Miss Penalty
 = IC × 0.03 × 25
 = IC × 0.75

2007年12月18日 星期二

[EE_CSIE] Computer Architecture Chapter04 Notes (4)

=== Ch4.5 Hardware Support for Exposing more parallelism at compile time === 

 ※ such as loop unrolling, software pipelining, and trace scheduling can be used to increase the amount of parallelism available when the behavior of branches is fairly predictable at compile time. When the behavior of branches is not well known, compiler techniques alone may not be able to uncover much ILP. 

 ※ 將指令擴充 : 
The first is an extension of the instruction set to include conditional (條件指令) or predicated (預測指令) instructions. 

  ※ 條件指令(Conditional instructions) : 
1. An instruction refers to a condition, which is evaluated as part of the instruction execution. 
2. If the condition is true, the instruction is executed normally. 
3. If the condition is false, the execution continues as if the instruction were no-op (空指令). 例如: if (A==0) {S=T;} 

  ※ Compiler Speculation with Hardware Support : 
To speculate ambitiously requires 3 capabilities: (良好的預測執行3要素) 
1. the ability of the compiler to find instructions that, with the possible use of register renaming, can be speculatively moved and not affect the program data flow, 
2. the ability to ignore exceptions in speculated instructions, until we know that such exceptions should really occur, and 
3. the ability to speculatively interchange loads and stores, or stores and stores, which may have address conflicts. 

  ※ Hardware Support for Preserving Exception Behavior : There are 4 methods that have been investigated for supporting more ambitious speculation without introducing erroneous exception behavior: 
1. The H/W and OS cooperatively ignore exceptions for speculative instructions.  - this approach preserves exception behavior for correct programs, but not for incorrect ones.  This approach may be viewed as unacceptable for some programs, but it has been used, under program control, as a “fast mode” in several processors. 
 2. Speculative instructions that never raise exceptions are used, and checks are introduced to determine when an exception should occur. 
 3. A set of status bits, called poison bits, are attached to the result registers written by speculated instructions when the instructions cause exceptions. The poison bits cause a fault when a normal instruction attempts to use the register. 
 4. A mechanism is provided to indicate that an instruction is speculative and the H/W buffers the instruction result until it is certain that the instruction is no longer speculative. 

 ---------------------------------------------------------- 
Example1
Here is an unusual loop. First, list the dependences and then rewrite the loop so that it is parallel.   for (i=1;i
<100 a="" b="" c="" d="" e="" i="" s1="" s2="" s3="" span="" style="font-weight: bold;">Solution : 
1. S2 to S1以及 S3 to S1, a[] -> true-dep. 
 2. S1 to S2, bi -> anti-dep. 
3. S3 to S1 loop-carried output-dep. 
 4. S3 to S2 loop-carried true-dep. 
 5. S3 to S3 loop-carried true-dep. 
 化解為:
   for (i = 1; i < 100; i = i + 1) {
     a[i] = b[i] + c[i]; //S1
     b[i] = a[i] + d[i]; //S2
   }
   a[100] = a[99] + e[99]; 

 ---------------------------------------------------------- 
EXAMPLE2: Here is a simple code fragment:
  for (i=2;i<=100;i+=2)
    a[i] = a[50*i+1]; 
 To use the GCD test, this loop must first be “normalized”

—written so that the index starts at 1 and increments by 1 on every iteration. Write a normalized version of the loop (change the indices as needed), then use the GCD test to see if there is a dependence. Solution : normalized正規化 
=>  for(i<1 1="" a="2," b="0," c="100," d="1" gcd="" i="" test=""> gcd(2,100)=2 且 d-b=1, 因為1是2的因數, 所以有相依性存在. (但是,實際上,Loop 載入順序是 a[101], a[201], …,a[5001]並指到 a[2], a[4],…,a[100]並不是相依性)
<1 1="" a="2," b="0," c="100," d="1" gcd="" i="" test="">

2007年12月17日 星期一

[EE_CSIE] Computer Architecture Chapter04 Notes (3)

=== Ch4.4 Advanced Compiler Support for Exposing and Exploiting ILP ===

※ The analysis of loop-level parallelism focuses on determining whether data accesses in later iterations are dependent on data values produced in earlier iterations, such a dependence is called a loop-carried dependence.

程式範例:

 for (i=1; i<=100; i=i+1) {
  A[i+1] = A[i] + C[i];  /* S1 */
  B[i+1] = B[i] + A[i+1]; /* S2 */
 }

--
A[2]
= A[1] + C[1]
B[2] = B[1] + A[2]
--
A[3] = A[2] + C[2]
B[3] = B[2] + A[3]
--
... ... ...
--
A[101] = A[100] + C[100]
B[101] = B[100] + A[101]
--
1.所以S1會用到上一次 S1計算出來的值,S2也會用到上一次S2的結果=> Loop-carried depenence
2.而同一迴圈,S2相依於S1,(not loop-carried),只要照順序執行即可。


※ Loop-carried dependence不見得會妨礙Parallelism:
程式範例:

 for (i=1; i<=100; i=i+1) {
  A[i] = A[i] + B[i];   /* S1 */
  B[i+1] = C[i] + D[i];  /* S2 */  }

--
A[1] = A[1] + B[1]

B[2] = C[1] + D[2]
--
A[2] = A[2] + B[2]
B[3] = C[2] + D[2]
--
... ... ...
--
A[100] = A[100] + B[100]
B[101] = C[100] + D[100]
--
S1相依於S2,之間存在Loop-carried dependence.

轉換關鍵性:
1.S1到S2沒有相依性,交換這兩道順序不會影響S2的執行.
2.Loop第一次執行,S1相依於此迴圈開始執行前的B[1]值.

化解為:
 A[1] = A[1] + B[1]
 for (i=1; i<=100; i=i+1) {
  A[i] = A[i] + B[i];   /* S1 */
  B[i+1] = C[i] + D[i];  /* S2 */
 }
 B[101] = C[100] + D[100]


※ A recurrence is when a variable is defined based on the value of that variable in an earlier iteration, often the one immediately preceding, as in the above fragment.

Detecting a recurrence can be important for two reasons:
Some architectures (especially vector computers) have special support for executing recurrences, and some recurrences can be the source of a reasonable amount of parallelism.

Dependence distance :
 for (i=6;i<=100;i=i+1) {
  Y[i] = Y[i-5] + Y[i];
 }

第I次執行時,Loop會讀取陣列元素i-5, Dependence distance = 5.
Dependence distance越大,the more potential parallelism can be obtained by unrolling loop.

※ Finding the dependences is important in 3 tasks :
1. Good scheduling of code.
2. Determining which loops might contain parallelism.
3. Eliminating name dependences.

※ Compiler 偵測 dependences ?
Nearly all dependence analysis algorithms work on the assumption that array indices are affine (仿射) : a one-dimensional array index is affine if it can be written in the form a × i + b, where a and b are constants, and i is the loop index variable. 而x[y[i]]就Nonaffine.


※ A dependence exists if two conditions hold: (GCD偵測)
1. There are two iteration indices, j and k, both within the limits of the for loop.
That is m ? j ? n, m ? k ? n.
2. The loop stores into an array element indexed by a × j + b and later fetches from that same array element when it is indexed by c × k + d. That is, a × j + b = c × k + d.

範例: Use the GCD test to determine whether dependences exist in the following loop:
  for (i=1; i<=100; i=i+1) {
   X[2*i+3] = X[2*i] * 5.0;
  }

解法: Given the values a = 2, b = 3, c = 2, and d = 0,
  then GCD(a,c) = 2, andd – b = –3.
  Since 2 does not divide –3, no dependence is possible.

=> GCD測試可保證沒相依性存在,但可能GCD測成功,但並沒相依性存在.
 (因為loop bounds沒考慮到)


※ Situation in which array-oriented dependence analysis (陣列導向的相依性分析) cannot tell us :
1. When objects are referenced via pointers.
2. When array indexing is indirect through another array.
3. When a dependence may exist for some value of inputs, but does not exist in actuality when the code is run since the input never take in those value.
4. When an optimization depends on knowing more than just the possibility of a dependence, but needs to know on which write of a variable does a read of that variable depend.


※ The basic approach used in points-to analysis (指向分析) replies on information from :
1. Type information(型別資訊), which restricts what a pointer can point to.
2. Information derived when an object is allocated or when the address of an object is taken, which can be used to restrict what a pointer can point to. (例: p指向X, q永不指向X, 則p和q就不能指向同一物件)
3. Information derived from pointer assignment. (p -> q -> X, q的值指定給p,則p指向q所指的物件)


※ Eliminating Dependent Computations (消除相依計算) :
1. Copy propagation (複製傳遞) : 用來避免複製運算 Eliminates operations that copy values.
 DADDUI R1, R2, #4
 DADDUI R1, R2, #4
 變成=> DADDUI R1, R2, #8

2. Tree height reduction (樹的高度縮減) :

   ADD R1,R2,R3
   ADD R4,R1,R6
   ADD R8,R4,R7
    轉換 3 cycles => 2 cycles
   ADD R1,R2,R3
   ADD R4,R6,R7
   ADD R8,R1,R4

3.Recurrences (遞迴):
  sum = sum + x;
  sum = sum + x1 + x2 + x3 + x4 + x5 ;  5 cycles
  => sum = ( (sum + x1) + (x2 + x3) ) + (x4 + x5) ;  3 cycles

※ Software Pipeline(軟體管線) : Symbolic Loop Unrolling (象徵性迴圈展開) :
Software Pipeline(軟體管線) : Reorganize loops such that each iteration in the software-pipelined code is made from instructions chosen from different iterations of the original loop (從不同回合中挑選組合而成).
請參考 Fig4.6


※ Global Code Scheduling (全域程式碼排程):
1. Effective scheduling of a loop body with internal control flow will require moving inst. across branches.
2. Aims to compact a code fragment with internal control structure into the shortest possible sequence (Critical Path) that preserves the data and control dependence. (保留 data 與 control 相依性)
3. It can reduce the effect of control dependences arising from conditional nonloop branches by moving code.
4. Effectively using global code motion require estimates of the relative frequency of different paths.


Trace Scheduling (追蹤排程) : focusing on the Critical Path
1. Useful for processors with a large number of issues per clock.
2. A way to organize the global code motion process, so as to simplify the code scheduling by incurring the costs of possible code motion on the less frequent paths. (用於執行頻率有明顯差距的不同路徑上)


Two steps of Trace Scheduling :
1. Trace selection (追蹤選擇) : tries to find a likely sequence of basic blocks whose operations will be put together into a smaller number of instructions.
2. Trace compaction (追蹤壓縮) : Tries to squeeze the trace into a small number of wide instructions. (其實就是 code scheduling.)

The advantage of the Trace Scheduling approach is that it simplifies the decisions concerning global code motion. (Trace scheduling 優點在於簡化了 Global Code 移動的決策)


Superblocks (超級區塊) : 解決 Trace Scheduling 追蹤中間進入或離開造成十分複雜的情況
1. are formed by a process similar to that used for traces.
2. but are a form of extended basic blocks, which are restricted to a single entry point but allow multiple exists.


How can a superblock with only one entrance be constructed? The answer is to use tail duplication (尾部複製) to create a separate block that corresponds to the portion of the trace after the entry.


與一般的 Trace 產生方法比起來, 使用 superblocks能減少額外記錄(bookkeeping)與排程的複雜度, 但是程式碼可能會大於以 Trace 為基礎的方法. 所以, 如同Trace scheduling, Superblocks在其他技巧都行不通時再使用比較合適.

[EE_CSIE] Computer Architecture Chapter04 Notes (2)

=== Ch4.2 Static Branch Prediction === 

 ※ Delayed branch can reduce the Control hazard. 程式範例:
    LD    R1,0(R2)
    DSUBU R1,R1,R3
    BEQZ   R1,L    OR    R4,R5,R6
    DADDU  R10,R4,R3
 L:   DADDU  R7,R8,R9 

 => DSUBU and BENQZ depend on LD 
=> stall will be needed after LD. 

 1. branch almost taken  
=> R7 was not needed on the fall-through path  
=> Could increase the speed by moving DADDU to the position after LD. 

2. branch rarely taken :  
=> R4 was not needed on the taken path  
=> Could increase the speed by moving OR to the position after LD. 

3. profiled-based strategy predictor : 
用預先收集的早期執行概況來預測分支 

 === Ch4.3 Static Multiple Issue : VLIW === 
※靜態 Statically scheduled superscalar requires compiler assistance. 

※動態 Dynamically-scheduled superscalar requires less compiler assistance, but has hardware costs. 

VLIW : Very Long Instruction Word 在一道指令中納入很多運算(64~128 bits or more) VLIWs use multiple, independent functional units. Rather than attempting to issue multiple, independent instructions to the units, a VLIW packages the multiple operations into one very long instruction, or requires that the instructions in the issue packet satisfy the same constraints. 

Basic VLIW approach --- 
1. Local scheduling tech :  
 a.) the loop unrolling generates straight-line code.  
 b.) Operate on a single basic block. 

2. Global scheduling tech : (trace scheduling是特別為VLIW發展的全域排程技巧)  
 a.) scheduling code across branches.  
 b.) More complex in structure.  
 c.) Must deal with significantly more complicated trade-offs in optimization. 

※ For the original VLIW model, there are both technical and logistical problems.
=> The technical problems are the increase in code size and the limitations of lock-step operation. 

Two different elements combine to increase code size substantially for a VLIW. 
1, generating enough operations in a straight-line code fragment requires ambitiously unrolling loops (as earlier examples) thereby increasing code size. 
 2, whenever instructions are not full, the unused functional units translate to wasted bits in the instruction encoding. 
 => 解increase code size方法 
=> 1. Clever encodings (例如:讓數個function unit共用一個 large immediate field) 
2. Compress the instructions in main memory 

※ Early VLIWs operated in lock-step – T here was no hazard detection H/W at all. 因為所有的 function unit 必須保持同步,所以任何一個管線發生stall,就會造成整個processor stall. 

logistical : Binary code compatibility problem (執行碼相容性問題) 
1. In a strict VLIW approach, the code sequence makes use of both the instruction set definition and the detailed pipeline structure. 
2. Different numbers of functional units and unit latencies require different versions of code. 
3. One possible solution is object-code translation or emulation. 
4. Another approach is to temper the strictness of the approach, so that binary compatibility is still feasible. 

Multiple Issue Processor兩個潛在的優點是Vector Processor所沒有的
1. has the potential to extract some amount of parallelism from less regularly structured code. 
2. to use a more conventional, and typically less expensive, cache-based mem system.

[EE_CSIE] Computer Architecture Chapter04 Notes (1)

=== Ch4 Exploiting Instruction Level Parallelism with S/W Approach ===

=== Ch4.1 Basic compiler techniques for exposing ILP ===
IA-64 : Intel Architecture-64, Intel's first 64-bit CPU micro architecture, is based on EPIC.

EPIC : Explicitly Parallel Instruction Computing


FIGURE 4.1 Latencies of FP operations used in this chapter.
這圖是貫穿第四章的精神所在,說明不同類型指令間的Latency.

先介紹什麼是 Pipeline Schedule 與 Loop Unrolling :

例如:
for (i=1000; i>0; i=i-1) {
 X[i] = X[i] + s;
}

1. MIPS code =>
Loop: L.D     F0,0(R1)
   ADD.D   F4,F0,F2
   S.D     F4,0(R1)
   DADDUI  R1,R1,#-8
   BNE    R1,R2,Loop

2. Without any scheduling (10 cycles) =>
Loop: L.D F0,0(R1)
    stall
   ADD.D   F4,F0,F2
    stall
    stall
   S.D     F4,0(R1)
   DADDUI  R1,R1,#-8
    stall
   BNE    R1,R2,Loop
    stall

3. Schedule 排程後(6 cycles) =>
Loop: L.D     F0,0(R1)
   DADDUI  R1,R1,#-8
   ADD.D   F4,F0,F2
    stall
   BNE    R1,R2,Loop
   S.D     F4,8(R1)

4. Loop unrolled 迴圈展開 =>
(14 clock cycles or 14/4=3.5 per iteration)
Loop: L.D   F0,0(R1)
   ADD.D  F4,F0,F2
   S.D   F4,0(R1)
   L.D    F6,-8(R1)
   ADD.D   F8,F6,F2
   S.D    F8,-8(R1)
   L.D    F10,-16(R1)
   ADD.D  F12,F10,F2
   S.D    F12,-16(R1)
   L.D     F14,-24(R1)
   ADD.D   F16,F14,F2
   S.D     F16,-24(R1)
   DADDUI R1,R1,#-32
   BNE    R1,R2,Loop

5. Unrolled loop 再 Schedule=>
Loop: L.D  F0,0(R1)
   L.D  F6,-8(R1)
   L.D  F10,-16(R1)
   L.D  F14,-24(R1)
   ADD.D  F4,F0,F2
   ADD.D  F8,F6,F2
   ADD.D  F12,F10,F2
   ADD.D  F16,F14,F2
   S.D  F4,0(R1)
   S.D  F8,-8(R1)
   DADDUI  R1,R1,#-32
   S.D    F12,16(R1)
   BNE    R1,R2,Loop
   S.D    F16,8(R1)


2007年12月16日 星期日

[MQueue] WMQ (Ex6) MQ Client

.
. IBM WebSphere MQ v6.0
. Ex 6 - IBM WebSphere MQ Client Implementation.
. 讓我們來練習一下 MQ Client Implementation
.

=== Exercise 6 : WebSphereMQ Client Implementation ===
What we will do :
A. Configure a Server for client connection.
B. Configure a client.
C. Test a client to server environment.
D. Use Auto-Definition of a CHANNEL.
E. Setup and perform Remote Administration.


=== Sample programs for MQ client ===
1. # amqsputc QName [QMgrName]
(This program is invoked the same way as amqsput and has the same parameter structure. But it connects to a WebSphereMQ client instead of a WebSphereMQ Server.)

2. # amqsbcgc QName [QMgrName]
(This program is invoked the same way as amqsbcg and has the same parameter structure but it connects to a WebSphereMQ client)

3. # amqsgetc QName [QMgrName]
(This program is invoked the same way as amqsget and has the same parameter structure but it connects to a WebSphereMQ client)


======================================================
[A. Server Queue Manager setup]
1. Create Queue Manager named QMC06 , and QMC07R :
 # crtmqm QMC06   # crtmqm QMC07R
 # strmqm QMC06   # strmqm QMC07R
 # runmqsc QMC06   # runmqsc QMC07R

 (on QMC06)
 : DEF QL(DLQ) REPLACE
 : ALTER QMGR DEADQ(DLQ)

 : DEF QL(XQMC07R) REPLACE USAGE(XMITQ)
 : DEF CHL(QMC06.TO.QMC07R) CHLTYPE(SDR) REPLACE +
  TRPTYPE(TCP) CONNAME('Host2(9007)') XMITQ(XQMC07R)


 (on QMC07R)
 : DEF QL(DLQ) REPLACE
 : ALTER QMGR DEADQ(DLQ)

 : DEF CHL(QMC06.TO.QMC07R) CHLTYPE(RCVR) REPLACE +
  TRPTYPE(TCP)
 : DEF QL(QL.A) REPLACE

 (on QMC06)
 : DEF QR(QRMT07R) REPLACE +
  RNAME(QL.A) RQMNAME(QMC07R) XMITQ(XQMC07R)


 (on QMC07R)
 # runmqlsr -m QMC07R -t TCP -p 9007

 (on QMC06, CHANNEL 啟動狀態)
 # runmqchl -C QMC06.TO.QMC07R -m QMC06
 # amqsput QRMT07R QMC06  (測試通道是否暢通)

2. Define a SVRCONN CHANNEL on QMC07R to make it connectable by clients :
 a. Use QMC07R_CLNT as the CHANNEL name.
 b. Protocol is TCP.

 # runmqsc QMC07R
 : DEFINE CHL(QMC07R_CLNT) CHLTYPE(SVRCONN) REPLACE TRPTYPE(TCP)

3. Be sure that an appropriate Listener function is avtive for the Server QM.
 # runmqlsr -m QMC07R -t TCP -p 9007

[B. Client setup (Method 1)]
4. Use the MQSERVER environment variable to provide a client-connection CHANNEL Definition to be able to connect to the Queue Manager.

 (UNIX / Linux Systems)
 # export MQSERVER=QMC07R_CLNT/TCP/QMC07R(9007)

 (Windows Systems)
 # SET MQSERVER=QMC07R_CLNT/TCP/QMC07R(9007)

[C. Test the Client connection (Setup Method 1)]
5. Use amqsputc to put messages on the Local Queue QL.A on the Server :
 # amqsputc QL.A QMC07R

6. Use amqsbcgc to browse the message on the Server Queue.
 (The value of Reply-to-QMgr in the MQMD will show.)
 # amqsbcgc QL.A QMC07R

7. Use amqsgetc to get the messages from the Server Queue.
 # amqsgetc QL.A QMC07R

[D. Server Queue Manager setup using Auto-Definition of CHANNELs]
8. Enable CHANNEL Auto-Definition in Queue Manager.
 so all teams are able to connect to Queue Manager
 : ALTER QMGR CHAD(ENABLED)

[E. Client Setup (Method 2)]
9. Use QMC07R to build a client CHANNEL definition table to enable a WebSphereMQ
 client to connect to each Queue Manager which has enabled CHANNEL Auto-Definition :
 a. Create 2 client connection CHANNEL entries to connect to QMC07R.

 (On QMC07R)
 : DEF CHL(CLNT_A) CHLTYPE(CLNTCONN) REPLACE +
  TRPTYPE(TCP) CONNAME('QMC07R(9007)') QMNAME(QMC07R)

 : DEF CHL(CLNT_B) CHLTYPE(CLNTCONN) REPLACE +
  TRPTYPE(TCP) CONNAME('QMC07R(9007)') QMNAME(QMC07R)


[F. Test the Client connection (Setup Method 2)] (需安裝MQ Client並設定)
10. On the client system ensure the following environment variables are point to the just
 created client CHANNEL definition table. Be sure to unset the MQSERVER.
 a. MQCHLLIB=
 b. MQCHLTAB=

 [UNIX Systems]
 Default location on the creating Queue Manager :
  export MQCHLLIB=/var/mqm/qmgrs//@ipcc
  export MQCHLTAB=amqclchl.tab
  export MQSERVER=

 [Windows Systems]
  SET MQCHLLIB=..\mqm\qmgrs\\@ipcc
  SET MQCHLTAB=amqclchl.tab
  SET MQSERVER=

11. Use amqsputc again to put a message to QL.A on QMC07R and ensure the
 operation is completed successfully. Verify that the new server connect CHANNEL
 CLNT_A is now defined on QMC07R.
 # amqsputc QL.A QMC07R

12. Stop CHANNEL CLNT_A . Then use amqsputc to put a message to QL.A . Verify
 that the new server connect CHANNEL CLNT_B is now defined on QMC07R, and
 that the message successfully arrived on the Queue.
 # runmqsc QMC07R
 : stop CHANNEL(CLNT_A)

 # amqsputc QL.A QMC07R


======================================================
======================================================
[G. Setup and perform Remote Administration.]
1. This requires that the managing Queue Manager be the default Queue Manager :

2. # runmqsc -w 15

3. : DISPLAY QMGR

[MQueue] IBM MQ Security

.
. IBM WebSphere MQ v6.0
. Chapter 6.3 - IBM WebSphere Security
.

=== WebSphere MQ Security Implementations ===
1. Object Authority Manager (OAM) facility.
2. CHANNEL Security using Secure-Sockets-Layer (SSL).

(※ MQ 只有授權Authorization, 沒有認證Authentication)


=== WebSphere MQ Access Control Overview ===
1. WebSphereMQ access control at user and/or group level :
 - UNIX use groups only (Username must exist, everyone is in nobody.)
 - Windows uses userids and/or groups.
 - System-level userids only are supported
  (No support for DCE principals, TXSeries userids, and so forth.)
2. Firsy level name only is controlled :
 - Alias Queues, Remote Queues.
 - Resolved name is not significant.


=== Object Authority Manager Installable Service ===
1. [ WebSphereMQ QMgr <---> Object Authority Manager (OAM) Access Control Lists ]
2. Access Control for WebSphereMQ objects :
 - Queue Manager   - Queues
 - Processes     - Namelists
 - Channels      - Authentication information objects
 - Listeners      - Services
3. OAM can be disabled :
 - Remove entry from mqs.ini or Windows Registry
 - Not recommended
 - Very difficult to re-establish uniform authority checking


=== Object Authority Manager : Access Control Lists ===
1. One authority file per object :
 - lus global permissions files.
2. Each file has one stanza per principle :
 - Principal (User)
 - Authority='bit pattern'
3. Windows OAM bypasses auth files for certain classes of principal
 - SYSTEM, local Administrators group, local mqm group


=== Security Management : setmqaut ===
1. Change the authorizations :
 - Queue Manager   - Queues
 - Processes     - Namelists
 - Channels      - Authentication information objects
 - Listeners      - Services
2. Principal or group level control
3. Granular control of access
 - No generic functions
 - Supports generic profiles

 # setmqaut -m QMgr -t Objtype -n Profile [-p Principale -g Group] permissions
 Example :
 # setmqaut -m QM1 -t queue -n MOON.* -g GP1 browse + get

4. Note that there are certain principals/groups which are granted automatic access to resources. These are :
 - mqm (user/group)
 - For Windows :
  a. Administrator (user/local group)
  b. SYSTEM (userid)
  c. The user (or principal group) which creates a resource.


=== Security Management : dspmqaut ===
1. Display current authorizations :
 - Queue Manager   - Queues
 - Processes     - Namelists
 - Channels      - Authentication information objects
 - Listeners      - Services
2. Principal or group level control.

 # dspmqaut -m QMgr -t ObjType -n ObjName [-p Principal -g Group ]
 Example :
 # dspmqaut -m QM1 -t q -n QL.Q1 -p mquser
 Entity mquser has the following authorizations for object QL.Q1 :
 get
 browse
 put ...


=== Security Managemnet : dmpmqaut ===
1. Dump current authorizations :
 - QMGR
 - Queues
 - Processes
 - Namelists
 - Authinfo (SSL CHANNEL Security)
 - Channels
 - Listeners
 - Services
2. Principal or group level control.

 # dmpmqaut -m QMgr -t ObjType -n Profile [-p Principal -g Group ]
 Example :
 # dmpmqaut -m QM1 -n a.b.c -t q -p mquser
 The resulting dump would display :
 profile : a.b.*
 object type : queue
 entity : user1
 type : principal
 authority : get, browse, put, inq


=== Access Control for WebSphereMQ Control Program ===
1. Most WebSphereMQ control programs
 - Such as crtmqm, strmqm, runmqsc, setmqaut, dspmqaut, dmpmqaut
2. Have restricted access :
 - UNIX/Linux restricts users to the mqm group
  a. Configuration as a part of WebSphereMQ installation.
  b. Control imposed by the O.S. not OAM.
 - Windows allows :
  a. mqm group
  b. Administrators group
  c. System userid
 - OpenVMS restricts users to those granted the MQM identifier.
 - Compaq NSK allows :
  a. MQM group
  b. SUPER.SUPER ID


=== Authority Checking in the MQI ===
1. MQI calls with security checking :
 - MQCONN / MQCONNX
 - MQOPEN
 - MQPUT1 (implicit MQOPEN)
 - MQCLOSE  (For Dynamic Queues).
2. WebSphereMQ events as audit records :
 - Events written to SYSTEM.ADMIN.QMGR.EVENT Queue.
 - Documented in Monitoring WebSphereMQ manual.
3. Reason code MQRC_NOT_AUTHORIZED (2035) returned if not authorized.
4. The MQCLOSE is generally not checked because the close options are usually none.
5. If the close options are set to MQCO_DELETE or MQCO_DELETE_PURGE (this is only for permanent
  Dynamic Queues) then, unless the Queue was created using the current handle, there is a check to
 determine if the user is authorized to delete the Queue.


=== Security and Distributed Queuing === ☆
1. Put authority :
 - Option for the receiving end of a message CHANNEL.
  a. Default user identifier is used.
  b. Context user identifier is used.
2. Transmission Queue :
 - Messages destined for a Remote Queue Manager are put on a Transmission Queue by the
  Local Queue manager
  a. An application should not notmally need to put messages directly on a Transmission Queue,
   or need authority to do so.
 - Only special system programs should put messages directly on a Transmission Queue should
  have the authority to do so.

=== Message Context ===
1. Information about source of message :
 - Identity section (user related)
 - origin section (program related)
2. Part of message Descriptor.
3. Can be passwd in related message.
4. Message context information allows for the application that retrieves a message to find out about
 the Originator of the message.The retrieving application may want to :
 a. Check that the sending application has the corrent level of authority.
 b. Keep an audit of all the messages it has worked with.
 c. The information is held in two field : Identify context and Origin context.


=== The Context Fields ===
An application can request the Queue Manager to set the context fields of a message by using the put message option MQPMO_DEFAULT_CONTEXT on an MQPUT or MQPUT1 call. This is the default action if no context if specified.

( ps : 用 # amqsbcg Queue QMgr 可以看到 )

1. Identify context :
 - UserIdentifier (user that originated the message.)
 - AccountingToken
  a. Windows (SID, Security ID in compressed format)
  b. i5/OS (Job accounting code)
  c. UNIX (Numberic user ID in ASCII characters)
 - ApplIdentityData (Blank)
2. Orign context :
 - PutApplType (MQAT_AIX, MQAT_CICS...etc.)
 - PutApplName
 - PutDate ( YYYYMMDD(GMT) )
 - PutTime ( HHMMSSTH(GMT) )
 - ApplOriginData (Blank)


=== No Context ===
1. Requested by a put message option :
 - MQPMO_NO_CONTEXT
 - Queue Manager clears all the context fields, specifically.
 - PutApplType is set to MQAT_NO_CONTEXT
2. To request "Default Context" or "No Context" requires no more authority than that required to put the message on the Queue.


=== Passing Context ===
A → [Queue1] → B → [Queue2] → C

1. Put messages on Queue2 with same Identity context as message taken from Queue1
2. Open Queue1 as "Save All Context"
3. Put messages with "Pass Identity Context"
4. Or transfer "No Context"


=== Alternate User Authority ===
A → [Queue1] → B → [Queue2] → C

1. Put messages with A's authority :
 - B needs appropriate authority.
 - UserID taken from message Context.
2. How it is requested ? :
 - AlternateUserID field in Object Descriptor.
 - Option on MQOPEN or MQPUT1


=== Setting Context ===
1. Two open options that require authority to use :
 - MQOO_SET_IDENTITY_CONTEXT
 - MQOO_SET_ALL_CONTEXT
2. Two corresponding put message options :
 - MQPMO_SET_IDENTITY_CONTEXT
 - MQPMO_SET_ALL_CONTEXT
3. Normally used by special programs only :
 - Message CHANNEL agents
 - System utilities


=== CHANNEL Exit Programs ===
MQPUT → TransmissionQueue → [Message] → MCA → Send →
                     
MQGET ← DestinationQueue ← [Message(retry)] ← MCA ← Receive ←
                     
                     

1. The uses of CHANNEL Exit programs are :
 - Auto-definition Exit can be used to modify the CHANNEL definition derived from
         the model SYSTEM.AUTO.RECEIVER
 - Security Exit is primarily used by the MCA at each end of a message CHANNEL
         to authenticate its partner.
 - Send and Receiver Exites can be used for purposes such as data compression
         / decompression and data encryption / decryption.
 - Message Exit can be used for any purpose which makes sense at the message
         level. The following are some examples :
     a. Application data conversion
     b. Encryption / decryption
     c. Journaling (日誌)
     d. Additional security checks such as validating an incoming user identifier.
     e. Substitution of one user identifier for another as a message enters a new
      security domain.
     f. Reference message handing.
 - Message-retry Exit is called an attempt to open a destination Queue, or put a
          message on a destination Queue, has been unsuccessful. The
          exit can be used to determine under what circumstances the
          MCA should continue to retry, how many times it should retry,
          and how frequently.
2. The Auto-Definition Exit is only supported on WebSphereMQ for AIX. HP-UX, iSeries,
 Solaris, and Windows, and MQSeries for Compaq Tru64 UNIX and OS/2 Warp V5.1


=== CHANNEL Exit Programs on MQI CHANNELs ===
                        [Auto-Definition]
          [Security]          [Security]
 MQCONN ←→        Send Receive
 MQOPEN ←→ CLNTCONN ←——————→ SVRCONN
  MQPUT ←→ 

1. No CHANNEL Exit Programs can be called on a client system if the MQSERVER
 environment variable is used to define a simple client conenction.
2. The Auto-Definiition Exit can be used to modify the CHANNEL definition derived
 from the model SYSTEM.AUTO.SVRCONN


=== Secure Sockets Layer ===
1. Protocol to allow transmission of secure data over an insecure network.
2. Combines these techniques :
 - Symmetric / Secret Key encryption
 - Asymmetric / Public Key encryption
 - Digital Signature
 - Digital Certificates
3. Protection :
 - Client / Server
 - Qmgr / QMgr CHANNELs
4. To combat Security Problems :
 - Eavesdropping (竊聽) ← Encryption techniques
 - Tampering (竄改、瞎搞) ← Digital Signature
 - Impersonation (偽裝) ← Digital Certificates


=== QMGR Attributes for SSL ===
1. ALTER QMGR command :
 - SSLKEYR  Sets the SSLKeyRepository.
 - SSLCRLNL  Sets the SSLCRLNamelist.
 - SSLCRYP  Sets the SSLCryptoHardware.
 - SSLTAKS  Sets the SSLTasks.
 - SSLEV   Enables or Disables SSL event messages.
 - SSLFIPS  Specifies if only FIPS-certified algorithms can be used.

ps : CRL (Certificate Re )

=== QMGR Authentication Object ===
1. ALTER AUTHINFO
2. DEFINE AUTHINFO
3. DELETE AUTHINFO
4. DISPLAY AUTHINFO


=== Channel Attributes for SSL ===
1. DEFINE or ALTER CHANNEL
 - SSLCIPH (Cipher 譯文)
 - SSLPEER
 - SSLCAUTH


=== Access Control for a WebSphereMQ Client ===
1. Access control is based on a user ID used by the server connection process :
 - Value of MCAUserIdentifier in MQCD determines this user ID
2. Security Exits at both ends of the MQI CHANNEL :
 - Client Security Exit can flow a user ID and password
 - Server Security Exit can authenticate the user ID and set MCAUserIdentifier
3. No security Exit at the client end of the MQI CHANNEL :
 - Value of logged_in USERID flows to the server system.
 - Server Secutiry Exit can authenticate the user ID and set MCAUserIdentifier
4. No Security Exit at either end of the MQI CHANNEL :
 - MCAUserIdentifier has the value of MCAUSER if it is nonblank.
 - MCAUserIdentifier has the value of flowed user ID otherwise.


=== Remote Queuing and Clients ===
1. CHANNEL Exits :
 - A number of CHANNEL Exits are available in the product and as SupportPacs
 - Serveral vendors in this market too.
2. MCAUSER :
 - The default setting is wide open, especially for client attach.
 - May want to set this to restrict who can access your Queue Manager.
3. MQ_USER_ID environment variable :
 - This war removed for WindowsNY and UNIX in the 5.1 release client env.
 - The logged-in username is now automatically used.
 - But this is not authenticated at the server ; you may still need security Exits.

[MQueue] IBM MQ Clients

.
. IBM WebSphere MQ v6.0
. Chapter 6.2 - IBMWebSphere MQ Clients
.

=== WebSphere MQ Client ===
 ☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆
  Client-System      Server-System
 WMQ-Application      WMQ-Queue-Manager
 Client-Connection      Server-Connection
 Communications-stack→→→Communications-stack
         MQI-CHANNEL
 ☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆
1. Assured delivery.
2. Queue storage.
3. Data conversion.
4. Administration.
5. Recovery.
6. Syncpoint control.


=== MQI Clients Explained ===
1. The full range of MQI calls and options is available to a WebSphereMQ client
 application, including the following :
 - The use of MQGMO_CONVERT option on the MQGET call. This causes the
  application data of the message to be converted into the numberic and
  character representation in use on the client system. The server Queue
  Manager provides the usual level of support to do this.
 - A client application may be connected to more than one Queue Manager
  simultaneously. Each MQCONN call to different Queue Manager returns a
  different connection handle. This does not apply if the application is not
  running as a WebSphereMQ client.
2. The MQI stub which is linked with an application when running as a client is
  different from that used when the application is not running as client. An
  application will receive the reason code MQRC_Q_MGR_NOT_AVAILABLE
  on an MQCONN call if it is linked with the wrong MQI stub.

 ☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆
 1. MQCONN ---> (Queue Manager)
  2. QMOPEN ---> (Queue)
   3. MQPUT / MQGET / MQINQ / MQSET

  MQBEGIN
   MQPUT / MQGET
   IF successful -> MQCMIT
   ELSE MQBACK

  4. MQCLOSE ---> (Queue)
 5.MQDISC ---> (Queue Manager)

 ☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆


=== Syncpoint Control on a Base Client ===
1. A WebSphere MQ client application may participate in a Local unit of work
 involving MQSeries resources.
 - Uses the MQCMIT and MQBACK calls for this purpose.
2. A WebSphere MQ client application cannot participate in a Global unit of
 work involving WebSphereMQ resources.


=== Extended Transactional Client ===
1. An Extended Transactional Client can participate in a Global unit of work :
 - Transaction manager runs on client system.
 - Transaction manager provides syncpoint processing.

=== MQ Client Installation ===
(略)


=== Defining an MQI CHANNEL === (MQ Client CHANNEL)
1. Use the DEFINE CHANNEL command with parameters :
 - CHLTYPE  CLNTCONN or SVRCONN (SVRCONN為client專用)
 - TRPTYPE  DECNET, LU62, NETBIOS, SPX or TCP.
 - CONNAME(string)  For a client connection only.
 - QMNAME(string)  For a client connection only.
2. No operational involvement on an MQI CHANNEL :
 - An MQI CHANNEL starts when a client application issues MQCONN
  (or MQCONNX)
 - An MQI CHANNEL stops when a client application issues MQDISC
3. Do not forget to configure and refresh the inet daemon, or to start the
 WebSphereMQ Listener, on the server system.


=== Two ways of Configuring an MQI CHANNEL ===
1. Method_1 :
 - On the server system, define a server connection.
 - On the client system, set the environment variable.
 - MQSERVER=ChannelName/TransportType/ConnectionName

 (Windows : SET MQSERVER=VENUS.SVR/TCP/hostname(port) )
 (UNIX : export MQSERVER=VENUS.SVR/TCP/hostname(port) )

2. Method_2 :
 - On the server system, define a client connection and a server conection.
 -If not on a file server, copy the client CHANNEL definition talbe from the server
  system to the client system.
 - On the client system, set the environment variables :
  a. MQCHLLIB= 
   Path to the directory containing the client CHANNEL difinition table.
  b. MQCHLTAB=
   Name of the file containing the client CHANNEL definition table.

  (Windows : SET MQCHLIB=C:\MQM
        SET MQCHTAB=AMQCLCHL.TAB
  (UNIX : export MQCHLIB=/mqmtop/qmgrs/QUEUEMANAGERNAME/@ipcc
      export MQCHLTAB=AMQCLCHL.TAB )


=== Auto-Definition of CHANNELs ===
1. Applies only to the end of a CHANNEL with type :
 - Receiver
 - Server connection
2. Function invoked when an incoming request is received to start a CHANNEL
 but there is no CHANNEL definition.
3. CHANNEL definition is created automatically using the model :
 - SYSTEM.AUTO.RECEIVER
 - SYSTEM.AUTO.SVRCONN
4. Partner's values are used for :
 - CHANNEL name.
 - Sequence number wrap value.
5. To enable the automatic definition of CHANNELs, the attribute ChannelAutoDef
 of the Queue Manager object must be set to MQCHAD_ENABLED.
 The Corresponding parameter on the ALTER QMGR command is CHAD(ENABLED)
6. CHANNEL auto-definition events can be enabled by setting attribute ChannelAutoDefEvent
 of the Queue Manager object must be set to MQCEVR_ENABLED.
 The Corresponding parameter on the ALTER QMGR command is CHADEV(ENABLED)


=== Let Queue Manager accessed by MQ Explorer ===
(☆☆☆ by AaA ☆☆☆)
1. SYSTEM.ADMIN.SVRCONN (Windows default, UNIX/Linux need to add manualy)
2. # runmqsc QM1
  : DIS CHANNEL(SYSTEM.ADMIN.SVRCONN)
  : ALTER CHANNEL(SYSTEM.ADMIN.SVRCONN) CHLTYPE(SVRCONN) MCAUSER('mqm')
  (MCAUSER原本空白,表示檢查UserID/Group。 而指定mqm表示連上來ㄉ,都自動以mqm登入)

[MQueue] IBM MQ Family SupportPacs

.
. IBM WebSphere MQ v6.0
. Chapter 6.1 - WebSphereMQ Family SupportPacs
.

=== WebSphereMQ Family SupportPacs ===
http://www.ibm.com/software/integration/support/supportpacs

1. MO01 ( Event and Dead Letter Queue Monitor ) :
 This SupportPac is the MQSeries Event queue monitor, Dead Letter queue monitor and Expired message remover for Windows, Java, OS/2 and AIX.

2. MS03 (Save Queue Manager object definitions using PCFs (saveqmgr) ) :
 This SupportPac (saveqmgr) saves all the objects, such as queues, channels, etc, defined in a either local or remote queue manager to a file.

2007年12月15日 星期六

[MQueue] WMQ (Ex5) MQ Clusters

.
.讓我們來練習一下 Queue Manager Clusters
.

=== Exercise 5 : Queue Manager Clusters ===
What we will do :
A. Create Clusters.
B. Define all required WebSphereMQ objects for Queue Manager Clusters.
C. Test and Configure Clusters.
D. Manage workload in Clusters.

======================================================
[QM1]————————————
 ∣Cluster Transmission Queue∣
 ∣       [ Cluster-Sender CHANNEL ] →→ [QM3]
 ∣Local Appication Queues  ∣          ↙
 ∣            ∣    [QM2]  ↙
 ∣Cluster Command Queue ∣   ↙     ↙
 ∣        [ Cluster-Receiver CHANNEL ][QM4]
 ∣Cluster Repository Q   ∣
  —————————————


======================================================
[A. Set up the Cluster connections.]
1. Create a new default Queue Manager QM1 to be used in a Queue Manager Cluster.
 # crtmqm -q QM1   (-q : 為 default QM)
 # crtmqm QM3
 
2. Start the Queue Manager :
 # strmqm QM1
 # strmqm QM3
 
3. Start the Listener function for your Queue Manager QM1 on port 9051
  using the WebSphereMQ Listener.
 # runmqlsr -m QM1 -t tcp -p 9051
 # runmqlsr -m QM3 -t tcp -p 9053
 
4. Define the Cluster connection objects required for your Queue Manager.
  The Objects needed should include the following :
 a. One Local Queue to be used as Dead Letter Queue.
   # runmqsc QM1
   # runmqsc QM3
   : DEF QL(DLQ)
   : ALTER QMGR DEADQ(DLQ)
   : DIS QMGR DEADQ   (驗證Dead Letter Queue)
 b. One Cluster Receiver CHANNEL (CLUSRVR) pointing to the owning QM.
  (On Every Queue Manager in Cluster)
  ☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆
  DEF CHL(TO.CLUS_A9.QM#) CHLTYPE(CLUSRCVR) REPLACE +
   TRPTYPE(TCP) CONNAME('Hostname(905#)') +
   SHORTRTY(600) SHORTTMR(60) DISCINT(30) CLUSTER(CLUS_A9)

  ☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆
  ( # = your QM in Cluster , 這邊先在 QM1 設接收端)
  DEF CHL(TO.CLUS_A9.QM1) CHLTYPE(CLUSRCVR) REPLACE +
   TRPTYPE(TCP) CONNAME('localhost(9051)') +
   SHORTRTY(600) SHORTTMR(60) DISCINT(30) CLUSTER(CLUS_A9)
   ( ps : 其實在這邊一執行, 在MQ Explorer就會出現CLUS_A9)

  ( # = your QM in Cluster , 另一個在 QM3 設接收端)
  DEF CHL(TO.CLUS_A9.QM3) CHLTYPE(CLUSRCVR) REPLACE +
   TRPTYPE(TCP) CONNAME('localhost(9053)') +
   SHORTRTY(600) SHORTTMR(60) DISCINT(30) CLUSTER(CLUS_A9)

 c. One Cluster Sender CHANNEL pointing to a (the other) Repository
  Queue Manager in your Cluster.
  ☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆
  DEF CHL(TO.CLUS_A9.QM*) CHLTYPE(CLUSSDR) REPLACE +
   TRPTYPE(TCP) CONNAME('Hostname(905*)') +
   SHORTRTY(600) SHORTTMR(60) DISCINT(30) CLUSTER(CLUS_A9)

  ☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆

  ( * = Full Repository , 這邊在在 QM3 送出)
  DEF CHL(TO.CLUS_A9.QM1) CHLTYPE(CLUSSDR) REPLACE +
   TRPTYPE(TCP) CONNAME('localhost(9051)') +
   SHORTRTY(600) SHORTTMR(60) DISCINT(30) CLUSTER(CLUS_A9)


  ( * = Full Repository , 另一個 QM1 送出)
  DEF CHL(TO.CLUS_A9.QM3) CHLTYPE(CLUSSDR) REPLACE +
   TRPTYPE(TCP) CONNAME('localhost(9053)') +
   SHORTRTY(600) SHORTTMR(60) DISCINT(30) CLUSTER(CLUS_A9)

 d. If your Queue Manager is to be a Full Repository,
  ALTER the Queue Manager to include the Cluster name.
   ALTER QMGR REPOS(CLUS_A9)   (要加入Cluster的QM則執行)

  (ps : 這邊也是,可以Check一下,CLUS_A9完整生出來了... lol )
   Verify on QM1 => DIS CLUSQMGR(*)
   Verify on QM1 => DIS CHSTATUS(*)

   Verify on QM1 => PING CHL(TO.CLUS_A9.QM3)
   Verify on QM3 => PING CHL(TO.CLUS_A9.QM1)
  ( 成功的話, => AMQ8020: 連通測試 WebSphere MQ 通道完成。)
  ( 失敗的話, => AMQ9547: 遠端通道的類型不適合所要求的動作。 )
 AMQ9547 : Type of remote channel not suitable for action
 Cause :
  Its is not possible to start a Cluster Receiver CHANNEL that uses the group Listener port.
 Solution :
  Start a non-shared Listener (INDISP(QMGR)) and ALTER the Cluster Receiver CHANNEL to
  use its port number rather than the group Listener port.

5. Wait until all CHANNELs timed out upon the DISCINT value.
 
6. What is the CURDEPTH on the SYSTEM.CLUSTER.REPOSITORY.QUEUE ?
 DIS Q(SYSTEM.CLUSTER.REPOSITORY.QUEUE) CURDEPTH
 
[B. Set up the Cluster application objects.]
7. Define the Cluster application objects required on your Queue Manager.
 Define all Queues with DEFPSIST(YES) and for all Cluster Queue DEFBIND(OPEN).
 a. Two or more Local Cluster Queue QL.C#
  (exist in more then one Queue Manager = multi instance Q)

 (On QM1 or QM3, 我們先在 QM1 上 DEF 測試一下)
 DEF QL(QL.C1) REPLACE DEFPSIST(YES) DEFBIND(OPEN) CLUSTER(CLUS_A9)

8. Wait until all CHANNELs timed out unpon the DISCINT value.

9. What is now the CURDEPTH on the SYSTEM.CLUSTER.REPOSITORY.QUEUE ?
 DIS Q(SYSTEM.CLUSTER.REPOSITORY.QUEUE) CURDEPTH

[C. Test Clustering.]
10. Prepare a text file with 9 messages.
  Each message should contain a sequence number.
  Use this text file in the following steps with amqsput via standard input.

 Example :
 0001 MSG msg text1............
 0002 MSG msg text22...........
 0003 MSG msg text333..........
 0004 MSG msg text4............
 0005 MSG msg text55...........
 0006 MSG msg text666..........
 0007 MSG msg text7............
 0008 MSG msg text88...........
 0009 MSG msg text999..........


11. From the non-repository Queue Manager, run amqsput to put messages to the
 Cluster Queue that is not defined on that Queue Manager.
 Check which Queue and Queue Manager the messages arrive on.
 a. QL.C# (Cluster Queue)
  # amqsput QL.C1 QM3
  - All messages are put again on one instance of Queue.
  - CHANNEL activity to full repository and to QM where messages are put.
 
12. Set DEFBIND(NOTFIXED) for all Cluster Queues on all Queue Managers in your Cluster.
 Is there any CHANNEL activity in the whole Cluster ?
 - ALTER QL(QL.C1) DEFBIND(NOTFIXED)
or - DEF QL(QL.C1) REPLACE DEFPSIST(YES) DEFBIND(NOTFIXED) CLUSTER(CLUS_A9)
 - Yes, because the change of the DEFBIND attribute has to be communicated.

13. On which instances of the destination Queue do the messages arrive ?
 Is there any CHANNEL activity ?
 - The messages are now distributed between all instances of the Queue QL.C# (Round Robin)
 - Because of remote operations we have CHANNEL activity.

14. Stop one of the Remote Cluster Queue Managers.

15. Again put 9 messages to the Cluster Queue that is not Local on your Queue Manager.
 - the messages are now put to the remaining instances of the Queue QL.C#

16. Restart the previously stopped Cluster Queue Manager.

17. Disable puts on all Queue instances of QL.C# in your Cluster.

18. Again put 9 messages to QL.C#

19. Explain the error indication you get :
 - Reason Code 2268 is returned to the putting application.
  The status PUT(DISABLED) is also know on the Local Queue Manager even all
  instances are located on Remote QMGR in the Cluster.
 - The Cluster Queue entry in the Local Queue Manager is holding this information.
 - DIS Q(QL.C*) CLUSINFO ALL


Full Repository : is a Queue Manager that hosts a complete set of information about every Queue Manager in the cluster.

Partial Repositories : are other Queue Managers in the cluster [that] inquire about the information in the full repositories and build up their own subsets of this information.

※ If MQ Cluster configured with only one Full Repository, it has a single point of failure. The Cluster won't function if that Full Repository goes down. Otherwise, by using multiple Full Repositories, if one Full Repository goes down, the other Full Repositories will take over to manage the Cluster.

※ Each Queue Manager should have at least one Cluster-Sender CHANNEL (CLUSSDR) and one Cluster-Receiver CHANNEL (CLUSRCVR), regardless if the Queue Manager is a full or a partial repository. The only exception to this is for MQ Clusters with only one full repository. This full repository should only have a Cluster-Receiver CHANNEL (CLUSRCVR).

※ A Full Repository pushes its information via a Cluster-Sender CHANNEL (CLUSSDR) to another full repository's Cluster-Receiver CHANNEL (CLUSRCVR). These two CHANNELs should have the same name.



=== Set MQ Cluster Using IBM WebSphereMQ Explorer ===
(E1) Queue Manager Clusters :
 http://publibfp.boulder.ibm.com/epubs/pdf/csqzah07.pdf

(E2) Configuring WebSphereMQ Cluster :