DSP

gem5 prefetcher

2019-07-13 19:14发布

最近在gem5上做预取实验,添加自己的预取算法,这里采用hardware stream prefetcher , 修改了几个bug才给实验调试通过,发文记录下实验过程。
gem5上添加自己的预取算法步骤:
(1)路径gem5-master/configs/common/Caches.py下,开启预取: class L1Cache(Cache): assoc = 2 tag_latency = 2 data_latency = 2 response_latency = 2 mshrs = 4 tgts_per_mshr = 20 prefetcher = StridePrefetcher(degree=8, latency=1.0) #添加此行代码 # prefetch_policy='tagged' #若是老版本, 添加此行代码 class L2Cache(Cache): assoc = 8 tag_latency = 20 data_latency = 20 response_latency = 20 mshrs = 20 tgts_per_mshr = 12 write_buffers = 8 prefetcher = StridePrefetcher(degree=8, latency=1.0) #添加此行代码 # prefetch_policy='tagged' #若是老版本, 添加此行代码 (2)在路径gem5-master/src/mem/cache/prefetch/下,添加自己的预取算法:主要是stream.hh和stream.cc文件。
(3)在路径gem5-master/src/mem/cache/prefetch/下的Prefetcher.py中配置: class StreamPrefetcher(QueuedPrefetcher): type = 'StreamPrefetcher' cxx_class = 'StreamPrefetcher' cxx_header = "mem/cache/prefetch/stream.hh" table_sets = Param.Int(16, "Number of sets in PC lookup table") table_assoc = Param.Int(4, "Associativity of PC lookup table") tableSize = Param.Int(8, "Number of sets in PC lookup table") distance = Param.Int(5, "Associativity of PC lookup table") use_master_id = Param.Bool(True, "Use master id based history") degree = Param.Int(4, "Number of prefetches to generate") (4)在路径gem5-master/src/mem/cache/prefetch/下的Sconscript中配置: Import('*') SimObject('Prefetcher.py') Source('base.cc') Source('queued.cc') Source('stride.cc') Source('tagged.cc') Source('stream.cc') #添加此行代码 注意:在这里若没有配置,在编译的时候会报错: > build/X86/python/_m5/param_StreamPrefetcher_wrap.o: In function > `_wrap_StreamPrefetcherParams_create': > /home/jyf/download/gem_nvmain/gem5-master/build/X86/python/_m5/param_StreamPrefetcher_wrap.cc:4549: > undefined reference to `StreamPrefetcherParams::create()' collect2: > error: ld returned 1 exit status scons: *** [build/X86/gem5.opt] Error > 1 scons: building terminated because of errors. 原因是由于缺少以上配置,stream.cc没有生成stream.o文件,文件无法连接。 在编译的过程中会生成stream.o , sreamPrefetcher.hh(gem5-master/build/ARM/params/下),
param_StreamPrefetcher_wrap.cc(build/ARM/python/_m5/下)。这些文件里面都有StreamPrefetcher * create()相关联。
(5)由于我找的hardware stream预取算法比较老,版本不匹配,这里还需要修改 stream.cc ,stream.hh源码,: stream.cc 中:
Addr 改为 AddrPriority StreamPrefetcher::calculatePrefetch(const PacketPtr &pkt, std::vector &addresses) { uint32_t core_id = pkt->req->hasContextId() ? pkt->req->contextId() : -1; //uint32_t core_id = pkt->req->contextId(); //if (core_id < 0) { if (!pkt->req->contextId()) { DPRINTF(HWPrefetch, "ignoring request with no core ID"); return; } ....... for (uint8_t d = 1; d <= degree; d++) { Addr pf_addr = table[i]->endAddr + blkSize * d; AddrPriority addrp; addrp.first=pf_addr; addresses.push_back(addrp); //addresses.push_back(pf_addr); DPRINTF(HWPrefetch, "Queuing prefetch to %#x. ", pf_addr); } ...... for (uint8_t d = 1; d <= degree; d++) { Addr pf_addr = table[i]->endAddr - blkSize * d; AddrPriority addrp; addrp.first=pf_addr; addresses.push_back(addrp); //addresses.push_back(pf_addr); DPRINTF(HWPrefetch, "Queuing prefetch to %#x. ", pf_addr); } } stream.hh中:
Addr 改为 AddrPriority void calculatePrefetch(const PacketPtr &pkt, std::vector &addresses); 这里若没有修改,会报出以下错误:子类没有实现父类的虚函数,实际上是版本不兼容的问题。 stream.cc:182:34: error: invalid new-expression of abstract class type 'StreamPrefetcher'virtual void calculatePrefetch(const PacketPtr &pkt,std::vector &addresses) ; (6) 设置cpu-type = Timing
gem5-master/configs/common/cpuConfig.py
源码分析:
cpul类型为:默认是detailed _cpu_aliases_all = [ ("timing", "TimingSimpleCPU"), ("atomic", "AtomicSimpleCPU"), ("minor", "MinorCPU"), ("detailed", "DerivO3CPU"), ("kvm", ("ArmKvmCPU", "ArmV8KvmCPU", "X86KvmCPU")), ("trace", "TraceCPU"), ] 更改为 timing: m5.objects.TimingSimpleCPU, def config_etrace(cpu_cls, cpu_list, options): if issubclass(cpu_cls, m5.objects.TimingSimpleCPU): # Assign the same file name to all cpus for now. This must be # revisited when creating elastic traces for multi processor systems. for cpu in cpu_list: # Attach the elastic trace probe listener. Set the protobuf trace # file names. Set the dependency window size equal to the cpu it # is attached to. cpu.traceListener = m5.objects.ElasticTrace( instFetchTraceFile = options.inst_trace_file, dataDepTraceFile = options.data_trace_file, depWindowSize = 3 * cpu.numROBEntries) # Make the number of entries in the ROB, LQ and SQ very # large so that there are no stalls due to resource # limitation as such stalls will get captured in the trace # as compute delay. For replay, ROB, LQ and SQ sizes are # modelled in the Trace CPU. cpu.numROBEntries = 512; cpu.LQEntries = 128; cpu.SQEntries = 128; else: fatal("%s does not support data dependency tracing. Use a CPU model of" " type or inherited from TimingSimpleCPU.", cpu_cls) (7)重新编译: sudo scons EXTRAS=../nvmain ./build/ARM/gem5.opt 这里可能还会报错: No module name specified using %module or -module. scons: *** [build/ARM/python/_m5/param_VirtIO9PBase_wrap.cc] Error 1 让人摸不着头脑,最后,把之前编译的都删除了 rm -rf ARM 重新编译,这次编译成功。
注意:编译的过程中,若有任何改动源码的部分,最好删除重新编译,不然会报些很莫名其妙的错误。 下面附上stream.cc和stream.hh源码:
stream.cc #include "debug/HWPrefetch.hh" #include "mem/cache/prefetch/stream.hh" StreamPrefetcher::StreamPrefetcher(const StreamPrefetcherParams *p) : QueuedPrefetcher(p), tableSize(p->tableSize), useMasterId(p->use_master_id), degree(p->degree), distance(p->distance) { for(int i=0; inew StreamTableEntry*[tableSize]; for(int j=0; jnew StreamTableEntry[tableSize]; StreamTable[i][j]->LRU_index = j; resetEntry(StreamTable[i][j]); } } } StreamPrefetcher::~StreamPrefetcher() { for (int i = 0; i < MaxContexts; i++) { for (int j = 0; j < tableSize; j++) { delete[] StreamTable[i][j]; } } }; // Training and Prefetching of streams void StreamPrefetcher::calculatePrefetch(const PacketPtr &pkt, std::vector &addresses) { uint32_t core_id = pkt->req->hasContextId() ? pkt->req->contextId() : -1; //uint32_t core_id = pkt->req->contextId(); //if (core_id < 0) { if (!pkt->req->contextId()) { DPRINTF(HWPrefetch, "ignoring request with no core ID"); return; } Addr blk_addr = pkt->getAddr() & ~(Addr)(blkSize-1); // cache block aligned address. assert(core_id < MaxContexts); StreamTableEntry** table; table = StreamTable[core_id]; // Per core stream training. uint32_t i; // Check if there is a stream entry with the same address as blk_addr for (i = 0; i < tableSize; i++) { switch (table[i]->status) { case MONITOR: if(table[i]->trainedDirection == ASCENDING) { // Ascending order if((table[i]->startAddr < blk_addr ) && ( table[i]->endAddr > blk_addr)) { // Hit to a stream, which is monitored. Issue prefetch requests based on the degree and the direction for (uint8_t d = 1; d <= degree; d++) { Addr pf_addr = table[i]->endAddr + blkSize * d; addresses.push_back(AddrPriority(pf_addr,0)); DPRINTF(HWPrefetch, "Queuing prefetch to %#x. ", pf_addr); } if((table[i]->endAddr + blkSize * degree) - table[i]->startAddr <= distance) { table[i]->endAddr = table[i]->endAddr + blkSize * degree; } else { table[i]->startAddr = table[i]->startAddr + blkSize * degree; table[i]->endAddr = table[i]->endAddr + blkSize * degree; } break; } } else if(table[i]->trainedDirection == DESCENDING) { // Descending order if((table[i]->startAddr > blk_addr ) && (table[i]->endAddr < blk_addr)) { for (uint8_t d = 1; d <= degree; d++) { Addr pf_addr = table[i]->endAddr - blkSize * d; addresses.push_back(AddrPriority(pf_addr,0)); DPRINTF(HWPrefetch, "Queuing prefetch to %#x. ", pf_addr); } if(table[i]->startAddr - (table[i]->endAddr - blkSize * degree) <= distance){ table[i]->endAddr = table[i]->endAddr - blkSize * degree; } else { table[i]->startAddr = table[i]->startAddr - blkSize * degree; table[i]->endAddr = table[i]->endAddr - blkSize * degree; } break; } } else{ assert(0); } break; case TRAINING: if ((abs(table[i]->allocAddr - blk_addr) <= (distance/2) * blkSize) ){ // Check whether the address is in +/- of distance if(table[i]->trendDirection[0] == INVALID){ table[i]->trendDirection[0] = (blk_addr - table[i]->allocAddr > 0) ? ASCENDING : DESCENDING; } else { assert(table[i]->trendDirection[1] == INVALID); table[i]->trendDirection[1] = (blk_addr - table[i]->allocAddr > 0) ? ASCENDING : DESCENDING; if(table[i]->trendDirection[0] == table[i]->trendDirection[1]) { table[i]->trainedDirection = table[i]->trendDirection[0]; table[i]->startAddr = table[i]->allocAddr; if(table[i]->trainedDirection != INVALID){ // Based on the trainedDirection (+1:Ascending, -1:Descending) update the end address of a stream table[i]->endAddr = blk_addr + (table[i]->trainedDirection) * blkSize * degree; } // Entry is ready for issuing prefetch requests table[i]->status = MONITOR; } else { resetEntry(table[i]); } } break; } break; default: break; } // End of Switch } // End of for loop uint32_t HIT_index=i; int INVALID_index = tableSize; for (int i=0; i//find empty entry if(table[i]->status==INV) { INVALID_index = i; break; } } int TEMP_index = -1; int LRU_index = -1000000; for (int i=0; i//find empty entry if(table[i]->LRU_index > TEMP_index) { TEMP_index = table[i]->LRU_index; LRU_index = i; } } assert(TEMP_index == tableSize - 1); int entry_id; if(HIT_index!=tableSize) { //hit entry_id = HIT_index; } else if (INVALID_index!=tableSize) { //Existence of invalid streams assert(table[INVALID_index]->status == INV); table[INVALID_index]->status = TRAINING; table[INVALID_index]->allocAddr = blk_addr; entry_id = INVALID_index; } else { //Replace the LRU stream-entry assert(table[LRU_index]->status!=INV); resetEntry(table[LRU_index]); table[LRU_index]->status = TRAINING; table[LRU_index]->allocAddr = blk_addr; entry_id = LRU_index; } // Shifting the table entries after the eviction of lru-id for (int i=0; iif(table[i]->LRU_index < table[entry_id]->LRU_index){ table[i]->LRU_index = table[i]->LRU_index + 1; } } table[entry_id]->LRU_index = 0; } void StreamPrefetcher::resetEntry(StreamTableEntry *this_entry) { this_entry->status = INV; this_entry->trendDirection[0] = INVALID; this_entry->trendDirection[1] = INVALID; this_entry->allocAddr = 0; this_entry->startAddr = 0; this_entry->endAddr = 0; this_entry->trainedDirection = INVALID; } StreamPrefetcher* StreamPrefetcherParams::create() { return new StreamPrefetcher(this); } stream.hh #ifndef __MEM_CACHE_PREFETCH_STREAM_HH__ #define __MEM_CACHE_PREFETCH_STREAM_HH__ #include "mem/cache/prefetch/queued.hh" #include "params/StreamPrefetcher.hh" // Direction of stream for each stream entry in the stream table enum StreamDirection{ ASCENDING = 1, // For example - A, A+1, A+2 DESCENDING = -1, // For example - A, A-1, A-2 INVALID = 0 }; // Status of a stream entry in the stream table. enum StreamStatus{ INV = 0, TRAINING = 1, // Stream training is not over yet. Once trained will move to MONITOR status MONITOR = 2 // Monitor and Request: Stream entry ready for issuing prefetch requests }; class StreamPrefetcher : public QueuedPrefetcher { protected: static const uint32_t MaxContexts = 64; // Creates per-core stream tables for upto 64 processor cores uint32_t tableSize; // Number of entries in a stream table const bool useMasterId; // Use the master-id to train the streams uint32_t degree; // Determines the number of prefetch reuquests to be issued at a time uint32_t distance; // Determines the prefetch distance /* StreamTableEntry Stores the basic attributes of a stream table entry. */ class StreamTableEntry { public: int LRU_index; Addr allocAddr; // Address that initiated the stream training Addr startAddr; // First address of a stream Addr endAddr; // Last address of a stream StreamDirection trainedDirection; // Direction of trained stream (Ascending or Descending) StreamStatus status; // Status of the stream entry StreamDirection trendDirection[2]; // Stores the last two stream directions of an entry }; void resetEntry (StreamTableEntry *this_entry); /* Creating a StreamTable for each core with Tablesize as the number of stream entries */ StreamTableEntry **StreamTable[MaxContexts]; public: StreamPrefetcher(const StreamPrefetcherParams *p); ~StreamPrefetcher(); /* Function called by cache controller to initiate the stream training process */ void calculatePrefetch(const PacketPtr &pkt, std::vector &addresses); }; #endif // __MEM_CACHE_PREFETCH_STREAM_HH__ 参考:
gem5预取实验
在添加自己的预取实验的过程中,可以参考gem5自带的一些预取算法的实现:stride.cc ,stride.hh 等。