Rajan, Kaushik and Govindarajan, Ramaswamy (2009) A Novel Cache Architecture and Placement Framework for Packet Forwarding Engines. In: IEEE Transactions on Computers, 58 (8). pp. 1009-1025.
getPDF.pdf - Published Version
Restricted to Registered users only
Download (4064Kb) | Request a copy
Packet forwarding is a memory-intensive application requiring multiple accesses through a trie structure. With the requirement to process packets at line rates, high-performance routers need to forward millions of packets every second with each packet needing up to seven memory accesses. Earlier work shows that a single cache for the nodes of a trie can reduce the number of external memory accesses. It is observed that the locality characteristics of the level-one nodes of a trie are significantly different from those of lower level nodes. Hence, we propose a heterogeneously segmented cache architecture (HSCA) which uses separate caches for level-one and lower level nodes, each with carefully chosen sizes. Besides reducing misses, segmenting the cache allows us to focus on optimizing the more frequently accessed level-one node segment. We find that due to the nonuniform distribution of nodes among cache sets, the level-one nodes cache is susceptible t high conflict misses. We reduce conflict misses by introducing a novel two-level mapping-based cache placement framework. We also propose an elegant way to fit the modified placement function into the cache organization with minimal increase in access time. Further, we propose an attribute preserving trace generation methodology which emulates real traces and can generate traces with varying locality. Performanc results reveal that our HSCA scheme results in a 32 percent speedup in average memory access time over a unified nodes cache. Also, HSC outperforms IHARC, a cache for lookup results, with as high as a 10-fold speedup in average memory access time. Two-level mappin further enhances the performance of the base HSCA by up to 13 percent leading to an overall improvement of up to 40 percent over the unified scheme.
|Item Type:||Journal Article|
|Additional Information:||Copyright 2009 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.|
|Keywords:||Special-purpose and application-based systems; design; performance; experimentation; cache architectures; network processors; synthetic trace generation; trace driven simulation|
|Department/Centre:||Division of Information Sciences > Supercomputer Education & Research Centre|
|Date Deposited:||29 Dec 2009 08:40|
|Last Modified:||19 Sep 2010 05:39|
Actions (login required)