ACM Queue - Processors http://queue.acm.org/listing.cfm?item_topic=Processors&qc_type=topics_list&filter=Processors&page_title=Processors&order=desc NUMA (Non-Uniform Memory Access): An Overview http://queue.acm.org/detail.cfm?id=2513149 NUMA (non-uniform memory access) is the phenomenon that memory at various points in the address space of a processor have different performance characteristics. At current processor speeds, the signal path length from the processor to memory plays a significant role. Increased signal path length not only increases latency to memory but also quickly becomes a throughput bottleneck if the signal path is shared by multiple processors. The performance differences to memory were noticeable first on large-scale systems where data paths were spanning motherboards or chassis. These systems required modified operating-system kernels with NUMA support that explicitly understood the topological properties of the system's memory (such as the chassis in which a region of memory was located) in order to avoid excessively long signal path lengths. (Altix and UV, SGI's large address space systems, are examples. The designers of these products had to modify the Linux kernel to support NUMA; in these machines, processors in multiple chassis are linked via a proprietary interconnect called NUMALINK.) Processors Fri, 09 Aug 2013 12:36:49 GMT Christoph Lameter 2513149 Realtime GPU Audio http://queue.acm.org/detail.cfm?id=2484010 Today's CPUs are capable of supporting realtime audio for many popular applications, but some compute-intensive audio applications require hardware acceleration. This article looks at some realtime sound-synthesis applications and shares the authors' experiences implementing them on GPUs (graphics processing units). Processors Wed, 08 May 2013 21:15:18 GMT Bill Hsu, Marc Sosnick-P&#233;rez 2484010 FPGA Programming for the Masses http://queue.acm.org/detail.cfm?id=2443836 When looking at how hardware influences computing performance, we have GPPs (general-purpose processors) on one end of the spectrum and ASICs (application-specific integrated circuits) on the other. Processors are highly programmable but often inefficient in terms of power and performance. ASICs implement a dedicated and fixed function and provide the best power and performance characteristics, but any functional change requires a complete (and extremely expensive) re-spinning of the circuits. Processors Sat, 23 Feb 2013 03:43:14 GMT David Bacon, Rodric Rabbah, Sunil Shukla 2443836 CPU DB: Recording Microprocessor History http://queue.acm.org/detail.cfm?id=2181798 <h1 class='hidetitle'>CPU DB: Recording Microprocessor History</h1> <h2>With this open database, you can mine microprocessor trends over the past 40 years.</h2> <br /> <h3>Andrew Danowitz, Kyle Kelley, James Mao, John P. Stevenson, Mark Horowitz, Stanford University</h3> <br /> <p>In November 1971, Intel introduced the world&rsquo;s first single-chip microprocessor, the Intel 4004. It had 2,300 transistors, ran at a clock speed of up to 740 KHz, and delivered <I>60,000</I> instructions per second while dissipating 0.5 watts. The following four decades witnessed exponential growth in compute power, a trend that has enabled applications as diverse as climate modeling, protein folding, and computing real-time ballistic trajectories of angry birds. Today&rsquo;s microprocessor chips employ billions of transistors, include multiple processor cores on a single silicon die, run at clock speeds measured in gigahertz, and deliver more than 4 million times the performance of the original 4004. </P> <p>Where did these incredible gains come from? This article sheds some light on this question by introducing CPU DB (cpudb.stanford.edu), an open and extensible database collected by Stanford&rsquo;s VLSI (very large-scale integration) Research Group over several generations of processors (and students). We gathered information on commercial processors from 17 manufacturers and placed it in CPU DB, which now contains data on 790 processors spanning the past 40 years.</P> Processors Fri, 06 Apr 2012 22:17:03 GMT Andrew Danowitz, Kyle Kelley, James Mao, John P. Stevenson, Mark Horowitz 2181798 Managing Contention for Shared Resources on Multicore Processors http://queue.acm.org/detail.cfm?id=1709862 <h2>Managing Contention for Shared Resources on Multicore Processors</h2> <h4>Alexandra Fedorova, Sergey Blagodurov, Sergey Zhuravlev; Simon Fraser University</h4> <h3>Contention for caches, memory controllers, and interconnects can be alleviated by contention-aware scheduling algorithms.</h3> <p>Modern multicore systems are designed to allow clusters of cores to share various hardware structures, such as LLCs (last-level caches; for example, L2 or L3), memory controllers, and interconnects, as well as prefetching hardware. We refer to these resource-sharing clusters as <i>memory domains</i>, because the shared resources mostly have to do with the memory hierarchy. Figure 1 provides an illustration of a system with two memory domains and two cores per domain. </p> <p>Threads running on cores in the same memory domain may compete for the shared resources, and this contention can significantly degrade their performance relative to what they could achieve running in a contention-free environment. Consider an example demonstrating how contention for shared resources can affect application performance. In this example, four applications&mdash;Soplex, Sphinx, Gamess, and Namd, from the SPEC (Standard Performance Evaluation Corporation) CPU 2006 benchmark suite<sup>6</sup>&mdash;run simultaneously on an Intel Quad-Core Xeon system similar to the one depicted in figure 1.</p> Processors Wed, 20 Jan 2010 22:46:23 GMT Alexandra Fedorova, Sergey Blagodurov, Sergey Zhuravlev 1709862 Reconfigurable Future http://queue.acm.org/detail.cfm?id=1388771 <h3>Reconfigurable Future</h3> <h4>The ability to produce cheaper, more compact chips is a double-edged sword.</h4> <h4>Mark Horowitz, Stanford University</h4> <p> Predicting the future is notoriously hard. Sometimes I feel that the only real guarantee is that the future will happen, and that someone will point out how it's not like what was predicted. Nevertheless, we seem intent on trying to figure out what will happen, and worse yet, recording these views so they can be later used against us. So here I go... </p> <p> Scaling has been driving the whole electronics industry, allowing it to produce chips with more transistors at a lower cost. But this trend is a double-edged sword: We not only need to figure out more complex devices, which people want, but we also must determine which complex devices lots of people want, as we have to sell many, many chips to amortize the significant design cost. </p> Processors Mon, 14 Jul 2008 15:03:29 GMT Mark Horowitz 1388771 The Price of Performance http://queue.acm.org/detail.cfm?id=1095420 <h1>The Price of Performance</h1> <h3>An Economic Case for Chip Multiprocessing </h3> <h4>LUIZ ANDR&Eacute; BARROSO, GOOGLE</h4> <p>In the late 1990s, our research group at DEC was one of a growing number of teams advocating the CMP (chip multiprocessor) as an alternative to highly complex single-threaded CPUs. We were designing the Piranha system,1 which was a radical point in the CMP design space in that we used very simple cores (similar to the early RISC designs of the late &rsquo;80s) to provide a higher level of thread-level parallelism. Our main goal was to achieve the best commercial workload performance for a given silicon budget.</p><p> Today, in developing Google&rsquo;s computing infrastructure, our focus is broader than performance alone. The merits of a particular architecture are measured by answering the following question: Are you able to afford the computational capacity you need? The high-computational demands that are inherent in most of Google&rsquo;s services have led us to develop a deep understanding of the overall cost of computing, and continually to look for hardware/software designs that optimize performance per unit of cost.</p> Processors Tue, 18 Oct 2005 14:14:21 GMT Luiz Andr&#233; Barroso 1095420 Extreme Software Scaling http://queue.acm.org/detail.cfm?id=1095419 <h1>Extreme Software Scaling</h1> <h3>Chip multiprocessors have introduced a new dimension in scaling for application developers, operating system designers, and deployment specialists. </h3> <h4>RICHARD MCDOUGALL, SUN MICROSYSTEMS</h4> <p>The advent of SMP (symmetric multiprocessing) added a new degree of scalability to computer systems. Rather than deriving additional performance from an incrementally faster microprocessor, an SMP system leverages multiple processors to obtain large gains in total system performance. Parallelism in software allows multiple jobs to execute concurrently on the system, increasing system throughput accordingly. Given sufficient software parallelism, these systems have proved to scale to several hundred processors.</p><p> More recently, a similar phenomenon is occurring at the chip level. Rather than pursue diminishing returns by increasing individual processor performance, manufacturers are producing chips with multiple processor cores on a single die. (See &ldquo;The Future of Microprocessors,&rdquo; by Kunle Olukotun and Lance Hammond, in this issue.) For example, the AMD Opteron1 processor now uses two entire processor cores per die, providing almost double the performance of a single core chip. The Sun Niagara2 processor, shown in figure 1, uses eight cores per die, where each core is further multiplexed with four hardware threads each.</p> Processors Tue, 18 Oct 2005 14:14:01 GMT Richard McDougall 1095419 The Future of Microprocessors http://queue.acm.org/detail.cfm?id=1095418 <h1>The Future of Microprocessors</h1> <h3>Chip multiprocessors&rsquo; promise of huge performance gains is now a reality.</h3> <h4>KUNLE OLUKOTUN AND LANCE HAMMOND, STANFORD UNIVERSITY</h4> <p>The performance of microprocessors that power modern computers has continued to increase exponentially over the years for two main reasons. First, the transistors that are the heart of the circuits in all processors and memory chips have simply become faster over time on a course described by Moore&rsquo;s law,1 and this directly affects the performance of processors built with those transistors. Moreover, actual processor performance has increased faster than Moore&rsquo;s law would predict,2 because processor designers have been able to harness the increasing numbers of transistors available on modern chips to extract more parallelism from software. This is depicted in figure 1 for Intel&rsquo;s processors.</p> <p><img src="/load/view.php?a=aHR0cDovL2RlbGl2ZXJ5aW1hZ2VzLmFjbS5vcmcvMTAuMTE0NS8xMTAwMDAwLzEwOTU0MTgvb2x1a290dW5maWcxLmpwZw" width="486" height="351"></p> Processors Tue, 18 Oct 2005 14:13:42 GMT Kunle Olukotun, Lance Hammond 1095418 Digitally Assisted Analog Integrated Circuits http://queue.acm.org/detail.cfm?id=984494 <h3>Digitally Assisted Analog Integrated Circuits<br> <em>BORIS MURMANN, STANFORD UNIVERSITY<br> BERNHARD BOSER, UC BERKELEY</em></h3> <h4>Closing the gap between analog and digital</h4> <p>In past decades, &#8220;Moore&#8217;s law&#8221;1 has governed the revolution in microelectronics. Through continuous advancements in device and fabrication technology, the industry has maintained exponential progress rates in transistor miniaturization and integration density. As a result, microchips have become cheaper, faster, more complex, and more power efficient.</p> <p> We will show, however, that digital performance metrics have grown significantly faster than corresponding measures for analog circuits, especially ADCs (analog-to-digital converters). Since most DSP (digital signal processor) projects depend on A/D conversion in the interfaces, this growing disparity in relative performance increase has the potential to threaten the rate of progress of DSP hardware.</p> Processors Fri, 16 Apr 2004 10:14:19 GMT Boris Murmann, Bernhard Boser 984494