ACM Queue - Concurrency http://queue.acm.org/listing.cfm?item_topic=Concurrency&qc_type=topics_list&filter=Concurrency&page_title=Concurrency&order=desc Scaling Synchronization in Multicore Programs: Advanced synchronization methods can boost the performance of multicore software. http://queue.acm.org/detail.cfm?id=2991130 Designing software for modern multicore processors poses a dilemma. Traditional software designs, in which threads manipulate shared data, have limited scalability because synchronization of updates to shared data serializes threads and limits parallelism. Alternative distributed software designs, in which threads do not share mutable data, eliminate synchronization and offer better scalability. But distributed designs make it challenging to implement features that shared data structures naturally provide, such as dynamic load balancing and strong consistency guarantees, and are simply not a good fit for every program. Often, however, the performance of shared mutable data structures is limited by the synchronization methods in use today, whether lock-based or lock-free. To help readers make informed design decisions, this article describes advanced (and practical) synchronization methods that can push the performance of designs using shared mutable data to levels that are acceptable to many applications. Concurrency Tue, 23 Aug 2016 14:15:04 GMT Adam Morrison 2991130 Challenges of Memory Management on Modern NUMA System: Optimizing NUMA systems applications with Carrefour http://queue.acm.org/detail.cfm?id=2852078 Modern server-class systems are typically built as several multicore chips put together in a single system. Each chip has a local DRAM (dynamic random-access memory) module; together they are referred to as a node. Nodes are connected via a high-speed interconnect, and the system is fully coherent. This means that, transparently to the programmer, a core can issue requests to its node's local memory as well as to the memories of other nodes. The key distinction is that remote requests will take longer, because they are subject to longer wire delays and may have to jump several hops as they traverse the interconnect. The latency of memory-access times is hence non-uniform, because it depends on where the request originates and where it is destined to go. Such systems are referred to as NUMA (non-uniform memory access). Concurrency Tue, 01 Dec 2015 13:05:48 GMT Fabien Gaud, Baptiste Lepers, Justin Funston, Mohammad Dashti, Alexandra Fedorova, Vivien Quéma, Renaud Lachaize, Mark Roth 2852078 Parallel Processing with Promises: A simple method of writing a collaborative system http://queue.acm.org/detail.cfm?id=2742696 In today's world, there are many reasons to write concurrent software. The desire to improve performance and increase throughput has led to many different asynchronous techniques. The techniques involved, however, are generally complex and the source of many subtle bugs, especially if they require shared mutable state. If shared state is not required, then these problems can be solved with a better abstraction called promises. These allow programmers to hook asynchronous function calls together, waiting for each to return success or failure before running the next appropriate function in the chain. Concurrency Tue, 03 Mar 2015 16:17:56 GMT Spencer Rathbun 2742696 Scalability Techniques for Practical Synchronization Primitives: Designing locking primitives with performance in mind http://queue.acm.org/detail.cfm?id=2698990 In an ideal world, applications are expected to scale automatically when executed on increasingly larger systems. In practice, however, not only does this scaling not occur, but it is common to see performance actually worsen on those larger systems. Concurrency Sun, 14 Dec 2014 22:54:57 GMT Davidlohr Bueso 2698990 Productivity in Parallel Programming: A Decade of Progress: Looking at the design and benefits of X10 http://queue.acm.org/detail.cfm?id=2682913 In 2002 DARPA (Defense Advanced Research Projects Agency) launched a major initiative in HPCS (high-productivity computing systems). The program was motivated by the belief that the utilization of the coming generation of parallel machines was gated by the difficulty of writing, debugging, tuning, and maintaining software at peta scale. Concurrency Mon, 20 Oct 2014 16:34:27 GMT John T. Richards, Jonathan Brezin, Calvin B. Swart, Christine A. Halverson 2682913 Scaling Existing Lock-based Applications with Lock Elision: Lock elision enables existing lock-based programs to achieve the performance benefits of nonblocking synchronization and fine-grain locking with minor software engineering effort. http://queue.acm.org/detail.cfm?id=2579227 Multithreaded applications take advantage of increasing core counts to achieve high performance. Such programs, however, typically require programmers to reason about data shared among multiple threads. Programmers use synchronization mechanisms such as mutual-exclusion locks to ensure correct updates to shared data in the presence of accesses from multiple threads. Unfortunately, these mechanisms serialize thread accesses to the data and limit scalability. Concurrency Sat, 08 Feb 2014 10:57:30 GMT Andi Kleen 2579227 The Balancing Act of Choosing Nonblocking Features: Design requirements of nonblocking systems http://queue.acm.org/detail.cfm?id=2513575 What is nonblocking progress? Consider the simple example of incrementing a counter C shared among multiple threads. One way to do so is by protecting the steps of incrementing C by a mutual exclusion lock L (i.e., acquire(L); old := C ; C := old+1; release(L);). If a thread P is holding L, then a different thread Q must wait for P to release L before Q can proceed to operate on C. That is, Q is blocked by P. Concurrency Mon, 12 Aug 2013 18:06:14 GMT Maged M. Michael 2513575 Nonblocking Algorithms and Scalable Multicore Programming: Exploring some alternatives to lock-based synchronization http://queue.acm.org/detail.cfm?id=2492433 Real-world systems with complicated quality-of-service guarantees may require a delicate balance between throughput and latency to meet operating requirements in a cost-efficient manner. The increasing availability and decreasing cost of commodity multicore and many-core systems make concurrency and parallelism increasingly necessary for meeting demanding performance requirements. Unfortunately, the design and implementation of correct, efficient, and scalable concurrent software is often a daunting task. Concurrency Tue, 11 Jun 2013 23:53:23 GMT Samy Al Bahra 2492433 Proving the Correctness of Nonblocking Data Structures: So you’ve decided to use a nonblocking data structure, and now you need to be certain of its correctness. How can this be achieved? http://queue.acm.org/detail.cfm?id=2490873 Nonblocking synchronization can yield astonishing results in terms of scalability and realtime response, but at the expense of verification state space. Concurrency Sun, 02 Jun 2013 09:33:34 GMT Mathieu Desnoyers 2490873 Structured Deferral: Synchronization via Procrastination: We simply do not have a synchronization mechanism that can enforce mutual exclusion. http://queue.acm.org/detail.cfm?id=2488549 Developers often take a proactive approach to software design, especially those from cultures valuing industriousness over procrastination. Lazy approaches, however, have proven their value, with examples including reference counting, garbage collection, and lazy evaluation. This structured deferral takes the form of synchronization via procrastination, specifically reference counting, hazard pointers, and RCU (read-copy-update). Concurrency Thu, 23 May 2013 13:27:44 GMT Paul E. McKenney 2488549