Design Con 2015

Multicore architectures, Part 3 - Communications and memory

Frank Schirrmeister, Cadence Design Systems -July 16, 2013

Editor's Note: Multicore architectures find use across a diverse range of applications thanks to their performance and efficiency. By combining several general-purpose MCU cores -- or MCU cores and specialized cores such as DSPs -- IC manufacturers can deliver devices well tuned to specific application requirements. Real World Multicore Embedded Systems brings together specialists offering the latest thinking on each facet of a multicore architecture. This excerpt offers an in-depth review and discussion on the nature of multicore architectures themselves. In Part 1, the author led us through a thoughtful review of the key motivations behind those architectures. Part 2 offered a review of the key characteristics of multicore architectures.


Adapted from "Real World Multicore Embedded Systems, 1st Edition", B Moyer, Editor (Newnes)


Communication architectures
The second fundamental aspect of multicore processors and systems is the way processors communicate with each other and with memory, on and off chip.

In the compute space, different approaches for off-chip communication exist, by using either dedicated links to connect chips together or a central memory-controller architecture.

HyperTransport is a generalized point-to-point I/O interface that’s been enhanced to support cache coherency. As an example, it is used for the I/O with AMD’s Athlon chips, but also for the nVidia nForce 500, 600 and 700 series. It also finds its use for I/O and non- uniform memory access (NUMA) support in AMD’s Opteron chips, which lets designers create a multiple-chip system without any intervening chips like the memory host controllers found in other multi-chip solutions. Programming of those systems is influenced by the number of hops between chips and the frequency of multi-hop memory accesses, which is application- and software-environment- specific.

In contrast, Intel keeps the processor chips simpler by leaving the memory and I/O interface management up to a central memory host controller.

As with off-chip interconnect, there are various options for interconnect between processors on a single die. Figure 3-8 shows the most basic approach of a central bus system, which may be hierarchical. It connects several masters with peripherals and memory.


Figure 3-8. Classic SoC with shared bus.

Software programmability highly depends on the bus availability and locality of data for tasks running on the individual processors. If memory accesses are not guarded correctly, the locks and data races (as addressed in more detail in the software synchronization chapter) can lead to deadlocks and potentially functionally incorrect systems. The most common issues are as follows:

  • Data races occur when two or more threads or processors are trying to access the same resource at the same time, where at least one of them is changing its state. If the threads or processors are not synchronized effectively, it is impossible to know which one will access the resource first. This leads to inconsistent results in the running program. 
  • Stalls happen when users have one thread or processor that has locked a certain resource and then moves on to other work in the program without first releasing the lock. When a second thread or processor tries to access that resource it is forced to wait for a possibly infinite amount of time, causing a stall. Even if a resource does not get locked for infinite amounts of time, they can cause severe performance issues if a processor is not well used because a thread running on it is stalled.
  • Deadlocks are similar to a stall, but occur when using a locking hierarchy. If, for example, Thread 1 or Processor 1 locks variable or memory region A and then wants to lock variable or memory region B while Thread 2 or Processor 2 is simultaneously locking variable or memory region B and then trying to lock variable or memory region A, the threads or processors are going to deadlock.
  • False sharing is not necessarily an error in the program, but an issue affecting performance. It occurs when two threads or processors are manipulating different data values that lie on the same cache line. On an SMP system, the memory system must ensure cache coherency and will therefore swap the values every time an access occurs.
  • Memory corruption occurs when a program writes to an incorrect memory region. This can happen in serial programs but it is even more difficult to find in parallel ones.


Loading comments...

Write a Comment

To comment please Log In

FEATURED RESOURCES