CMOS winner-take-all circuits: A tutorial
The human vision-processing system is built of numerous complex neural layers that communicate with one another by means of feedforward and feedback neural connections. Via these connections, each neuron frequently makes signals to others at intro-layer or inter-layer locations by broadcasting electrical streams of pulses. Every time a neuron generates a pulse, its addressing information is sensed by a neural junction called synapse, which is temporally connected to a centric sensory line (also known as the bus), where many other neurons are simultaneously competing for the right of way in order to travel further. In such a competition, the general rule is: The recipient neuron at the end of the bus will only listen to neurons that are active when it is active (i.e., the winners are those who have stronger and more consistent signal intensity), and ignore the rest.
A winner-takes-all (WTA) circuit, which identifies the highest signal intensity among multiple inputs, is arguably the most important building block seen in various neural networks, fuzzy control systems, and increasingly often, in integrated image sensors and neuromorphic vision chips that aim to emulate or even outperform—although widely regarded with suspicion—the extremely optic-sensitive coat of the posterior part of the human eye that receives the image produced by the lens; namely, the retina. Once the neuron (also referred to as the cell) with the highest input signal is successfully selected by the WTA circuit, a certain value will be assigned to that winning cell by means of current or voltage, while all other cell’s nominal values will be set to null (i.e., they lose).
Indeed, I’ve found that WTA circuits designed in CMOS (Complementary Metal–Oxide Semiconductor) technology are suitable for the implementation of low-power and high-density neuromorphic chips.
Recalling what our professors said in device physics courses, we find that, when an NMOS transistor’s Vgs (the potential between the gate and the source) surpasses Vthn (the threshold voltage)—that is, when the value of the effective drain-source voltageVeff exceeds zero—a vertical electric field between the drain and the source is formed, and the device is said to be operating in inversion. When the drain-source voltage (Vds) exceeds zero, a horizontal electric field is formed between the drain and the source. The drain-source current (Id) gradually increases with Vds and Vgs. (Remember your professor’s water-flowing-through-pipe analogy?)
Once Vds reaches Veff, the device’s Id ~ Vds relationship is described by the following equation (this is widely known as the square-law I-V relationship),
μn is the electron mobility near the silicon surface, Cox is the gate capacitance per unit area, W is the gate width, and L is the effective channel length. If we take into account the channel-length modulation phenomenon (i.e., Early effect) [1-3], then the aforementioned equation should be rewritten to,
VE is the Early voltage and has a general expression as follows [1-3],
In addition, the Early voltage VE is usually employed to characterize a saturated MOS transistor’s small-signal output resistance rds. That is, rds ≈ VE /Id.
The foregoing square-law equations suffice to accurately describe an NMOS transistor operating in saturation, as long as its Veff satisfies this condition: Veff > 4VT = 4kT/q. (Sometimes an algebraically convenient alternative, Vds > 4VT, is employed by engineers in their paper-pencil calculations.) When an NMOS transistor’s Vgs is lower than its Vthn —that is, Veff & 0—and yet suffices to sustain a depletion region in the substrate, the transistor is said to be operating in subthreshold. (Typical subthreshold Veff values range between –100 mV and 0 V.) In such cases, the square law no longer applies.
Due to the presence of a negative Veff and the resultant diminished vertical electric field underneath the gate, the drift of majority carriers in the NMOS transistor is trivial. As a consequence, Id is principally produced by minority-carrier electrons diffusing in the direction of the concentration gradient, which is driven by the horizontal electric field created by Vds. (Minority-carrier electrons will flow from source to drain if Vds > 0, or from drain to source if Vds & 0.) In other words, in subthreshold, the depletion current (generated by diffusion) overtakes the inversion current (generated by drift).
Understanding the above, we say that a subthreshold MOS transistor behaves more like a three-terminal diode than a four-terminal field-effect device. In fact, when Vds > 0, an NMOS transistor operating in subthreshold is almost equivalent to an npn bipolar transistor, where the drain, the substrate and the source act as the collector, the base and the emitter, respectively. (When Vds & 0, the drain acts as the emitter and the source as the collector.) [1-2]
Thus, by employing classic device physics formulae appropriate for describing an npn bipolar transistor, we find the general description of Id in a subthreshold NMOS, as follows [1-2],
I0 is known as the scale current whose value depends mainly on the transistor’s physical geometry . The symbol κ stands for the derivative of the surface potential ψs with respect to the gate voltage Vg, which describes a capacitive voltage divider consisting of a gate (or oxide) capacitance Cox and a depletion capacitance Cdepletion (κ is a constant that takes value between 0.5 and 0.9 ).
As we can see, like in an npn bipolar transistor, the current flowing through a subthreshold NMOS transistor varies exponentially with respect to the voltage differences between terminals. Moreover, when Vds > 4VT, the last term in the foregoing equation is trivial—that is, Id is almost independent of Vds—and the subthreshold transistor is said to be operating in subthreshold saturation. In such a case, given a fixed Vgs, the device’s Id ~ Vds output curve is almost flat, implying a small dependence shown by Id on Vds whose weight is controlled by the channel-length modulation factor (λ), which is the reciprocal of the Early voltage,VE[1-3]. This particular characteristic of a subthreshold transistor is similar to that of an above-threshold one (see Equation (2)).
For describing a saturated subthreshold NMOS transistor, we rewrite Equation (2) as follows,
Given a fixed source potential (i.e., Vs is a constant), the NMOS transistor’s transconductance can be expressed by writing the following,
which is similar to the expression of a bipolar transistor’s transconductance.
With the above in mind, we realize that a subthrehold MOS transistor typically has a much larger transconductance-per-unit-current value (gm/Id) compared to that of an above-threshold counterpart (assuming both are of identical geometry) [1-3]. As a result, a subthreshold MOS transistor is able to achieve a higher level of sensitivity and a larger voltage gain than its above-threshold counterpart, while both are consuming the same amount of active power. This property is perhaps one of the main reasons why subthreshold MOS transistors are used widely in low-power analog applications such like the CMOS WTA circuit.
Finally, it is instructive to note that a general circuit (e.g., a current mirror) built of subthreshold CMOS transistors is susceptible to imperfections caused by device mismatching, such as an inaccurate current output or a reduced input dynamic range. To a CMOS WTA circuit, these imperfections do not necessarily mean a definite wrong selection, but they do hinder the circuit from tracking its winner adaptively, which more often than not is non-stationary.
The first CMOS winner-take-all (WTA) circuit was reported by . A two-cell version of the circuit is illustrated in Figure 1. It is assumed in this tutorial that corresponding constitutive elements (transistors) of all cells have identical physical geometries.
Let us begin analyzing this circuit by considering the scenario where its two current inputs are identical (Iin1 = Iin2 = I). Transistors M1a and M2a act as current sinks that sink the corresponding I currents into ground. Transistor M1b and M2b carry drain currents Ic1 and Ic2, respectively. Because M1a and M2a have identical gate (i.e., VC1 = VC2 = VC) and source potentials while sinking the same amount of current to ground, their drain potentials are identical (i.e., Vout1 = Vout2 = Vout). Thus, M1b and M2b have identical gate, source and drain potentials, resulting in a couple of identical drain currents (i.e., Ic1 = Ic2 = Ic). Assuming both cells have the identical output current mirrors, we have: Iout1 = Iout2 = Iout.
Transistors of each WTA cell in Figure 1 all operate in subthreshold as previously mentioned, offering the possibility of ultra low power dissipation, which are crucial to an integrated neural system. Now, we describe M1a and M2a by employing Equation (4), as follows,
M1b and M2b can be described in a similar manner,
Vdib stands for the drain voltage of Mib (i = 1, 2…). By solving for Vout as a function of I and Ic, we find that Vout logarithmically encodes the input current I (with an offset determined by the magnitude of Ic) . This behavior implies that each WTA cell can handle a rather wide input dynamic range, because Vout1 (Vout2) varies linearly—between two limits: Turning off and saturating M1a (M2a)—in accordance with an Iin ranging over several orders of magnitude.
In addition, each cell essentially realizes a current-to-voltage converter of logarithmic input-output transfer characteristics, which may find further applications in modern fiber-optic transceivers and continuous-time analog-digital converters.
Next, let us consider the situation where Iin1 >> Iin2. Based on Equation (7), we find that the difference between Iin1 and Iin2 is reflected by the square-bracketed term—that is, —for M1a and M2a share the same gate voltage, VC.
Thus, Iin1 >> Iin2 translates to Vout1 >> Vout2. Assuming Vout2 is sufficiently small to effectively shut off M2b, we find that Ic2 and hence Iout2 are reduced to null. As a result, both bias currents must flow through M1b, meaning Ic1 = 2Ibias. In this way, we say that Cell 1 has won over Cell 2 and taken all bias currents—that is, Cell 1 has inhibited Cell 2—and it can be shown that the winning Vout1 encodes Iin1 logarithmically as well .
The foregoing analysis of the winning cell is based on a large-signal assumption that Iin1 and Iin2 sufficiently different. When Iin1 and Iin2 are very similar, that assumption no longer holds and consequently we have to do a small-signal analysis, where the Early voltage (VE) will weigh in. Now, let us assume that Iin1 = I + ∆I, Iin2 = I, and that transistors M1a, M2a, M1b and M2b all start off operating in subthreshold saturation. To understand the change in Vout1 due to ∆I, we regard M1a as a resistor rds, whose resistance is given by VE /Id as mentioned, and we get,
Next, we investigate how M1b is going to help Cell 1 win over Cell 2 in light of ∆I. Let us first assume that VC1 node stays at the same potential despite ∆I. (This assumption is very close to reality especially when ∆I is small, since any decrease in VC2 is to be compensated by an equivalent amount of increase in VC1.) By employing Equation (6), we find the transconductance of M1b to be
From the above, we see that the change in Ic1 due to ∆I is indeed given by,
The parenthesized term in the foregoing equation stands for the gain factor that describes how sensitive the cell is with respect to the input difference. By assigning typical values to these parameters—κ = 0.5, VE = 50 V, and VT = 26 mV—we get a gain of about 962 or equivalently, 60 dB.
Due to such a large gain factor, a very small ∆I will be sufficient to cause a fairly significant increase in Ic1 (and this increase is to be supplied by the losing cell’s current source). For instance, given the aforementioned parameter values and that ∆I = 0.001Iin1, a ∆Ic1 as large as 0.962Ibias results according to Equation (11).
Finally, from Equation (3), we realize that the longer the channel of transistor M1a (or M2a), the larger the Early voltage and in turn the more sensitive the WTA circuit will be (in response to the input difference)—that is, the WTA circuit will demonstrate a steeper winning/losing response.
In concluding this section, we say that the circuit shown in Figure 1 is capable of selecting the cell with the highest input current, regardless of the extent to which the cell surpasses its competitors. In addition, it can be shown that the properties mentioned above apply to an n-cell (n > 2) WTA circuit .
WTA with Local Inhibitory Decoupling
Besides the circuit shown in Figure 1, a WTA circuit that contains local inhibitory decoupling (quite often the term local inhibitory coupling is used instead) was reported by . The purpose of the local inhibitory decoupling feature is to contain a winner’s influence within a prescribed spatial range—that is, in such a WTA circuit, a winning cell inhibits its neighboring cells but not cells that are sufficiently distant (or, decoupled) from it—resulting in a fork-like spatial impulse response. (In a sense, it has the shape of a Chinese character “SHAN”, which means “the mountain”). As reported by , resistors (which deter current) can be used in a WTA circuit to realize the inhibitory coupling functionality, creating buffers between VC nodes such that every local winner is allowed to take some, but not all the bias currents.
A slightly modified version of the said circuit is illustrated in Figure 2, where aforementioned resistors are substituted with NMOS transistors operating in subthreshold saturation (shown as M1c and M2c in Figure 2), for achieving higher area density and lower thermal dissipation. In this circuit, the gate potentials of M1c and M2care both controlled by the Inhibition voltage, Vinhi. By employing Equation (5), we write the expression of each drain current as follows,
As we can see, the current flowing through M1c (M2c) is exponentially dependent upon Vinhi. The higher the Vinhi voltage, the more current flows toward the winner’s VC node through the local inhibitory decoupling transistors. If Vinhi reaches the highest possible potential on chip (say, Vdd), then the drain and the source of M1c are literally shorted together, resulting in zero degree of local inhibitory decoupling (i.e., identical to the circuit shown in
The aforementioned circuit idea is originated from a novel configuration called pseudo-conductance current divider (diffusor) , which basically exploits the similarity between a subthrehold MOS transistor and a bipolar transistor: The current flowing in a subthreshold MOS transistor, like that in a bipolar transistor, can be divided into forward and reverse components; that is, Id = If —Ir. With this in mind, we revisit the decoupling transistor M1c, and obtain the following,
Assuming that the value of (VC1—VC2) is sufficiently small compared to VT, we can rewrite the foregoing expression of Id1c back to Equation (12). In addition, assuming that the current-source transistor of the winning Cell 1 is operating in subthreshold, we find that,
The larger this ratio is, the less effective the decoupling will be.
WTA with Local Excitatory Coupling
The same methodology of current diffusor networks (as mentioned) can be adopted to analyze a third type of WTA circuit, which contains both local inhibitory decoupling and local excitatory coupling. The purpose of the local excitatory coupling feature is to slightly stimulate each locally inhibited cell, thereby increasing its intrinsic output value (hence the term excitatory). The outcome is a smoother overall spatial impulse response.
Intuitively speaking, while local inhibitory decoupling creates fork-like output characteristics, local excitatory coupling introduces an output curve that looks more like a witch’s hat—close your eyes and picture one of Harry Potter’s magical teachers in the movie—which is smoother compared to the fork but still has rather steep slopes.
A modified version of this circuit is shown in Figure 3, where the local excitatory coupling transistor (e.g., M1d) is connected between nodes Vout1 and Vout2, rather than between VF1 and VF2, facilitating the use of NMOS excitatory transistors only (otherwise, Vexci would need to be set at values higher than Vdd).
As we can see in Figure 3, unlike the previous case of local inhibitory decoupling, where M1c and the two current-source transistors form a current diffusor network, here in Cell 1, transistors M1d, M1a, and M2a form a current diffusor network. Consequently, the extent to which the local winning effect ripples laterally depends very little on the bias current (Ibias); rather, it is closely related to how much current actually passes through M1a and M2a to ground. This aspect translates to a relationship between the rate of spatial smoothing (also called spatial filtering) and voltage potentials Vexci, VC1, and VC2. Revisiting Equation (13), we find the following relations with respect to M1d, M1a, and M2a,
Assuming that Cell 1 is the local winner and that VC1 = VC2 = VC, we obtain,
The rightmost term in the preceding equation is called the WTA circuit’s space constant (λ), and it is useful for describing the trend of signal loss with respect to the distance from the local winner. Now, let us add up all excitatory currents, by applying Equation (16) recursively across the entire spatial network. After utilizing the famous Euler’s formula, we arrive at the following conclusion,
From the two preceding equations, we see that the space constant λ is controlled by the value of (Vexci—Vc) exponentially, and that λ shall be smaller than unity. As mentioned, the purpose of local excitatory coupling is to prevent the neighboring inhibited cells from dying out (too fast), and thus, for a WTA circuit with excitatory coupling to work properly, Vexci has to be set at smaller than Vc. Additionally, given a fixed Vc value, the larger the Vexci voltage, the slower the rate at which the inhibited cells lose, meaning that the circuit will have a wider and smoother spatial impulse response.
This above quantitative result matches that of an intuitive observation, which is described as follows: If Vexci is so large that all excitatory transistors are effectively shorted, then all output nodes (Vouti) share the same potential, thereby creating a situation where neither a winner nor a loser exists, which eventually defeats the purpose of the WTA circuit. On the contrary, if Vexci is sufficiently decreased such that all excitatory transistors are effectively turned off, then the circuit of Figure 2 results.
Hysteretic WTA with Local Excitatory Feedback
Many CMOS integrated circuit designers are familiar with the technique of utilizing unsymmetrical decision circuit to add hysteretic properties into a CMOS voltage comparator . In a typical CMOS WTA circuit—which in essence is a reset-able multiple-input current comparator—similar hysteretic properties can be realized by means of local excitatory feedbacks[10-12]. More often than not, local excitatory feedbacks are used in combination with the aforementioned current diffusing (or distributing) mechanism [6-8] to create an “all-in-one” sort of WTA circuit topology . An example of such circuits is illustrated in Figure 4, which was originally reported by  and further discussed by .
Looking into Figure 4, we identify two aspects that are not employed in any previous configuration: Diode-source degeneration implemented by means of transistors M1S and M2S, and local excitatory feedback implemented by means of transistors M1F and M2F. The former is a fairly straightforward setup, for it is well known that connecting a diode-connected transistor to a common-source single-stage amplifier increases the amplifier’s output impedance; that is, the ratio (∆Vout/∆I) is increased due to M1S (M2S), thereby producing a narrower losing response . The latter is responsible for introducing a hysteretic behavior to the WTA circuit.
Specifically, if Cell 1 becomes the winner, then a large amount of current will flow through M1b. Due to the PMOS current mirror that includes M1F, a small portion of the large M1b current, whose exact amount (Ifb) can be adjusted by sizing the PMOS transistors, is positively fed back to the output node Vout1 (as the arrow shows). In other words, Ifb is added into the original input current Iin1. As a result, during the next selection, the condition for Cell 1 to be de-selected will be: There must be a potential winner whose input current is large than (Iin1+Ifb). That is, Ifb is the hysteretic current.
Similar to the case of hysteretic voltage comparator, hysteretic current Ifb effectively spares a current-mode WTA from making erroneous selections under the influence of noise, device mismatch, or dc offset.
In addition, thanks to the existing local excitatory coupling configurations (reference to M1d and M2d in Figure 4), the hysteretic current Ifb is distributed to the neighboring cells’ input nodes. However, knowing that to any cell an increased input current generally means a higher VC, we find based on Equation (16) and Equation (17) that in a sense, Ifb works against local excitatory coupling by distributing hysteretic protection to the neighboring cells, which helps the existing local winner maintain its winning status as it shifts from one spatial position to another. Therefore, there is a trade-off to balance by appropriately sizing the transistors (especially the PMOS current mirror and the coupling transistor). This particular property is instrumental to applications that require for constantly tracking the strongest input signal, which is non-stationary [13-14].
More WTA Circuits
Quite a few novel CMOS WTA circuits have been reported in the literature since the publication of , most of which are endowed with their respective strengths and weaknesses (in speed, resolution, power, or density).
Figure 5 illustrates a practical CMOS circuit that may function as either a regular WTA (i.e., there is only one winner) or a compromised WTA (i.e., a soft-max configuration where there can be multiple winners simultaneously), depending on the adjustable voltage VA. Specifically, when VA is smaller than all output voltages Vouti, the circuit performs soft-max computations; otherwise, the diode-connected transistors M1G and M2G are effectively turned off, and thus the circuit performs WTA computations.
Intuitively speaking, connecting a diode-connected transistor to node Vout1 is similar to paralleling M1a with a physical resistor—remember that we regarded M1a as a resistor rds when analyzing the first WTA circuit—and in effect, that changes Vout1 from a high-impedance node (as in WTA configurations) into a low-impedance node (as in soft-max configurations). Given a fixed conductance of M1b, such a decrease in nodal impedance hinders the cell with the strongest input from actually winning over the others, thereby alleviating the likelihood of an all-time single-cell dictatorship. For brevity, quantitative circuit analysis is not included here, and the reader is referred to the said paper  for more details.
Another interesting WTA topology is illustrated in Figure 6. Here, only one cell is shown for simplicity. The underlying idea is similar to that of  or ; that is, not only local excitatory feedback (through M9, M6, and M5) is adopted to add hysteresis to the circuit, but a so-called local inhibitory feedback (through transistors M3, M7, and M8) is employed here. In essence, local inhibitory feedback is a form of intro-cell local inhibitory decoupling (as opposed to the aforementioned inter-cell one) with a dependency on the cell’s output voltage. When Vreset is set to “1”, both transistors M3 and M5 are effectively turned off, thereby cancelling out the feedback current parameters (excitatory and inhibitory) from the previous selection. During the next cycle, Vreset is set to “0” till a winner is selected.
Besides current-mode WTA circuits, additional works on voltage-mode WTA design have been reported [17-19]. In particular, the authors of  explored the possibility of eliminating the circuit’s dependence upon device matching characteristic, by employing an inverter-based current comparator as the core for each WTA cell. To probe further, perhaps one should investigate the feasibility of substituting the said inverter with a CMOS Schmitt trigger, which adds hysteretic properties into the circuit and enhances the overall robustness.
More often than not, it is desirable to select and track the loser (i.e., the cell with the smallest input) rather than the winner in a neural network. By subtracting the winner from a fixed reference, we can get the loser. However, in the analog domain, subtraction and addition are two functionalities subject to circuit-implementation problems, such as loss of accuracy and reduction of input/output dynamic range.
A CMOS loser-take-all (LTA) circuit that does not require for implementing subtraction in the analog domain is shown in Figure 7. The circuit operates as follows. Assuming that Cell 1 is the potential loser, we find that Vout1 is the smallest output, and hence M1b is turned off. As a result, the PMOS transistor M1m is turned on, and the bias current supplied by the global bias cell (Ibias) flows through M1m and M1n to ground. Transistor M1n’s non-zero drain-source voltage serves as the digital output pulse, broadcasting across the network that Cell 1 has been selected as the loser. In addition, the loser’s input current Iin1 is copied to the signal path composed of transistor MP, MA, and MS, which is read out in the analog domain as the minimum input value.
In conclusion to create CMOS winner-take-all (WTA) circuits one must understand the essential properties of subthreshold MOS transistor as well as be familiar with the various CMOS WTA and loser-take-all (LTA) circuits.