# EDN Access--01.02.97 A Minus B=A+NOT(B)+1 (Part 2)

Design FeaturesJanuary 2, 1997 |

## A Minus B=A+NOT(B)+1 (Part 2)

### Clive "Max" Maxfield, Intergraph Computer Systems

*In Part 2, we discover the grizzly reasons why an ALU contains only a 1's complementor, instead of the 2's complementor you expect to find there.*

The first half of this article ("A minus B=A+NOT(B)+1 (Part 1),"*EDN*, Dec 5, 1996), introduced the concept of signed binary numbers, in which the MSB represents a negative quantity, as opposed to simply representing a plus or minus sign. The advantage of the signed binary format for addition operations is that you can always directly add these numbers together to provide the correct result in one operation, irrespective of whether they represent positive or negative values. That is, you perform the operations a+b, a+(-b), (-a)+b, and (-a)+(-b) in exactly the same way, by simply adding the two values together. This method results in adders that you can construct using a minimum number of logic gates and that are fast, lean, and mean.

For subtraction operations, Part 1 also noted that you can perform an equation in decimal arithmetic, such as 10-3=7, by negating the right-hand value and inverting the operation; that is, 10+(-3)=7. This technique also works with signed binary arithmetic, although you perform the negation of the right-hand value by taking its 2's complement rather than by simply changing its sign. For example, in the case of a generic signed binary subtraction represented by a-b, generating the 2's complement of b results in -b, allowing you to perform the operation as an addition: a+(-b). Thus, you perform the operations a-b, a-(-b), (-a)-b, and (-a)-(-b) in exactly the same way, by taking the 2's complement of b and adding the result to a, irrespective of whether a or b represents a positive or negative value. This approach means that computers do not require two different blocks of logic (one to add numbers and another to subtract them); instead, they require only an adder and some way to generate the 2's complement of a number, which tends to make life a lot easier.

However, as Part 1 further noted, an examination of a computer's ALU reveals that there isn't a 2's complementor in sight, but, instead, a 1's complementor glares balefully at you from the ALU's nest of logic gates. So, our mission here (should we decide to accept it) is to investigate the way in which an ALU performs 2's complement arithmetic without the benefit of a 2's complementor logic block.

#### The ALU

The heart (or, perhaps, the guts) of the CPU is the ALU, where all of the number crunching and data manipulation take place. For these purposes, we'll assume you're using a computer whose data bus is 8 bits wide, and whose ALU therefore works with 8-bit chunks of data (**Figure 1**

## ).

The ALU accepts two 8-bit words (A[7:0] and B[7:0]) as input, "scrunches" them together using some arithmetic or logical operation, and outputs an 8-bit result known as F[7:0]. Whatever operation you perform on the data is dictated by the pattern of logic 0s and 1s fed into the ALU's instruction inputs. For example, one pattern may instruct the ALU to add A[7:0] and B[7:0] together, and another pattern may request the ALU to logically AND each bit of A[7:0] with the corresponding bit in B[7:0].

Note that the ALU is completely asynchronous, which means that it is not controlled directly by the main system's clock. As soon as you present any changes to the ALU's data, instruction, or carry-in inputs, these changes immediately start to ripple through the ALU's logic gates and eventually appear at the data and status outputs.

#### The "core" ALU

The number of instruction bits required to drive the ALU depends on the number of functions you require it to perform: You can use two bits to represent four different functions, three bits can represent eight functions, and so forth. Consider the ALU as having layers like an onion, and you can visualize the core of an extremely rudimentary ALU as performing only five simple functions (Table 1).

The instruction-bit patterns you assign to these functions are not important at this time; suffice it to say that the five functions in the table only require three instruction bits. But, implementing a core ALU to perform these tasks is really not very complex. First, consider how you can implement the AND function, which requires only eight two-input AND gates (**Figure 2**

## ).

Similarly, the OR function requires only eight two-input OR gates, and the XOR function requires eight two-input XOR gates. Things are a little more complex for the ADD function, but not unduly so. For example, a basic 8-bit ripple-through adder requires only a total of 16 AND gates and 24 XOR gates, plus an additional XOR gate to generate the overflow output (the overflow output is generated by XORing the carry-in and carry-out associated with the MSB of the result). Similarly, the compare (CMP) is a little more complex than the primitive logical functions, but nothing that a few handfuls of cunningly connected gates can't handle. Thus, it doesn't take long to generate the five core functions as individual entities (**Figure 3**

## ).

Note that you can directly connect the overflow output from the adder (Oadd) to the main overflow output (O) coming out of the core ALU. However, you cannot directly drive the carry-in input (CIadd) to the ADDer function from the main CI input feeding the core ALU. Similarly, the carry-out from the adder output (COadd) does not drive the main carry-out output (CO) from the ALU, because we have other plans for these signals.

For these purposes, consider the inputs to the CMP function to be signed binary numbers. Also, the AgtB output is driven to logic 1 if A[7:0] is greater than B[7:0], while the AeqB output is driven to logic 1 if A[7:0] is equal to B[7:0].

At this stage in the proceedings, you can implement the core-ALU functions in isolation (although, admittedly, I've skimped on some of the nitty-gritty details). The next point to ponder is the means by which to "glue" them all together to form the ALU itself; one possible approach is to hurl a multiplexer into the cauldron of logic gates, stir things up a little, and see what develops (**Figure 4**

## ).

In this scenario, you use two of the instruction bits (which can represent four patterns of 0s and 1s) to control a 4:1 multiplexer, in which each of the input channels feeding the multiplexer is 8 bits wide. The A[7:0] and B[7:0] signals are presented to all of the functions, but only the outputs from the function of interest are selected. The reason you need only a 4:1 multiplexer is that the fifth function, the CMP, only outputs status information, but doesn't generate any data as such.

The advantage of this multiplexer-based approach is that it's very easy to understand. However, it's unlikely that you'll use it in a real-world implementation because you only want to be able to perform one function at any particular time (at least in the case of this rudimentary unit), so you should examine the functions to find areas of commonality that allow you to share gates between them. To put this another way, instead of having multiple distinct functions feeding a multiplexer, it's likely that you would lose the multiplexer and "scrunch" all of the functions together into one "super function," thereby allowing you to reduce the ALU's total gate count and increase its speed. On the other hand, there's nothing intrinsically wrong with this multiplexer-based technique, so it's what we'll play with here.

Now, you know how to implement the data-processing portion of the core ALU, but you also must decide how to use the AgtB, AeqB, Oadd, and Coadd signals, and also how to generate the CO, O, N, and Z status outputs (Figure 5## ).

The "negative" (N) status output is the easiest, because it's simply a copy of the MSB of the data outputs (that is, F[7]). Things get a little more complicated when you come to the "zero" (Z) output, because it depends on the type of operation that the ALU is performing. In the case of the AND, OR, XOR, and ADD functions, the zero output is set to logic 1 if the result from the operation is all 0s. You can create an internal signal called "Zint" to implement this approach by simply feeding all of the F[7:0] data outputs into an 8-bit NAND gate. However, in the case of the CMP function, you set the Z output to logic 1 if the two data values A[7:0] and B[7:0] are equal (this approach is represented by the AeqB signal coming out of the CMP block).

The bottom line is that you have a single output, Z, which should reflect the state of one of two signals, Zint and AeqB, depending on the function being performed. You can achieve this goal by feeding Zint and AeqB into a 2:1 multiplexer, whose select input could be controlled by a third instruction bit driving the core ALU. Similarly, you usually want the CO status output to reflect the carry-out from the ADD function on its Coadd signal; but if you are performing a CMP instruction, you want the CO signal set to logic 1 if the unsigned binary value on A[7:0] is greater than on B[7:0]. Once again, you can achieve this goal by feeding both the COadd and AgtB signals into a 2:1 multiplexer controlled by the third instruction bit.

Table 1—Five functions of a rudimentary ALU core | ||

Function | Outputs F[7:0] equal | Flags modified |

Logical AND | A[7:0] & B[7:0] | N, Z |

Logical OR | A[7:0] | B[7:0] | N, Z |

Logical XOR | A[7:0] ^ B[7:0] | N, Z |

Addition (ADD) | A[7:0] + B[7:0] + CI | CO, O, N, Z |

Compare (CMP) | A[7:0] U B[7:0] | CO, Z |

#### Extending the core ALU

Thus far, you have a core ALU that can perform five simple functions, but the CPU requires more. For example, the core ALU has an ADD function that can add two 8-bit signed binary numbers together (along with the carry-in status input), but the CPU needs to be able to perform both additions and subtractions in the form of the instructions introduced in Part 1 of this article: "add without carry" (ADD), "add with carry" (ADDC), "subtract without carry/borrow" (SUB), and "subtract with carry/borrow" (SUBC). This requirement means that you need to extend the core ALU in strange and wondrous ways, such that you can control the value being fed into the B[7:0] inputs by means of a complementor block (**Figure 6**

## ).

Before delving into this new complementor block, it may be best to determine exactly what you want it to do. In the case of instructions such as AND, OR, XOR, ADD, and ADDC, you want the new block to pass whatever value is on the BB[7:0] inputs directly through to its outputs without any modification. However, in the case of SUB and SUBC instructions, the new block negates (generates the 1's complement of) the value on the BB[7:0] inputs before passing it on to the core ALU (**Figure 7**

## ).

Thus, the complementor block contains only two functions: a 2:1 multiplexer and a negator, where the negator simply comprises eight NOT gates-one for each signal in the data path. If the pattern on the instruction bits represents an operation such as AND, OR, XOR, ADD, or ADDC, you should decode them in such a way that they cause the 2:1 multiplexer in the complementor block to select the value on BB[7:0]. By comparison, a SUB or SUBC causes the multiplexer to select the outputs from the negator, whose value is the inverse of that found on BB[7:0].

One question that is probably on your lips is: "Why use a negator (which generates the 1's complement of the value on BB[7:0]) instead of a 2's complementor?" After all, Part 1 devoted a lot of effort to singing the praises of 2's complement vs 1's complement representations. The answer to this question will make you begin to appreciate the wily ways of the CPU designer, but first let's review:

You can perform a subtraction, such as a-b, by converting b into its negative equivalent and then performing an addition; that is, another way to represent a-b is to say a+(-b).

You can convert b to -b by generating its 2's complement value.

You already have an ADD function in the core ALU.

One way to perform a simple SUB operation is to force the CIadd input to the ADD function in the core ALU to a logic 0 value, and to feed the BB[7:0] inputs through a 2's complementor function that you can create in the new block if you wish. However, generating the 2's complement of a binary value requires you to invert all of the bits and then add 1 to the result (**Figure 8**

## ).

Remember that you're not actually going to use a 2's complementor; this portion of the discussion is simply intended to "set the scene." The process of inverting the bits using the negator is easy (you can simply feed each bit through a NOT gate), but adding 1 to the result requires you to build a second 8-bit adder, which requires a substantial amount of additional logic gates. Given a choice, it is preferable not to have two adder blocks in our ALU, but what can you do?

In fact, the answer is fairly obvious. Consider the methods for generating a 2's complement (**Figure 8**). The first method requires you to force the new adder's CI input to a logic 0 and to connect its second set of inputs to a hard-wired $01 value (where "$" indicates a hexadecimal value) (**Figure 8a**). Alternatively, you can achieve the same effect by connecting the second set of inputs to a hard-wired $00 value and forcing the CI input to our new adder to a logic 1 (**Figure 8b**).

Hmmm, just a cotton-pickin' minute. We noted earlier that, if you decided to use a 2's complementor for a simple SUB operation, you have to force the CIadd carry-in signal to the adder in the core ALU to logic 0, but this statement seems redundant. Let's combine the 2's complementor in **Figure 8b**with the main adder in the core ALU (note the omission of the multiplexer in the complementor block to simplify the issue) (**Figure 9**

## ).

Does anything leap out at you from this figure? The CI input to the adder in the 2's complementor is being forced to logic 1, and the CIadd input to the adder in the core ALU is being forced to a logic 0. It doesn't take long to realize that you can achieve exactly the same effect by forcing the CI input to the adder in the 2's complementor to a logic 0 and forcing the CIadd input to the adder in the core ALU to a logic 1.

This is the clever part: If you force the CI input to the adder in the 2's complementor to a logic 0, this adder isn't actually doing anything at all. That is, you end up with a block that adds one set of inputs (the outputs from the negator) to a value of zero with a carry-in of zero. Because any value plus zero equals itself, why do you need this adder? You don't! If you force the CIadd input to the adder in the core ALU to a logic 1, you can simply lose the adder from the 2's complementor. Hence, the fact that **Figure 7**shows the complementor block as only containing a negator and a multiplexer. In summary,

To perform a SUB function (without a carry-in), you need to perform the operation a[7:0]-b[7:0];

This operation is equivalent to a[7:0]+(-b[7:0]);

You can generate (-b[7:0]) by taking the 2's complement of b[7:0] using (NOT(b[7:0])+1), which allows you to perform the operation a[7:0]+(NOT(b[7:0])+1);

But, you don't want to use a 2's complementor, because using it requires too many gates. Instead, you want to use only a negator (1's complementor). The 1's complement of b[7:0] is NOT(b[7:0]), which results in the operation a[7:0]+(NOT(b[7:0]));

You know that the 2's complement of a number equals its 1's complement +1, so you can also say that the 1's complement of a number equals the 2's complement -1. This approach means that a[7:0]+(NOT(b[7:0])) is equivalent to a[7:0]+(-b[7:0]-1);

So, forcing the CI input to the adder in the core ALU to logic 1 means that the operation you actually perform is a[7:0]+(-b[7:0]-1)+1.

If you cancel out the -1 and +1 in step 6, you're left with an identical expression to that shown in step 2, which is what you wanted in the first place. This explains why you only require a negator (1's complementor) in the complementor block, because forcing a logic 1 onto the carry-in input to the adder in the core ALU allows you to take the 1's complement output from the negator and convert it into a full 2's complement value. So, now you know why an ALU only contains a 1's complement block instead of a 2's complementor. And, it's as simple as that…phew!

Author's biography

Clive "Max" Maxfield is a member of the technical staff at Intergraph Computer Systems (Huntsville, AL), (800) 763-0242, where he gets to play with the company's high-performance graphics workstations. In addition to numerous technical articles and papers, Maxfield is also the author of Bebop to the Boolean Boogie: An Unconventional Guide to Electronics (ISBN 1-878707-22-1). To order, phone (800) 247-6553. You can reach Maxfield via e-mail at crmaxfie@ingr.com.

| EDN Access | feedback | subscribe to EDN! |

Understanding SSD over-provisioning

Equations and Impacts of Setup and Hold Time

ARM flexes from servers to wearables

Embedded Systems Architecture, Device Drivers - Part 1: Interrupt Handling

Single-cycle logarithms & antilogs

Automotive System & Software Development Challenges – Part 1

How FPGAs and multicore CPUs are changing embedded design

Memory Hierarchy Design - Part 1. Basics of Memory Hierarchies

The future of computers - Part 1: Multicore and the Memory Wall