|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
April 10, 1997 The product of all fears: binary multiplicationClive "Max" Maxfield, Intergraph Computer Systems You can perform multiplication in simple µPs using low-level instructions. This article is condensed from the forthcoming book, Bebop Bytes Back (Reference 1), which will be available within the next few weeks. Modern computers typically have special instructions and employ dedicated logic (either inside or outside the CPU) to perform multiplication operations. Early microcomputers didn't have this capability, so users had limited options: Either they made do without the ability to multiply numbers (which would have restricted the their ability to write useful programs to the point of absurdity, or they were obliged to come up with a technique to perform multiplication using whatever primitive in-structions were available. The shift-and-add technique
Using this algorithm, a partial product is generated for each bit in the multiplier (note that the hyphen ("-") characters in Figure 1 would actually be 0s used to pad the partial products to the necessary width). If a bit in the multiplier is 0, its corresponding partial product consists only of 0s, but if the bit in the multiplier is 1, its corresponding partial product is a copy of the multiplicand. Also, each partial product is left-shifted as a function of the multiplier bit with which the partial product is associated; for example, the partial product associated with bit 0 in the multiplier is left-shifted 0 bits; the partial product associated with bit 1 in the multiplier is left-shifted 1 bit, and so on. Once all of the partial products are generated, they are added together to generate the result. The result's width equals the sum of the widths of the multiplicand and the multiplier. Multiplying two 8-bit numbers generates a 16-bit result (Figure 1). As each of the unsigned 8-bit numbers can carry values from 0 to 255 in decimal ($00 through $FF in hexadecimal), the result ranges from 0 to 65,025 in decimal ($0000 through $FE01 in hexadecimal). You can implement this shift-and-add approach in hardware or software. If you consider a hardware implementation, you can use a dedicated logic block to simultaneously generate all of the partial products and also add them. Once again, because early microcomputer users didn't have this hardware implementation, they were obliged to use a software technique, and they had to individually generate each partial product and add this product into the result. Even so, the beauty of this scheme is that it requires only the same number of additions as there are bits in the multiplier, which means eight additions for this example.
Initially, the two bytes representing your result are loaded with 0s. Next, you look at bit 0 in the multiplier to see if the bit contains a 0 or a 1. If bit 0 of the multiplier contains a 0, you add a byte of 0s to the most significant byte of the result, but if bit 0 of the multiplier contains a 1, then you add a copy of the multiplicand into the result's most significant byte. In both cases, you're interested in the resulting state of the carry flag in the µP's status register. Next, you shift both bytes of the result 1 bit to the right. When you shift the result's most significant (left) byte, you must shift the carry flag into that byte's most significant bit and also store the value of the original least significant bit that "drops off the end." Similarly, when you shift the result's least significant (right) byte, whatever bit drops off the end from the most significant byte must shift into the most significant bit of the least significant byte. You can repeat this procedure for each of the bits in the multiplier, either adding a byte of 0s or a copy of the multiplicand into the result's most significant byte and then shifting both bytes of the result 1 bit to the right. Performing this sequence for each of the bits in the multiplier ultimately generates the same result as does the original technique based on 16-bit partial products, but this sequence requires far fewer computer instructions. Before you continue, there's still one more problem: extracting and testing the bits in the multiplier. One approach is to individually mask out the bits. For example, to determine the value in bit 0 of the multiplier, you can AND this value with $01 (or %00000001 in binary) and then use a JZ ("jump if zero") instruction to vary your actions, depending on the result. Alternatively, you can use a JNZ ("jump if not zero") if this instruction better serves your purposes. Similarly, to determine the value in bit 1 of the multiplier, you can AND this value with $02 (or %00000010 in binary); to determine the value in bit 2 of the multiplier, you can AND this value with $04 (or %00000100 in binary); and so forth. However, you will again discover that this method requires many instructions.
Thus, every time you stroll around the main loop, which involves shifting both bytes of the result 1 bit to the right, the next bit of interest in the multiplier automatically ends up in the carry flag, ready and waiting for the following iteration of the loop. Don't worry if you find the above a bit mind-boggling at first; after a while you'll find that this "wheels-within-wheels" thinking starts to come naturally. Unfortunately, the above technique can handle only unsigned numbers; if you consider your 8-bit numbers as representing signed values, then this routine sometimes works and sometimes doesn't, which many would consider to be less than satisfactory. Thus, if you wish to multiply signed 8-bit numbers with any confidence, you need to modify your technique.
The most significant bit of a signed binary number is the sign bit, and this bit is logic 0 if the value is positive and logic 1 if the value is negative (the sign bit would be bit 7 in the case of an 8-bit number). Now, consider the multiplicand, which feeds directly into a multiplexer and into a 2's complementor. The 2's complementor automatically generates the negative equivalent of whatever value is fed to the complementor's inputs. This means that, if the multiplicand is a positive value, the output from the 2's complementor is this value's negative equivalent, and vice versa. The output from the 2's complementor is also connected to the multiplexer, and the multiplexer's select input is driven from the multiplicand's sign bit. This part is clever, because if the sign bit is logic 0 (indicating a positive value), this bit instructs the multiplexer to select the multiplicand. Conversely, if the sign bit is a logic 1 (indicating a negative value), this bit instructs the multiplexer to select the output from the 2's complementor, which is the positive equivalent of the multiplicand. You can perform similar actions with the multiplier; thus, the multiplier array always receives two positive numbers, which it proceeds to multiply together. Assuming that the multiplier array is based on the shift-and-add technique, this array generates all of the partial products, adds them, and presents the output's 16-bit result. Now comes another legerdemain, in which you decide whether to negate the output from the multiplier array. If both the multiplicand and the multiplier are positive, you need not invert the output from the array, because a positive times a positive equals a positive. Similarly, you need not invert the output from the array if both the multiplicand and the multiplier are negative, because a negative times a negative equals a positive. Thus, only when the multiplicand and the multiplier have different signs should the result be negative, in which case you need to invert the output from the array. The array's output is fed into a multiplexer and 2's complementor arrangement similar to those used for the multiplicand and the multiplier (Figure 5). However, in this case, the multiplexer's control signal comes from an XOR gate, whose inputs are the sign bits from the multiplicand and the multiplier. If both sign bits have the same value, then the output from the XOR is logic 0, and the multiplexer selects the outputs from the multiplier array. Alternatively, if the sign bits have different values, then the output from the XOR is logic 1, thereby causing the multiplexer to select the outputs from the 2's complementor. For the purposes of these
discussions, you will realize a software subroutine
instead of a hardware implementation, but the same
general The above technique raises one slight problem. Signed 8-bit numbers can carry values from 128 through +127 in decimal ($80 through $7F in hexadecimal). The problem is that you cannot convert a 128 value into its positive equivalent, because your 8-bit fields cannot represent a value of +128. Thus, you must introduce a rule that says you can guarantee the results that your subroutine returns only if you present your subroutine with values in the range 127 through +127 in decimal ($81 through $7F in hexadecimal). It is the user's responsibility to ensure that this condition is met. Thus, the most negative result you can receive is 127×+127 (or +127×127)=16,129, and the most positive result is +127×+127 (or 127×127)=+16,129, so your 16-bit result ranges from 16,129 to +16,129 in decimal ($C0FF through $3F01 in hexadecimal). Next, you proceed to the assembly language (Listing 1) of the subroutine, which retrieves two 8-bit signed numbers from the stack, multiplies them, and places the 16-bit signed result on the top of the stack. (Note that this subroutine assumes that any 16-bit numbers are stored with the most significant byte "on top" of the least significant byte.) The way the flag, _AD_FLAG, works is as follows: In the _AD_TSTA part of the routine, this flag is loaded with all 8 bits of the multiplicand. Later, in the _AD_TSTB part of the routine, the flag, which now contains a copy of the multiplicand, is XORed with all 8 bits of the multiplier. In reality, you couldn't care less about the least significant 7 bits of the flag, because you're interested only in the most significant bit (the sign bit). However, you don't have to worry about the other bits, because when you come to actually use the flag at the beginning of the _AD_TSTC part of the routine, you can use a JNN ("jump if not negative") instruction, which considers only the state of the sign bit. Note that this assembly language corresponds to no particular mP because the language is one that the author designed for the virtual mP implemented in software that accompanies the forthcoming book, Bebop Bytes Back (Reference 1).
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| EDN Access | Feedback | Table of Contents | |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Copyright © 1997 EDN Magazine, EDN Access. EDN is a registered trademark of Reed Properties Inc, used under license. EDN is published by Cahners Publishing Company, a unit of Reed Elsevier Inc. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||