EDN logo


Design Feature: September 12, 1996

16-BIT CHIPS


Hitachi H8

Download Product Data
Sheet (MS Excel)
16-bit chips

The register-based H8 includes four series: the H8/300L and H8/300 (8-bit µCs with 16-bit instruction words and 16-bit ALU) and the H8/300H and the H8S (16-bit µCs with 32-bit ALUs). Each series is upward-compatible. The H8 has eight general-purpose registers supplemented by PC and PSW registers. The H8S adds a MAC and extended-control (EXR) registers. These registers are not part of a register-banking or third-addressing-space scheme.

The 8-bit 300L and 300 chips treat registers as either 8 or 16 bits, referencing registers as a set of eight 16-bit registers or 16 8-bit registers. The 300H and H8S registers are accessible as 8, 16, or 32 bits. The 8-bit or 16-bit-wide external data path is dynamically resizable.

H8 devices have a fixed instruction word, with a supplemental word for additional data, and a RISC-like load/store architecture. All CPUs have a single, unified address space. The 300L can access only on-chip memory. The address space includes a 128-byte register file to access on-chip peripherals as memory-mapped I/O.

Power management: In sleep mode, CPU operation halts, register and RAM contents remain unchanged, and peripherals continue to function. In standby, CPU and peripheral operations halt, and registers and RAM contents remain unchanged. H8S devices can individually control the operation of each of its peripherals.

Special instructions: H8 devices are code-compatible and all share a common instruction base with 55 to 69 instructions, mnemonics, and basic addressing philosophy. Bit-manipulation instructions include set, clear, test, and various logic operations. Math functions include add, subtract, increment, decrement, decimal adjust, multiply, divide, and extend sign; H8S adds a MAC instruction. H8 devices also perform block moves.

Intel 8086/186

The register-based architecture of the 8086 has approximately 120 instructions and 14 16-bit registers, organized into four general-purpose, four pointer, four segment, and two special registers. The CPU addresses each general-purpose register as a 16-bit register or two 8-bit registers. The segment registers point to code, stack, and two local data segments.

The core architecture breaks into two sets: The processor- execution and the bus-interface unit, which asynchronously communicates to the outside world via an 8- or a 16-bit multiplexed system bus. The processor-execution unit uses a 6-byte instruction prefetch queue to hold pending instructions, which the bus-interface unit fetches.

All memory addressing is base-relative, which helps embedded code, because embedded code is easily relocatable. (You change the address base to relocate.) Address segmentation lets the CPU address up to 1 Mbyte of memory. A 16-bit offset (supporting a 64-kbyte segment) is added to the segment base address (segment register shifts 4 bits left) to attain a 20-bit address. The CPU bus supports multiprocessing. The local-bus controller deploys a HOLD/HLDA protocol that enables another bus master, typically DMA, to take over the common system bus.

Power management: The 186 has two power-saving modes: idle and power-down (Intel versions only). Idle shuts off the CPU clock, leaving all integrated peripherals active. Power-down disables the clock input. In addition, you can programmably divide the internal processor frequency (by a factor up to 256) and slow all internal logic. Vadem's VG230 and VG330 have four power-management states: on, doze, suspend, and off. SmartClock, a power-management feature of Vadem's chips, operates in the doze state. SmartClock dynamically speeds the CPU clock in response to events such as keyboard activity or slows when there is no activity.

Special instructions: Math instructions include signed and unsigned multiply and divide, add, subtract, BCD, and decimal adjust. The 80x86 performs a register exchange, repeat prefix for repeating string operations (execute until zero or equal). Wait examines the Test pin and suspends instruction execution if pin is High.

Second sources: For the 8086: Fujitsu, Temic, Siemens, and Oki. For the 80186: AMD and Siemens. Chips and Technologies, NEC, Sharp, and Vadem make code-compatible µPs and µCs. For the 80286: Harris Semiconductor.

Intel MCS 96/196

256 RAM-based registers form the basis for the MCS-96; most of the registers can function as a result accumulator. The first 23 of these registers are special-function registers (SFRs) used to control the on-chip peripherals. Some family members have on-chip RAM that can hold small, critical dynamic code or data and can implement register windowing. Register windowing can substitute a block in RAM for a block of registers. Accesses to a register in the window block are mapped to the windowed block in RAM. This technique makes it easy to perform fast context switches by shifting the register window to another block. Block sizes can be programmed for 32, 64, or 128 bytes.

The MCS-196 has approximately 220 instructions, which can consist of one, two, or three operands. Some instructions are more than one word. Register windowing helps minimize instruction size by letting 8 bits address a register in a movable window.

The address space of the MCS-96 works with both 8- and 16-bit external data buses. The external bus multiplexes data and address lines, so a buffer must hold the address stable during data transfers. However, the 8xC196NP has a demultiplexed external bus. An on-chip memory controller lets the MCS-96 use a range of memory types and speeds. External memory wait states are programmable.

The CPU can use autoprogramming to program the internal EPROM via an 8-bit external data interface. All MCS-96 chips (except the Mx) have a full-duplex serial port, which the 196Kx uses to program the µC.

The MCS-196 has an event-processor array that contains two 16-bit timers and 10 capture/compare modules. An event interrupt generates edges, starts A/D conversions, and resets timers. A high-speed I/O structure has up to four input and six output timer/counter-driven lines. A peripheral-transaction server is a microcoded hardware-interrupt handler for responding to such functions as data transfers and starting an ADC.

Power management: Idle mode shuts off the CPU clock, leaving all integrated peripherals active. Power-down mode disables the clock input.

Special instructions: Math instructions include add, subtract, multiply, divide, and multiply and accumulate (MAC). Special instructions include a block move of data, indirect-autoincrement addressing, and a table-indexed jump, which lets you jump via a table value.

Second source: IBM Microelectronics.

Mitsubishi 37700 Family

With 109 instructions, the 37700 family builds on the basic 38000 instruction set and architecture. The accumulator-based CPU has two 16-bit accumulators, two 16-bit index registers, and a 16-bit program counter and stack pointer (SP). The ALU has two bus rails that feed directly into it. Between the rails lie the accumulators, index registers, program counter, SP, program- and data-bank registers, and a 24-bit address incrementer. Almost all operations pass through the accumulator. The main registers can function as 8- or 16-bit registers.

The 37700 is semipipelined. The CPU fetches the next instruction while executing the current one. A 3-byte prefetch queue holds the next instruction. More than 90% of the instructions execute in less than 1 µsec at 25 MHz.

The 16-Mbyte address space divides into 256 64-kbyte banks. The high-order bits of a 24-bit address reference the bank; this field is supplied by an 8-bit program- or data-bank register. Bank 0 holds the special-function registers, internal RAM, and internal ROM. In single-chip mode, executing from on-chip ROM and RAM, the CPU has only one 64-kbyte bank. For debugging, the chip can run in µP mode, in which it executes from off-chip program memory.

The 37700 has a 256-byte "direct page" for time-critical routines. This page can lie in the first 64-kbyte memory bank or between the first and second banks. The 16-bit direct-page register points to the base (lower) address of the direct page. Accessing the direct page using the direct-page register is faster and takes only 2 bytes.

The external-memory bus can be multiplexed or demultiplexed. For a 16-bit address, the bus is not multiplexed; it uses 16-bit addresses and 8-bit data. The CPU can access 16-bit data from odd or even bytes, but performance degrades when using an odd byte.

Power management: During wait, oscillation continues, the internal clock stops, and the integrated peripherals are active. In stop, oscillation stops and peripherals are disabled.

Special instructions: The 37700's bit-manipulation instructions include bit set, clear, and test for certain flag bits. Math instructions include unsigned multiply and divide, add, subtract, and decimal adjust. The 37700 performs register A and B exchange and a forced execution breakpoint.

Mitsubishi M16

The Mitsubishi M16 architecture has many of the features of a 32-bit µP. The programming model includes 16 32-bit general-purpose registers (includes the stack pointer ), six 32-bit control registers, a 32-bit ALU, a 32-bit internal bus, and a 32-bit PC. The basic instruction width of the CPU's 96 instructions is one of the few architectural features that is actually 16 bits. The core of the µP comprises a four-stage pipeline: fetch, two decode units, and the execution unit. The execution unit also contains a 32-bit barrel shifter.

The stack can switch between a user stack and an interrupt stack. A frame pointer indicates the end of a stack region a specific subroutine uses. This register is used in conjunction with enter and exit instructions necessary to set up a particular stack frame.

The external bus is 16 bits wide and nonmultiplexed. The CPU can access 16-bit data from odd or even bytes. A 32-bit data access to external memory is automatically divided into two 16-bit accesses. The CPU or DMAC can be a bus master; arbitration is accomplished without adding extra bus cycles. On-chip peripherals are accessed as memory-mapped I/O.

Power management: In sleep mode, the internal clock deactivates, peripherals remain active, internal CPU state is maintained, and the built-in DRAM controller continues to perform refresh cycles; recovery is via NMI or reset. Stop mode halts the internal clock and source oscillation; recovery requires a reset.

Special instructions: Bit-manipulation instructions include bit set, clear, invert, search bit, extract, insert, test, and compare bit field (signed and unsigned). The M16 performs math instructions, such as add, subtract, signed and unsigned multiply and divide, and negate. Special queue-related instructions manipulate a queue consisting of double-linked linear lists.

Motorola 68HC12

Although the HC12 is a true 16-bit architecture, it has the exact same register set and interrupt stacking order as the HC11. In addition, the HC12's 208 instructions are a superset of those in the HC11, making the HC12 upward-compatible with the HC11. To run your HC11 code on the HC12, you need only reassemble your code and account for changes to timing loops (due to a clock speed increase to 8 MHz) and shorter instruction-cycle times.

Similar to the 68HC16 and 68300 families, the HC12 is based on a modular design methodology. Motorola's designers used the Lite Module Bus (similar to the Intermodule Bus) to connect the core to peripheral modules. The HC12's core contains a module that includes either a multiplexed or nonmultiplexed external bus, runtime monitors, and a background-debug-mode (BDM) feature. The runtime monitors include a watchdog timer, a clock monitor that uses an RC time constant to monitor the speed of the crystal, and a periodic interrupt timer. The BDM is a single-wire implementation (vs four wires on the HC16 and 68300). BDM offers code patching and two hardware breakpoints (not on all HC12 derivatives). BDM performs nonintrusive reads and writes to memory while the CPU runs at full speed: BDM accesses on-chip memory during CPU dead cycles. You can use BDM to program the on-chip flash or EEPROM or for programming the address comparators to set hardware breakpoints. BDM lets debuggers do source-level debugging and monitor variables without intruding on the user's software.

Power management: The HC12 uses a PLL to hit 8 MHz and help with the CPU's power management. Current implementations of the core operate off voltages ranging from 2.7 to 5.5V, with a path to 1.8V. The CPU12 has wait and stop power-saving modes. The HC12 also has other power-saving features built into the core: Each module has controls to save power when idle, low-noise drivers are available on each I/O pin, and external bus actions are halted when doing internal accessing.

Special instructions: The HC12 supports several indexed addressing modes, most important of which is stack-pointer referencing to handle stack-based parameters. Autoincrement and autodecrement indexed addressing is useful for loop counters in C. You can use the HC12's load-effective address instruction in C programs to allocate and deallocate stack space. For case statements, there are indexed indirect- addressing modes that allow you to put a computed GOTO right in line.

A new division instruction on the HC12 allows you to divide a 16-bit number by a 16-bit number instead of having to use a sign-extended 32-bit number. Furthermore, the HC12 performs this divide in 12 cycles, compared with 41 cycles for the HC11. A 16×16-bit multiply executes in 375 nsec. Minimum/maximum functions compare two values and store the result in either the accumulator or memory. For example, for a minimum function, the µC stores the smaller of the two values. Similar to the 68300 family, the HC12 performs table-look-up and interpolate functions. These functions are useful for operations such as compressing table data. The HC12 also includes four instructions to assist with fuzzy logic.

Motorola 68HC16

With 261 instructions, Motorola's 68HC16 µC is a superset of the 8-bit 68HC11 and is source-code-compatible with the 68HC11. The 68HC16 is an accumulator-based architecture with processing centered around two 16-bit accumulators. Three 16-bit index registers work in conjunction with the accumulators. These index registers have 4-bit extensions for creating 20-bit addresses. Similarly, the stack pointer and program counter, both 16-bit registers, have 4-bit extensions, providing 20-bit address capability.

An integrated MAC unit comprises a 16-bit multiplicand register, a 16-bit multiplier register, a 36-bit accumulator, and two 8-bit address-mask registers. The MAC unit performs a MAC cycle in 480 nsec (at 25 MHz). The MAC unit uses a simplified form of modulo addressing to implement finite-impulse filters and circular buffers.

The 68HC16 has a modular architecture built on the internal Intermodule Bus (IMB), which simplifies the addition of on-chip peripherals. Bus protocols are based on the 68020 bus. The IMB contains circuitry to support exception processing, address-space partitioning, multiple interrupt levels, and vectored interrupts. The 68HC16 has a system-integration module that supports an external memory interface with 20 address bits, 16 data bits, and up to 12 programmable-chip selects. The module includes watchdog and periodic timers and a PLL that boosts a 32.76-kHz or 4.2-MHz crystal to a 16.78-MHz internal clock. On-chip peripherals are memory-mapped and accessed through dedicated peripheral registers.

The HC16 includes the company's background-debug- mode (BDM) in-circuit debugging feature. BDM allows read and write access of the target system's registers and memory and offers a set of debug commands. You can use BDM to program the on-chip flash, RAM, or EEPROM. BDM lets debuggers do source-level debugging and monitor variables without intruding on the user's software.

Program and data can share a common address or use two separate spaces. Each space divides into 16 64-kbyte banks. The 68HC16's addressing space expands to 1 or 2 Mbytes for separate code and data spaces for larger applications. Instruction boundaries are on even boundaries and use little-endian addressing. The CPU accesses words on word or byte boundaries.

Power management: Wait reduces current by stopping CPU execution while leaving the clock running. LPSTOP stops the clock.

Special instructions: The 68HC16 performs bit manipulation via instructions such as bit set, clear, and test. It also supports a variety of math instructions, such as add, subtract, BCD, DAA, and signed and unsigned multiply and divide. A background operating mode uses special debugging instructions.

NEC 78K Series

The NEC 78K series comprises four µC lines: the K0, K2, K3, and K4, all of which derive from a common architecture. The K2 is an 8-bit chip with 65 instructions and a full set of peripherals. The K0 line, with 227 instructions, is built on a K2 base and brings peripherals that are usually not found on 8-bit µCs; the line suits intensive I/O processing. The 16-bit K3 has 96 instructions and more complex peripherals. The K4 provides a 16-bit upgrade path for the K2; it has a code superset and plug-in compatibility.

The vendor's K0, K2, and K3 µCs have a 16-bit program counter and stack pointer; the K4 has a 20-bit PC and a 24-bit stack pointer (SP). The program-status words (PSWs) in the K0/K2 and K3/K4 µCs are 8 and 16 bits, respectively. K0 and K2 chips operate around four banks of eight 8-bit registers; K3 and K4 have a base of eight banks of 16 8-bit registers. These registers can be paired to function as 16-bit registers. Additionally, the K4 can combine any four of the 16-bit registers (actually, four pairs of 8-bit registers) with 8-bit extension registers and use the combination registers for 24-bit address specification.

The register banks are in on-chip RAM along with directly accessible RAM. The CPU symbolically addresses the registers as the current register bank or as memory. On-chip peripherals are memory-mapped and are accessed either by main-memory addressing or by special-function-register addressing. All families separate RAM into fast RAM inside the execution unit and separate data RAM. The fast RAM includes 128-byte register and 128-byte data RAM. For context switching with the K3/K4 families, you can specify an alternate register as part of the interrupt vector itself. However, with the K0/K2 chips, the context switch is accomplished via a bank-select instruction executed before branching to the interrupt-service routine referencing the new bank.

K3/K4 µCs contain a 3-byte instruction-prefetch queue. The bus-control unit can fetch an instruction byte from memory during cycles in which the execution unit is not using the memory bus.

Power management: Halt mode discontinues CPU operation while all peripherals continue to operate. In stop mode, only the subsystem clock (if used) and interrupts operate. In addition, the K4 has an idle mode, in which the oscillation circuitry continues to operate but the entire system stops. All 78K devices have a programmable clock divider to conserve power during less performance-demanding operations.

Special instructions: Bit-manipulation instructions are bit set, clear, complement, test, and various logic operations. Math instructions include add, subtract, multiply, divide, and decimal adjust. K0/K2 CPUs handle 16-bit operations by pairing adjacent registers in banks. Their 16-bit arithmetic operations include ADDW, SUBW, INCW, DECW, and SHR/LW. K3/K4 µCs can perform a 16×16 multiply and a 32×16 divide, as well as MAC and multiply-and-subtract instructions. Hardware implements a 32-word branch-destination address table that adds a level of indirection to branches and subroutine calls. This option is useful when making frequent calls to specific subroutines, because the special call instruction (CALLT) is a 1-byte instruction vs a standard 3-byte call to a direct address.

Philips 80C51XA

The 16-bit architecture of Philips' 80C51XA provides source-code compatibility with the 8051: Instructions can be translated one for one and do not require multiple-instruction constructs. The XA has approximately 300 instructions. Operation centers around a 20-word register file that is a superset of the 8051 registers. Software can access these registers as words, bytes, or individual bits. Some instructions (32-bit shifts, multiplies, and divides) allow addressing pairs of word registers as double words. The bottom four word registers of the register file are banked (four banks) and simplify context switching for multitasking. The XA architecture has up to 32 vectored interrupts and exception-handling circuitry for fault-tolerant systems.

A 24-bit program counter provides access to up to 16 Mbytes of linear, unsegmented code space. A 7-byte prefetch queue holds pending instructions. Two segment registers provide the upper 8 address bits for accessing up to 16 Mbytes of data memory. Memory-mapped special-function registers (SFRs) control and monitor on-chip peripherals. Processor stacks include one stack for supervisor code and another for applications; the stacks, up to 64 kbytes in size, can reside in on- or off-chip memory.

The external data bus is configurable for 8- or 16-bit accesses and includes a programmable-wait-state generator. Depending on the amount of code space required for your application, some of the upper address lines can be configured as I/O ports.

Power management: Software-controlled idle mode shuts down processor functions but leaves most of the on-chip peripherals and external interrupts functioning; power-down mode shuts down everything, including the on-chip oscillator.

Special instructions: The 80C51XA performs extensive bit manipulation via instructions such as jump on bit set or clear, set, clear, move, AND, and OR. Math instructions include add, subtract, 16×16 multiply and 32×16 divide (signed and unsigned), and 32-bit shifts. The XA also has instructions to normalize and sign extend operands for floating-point support, instructions to move data blocks, jump double indirect, breakpoint and trap, and reset.

Siemens SABC16x

Operations within Siemens 166/165/167 µCs center around 16-bit registers in up to 16 banks, as well as around a 16-bit program counter and a 16-bit program-status word. The banks are dual-ported RAM, which lets the CPU read a register for the next operation while writing back the results of the current operation to another register. On-chip peripherals work independently of the CPU with a separate clock generator. The CPU and peripherals interchange data and control information via special-function registers.

The main core of the CPU comprises a four-stage pipeline: fetch, decode, execute, write-back; a one-cycle barrel shifter; and a fast multiply/divide function unit. Pipeline stages clock in 100-nsec cycles. Most of the µC's 240 instructions, therefore, appear to execute in a single cycle. Instruction latency is four cycles, or 400 nsec. A peripheral event controller (PEC) performs byte or word transfers in one cycle between peripherals and memory without interrupting the CPU.

The CPU uses code segmentation and data paging to address up to 256 kbytes (the 166) or up to 16 Mbytes (165/167) of the unified-instruction-data-memory space. The external-memory bus controller has four programmable modes, chip selects, and a wait-state generator. You can partition physical memory into multiple segments and five address ranges (166 has only two), each segment having a different type of memory with or without wait states. A hold/acknowledge mechanism on the external bus can be programmed so external devices take control for critical data transfers. A system stack of up to 512 bytes stores temporary data.

Instructions are 2 or 4 bytes long. The µCs can handle a 4-byte instruction fetch from on-chip ROM in one 100-nsec stage. A single fetch gets an entire instruction. However, because the 16-bit external bus permits only a single-word access, off-chip program accesses suffer at least a one-cycle stall for a 4-byte instruction.

The 166/165/167 µCs cache branch-target instructions and use them to supply the next iteration of a branch, allowing execution without pipeline stalls. First-pass loop branches pay a single-cycle penalty. Nonaligned, double-word, branch-target instructions also pay a one-cycle penalty.

Power management: Idle mode shuts off the CPU clock, leaving all integrated peripherals active. Power-down mode disables the clock input. Any reset or interrupt request can terminate idle mode; only a hardware reset can terminate power-down mode.

Special instructions: Bit-manipulation instructions include bit set, clear, move, and various logical operations. Math instructions are add, subtract, 16×16 multiply and divide, and 32×16 divide. The µCs can perform up to 15 shifts or rotates in one instruction cycle. Every jump has 16 conditions.

Second source: SGS-Thomson.

Toshiba TLCS-900

The TLCS-900 architecture centers around a flexible register set. You can configure the set for 8-, 16-, or 32-bit processing by using a 16-bit ALU and datapaths. The TLCS-900's general-purpose register set is designed for fast context switching and can be partitioned into four register banks, each with four 32-bit registers, or eight register banks, each with eight 16-bit registers. The chip operates in two modes: minimum, with a 16-bit program counter and registers, or maximum, a full 32-bit mode with 32-bit datapaths, a PC, and registers.

The TLCS-900, with 300 to 400 instructions, is backward-compatible with the TLCS-90 but offers a substantial performance increase by using a three-stage pipeline in combination with a 4-byte prefetch queue. The 32-bit maximum mode accommodates large-scale arithmetic and addressing (16 Mbytes) with a basic 16-bit CPU.

I/O processing is enhanced by configuring peripheral interrupts to bypass CPU interrupt and, instead, be handled by an I/O controller or a special peripheral mDMA processor. By using the I/O controller, the device avoids the overhead of interrupt processing. Peripheral events trigger I/O-controller processing and "DMA" the data to or from memory and internal peripherals. The I/O controller handles up to four mDMA channels. The CPU can execute from external memory and, while running, can dynamically shift bus sizes between 8 and 16 bits.

Power management: Idle modes shuts down the CPU, leaving all integrated peripherals active. Power-down (stop) disables the oscillator. Any reset or interrupt request can terminate idle mode; only a hardware reset (NMI and INT0) can terminate power-down.

Special instructions: Bit-manipulation instructions include bit set, clear, change, test, search forward and reverse, and various logical operations. Math instructions include add, subtract, decimal adjust, signed and unsigned 8×8 and 16×16 multiply, signed and unsigned 16×8 divide, and shift 1 bit one to 16 times. The TLCS-900 also has a MAC instruction and modulo increment/decrement instructions used for circular buffer pointers. It can also perform block moves and pattern searches in memory.


| EDN Access | feedback | subscribe to EDN! |
| design features | out in front | design ideas | departments | products |


Copyright © 1996 EDN Magazine. EDN is a registered trademark of Reed Properties Inc, used under license.