Two Views of the Post PC World - Automata Processor and TOMI Celeste, Part 1

-March 18, 2014

Automata Processor is a State Machine on Steroids
A state machine is a means of representing a system consisting of several states and the events that cause a transition from one state to another.

AP is a specialized programmable state machine with some important parallel enhancements.

An elevator controller is an example of a simple state machine.

Figure 1. Elevator State Machine

The system will remain in one of the states until one of the following events:
  • The UP button is pressed
  • The DOWN button is pressed
  • The elevator reaches a floor

Figure 2. Elevator Truth Table

The above truth table can be implemented using a 32 x 2 bit memory and a latch as shown in Figure 3 State Machine Implementation.

Figure 3. State Machine Implementation

State machines have mostly been replaced by embedded processors, but one of the most important designs of the early PC business was implemented with exactly such a design. Using a state machine made from a PROM, self-taught Steve Wozniak reduced 45 ICs in the traditional design to 5 for the Apple II disc controller.[15]

Micron's AP retains the simplicity of Woz's state machine but adds two elements that enable dramatic levels of parallelism.

Figure 4. Apple II Disk Controller

AP Element One
The first element came from Ricardo A. Beaze-Yates's 1989 dissertation for the University of Waterloo.[16] In this document Ricardo proposed two methods to accelerate search. No one remembers the first, but the second is pretty important:

"In our second solution, we trade storage for time. We obtain a solution which works in time independent of the size of the answer. For this, we include in each internal node all the positions (in order) that match with that node (leaves of the subtree rooted by that node)."

Micron has implemented this structure as a 256 x 1 bit memory as shown in Figure 5 Micron's Parallel Comparator[17].

Figure 5. Micron's Parallel Comparator

The 256 bits correspond to all the possible 8-bit characters that could be matched. For example, if Automata Processor is searching an input stream for the letter "A", it is represented in ASCII as Hex 41. Location Hex 41 of the 256 bit RAM would be programmed with a "1". All other locations would contain "0".

The character to compare is presented to the address decoder which selects 1 of the 256 memory rows. The column line will contain the result of the match. If Hex 41 is presented to the address decoder, the column will contain a "1", indicating a match. Any other character will produce a "0".

Furthermore, multiple comparisons can be performed in parallel. For instance, text reading software looks for a variety of terminators indicating the end of a word. Space, comma, period, and carriage return can all be searched in a single cycle by setting the corresponding bits in the 256-bit memory.

AP Element Two
Unlike a simple state machine that allows a transition to a single state, Automata Processor achieves additional parallelism by allowing multiple transitions on a single event.

Figure 6. Example of Parallel States

The diagram above might be found in a text to speech program that looks for key words and then pronounces them. The actual program would contain a path for every word that could be recognized and probably require a dozen or more Automata chips.

The list of possible words is on the right side. Notice that after the letter "c" is found, there are 4 parallel comparisons being performed simultaneously. If an "a" is found, there are 3 parallel comparisons being performed. If a comparison fails, that state terminates.

As a result, any word no matter how large the dictionary, can be detected in the same number of cycles as it has letters.

Due to the parallelism of its compare and parallelism of its states, Automata Processor is probably several hundred times faster at decoding patterns than any existing computer architecture. Furthermore, as the size of the comparison dictionary increases, the AP speed advantage increases. This performance will enable entirely new classes of products which we'll discuss in Part III of this series.

Loading comments...

Write a Comment

To comment please Log In