Connecting to ROCCC Generated Code

The VHDL generated by ROCCC communicates with the external platform in a variety of ways described in this section. All inputs and outputs that connect to ROCCC code are assumed to be active-high.

Each hardware module and system generated by ROCCC will contain six ports by default. These default ports are clk, rst, inputReady, outputReady, done, and stall. Their use is described here:

  • clk
    The clk port is the clock of the hardware and should be connected to a clock signal. All processes internal to ROCCC code trigger off of the rising edge of the clock. All ROCCC components and systems assume a single clock to drive all the hardware.
  • rst
    The rst port is the reset signal to the generated hardware. Driving the reset port high resets the hardware to an initialized state. As long as the reset port is held high, the hardware will remain in the reset state, regardless of the inputs. After bringing the reset port low, the hardware will begin responding to the input signals. The hardware generated by ROCCC requires the reset port to be driven high for at least one clock cycle for initialization purposes. Not doing so may leave the component in an uninitialized state.
  • inputReady
    The inputReady signal should be driven high when the signals that correspond to input scalars are valid. As long as the inputReady signal is high, input scalars will be read on every rising edge of the clock. Setting the input scalars to valid data and setting inputReady high should be the first thing done by any interfacing code. Even if no input scalars are used, streams will wait to generate addresses and request data until after inputReady is driven high.
  • outputReady
    The outputReady port goes high when valid data is placed on the output scalar ports of the hardware. The output data is valid simultaneously with the outputReady signal being high.
  • done
    The done port goes high when the hardware generated by ROCCC has finished processing all of the input it was designed to process and remains high until the reset signal is asserted.
  • stall
    The stall port is used by the interfacing code to stall the pipeline of the generated hardware.

Timing Diagram Of A System With Both Input Scalars And Input Streams


Input and Output Ports

In addition to the default ports, input and output data ports will be generated by ROCCC. These may correspond either to single registers or to streams.

  • Registers

    For each input register, a single data port will be generated. When generating modules, all inputs are treated as registers. When generating systems, any single variable that acts as input to the main loop will be treated as an input register.

    For each output register, a single data port will be generated. When generating modules, all outputs are treated as registers. When generating systems, any single variable that acts as output to the main loop will be treated as an output register.

Block Diagram Of A Generated Module

  • Streams

    For input streams, several ports will be generated: a valid port, an address ready port, a pop port, a positive number of input ports, and the same number of address ports. The number of input ports and address ports will be equal to the number of channels the user specifies for the stream, and needs to be a factor of both the window size and the step window size.

    The default ports are still generated  as well as the interface to the streams. In addition to the ports generated for streams, input and output registers can be created as well.

    For output streams, several ports will be generated: a valid port, a pop port, a positive number of output data ports, and the same number of address ports.  Similar to the input ports, the number of output data ports and address ports will be equal to the number of channels the user specifies for the stream and needs to be a factor of both the window size and the step window size.

Block Diagram Of A Generated System


Interfacing Protocols

  • Input Registers
    Input registers are used by both module and system code.  They need to be set when inputReady is driven and are sampled on the rising edge of the clock.  Driving the input registers is the responsibility of the calling code.  In modules, the input registers can be changed every clock cycle.  In systems, the input registers may be set only once, and must be set before passing any data to the input streams.

Timing Diagram Of Module Use

  • Input Streams

    The input stream address generation and the input protocol have been decoupled, allowing address generation to happen independent of incoming data. In particular, there are two ports dealing with address generation and three that deal with input data.

    When an address is being generated, the address rdy port will be brought high and the address port will hold the address of the value needed. The address rdy will only be held high for one clock cycle for each individual address being generated. If addresses are being generated in consecutive clock cycles, the address rdy port will be continuously high.

    The user defined interfacing code needs to service memory requests in a FIFO fashion. ROCCC generated code expects the data we receive to be in the exact order as requested. When data is ready, the data must be placed on the input port and valid must be asserted and held until the ROCCC generated code brings the pop signal high. When pop is brought high, if the next data value is ready, valid can remain high and the new data value can be placed onto the input port, otherwise valid should be brought low.

    Timing Diagram Of Generated Code Reading From A Stream With Memory Addresses

    In order to allow the fastest possible streaming, all data is read synchronously, but the pop/valid handshake is asynchronous. The pop and valid signals can be treated as synchronous statements, although this limits the data transfer rate to the ROCCC generated code with a data transfer occurring every other clock cycle, with alternating cycles devoted to the handshake protocol.

    The number of outstanding memory requests generated by the ROCCC- generated code is independent of the reading of data and is user configurable. When the number of outstanding memory requests is set to two we can generate a total of 2 memory addresses before we stop. Data can be read at any time during memory generation although we assume that all data being received happens in the order in which we requested it.

    Timing Diagram Of Generated Code Reading From A Stream With Multiple Outstanding Memory Requests

    If the user has specified that a given stream is a multi-channel stream, then it is necessary to set all channels of the input with valid data before asserting the valid signal. The channels in the ROCCC generated code are numbered from 0 to N and it is up to the user generated interfacing code to place the oldest data in channel 0, the second oldest data in channel 1, and so on. Once all channel data has been fetched, the interfacing code should set valid high and hold it high until pop is seen to be high.

Timing Diagram Of Generated Code Reading From A Stream With Multiple Channels

  • Output Registers
    Output scalars are driven when outputReady is driven. The number of clock cycles before outputReady goes high after driving inputReady is based off the delay of the pipeline. Code that interfaces with systems should ignore outputReady; if values are to be sampled every iteration of the loop, then a stream should be used. System code that properly uses output scalars should only interested in a final value, which will be valid when done goes high, not when outputReady goes high.

  • Output Streams

    Output streams have some number of data ports, the same number of address ports, and two ports necessary fro a complete two-way handshake protocol. The two-way handshake protocol of the output streams is similar to the handshake protocol of the input stream. When the output controller has valid data from the datapath, the first element of the stream is written to the data port, and the address of that data element is written to the corresponding address port. The valid port is brought high, and the output controller waits until the pop signal is brought high. When the pop signal is brought high, the next element of the stream is written to the data port, its address is written to the address port, and the valid port is again brought high.

    Because the outputController is serializing data calculated in parallel, the datapath must be stalled until all of the data is serialized. This happens entirely internally, but functions equivalently to bringing the stall port high the datapath is stalled, the inputController continues to read but will not push data onto the datapath, and other output streams may run out of valid data. For this reason, it is important not to rely on a specific timing for any stream interfacing. Rather, the two-way handshakes should be relied on to guarantee that data is transferred correctly.

    Timing Diagram Of Output Streams

    If it is imperative that data not be serialized, it is preferred to create several output streams or to create a multi-channel stream over using output scalars as pseudo-streams.

  • Done
    The done signal works differently, depending on if it is coming from module or system code. Module code will drive the done signal high as soon as the first value is processed; this can safely be ignored by any code interfacing with a ROCCC module, as modules are stateless and can never be considered done. System code will drive the done signal high on the rising edge of the clock after the last output values are set.

    Timing Diagram Of The End Of A System's Processing

  • Stall
    The stall signal allows the interfacing code to stall the datapath in both modules and systems. Stalls are not instantaneous – it takes 1-2 clock cycles for the stall signal to propagate all the way up the datapath, to both the input and output controller. In hardware, a common use for a stall signal is when interfacing with memory that may become full. However, both input and output streams are two-way handshakes, and any stream can be ”stalled” by simply not completing the handshake. For this reason, and because stalls are not instantaneous, stalls should be reserved for the case when there is no alternative.When the stall signal is brought high, both input and output streams will continue to interact with any interfacing code. However, the datapath will be frozen, and data will not be pushed onto the datapath. Again, prefer to handle full memory in the stream interface, and not with the stall signal.

<< PCore Generation Tutorials Home