Load-Store-Unit (LSU)

The Load-Store Unit (LSU) of the core takes care of accessing the data memory. Load and stores on words (32 bit), half words (16 bit) and bytes (8 bit) are supported.

Table 8 describes the signals that are used by the LSU.

Table 8 LSU interface signals






Request valid, will stay high until data_gnt_i is high for one cycle






Write Enable, high for writes, low for reads. Sent together with data_req_o



Byte Enable. Is set for the bytes to write/read, sent together with data_req_o



Data to be written to memory, sent together with data_req_o



Data read from memory



data_rvalid_i will be high for exactly one cycle to signal the end of the response phase of for both read and write transactions. For a read transaction data_rdata_i holds valid data when data_rvalid_i is high.



The other side accepted the request. data_addr_o may change in the next cycle.

Misaligned Accesses

The LSU never raises address-misaligned exceptions. For loads and stores where the effective address is not naturally aligned to the referenced datatype (i.e., on a four-byte boundary for word accesses, and a two-byte boundary for halfword accesses) the load/store is performed as two bus transactions in case that the data item crosses a word boundary. A single load/store instruction is therefore performed as two bus transactions for the following scenarios:

  • Load/store of a word for a non-word-aligned address

  • Load/store of a halfword crossing a word address boundary

In both cases the transfer corresponding to the lowest address is performed first. All other scenarios can be handled with a single bus transaction.


The CV32E40P data interface does not implement the following optional OBI signals: auser, wuser, aid, rready, err, ruser, rid. These signals can be thought of as being tied off as specified in the OBI specification. The CV32E40P data interface can cause up to two outstanding transactions.

The OBI protocol that is used by the LSU to communicate with a memory works as follows.

The LSU provides a valid address on data_addr_o, control information on data_we_o, data_be_o (as well as write data on data_wdata_o in case of a store) and sets data_req_o high. The memory sets data_gnt_i high as soon as it is ready to serve the request. This may happen at any time, even before the request was sent. After a request has been granted the address phase signals (data_addr_o, data_we_o, data_be_o and data_wdata_o) may be changed in the next cycle by the LSU as the memory is assumed to already have processed and stored that information. After granting a request, the memory answers with a data_rvalid_i set high if data_rdata_i is valid. This may happen one or more cycles after the request has been granted. Note that data_rvalid_i must also be set high to signal the end of the response phase for a write transaction (although the data_rdata_i has no meaning in that case). When multiple granted requests are outstanding, it is assumed that the memory requests will be kept in-order and one data_rvalid_i will be signalled for each of them, in the order they were issued.

Figure 5, Figure 6, Figure 7 and Figure 8 show example timing diagrams of the protocol.

Figure 5 Basic Memory Transaction

Figure 6 Back-to-back Memory Transactions

Figure 7 Slow Response Memory Transaction

Figure 8 Multiple Outstanding Memory Transactions

Post-Incrementing Load and Store Instructions

This section is only valid if PULP_XPULP=1

Post-incrementing load and store instructions perform a load/store operation from/to the data memory while at the same time increasing the base address by the specified offset. For the memory access, the base address without offset is used.

Post-incrementing load and stores reduce the number of required instructions to execute code with regular data access patterns, which can typically be found in loops. These post-incrementing load/store instructions allow the address increment to be embedded in the memory access instructions and get rid of separate instructions to handle pointers. Coupled with hardware loop extension, these instructions allow to reduce the loop overhead significantly.