UltraPlus I2C and SPI
See original GitHub issueHas anyone tried the hard I2C and SPI modules in the UltraPlus FPGAs and got them working? I have spent the past few days tinkering away using the information in these documents:
http://www.latticesemi.com/-/media/LatticeSemi/Documents/ApplicationNotes/AD/AdvancediCE40SPII2CHardenedIPUsageGuide.ashx?document_id=50117 http://www.latticesemi.com/~/media/LatticeSemi/Documents/TechnicalBriefs/SBTICETechnologyLibrary201504.pdf
So far the SPI module works perfectly right up until you try to transmit a byte. Once you do the module just clocks out the same byte repeatedly until you disable the module or transmit a different byte at which point it just clocks the new byte ad infinitum.
On the I2C side of things the module never seems to assert the SBACKO
line when writing to its registers and so anything that waits on it will hang.
I have not yet ruled out my design / software as the cause of this but I’m not sure where to go from here as I cannot find a single working example anywhere, even from Lattice.
Also, if anyone has access to the proprietary tools, does the module generator produce readable HDL code we can study?
Issue Analytics
- State:
- Created 5 years ago
- Comments:11 (4 by maintainers)
Top GitHub Comments
I had the misfortune of having to use the SB_I2C hardened IP block, in order to free up some gates, and came across this thread while looking for docs, and figured I’d add my 2c here.
The good news:
The bad news:
The TL;DR is I would only recommend using the SB_I2C block if and only if you are really out of gates and this is the only way to optimize a few LC out of the design. Previously we used an OpenCores Verilog implementation glued into Python that was very well tested and well behaved, and had a “sane” driver interface; it basically came up without a hitch and never caused us heartburn.
OK, so from here on out – these are notes for people who are thinking about using the I2C block themselves:
The SB_I2C block uses a wishbone-oid interface. The link above to hard_i2c.py will show you how to integrate. They only provide a signal called “STB” which actually needs to be mapped to “CYC”, not “STB”, because they lack a “CYC” signal. The block also does not pay attention to CTI, etc. You must make sure that your wishbone interface is configured to be non-caching for the region.
The Lattice docs say that bits 7:4 depend upon the location of the block (upper right or upper left) but looking through Clifford’s notes it seems maybe it’s actually set by a parameter p_BUS_ADDR74. I didn’t resolve this but just in case I put a hard BEL constraint on it so it doesn’t move around.
I took the strategy of just mapping the address and data bits straight over to wishbone, so that the 8-bit registers are actually strided over words, and the upper 24 bits are wasted. Thus the address table given in the docs needs to be multiplied by 4 to get the actual offsets. The code for the driver is here: https://github.com/betrusted-io/betrusted-ec/blob/e0f21858cd2cbb6448173f63467a93c8458c6798/sw/betrusted-hal/src/hal_hardi2c.rs#L1 (sorry, we only wrote a Rust version).
The flow chart and timing diagrams on page 24-30 of the PDF file linked above are pretty handy, except they are wrong. Or rather, I’m guessing the core behavior was slightly tweaked between different versions of the FPGA and they just didn’t bother documenting all the differences. The biggest trick in driving the block is that the block requires the host to intervene in real-time to guide the I2C transaction. In other words:
This means you have race conditions on the “fast” side (making sure the commands are around long enough to be accepted by the I2C block), and “slow” side (making sure the command are cleared soon enough to avoid issuing a second command by accident).
The upshot is that if you’re driving this with a Vexriscv block running at 12MHz, you don’t actually have a lot of margin to play with. Interrupts may not be serviced in time to meet the “slow” constraint; but at 12MHz, it’s “fast” enough that you can’t simply fire and forget.
Thus the driver in essence requires a lot of polling to be done to make sure everything occurs in exact lock-step.
The documentation in the PDF would hint that TRRDY is the one register to watch to synchronize things. The TRRDY bit indicates that the Rx or Tx register (depending on the mode) has been copied into the I2C hard IP block, and the host is now safe to update the contents (or read it for Rx).
If you implement the flowchart on page 24 exactly as shown, what you end up with is just the very first cycle of either a read or write being produced, and all subsequent cycles are skipped.
For writes, not only do you have to wait until TRRDY is asserted, you need to wait until the initial slave address transaction (concurrent with the “STA” bit) indicates completion (by monitoring the TIP bit). If you simply load the next value into Txd and issue a write command upon TRRDY, the system will ignore it.
However, once you have completed that, you can now monitor TRRDY and issue WR commands to issue successive writes.
In other words, the flow chart needs to be modified to say “Wait for TIP to clear” after the initial “TXDR/CMDR” box in order to be correct.
I have found that “repeated-start” commands also don’t work. A “repeated-start” leads to (iirc) the read data cycles following the “read start” command to disappear. Thus, the work-around is to always conclude every write phase with a “full stop”, before moving onto the read phase. Minor performance loss, no biggie.
For the read side, the document does accurately specify to wait for “SRW” instead of “TRRDY” after the initial “STA+R” command. The read side flow chart is otherwise mostly correct except that I couldn’t get the “1 byte read” condition to work. Because of the double-buffering they put on the I2C read side, there is an even stricter race condition imposed, where you must issue the “RD+NACK+STOP” command within a 2tSCL to 7tSCL window, or else things blow up (usually ends up with a weird runt cycle or the I2C block gets hung clocking SCL forever).
I was unable to find a way to reliably hit the 2tSCL to 7tSCL window. I had increased the hardware timer precision and tried various combinations off of that, but it always seemed I was either too fast or too slow. If your system uses caching, does XIP from SPI, or has interrupts, that would also cause this to blow up.
Thus, my final driver implementation works around this by simply not allowing single-byte reads. I have an “assert” in the code to catch that, but another valid way to deal with it might be to simply issue two reads even if a single read is requested. In many cases, this is harmless for the I2C device to read an extra byte, and the main impact would be e.g. if you were relying on the position of the read address pointer to increment by only one byte in the target device. Fortunately for my application, this is not the case so I didn’t have to solve this last detail.
I will note that there is a “RBUFDIS” function that is not well documented that might solve the above problem. In the flow chart examples, they always set CKSDIS but don’t explain why; I just do it in my code because that’s what they recommended. I imagine that if you set the RBUFDIS signal, you would no longer have that weird race condition anymore on the RD+STO+NACK cycle, but instead you’d have another race condition timing when to read the data out of the Rxd register. I didn’t want to find out which was worse, but this foot note is here for anyone who decides they absolutely must have the ability to read a single byte from a slave device using this hard IP block.
Finally, I put some diagnostics in my code to check how often we hit time-outs at places I wouldn’t expect them, and I also explicitly wait for things like TRRDY to go “not ready” even though the flow chart doesn’t call for it to ensure proper interlocking. Despite these measures, a small fraction of the I2C operations still trigger time-outs and fail. Therefore, all the calls to the I2C API in my implementation now check the return code and retry the operation if there is a failure. I did not go into why I had the rare time-outs, or what causes them, because my targets all support stateless read/write, e.g., I can afford to just keep on retrying the read or write until it works, but not all I2C targets are like this.
There is a mysterious “SDA delay” parameter, and apparently there is some mention of a “glitch filter” elsewhere that is a hard IP block that seems like it was meant to be used with the SB_I2C and it may be instantiated by the proprietary tool and perhaps adding these or tuning these parameters would solve the reliability problems, but the docs are sparse on this.
Good luck! I agree with a note I saw from Clifford elsewhere – this block is pretty horrible and fussy, and you should use an RTL IP block if you can. But if you’re out of gates – you’re out of gates, and I hope these notes can help you.
Unfortunately the I2C and SPI cores are somewhat annoying to use, despite adding them to icestorm for completeness I would advise you to avoid them if possible and use soft cores instead.
The I2C module is hanging because of a major typo in the datasheet. The correct register set for the UltraPlus is not the one titled “iCE40 UltraLite and iCE40 UltraPlus”, but “iCE40LM and iCE40 Ultra”. Hopefully once you start following that it will at least acknowledge you.
The only working example of I2C/SPI is @mmicko’s work here: https://github.com/mmicko/mikrobus-upduino/blob/master/src/picosoc/firmware.c and https://github.com/mmicko/mikrobus-upduino/blob/master/src/picosoc/ip_wrapper.v