Interconnect project?

Hey Team,

I’d like to create a small cluster of Snickerdoodles where each of them is connected to several of its neighbors.

I prefer a parallel interface right now on these point-to-point links based on the assumption that careful work on a parallel interface should be able to provide lower latencies and higher throughput than serial. But if pin counts become short or “careful” turns out to actually be “close to impossible”, serial might be fine. I don’t require that the connections be bidirectional, beyond possibly some sort of trivial flow control. The higher speed, the better.

I recognize that the clocks on any two connected Snkrdl might not be at exactly the same frequency without some additional TBD synchronization. I am pretty sure I can keep the length of the interconnecting wires essentially the same… but perhaps not their path through space. I’m open to using differential signaling. I think we can assume those wires will always be less than 4 inches long.

I’m not an experienced FPGA programmer at this point but did study undergraduate “electronics” several decades ago. I think we even programmed some of the early Xilinx hardware for one class back then.

What project might you suggest as an example of something like this? What general suggestions?

J.

Jason,

Sounds like an awesome project! A snickerdoodle supercomputer…rolls right off the tongue.

After chatting with the team about this, the feedback was: The biggest driving factors are probably going to be the topology, the number of neighbors per node, how many I/O you’re willing to dedicate for interconnect, and the required throughput.

If you can live with DDR I/O speeds (max 32x 800 Mbps if you connect nodes in a 2D torus) , you probably want to look at an AXI Chip2Chip bridge:

  • Independent clocking, 32 bit data width, 40 bit address width (extra 8 bits for node ID), DDR interface 4-1
  • 12 inputs (clock + 11 data)
  • 12 outputs (clock + 11 data)
  • Could fit on a single snickerdoodle connector (<25 I/O total)
This is going to be more limited in terms of throughput, as the Xilinx IP isn't the best at maximizing the bandwidth for a given number of pins. If you go this route, it might be worth writing your own IP to make the interface very compact.

You could also consider LVDS, which can move 1.25Gbps (in ‘theory’): 4 links on each node to adjacent nodes with 2 pairs and a clock in each direction.

Optionally, since you have 60 pairs available, you could setup a synchronously clocked system where all nodes were fed from the same reference clock which would allow you to drop the ref clock pairs and instead use 8b10b codes. At that point you could fully connect to 31 nodes!

Those are some ideas anyway, let me know what you think.

-Cousins

Hey Ryan. Thanks for the thoughts.

'sorry for the delay responding. (I wasn’t notified of your response and apparently overlooked your response when I polled the forum subsequently… until today. What a great surprise.)

It’s good to hear that this sort of clustering might be possible.

I’m thinking 4, 5, or 6 neighbors directly connected this way depending on the difficulty involved.

A 2D torus is probably the best option I see. Definitely nothing more complex than that as it’s almost a “starter” project. :slight_smile:

The “DDR” speeds you mention should be acceptable. I’m beginning to read up on Chip2Chip to understand more about that.

FWIW… I think I’m fine with message/packets passing, rather than shared memory, so I don’t think I’d need so many DDR address lines. At least not for addressing. I’ll read further to make sure that’s even a meaningful statement.

I have been deferring deciding how passing packets to non-neighbors would work.

The LVDS speeds is inviting. I’ll stay flexible with a shared clock or wave clock across the cluster vs total async clocking.

If you have more thoughts, sample projects, etc. Please let me know. I’ll keep on reading up in the meantime.

Please pass my thanks to the team.

J.