FPGARelated.com
Forums

Is N-body simulation in O(n) time possible?

Started by smjedison 6 months ago2 replieslatest reply 6 months ago76 views

Hi! My name is Mason. I've recently been messing around with FPGAs, and I think I found a way to do N-body simulation in O(n) time. First, some quick background. There are two things that make N-body simulation hard to do, namely:

  1. Every body needs to interact with every other body
  2. Every state is dependent on the previous state

Point 1 requires O(n^2) calculations, and point 2 removes the ability to pre-distribute data. This also creates a massive IO bottleneck, as each previous state needs to be distributed to all nodes performing the current calculation.

My theoretical solution to this problem is as follows:

  1. Every body is assigned to a dedicated ALU, registers, and register controller (I'll refer to this as a node)
  2. Every node is connected to a globally shared half-duplex bus
  3. A controller orchestrates all dataflow
  4. The controller sequentially commands each node to broadcast its state (i.e. position) on the bus. All other nodes perform a calculation on that bus value in parallel

Here's an example:

sequential-broadcasting.drawio (1)_84871.png

  1. Controller tells the sun to broadcast its position and mass
  2. Controller tells all other nodes to calculate and apply the appropriate force, (m1*m2)/d^2
  3. Controller saves the sun's position in a local buffer (to send to computer)
  4. Controller repeats above steps for all other planets
  5. Controller sends buffer of positions to computer

The nice thing about this design is it's trivial to scale—even across different chips. It's just a half-duplex bus!

Do you think this design is feasible and/or more scalable than current approaches? Or is it just inherently too slow?

[ - ]
Reply by ShparekhOctober 19, 2023

This is interesting. 

Generally, FPGAs are resource rich - both routing and gates, so you do not have to limit yourself at the outset to constraints such as half-duplex busses with a common controller that sequentially computes the force for each node.  Instead, it helps to think in terms of parallelizing the problem.

I am not familiar with the problem.  From your description, I didn't sense any dependencies in the inputs to each nodes.  So, ideally, you can extend your scheme to each pair of nodes to realize O(1) solution.  

To me it seems that the problem can be triaged down to solving the equation, (m1*m2)/d^2for each pair of nodes.  So, if m1 is the mass of remote node, and m2 is the mass of self.  Similarly, l1 can be location of remote node, l2 location of self.  

You can design a pipeline that takes in those 4 values and outputs force, f for that pair of node.  You can instantiate that pipeline 25 times to provide you with the result in O(1) time.

Subsequently, if you have resource constraints, you address the problem accordingly.

Best regards.

[ - ]
Reply by smjedisonOctober 19, 2023

You are correct that I could calculate this in O(1) time. The issue I'm trying to solve here, however, is scaling. I'm looking at potentially simulating 1,000,000 atoms. With your O(1) design, there would be (1,000,000*999,999) pair interactions, way too many to fit on a single chip. That's the issue--the N-body problem scales exponentially.

With my solution in this scenario, I would only have 1,000,000 steps. While the first node broadcasts, all other nodes in parallel apply the first node's position. Thus, expanding by one node doesn't add n steps to process, it only adds 1.

Thank you for providing your insights! I am by no means experienced with FPGAs and am still trying to understand :)