FPGARelated.com
Forums

MontaVista Linux and Virtex-II & 4

Started by Osnet February 18, 2006
Does anyone know if MontaVista Linux or other distributions support SMP
in Virtex-II Pro and Virtex-4? Thanks.

No, the PowerPC405 caches in the current Xilinx FPGAs are not cache
coherent and so do not support SMP.  

Paul

Osnet wrote:
> > Does anyone know if MontaVista Linux or other distributions support SMP > in Virtex-II Pro and Virtex-4? Thanks.
> Osnet wrote: >>Does anyone know if MontaVista Linux or other distributions support SMP >>in Virtex-II Pro and Virtex-4? Thanks.
Paul Hartke wrote:
> No, the PowerPC405 caches in the current Xilinx FPGAs are not cache > coherent and so do not support SMP. >
I don't really get the relation between the two facts ... The OS could enforce coherency in software, by forcing a cache flush during task switching I think ... Sylvain
In article <43f849af$0$2132$ba620e4c@news.skynet.be>,
 Sylvain Munaut <com.246tNt@tnt> writes:
|> Paul Hartke wrote:
|> > No, the PowerPC405 caches in the current Xilinx FPGAs are not cache
|> > coherent and so do not support SMP.  
|> 
|> I don't really get the relation between the two facts ... The OS could
|> enforce coherency in software, by forcing a cache flush during task
|> switching I think ...

The idea of having caches also means that they are transparent so that you
do *not* need any sort of special treatment by the programmer or operating
system.

Besides, flushing with task switching wouldn't help as memory write accesses 
will occur independently from task switches, so your OS would need to keep 
track of memory accesses of all CPUs in your SMP system to block reads to 
"dirty" addresses until they have been written back by the "dirtying" CPU, 
i.e. the OS would have to establish a cache coherency protocol entirely in 
software and without typical hardware assist as required for cache coherency
protocols in hardware like MESI.

Rainer

Rainer Buchty wrote:
> In article <43f849af$0$2132$ba620e4c@news.skynet.be>, > Sylvain Munaut <com.246tNt@tnt> writes: > |> Paul Hartke wrote: > |> > No, the PowerPC405 caches in the current Xilinx FPGAs are not cache > |> > coherent and so do not support SMP. > |> > |> I don't really get the relation between the two facts ... The OS could > |> enforce coherency in software, by forcing a cache flush during task > |> switching I think ... > > The idea of having caches also means that they are transparent so that you > do *not* need any sort of special treatment by the programmer or operating > system.
Well, there are quite a few CPU where you need to enfore coherency by hand when using DMA for example (there is even a flag for not coherent cache cpu in the kernel) ...
> Besides, flushing with task switching wouldn't help as memory write accesses > will occur independently from task switches, so your OS would need to keep > track of memory accesses of all CPUs in your SMP system to block reads to > "dirty" addresses until they have been written back by the "dirtying" CPU, > i.e. the OS would have to establish a cache coherency protocol entirely in > software and without typical hardware assist as required for cache coherency > protocols in hardware like MESI.
I don't get that sorry ... (note that you may be right, i'm just trying to understand here) I'm not that familiar with SMP but here is how it goes for me : Processes have two kind of writable memory zones, either private to the process or shared between several processes. * For independent address space, since the task will run only on a CPU at a time, cache problem only occurs when a task is stopped on one cpu and launched on the other. So just after stopping the task, the cpu should flush it's cache so that if the task is launched on the other cpu, the other has access to up-to-date memory. * For shared zone between processes, there is no problem either. While the two processes are running simultaneously, no problem can occur because the processes must handle the synchronisation themselves even on cache coherent system (by semaphore, or flag in memory, whatever ...). And when only one is running, the situation is similar to the independent zones. Sylvain
In article <43f87b17$0$3812$ba620e4c@news.skynet.be>,
 Sylvain Munaut <com.246tNt@tnt> writes:
|> Well, there are quite a few CPU where you need to enfore coherency by
|> hand when using DMA for example (there is even a flag for not coherent
|> cache cpu in the kernel) ...

Yes, I also could come up with a system e.g. requiring non-cacheable memory
areas because one or more of the devices accessing the respective memory 
area is not able to support a cache coherency protocol.

No doubt that it can be done otherwise, but that's not the point.

|>  * For independent address space, since the task will run only on a CPU
|> at a time, cache problem only occurs when a task is stopped on one cpu
|> and launched on the other. So just after stopping the task, the cpu
|> should flush it's cache so that if the task is launched on the other
|> cpu, the other has access to up-to-date memory.

And why would you specifically need shared memory in this respect?

If your application / system does not require shared memory access by design
then of course you can come up with a light-weight solution like the above
where it seems like that the one task is the only one dealing with a specific
set of data.

|>  * For shared zone between processes, there is no problem either. While
|> the two processes are running simultaneously, no problem can occur
|> because the processes must handle the synchronisation themselves even on
|> cache coherent system (by semaphore, or flag in memory, whatever ...).

Ok, assume that the semaphore is placed in memory and we have a two-processor
system where each processor runs one of those two tasks.

You could of course switch off caching of that very memory area holding the
semaphore(s) and never have a problem. But also no cache for that area, i.e.
the accesses will be dog slow.

You could also, as I understand your example, trigger a flush so that whenever
one processor tries to read the semaphore, the other processor flushes the
cache line(s) holding your semaphore(s) while the reading processor waits for
that process to finish. Would work, but induces unnecessary bus load, waiting
times (the semaphore might not have been changed since the last read), and
furthermore a more or less complex communication protocol which needs to be
triggerend whenever any of the processors tries to access the shared memory
region.

The idea behind having a cache coherency protocol is to get consistency and
coherency at no extra cost on software side. The programmer (or the OS) do
not need to care about the entire process of access monitoring, stopping a 
read access on a memory region which has been dirtied by another processor, 
writing back that dirty information to memory, and restarting the read 
processor. The price for that you pay on hardware side, i.e. you require a
common, snoopable bus, some additional communication signals (3 in case of
MESI), logic to implement the light-weight protocol, and a slightly altered 
cache to hold the actual MESI state.

The idea behind MESI (or cache coherency protocols in general) is to keep 
the additional bus traffic as low as possible, i.e. accesses to memory only 
when necessary, keeping as much traffic inside the cache as possible.

Of course you could do all that also on software side using communication
methods on OS and application level. But for the price of increased 
complexity, bus traffic, and access latencies. 

Try scaling the 2-processor example up to 3, 4, or more processors.

Rainer
On 2006-02-19, Sylvain Munaut <com.246tNt@tnt> wrote:
> >> Osnet wrote: >>>Does anyone know if MontaVista Linux or other distributions support SMP >>>in Virtex-II Pro and Virtex-4? Thanks. > > Paul Hartke wrote: >> No, the PowerPC405 caches in the current Xilinx FPGAs are not cache >> coherent and so do not support SMP. >> > > I don't really get the relation between the two facts ... The OS could > enforce coherency in software, by forcing a cache flush during task > switching I think ...
This cannot handle threaded applications running on multiple CPUs since they will share the same memory. If you do not have any threaded applications it might work. The biggest problem for the original poster is however that the multiprocessor support in the Linux kernel itself is designed on the principle that the memory system is cache coherent. Rewriting all of that is going to be non-trivial to say the least. The best you can hope for in this case is to run two copies of the Linux kernel, one on each processor. /Andreas