FPGARelated.com
Forums

Prob in Synthesizing and Simulating large Mux

Started by vssumesh September 30, 2005
Hi all,
  I am developing a hardware in which I need large size MUX. I need a
240 to 1 byte multiplexer. I tried to code it but observed the
following problems.
1. I tried the straight forward way. Using the AND and OR gates. This
is simple as I have to use simple "generate" functions in verilog.
But the problem is that I could neither simulate nor synthesize the
design. In the modelsim (V 6.0a) it just stop responding when I tried
to load the design. And in the Xilinx ISE also its not working. In case
of ISE first it shows strong activity and loads the processor and takes
up loat of memory. But after some time it just not working ; ISE is
showing activity but the processor usage is almost '0' and after
some 4 hrs it showed only 60% progress. If I reduce the size of the
inputs it just works fine and gives output in few minuts.
2. Then I tried the case statement and I written 240 cases. In this
case also xilinx is not working.
I am using Windows XP on AMD machine. Version of the ISE is 6.0. And if
I reduce the number of cases to 120 it gives proper output.
I confused about the low activity of the Xilinx. Why its not loading
the processor. Is it because of the problem in the OS. I hope the
method 2 will work with the synthesizer.
Please advice me on this issue.  And please let me know about any usual
ways to generate this type of huge MUX.

The output of the Xilinx is given below.

Started process "Synthesize".


=========================================================================
*                          HDL Compilation
*
=========================================================================
Compiling source file "../test/test.v"
Module <test_mux> compiled
No errors in compilation
Analysis of file <test_mux.prj> succeeded.


=========================================================================
*                            HDL Analysis
*
=========================================================================
Analyzing top module <test_mux>.
WARNING:Xst:905 - ../test/test.v line 23: The signals <in> are missing
in the sensitivity list of always block.
Module <test_mux> is correct for synthesis.

    Set property "resynthesize = true" for unit <test_mux>.

=========================================================================
*                           HDL Synthesis
*
=========================================================================

Synthesizing Unit <test_mux>.
    Related source file is ../test/test.v.
Unit <test_mux> synthesized.


=========================================================================
*                       Advanced HDL Synthesis
*
=========================================================================

Advanced RAM inference ...
Advanced multiplier inference ...
Advanced Registered AddSub inference ...
Dynamic shift register inference ...

=========================================================================
HDL Synthesis Report

Found no macro
=========================================================================

=========================================================================
*                         Low Level Synthesis
*
=========================================================================

Optimizing unit <test_mux> ...

***### Program stoped the processor loading here###***

"vssumesh" <vssumesh_asic@yahoo.com> schrieb im Newsbeitrag
news:1128076170.901183.12350@z14g2000cwz.googlegroups.com...
> Hi all, > I am developing a hardware in which I need large size MUX. I need a > 240 to 1 byte multiplexer. I tried to code it but observed the > following problems.
you better think in terms of luts, and code an hierarchial tree, that should sysnthesize without anyproblem. We have defenetly synthesized way wider MUXes Antti
Ok .. but is it easy to simulate? And if we code it in a hierarchial
tree will it take more area than required. Please give little more
details in this.

"vssumesh" <vssumesh_asic@yahoo.com> schrieb im Newsbeitrag
news:1128080413.197839.108730@g47g2000cwa.googlegroups.com...
> Ok .. but is it easy to simulate? And if we code it in a hierarchial > tree will it take more area than required. Please give little more > details in this. >
one slice (2 LUTs + MUX) can implement 4:1 mux so you mux down by 4, than again by 4 as much as needed 256 to 1 MUX: if you take 256 signal then 1 LUT level reduces it to 64 (64 slices) the second to 16 (16slices) the 3rd to 4 (4 slices) and the last to 1 signal (1 slice) ==85 slices this is the smallest LUT based mux whatever you write in HDL the same amount of LUTs is required Antti
vssumesh wrote:
> Ok .. but is it easy to simulate? And if we code it in a hierarchial > tree will it take more area than required. Please give little more > details in this.
Also try to think about whether you really need a random accessible mux in your case. For example if you allways need the inputs in the same order you can load all of them into a shift register and shift them out. Kolja Sulimma
Kolja Sulimma wrote:

>vssumesh wrote: > > >>Ok .. but is it easy to simulate? And if we code it in a hierarchial >>tree will it take more area than required. Please give little more >>details in this. >> >> >Also try to think about whether you really need a random accessible mux >in your case. For example if you allways need the inputs in the same >order you can load all of them into a shift register and shift them out. > >Kolja Sulimma > >
you can get better pipelined performance by decoding the selects before the first level then combining the first level outputs in an OR tree. You can also use the carry chains, or if using virtexII the horizontal or chains with this method to help reduce the size of the logic. This is for a random selection sequence. As Kolja said,, a shift register might be a better choice if you can constrain the selection order. If it is to read back registers that you've written into a design, you can use a block ram as a shadow for the registers and read back the block RAM. Finally, if you can afford the latency, you can get better place and route results by going with a linear structure. -- --Ray Andraka, P.E. President, the Andraka Consulting Group, Inc. 401/884-7930 Fax 401/884-7950 email ray@andraka.com http://www.andraka.com "They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759
I dont fully understand what you are suggesting. But it seems to me
that you are advicing a pipelined operation. But that is not possible
in the design. It is a completely random MUX.
The task is to take data from a 240 byte register and to arrange that
into a 64 byte wide data bus (simultanious)(each output byte can take
data from any of the 240 registers). And the selection bits are direct
to each mux. That is 240 bit selction lines into each MUX. I tried to
implement it with the LUT but it gave the same result. I am ready to
wait for days but the ISE is simply giving up. If i reduce the output
by 32 it is giving the output.
>" You can also use the carry chains, or if using virtexII the horizontal >or chains with this method to help reduce the size of the logic."
Please give me little more details on this. I tried to to implement normal ANDing and then ORed all the bits. Sumesh
so what your really saying ...

Is you want to make  64 x 8 x (240:1) mux  or 122,880 combinations...
What was suggested isn't that hard to implement.. and it isn't pipelined
either.. pipelining assumes a clock and one level per stage.. so a 4 stage
pipeline creates a 4 clock delay... and there is no clock.

so thats 240x8 + 64x8 + (8x6) pins / signals ...
If you think about what you are asking for.. you would see its a might
rediculas!  even assuming you have just the 240 bytes .. thats 1,920 signals
all on its own!

Please look at your design and maybe comeup with something a might more
sensable.

Simon

"vssumesh" <vssumesh_asic@yahoo.com> wrote in message
news:1128316000.075586.204600@g47g2000cwa.googlegroups.com...
> I dont fully understand what you are suggesting. But it seems to me > that you are advicing a pipelined operation. But that is not possible > in the design. It is a completely random MUX. > The task is to take data from a 240 byte register and to arrange that > into a 64 byte wide data bus (simultanious)(each output byte can take > data from any of the 240 registers). And the selection bits are direct > to each mux. That is 240 bit selction lines into each MUX. I tried to > implement it with the LUT but it gave the same result. I am ready to > wait for days but the ISE is simply giving up. If i reduce the output > by 32 it is giving the output. > >" You can also use the carry chains, or if using virtexII the horizontal > >or chains with this method to help reduce the size of the logic." > Please give me little more details on this. I tried to to implement > normal ANDing and then ORed all the bits. > Sumesh
P.S. package mux240_1_pkg is type byte240_typ is array (0 to 239) of std_logic_vector(7 downto 0); type byte64_typ is array (0 to 63) of std_logic_vector(7 downto 0); type int240_typ is array (0 to 63) of integer range 0 to 239; end mux240_1_pkg; entity mux240_1_byte is Port ( din: in byte240_typ; dout: out byte64_typ; sel: in int240_typ ); end mux240_1_byte; library work; use work.mux240_1_pkg.all; architecture rtl of mux240_1_byte is begin gen: for i in dout'range generate dout(i) <= din(sel(i)); end generate; end rtl;
Hello Simon,
   Yes i am trying to implement the 64 nos of  8bit wide (240:1) mux.
And there is 240 * 64   = 15360 total selction bits (240 bits to each
mux). And 240 * 8 = 1920 data bits to whole block of 64 muxs (same data
goes to all MUX).  Thus the mux array block will have 17280 input lines
and 64 * 8 output lines. Why you are saying that it is not possible.
All signals are internally generated from other parametrs (I dont know
the internal routing efforts of the FPGA). Please advice.
  The mux (the code) you suggested is a single 240 to 1 byte mux. But i
want 64 copies of that. Is that possible. I know that it is not
possible to implement it in asingle design by getting the selction
signal from external sources; is it because of this constrain that the
ISE stops working. I am able to get output if i reduce any of the
parametrs to half (no: out put or no: registers etc).

That's where my snippet is different.. the "for generate " will repeat that
mux 64 times for you :-)
nice and simple isn't it ???

The problem is you have to think of the resources.. I don't know exactly..
but the number of loads on any CLB are finite.. I doubt they are 64... so
the whole thing gets repeated multiple times as you are talking 8x240
outputs you will chew up resources horribly fast.

The Next problem is Xilinexs as with all FPGA's are a compromise... the 1 M
gate quote is based upon designs which are synchronous.. and yours isn't..
that makes a huge mux very inefficient and not what the tools are designed
to cope with.

The best bet would be to rethink.. possibly use the idea of shifting the
data into a dual port ram.. and using the second port of the ram as the
output of the mux...  it does mean your design ends up pipelined.. but you
will be struggling to do it some other way.

The other solution is to put down 4 FPGA's

Simon

"vssumesh" <vssumesh_asic@yahoo.com> wrote in message
news:1128328159.468110.157190@f14g2000cwb.googlegroups.com...
> Hello Simon, > Yes i am trying to implement the 64 nos of 8bit wide (240:1) mux. > And there is 240 * 64 = 15360 total selction bits (240 bits to each > mux). And 240 * 8 = 1920 data bits to whole block of 64 muxs (same data > goes to all MUX). Thus the mux array block will have 17280 input lines > and 64 * 8 output lines. Why you are saying that it is not possible. > All signals are internally generated from other parametrs (I dont know > the internal routing efforts of the FPGA). Please advice. > The mux (the code) you suggested is a single 240 to 1 byte mux. But i > want 64 copies of that. Is that possible. I know that it is not > possible to implement it in asingle design by getting the selction > signal from external sources; is it because of this constrain that the > ISE stops working. I am able to get output if i reduce any of the > parametrs to half (no: out put or no: registers etc). >