comp.arch.fpga | 16->5 "Sort"

I'm trying to design a circuit (Virtex-7) which you might call either a pri=
ority encoder or a sorter.  This is what it should do:

<i>Given a 16-bit vector with 5 bits set, create a list of 5 4-bit encoded =
values of each bit set.  These needn't be in order.</i>

This turns out to be a lot harder than I thought.  Writing the behavioral R=
TL isn't hard, but Vivado synthesizes it to 16 levels of logic, and when I =
draw out an optimized version, I still get at least 5 levels (using 6-input=
 LUTs).  I'd like to do it with minimal latency, but I can only do about 3 =
levels of logic at my clock speed.  I've tried thinking about how I can use=
 the carry chain muxes but they don't seem to be helpful.  I can do a leadi=
ng-ones detector and encode the leading 1, but I'd probably have to pipelin=
e each stage so that would take 5 cycles.

Reply by rickman ●May 11, 20152015-05-11

On 5/11/2015 9:27 PM, Kevin Neilson wrote:
> I'm trying to design a circuit (Virtex-7) which you might call either
> a priority encoder or a sorter.  This is what it should do:
>
> <i>Given a 16-bit vector with 5 bits set, create a list of 5 4-bit
> encoded values of each bit set.  These needn't be in order.</i>
>
> This turns out to be a lot harder than I thought.  Writing the
> behavioral RTL isn't hard, but Vivado synthesizes it to 16 levels of
> logic, and when I draw out an optimized version, I still get at least
> 5 levels (using 6-input LUTs).  I'd like to do it with minimal
> latency, but I can only do about 3 levels of logic at my clock speed.
> I've tried thinking about how I can use the carry chain muxes but
> they don't seem to be helpful.  I can do a leading-ones detector and
> encode the leading 1, but I'd probably have to pipeline each stage so
> that would take 5 cycles.

I'm not sure what you mean by "list".  Logic circuits don't normally use 
lists.  You want to know which five lines are set.   It is easy to 
encode that into four bit values.  But once you have the values what 
exactly do you want to do with them to provide them in a "list"?

If you want them  accessible on the same four data lines in sequence, 
what determines the timing of the sequence?  What exactly is your 
interface?

-- 

Rick

Reply by Kevin Neilson ●May 11, 20152015-05-11

The list is a 20-bit vector comprising the 5 4-bit values.  This is put into a 20-bit wide FIFO; each of the 5 values will be processed simultaneously when read from the FIFO.  So another way you could describe this is a 16-bit/5-hot -> 20-bit encoder.

Reply by Kevin Neilson ●May 11, 20152015-05-11

To be clearer, here's an example.

Input (bit 15 on left):

1 1 0 0  0 0 0 0  0 0 0 0  0 1 1 1 

20-bit Output List:
1111 1110 0010 0001 0000

Reply by rickman ●May 12, 20152015-05-12

On 5/11/2015 10:26 PM, Kevin Neilson wrote:
> To be clearer, here's an example.
>
> Input (bit 15 on left):
>
> 1 1 0 0  0 0 0 0  0 0 0 0  0 1 1 1
>
> 20-bit Output List:
> 1111 1110 0010 0001 0000

I am old school so I have to picture these things....

I see five priority encoders.  The first priority encoder output is used 
to disable that input to the second priority encoder, etc.

How did you code it?  I can see how this would be a lot more than three 
levels of logic.  The equations get quite long.  When you talk about 
levels of logic I assume you mean layers of LUTs?  I can see maybe each 
16 input priority encoder being no more than 3 levels of LUTs, but all 
five layers with the inhibit logic... I don't think so.  I expect the 
tools did a pretty good job of optimizing it and I don't easily see any 
way of using carry chains.

-- 

Rick

Reply by glen herrmannsfeldt ●May 12, 20152015-05-12

rickman <gnuarm@gmail.com> wrote:

(snip)
>> Input (bit 15 on left):

>> 1 1 0 0  0 0 0 0  0 0 0 0  0 1 1 1

>> 20-bit Output List:
>> 1111 1110 0010 0001 0000

> I am old school so I have to picture these things....

> I see five priority encoders.  The first priority encoder output is used 
> to disable that input to the second priority encoder, etc.

> How did you code it?  I can see how this would be a lot more than three 
> levels of logic.  The equations get quite long.  When you talk about 
> levels of logic I assume you mean layers of LUTs?  I can see maybe each 
> 16 input priority encoder being no more than 3 levels of LUTs, but all 
> five layers with the inhibit logic... I don't think so.  I expect the 
> tools did a pretty good job of optimizing it and I don't easily see any 
> way of using carry chains.

It is slightly easier than that. 

(Sometime ago I did this with 36 and 3.)

Consider the case of only two bits high. Use one priority encoder
the usual way, and one upside down. (That is, the highest and lowest
set bit.) Now, as you said, subtract off the highest and lowest, and
two more priority encoders, and finally subtract one more and the
last one. 

But first I would add the logic to determine that only 
(or at least) five were set.

I suspect that, as the OP noted, it is combinatorial hard.
Consider the logic needed to, separately, compute each bit
of the result. Even just the first one.

It does, however, pipeline very well.  In the one I was working on,
it had a fairly fast clock (66MHz) but I could stand some levels
of latency.   The OP claims the need for low latency.

-- glen

Reply by Rob Doyle ●May 12, 20152015-05-12

On 5/11/2015 7:26 PM, Kevin Neilson wrote:
> To be clearer, here's an example.
>
> Input (bit 15 on left):
>
> 1 1 0 0  0 0 0 0  0 0 0 0  0 1 1 1
>
> 20-bit Output List:
> 1111 1110 0010 0001 0000
>

64K x 20 ROM?

How badly do you want one level of logic?

Rob.

Reply by GaborSzakacs ●May 12, 20152015-05-12

Rob Doyle wrote:
> On 5/11/2015 7:26 PM, Kevin Neilson wrote:
>> To be clearer, here's an example.
>>
>> Input (bit 15 on left):
>>
>> 1 1 0 0  0 0 0 0  0 0 0 0  0 1 1 1
>>
>> 20-bit Output List:
>> 1111 1110 0010 0001 0000
>>
> 
> 64K x 20 ROM?
> 
> How badly do you want one level of logic?
> 
> Rob.

Two ROMs, 256 by 23 and 256 by 20.  The first encodes the
lower 8 bits into 0 to 5 4-bit numbers right-justified plus
a 3-bit indication of how many bits were set.  The second
just encodes 0 to 5 4-bit numbers, left justified (you assume
it has the remaining 1's not covered by the low 8 bits.  Then
one more level to select how many bits to take from each ROM
based on the number of ones in the lower 8 bits (each output
bit is only a 2:1 mux in this case - that's why the ROMs
are justified right and left as noted).

-- 
Gabor

Reply by glen herrmannsfeldt ●May 12, 20152015-05-12

GaborSzakacs <gabor@alacron.com> wrote:
> Rob Doyle wrote:
>> On 5/11/2015 7:26 PM, Kevin Neilson wrote:
>>> To be clearer, here's an example.

>>> Input (bit 15 on left):

>>> 1 1 0 0  0 0 0 0  0 0 0 0  0 1 1 1

>>> 20-bit Output List:
>>> 1111 1110 0010 0001 0000

>> 64K x 20 ROM?

Hmm, that probably does work. Especially with the BRAM in many FPGAs.

When I did it some years ago, it was three out of 36 bits, and ROMs
weren't, and still aren't, that big.

>> How badly do you want one level of logic?
 
> Two ROMs, 256 by 23 and 256 by 20.  The first encodes the
> lower 8 bits into 0 to 5 4-bit numbers right-justified plus
> a 3-bit indication of how many bits were set.  The second
> just encodes 0 to 5 4-bit numbers, left justified (you assume
> it has the remaining 1's not covered by the low 8 bits.  Then
> one more level to select how many bits to take from each ROM
> based on the number of ones in the lower 8 bits (each output
> bit is only a 2:1 mux in this case - that's why the ROMs
> are justified right and left as noted).

If not in BRAM, that seems a good way. 

How many levels of logic is a 256 bit ROM in FPGA LUTs?

-- glen

Reply by rickman ●May 12, 20152015-05-12

On 5/12/2015 11:54 AM, glen herrmannsfeldt wrote:
> GaborSzakacs <gabor@alacron.com> wrote:
>> Rob Doyle wrote:
>>> On 5/11/2015 7:26 PM, Kevin Neilson wrote:
>>>> To be clearer, here's an example.
>
>>>> Input (bit 15 on left):
>
>>>> 1 1 0 0  0 0 0 0  0 0 0 0  0 1 1 1
>
>>>> 20-bit Output List:
>>>> 1111 1110 0010 0001 0000
>
>>> 64K x 20 ROM?
>
> Hmm, that probably does work. Especially with the BRAM in many FPGAs.
>
> When I did it some years ago, it was three out of 36 bits, and ROMs
> weren't, and still aren't, that big.
>
>>> How badly do you want one level of logic?
>
>> Two ROMs, 256 by 23 and 256 by 20.  The first encodes the
>> lower 8 bits into 0 to 5 4-bit numbers right-justified plus
>> a 3-bit indication of how many bits were set.  The second
>> just encodes 0 to 5 4-bit numbers, left justified (you assume
>> it has the remaining 1's not covered by the low 8 bits.  Then
>> one more level to select how many bits to take from each ROM
>> based on the number of ones in the lower 8 bits (each output
>> bit is only a 2:1 mux in this case - that's why the ROMs
>> are justified right and left as noted).
>
> If not in BRAM, that seems a good way.
>
> How many levels of logic is a 256 bit ROM in FPGA LUTs?

Like most things, "it depends".  The older devices have 16 bits per LUT, 
if any, and many of the newer devices have 64 bits per LUT.

Nearly all devices have BRAMs so it's a no brainer by the split table 
method.  Only trouble is BRAMs use a clock cycle...  I'm just sayin'...

-- 

Rick

Previous12 3 Next

16->5 "Sort"

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Quick Links

About FPGARelated.com

Social Networks

The Related Media Group