Reply by Hal Murray December 3, 20052005-12-03
>>It's an asynchronous signal going into your state machine. >>All the classic things can go wrong. The complicated one is >>metastability. The simple one is that it meets setup for >>some parts of your FSM but not for others. > >Ah ok, I forgot that the signal can be used at some different places >with different setup times. >Thanks, I hope everything is clear now.
Even if it only goes to one place, you still have to consider Metastability. -- The mail server is located in California. So are all my other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited commercial e-mail to my address or any of my other addresses. These are my opinions, not necessarily my employer's. I hate spam.
Reply by Michael Dreschmann November 30, 20052005-11-30

>It's an asynchronous signal going into your state machine. >All the classic things can go wrong. The complicated one is >metastability. The simple one is that it meets setup for >some parts of your FSM but not for others.
Ah ok, I forgot that the signal can be used at some different places with different setup times. Thanks, I hope everything is clear now. Michael
Reply by Hal Murray November 30, 20052005-11-30
>If I connect the writing system to this async falling edge signal >directly what would be the problem? An "again availabe fifo" possible >would be detected one clock earlier but this shouldn't be a problem, >because the effect (falling edge of FULL) is still behind the cause >(read pointer was increased). Where is my error?
It's an asynchronous signal going into your state machine. All the classic things can go wrong. The complicated one is metastability. The simple one is that it meets setup for some parts of your FSM but not for others. -- The mail server is located in California. So are all my other mailboxes. Please do not send unsolicited bulk e-mail or unsolicited commercial e-mail to my address or any of my other addresses. These are my opinions, not necessarily my employer's. I hate spam.
Reply by Peter Alfke November 29, 20052005-11-29
 FULL is a control signal for the writing (and EMPTYis a control signal
for the reading).
As I said, the leading edges are naturally derived from the appropriate
clock, and are thus synchronous.
The falling edges are caused by the "wrong" clock, and can thushave any
weird phase relationship with the important clock.
You do not want the FULL flag go away in an asynchronous way, since
that might "confuse" the write logic, whether it can or cannot write at
this moment.
And the trailing edge of EMPTY should clearly communicate with the read
logic, in an unambiguous way.  These flags must be interpreted
correctly for many millions of times, any ambiguity will bite you,
sooner or later. Usually in the worst way (Murphy's Law).
Peter Alfke

Reply by Michael Dreschmann November 29, 20052005-11-29
Hello Peter,

I found your appnote about async fifos:
Very interesting and helpful, but one thing is not quite clear to me.
You write:
"FULL is of interest only to the write logic, where it must stop
further writes. FULL goes active as a result of the last write
operation, which makes the rising edge of FULL a synchronous signal,
[...] Therefore, we only need to synchronize the falling edge of
Then you describe how to make the falling edge synchronous to the
write clock. But why is this necessary?
If I connect the writing system to this async falling edge signal
directly what would be the problem? An "again availabe fifo" possible
would be detected one clock earlier but this shouldn't be a problem,
because the effect (falling edge of FULL) is still behind the cause
(read pointer was increased). Where is my error?

Reply by Peter Alfke November 28, 20052005-11-28
Some words of wisdom from an old FIFO designer:

In a FIFO, you can use any addressing scheme you want, binary, LFSR,
Gray, or whatever, as long as the write logic agrees with the read

If you want to compare two asynchronous values for identity, make sure
they advance in a Gray fashion (i.e. only one bit changes). Otherwise
you will see glitches at the output of your dentity comparator.
If you Gray-code an incrementing or decrementing binary counter, only
one bit will change. But if you use that encoding for anything else (
e.g. for "jumpy" binary values), the output will most likely change
several bits per transition, and the Gray advantage is not there.

The best way to convert from binary to Gray is to XOR the D-inputs (not
the outputs) of adjacent binary counter bits, and register the XOR
output. That keeps the two representations always in synch.
And, as we know, "Gray" is the inventor's name, and is spelled with an
"a". ( I misspelled it once, and never again!)
Peter Alfke

Reply by C. G. November 28, 20052005-11-28

I would synchonise the gray output of each domain first using two 
register stages in series and apply the gray to binary conversion AFTER 
the synchronisation. This will give you a glitch safety good enough for 
any industial environment. If the synchronisation does glitch, then only 
in the single bit which the receiving clock domain might perceive as 
changing. (If you synchronise the converted value, the receiving clock 
domain could observe multiple simultaneous changes)

Something like the VHDL pseudo-code below:

signal count_clka..
signal count_clkb..
signal counta_sync1_clkb, counta_sync2_clkb..
signal countb_sync1_clka, countb_sync2_clka..

     -- synchronising clka to the clkb domain
process (clkb, rst_n)
   variable v_counta_sync_binary...
   variable v_countb_clkb_binary...
   if (clkb'event and (clkb = '1')) then
         -- convert the local counter
     v_countb_clkb_binary := f_gray_to_bin(count_clkb);
         -- convert the synchronised remote counter
     v_counta_sync_binary := f_gray_to_bin(counta_sync2_clkb);

       -- do the comparison stuff between v_countb_clkb_binary and
       -- v_counta_sync_binary
       -- e.g. setting up the new values for full/empty etc.
  		. . . .

       -- update the synchronising stages		
     counta_sync1_clkb <= count_clka;
     counta_sync2_clkb <= counta_sync1_clkb;
	. . . .
end process;

-- and of course a similar process for the other direction
Reply by Michael Dreschmann November 28, 20052005-11-28

thanks for your code. I think I've found a solution:
The read and write pointers will be implemented in gray code. Then
I'll decode them to binary code and multiply by 18. The multiplication
should be a simple addition, so there is no big resource using. (*18 =
*16 + *2)
But I've a last question:
If I compare the two gray coded pointers, no glitch can appear on the
full or empty signals? Or do I have to consider something else?

Reply by Charles, NG November 28, 20052005-11-28

1) A gray-to-binary code that I picked up on the web years ago is

     function f_gray_to_bin(vval : std_logic_vector) return 
std_logic_vector is
         variable i              : integer;
         variable v_accumulate   : std_logic_vector(vval'left  downto 
         variable v_par1         : std_logic_vector(vval'length - 1 
downto 0);

         v_par1 := vval;
         v_accumulate(v_par1'left) := v_par1(v_par1'left);
         for i in v_par1'left -1 downto 0 loop
             v_accumulate(i) := v_par1(i) xor v_accumulate(i + 1);
         end loop;
         return v_accumulate;
     end f_gray_to_bin;
I can't find the document I derived it from on my laptop, I still should 
have it on a CD somewhere. If I get a chance to look I'll send it to you.

2) Seems a difficult one at first glance.
Would it be feasable
a) just to store the base addresses on the FIFO (since the packet or 
whatever always seems to be 18 bytes)

b) use semaphores to indicate whether memory blocks are empty/valid/read
Reply by Michael Dreschmann November 27, 20052005-11-27

I'm designing an "On Chip Network" System consisting of one network
master and several network interfaces. Every interface is connected to
a prozessor (picoblaze) which can transmit and receive data onto/from
the network.
Now I'd like to have two completely independent clocks on the network
and on the prozessor bus site. In the network interface I use two fifos
(rx and tx) implemented in a signle bram to exchange data between
network and prozessor site. So I think an asynchronous fifo should solve
the problem.
But I'm not sure how to implement such a fifo exactly. I found several sites
recommending gray coded read and write pointer, but I couldn't find infos
how to implement a gray code to binary code converter to address the
block ram and how to increase the gray coded pointers.

Also my fifo is a little bit special:

I've implemented two fifos in a single bram but I don't think that is relevant
here because read and write pointers are present for each fifo. The final
bram address is calculated for each site (network/prozessor) from the read
or write pointer depending on which fifo is addressed (= if the site is reading or writing).

The second point is the read and write pointers points to a block base
address. Each block is 18 bytes long. Each site can random access the
18 bytes in the active block by an address input:
ram_address = read/writepointer * 18 + local_address
(local_address < 18)
Because I want to avoid the *18 multiplication I increase the read/write
pointers by 18 with any block_release or block_store command. The release
command is synchron to the reading site, the store command is synchron to
the writing site. Both are active for on clock period.

The fifo empty and full signals are generated by comparing write and read
empty: write = read
full: (write+1) = read
I know that I waste one fifo entry in this way, but that is acceptable.

My problem is now:

1. How do I implement gray code pointers and how do I convert them to a
binary coded pointer that can act as base for the local_address addition?

2. Is there any way to add 18 instead of 1 to a gray code pointer without
losing his "hamming distance 1" characteristic to avoid a *18 multiplication?

I'v added the address logic from my fifo here, perhaps it helps.



function inc_block(arg: std_logic_vector) return std_logic_vector is
	if arg < "1111011110" then -- increase arg by 1 in a loop from 0 - 55
		return arg + "0000010010";
		return "0000000000";
	end if;

-- calculate fifo a/b, in/out addresses:
addr_ram_in_a <= block_base_addr_in_a + addr_in_a;
addr_ram_out_a <= block_base_addr_out_a + addr_out_a;
addr_ram_in_b <= block_base_addr_in_b + addr_in_b;
addr_ram_out_b <= block_base_addr_out_b + addr_out_b;

-- use "wr" to generate absolute port addresses:
addr_pa(10) <= wr_b;
addr_pa(9 downto 0) <= addr_ram_in_b when (wr_b = '1') else addr_ram_out_a;
addr_pb(10) <= not wr_a;
addr_pb(9 downto 0) <= addr_ram_in_a when (wr_a = '1') else addr_ram_out_b;

-- fifo states
buffer_empty_a_int <= '1' when block_base_addr_in_a = block_base_addr_out_a else '0';
buffer_full_a_int <= '1' when inc_block(block_base_addr_in_a) = block_base_addr_out_a else '0';
buffer_empty_b_int <= '1' when block_base_addr_in_b = block_base_addr_out_b else '0';
buffer_full_b_int <= '1' when inc_block(block_base_addr_in_b) = block_base_addr_out_b else '0';

-- fifo counters and controllogic
process (clk)
	if (clk'event and clk = '1') then
		if reset = '1' then
			block_base_addr_in_a <= "0000000000";
			block_base_addr_out_a <= "0000000000";
			block_base_addr_in_b <= "0000000000";
			block_base_addr_out_b <= "0000000000";
			-- fifo a control
			if (clear_buffer_a = '1') then						-- reset writepointer and readpointer if clear_buffer is active
				block_base_addr_in_a <= "0000000000";
				block_base_addr_out_a <= "0000000000";
				if (save_block_a = '1') and (buffer_full_a_int = '0') then		-- increase writepointer if save_bock is active and buffer is not full
					block_base_addr_in_a <= inc_block(block_base_addr_in_a);
				end if;
				if (release_block_a = '1') and (buffer_empty_a_int = '0') then	-- increase readpointer if release_bock is active and buffer is not empty
					block_base_addr_out_a <= inc_block(block_base_addr_out_a);
				end if;
			end if;
			-- fifo b control
			if (clear_buffer_b = '1') then						-- reset writepointer and readpointer if clear_buffer is active
				block_base_addr_in_b <= "0000000000";
				block_base_addr_out_b <= "0000000000";
				if (save_block_b = '1') and (buffer_full_b_int = '0') then		-- increase writepointer if save_bock is active and buffer is not full
					block_base_addr_in_b <= inc_block(block_base_addr_in_b);
				end if;
				if (release_block_b = '1') and (buffer_empty_b_int = '0') then	-- increase readpointer if release_bock is active and buffer is not empty
					block_base_addr_out_b <= inc_block(block_base_addr_out_b);
				end if;
			end if;
		end if;	-- reset state
	end if;	-- clk'event
end process;