FPGARelated.com

Computing Fixed-Point Square Roots and Their Reciprocals Using Goldschmidt Algorithm

Michael MorrisMichael Morris June 14, 202010 comments

Michael Morris presents a practical, FPGA-friendly fixed-point implementation of the Goldschmidt algorithm to compute sqrt and 1/sqrt. The post shows how an msb-indexed Y_est table and an N_adj scaling factor produce a reliable initial inverse-square-root estimate for an FP32B16 format, enabling five-iteration convergence. It also covers fixed-point normalization, multiplier/shift tradeoffs, and why this fits a real-time motion-controller use case.


Use DPLL to Lock Digital Oscillator to 1PPS Signal

Michael MorrisMichael Morris July 24, 20168 comments

Michael Morris demonstrates a practical DPLL that locks a Direct Digital Synthesizer to a GPS 1PPS signal, achieving sub-microsecond alignment and removing reference-oscillator frequency error. The design uses a Phase-Frequency Detector for 0 degree phase lock, a multiplier-free α-filter, and a limiter to prevent saturation, and includes coast and re-lock logic plus a synthesizable Verilog reference core.


Inside the Spartan-6: Using LUTs to optimize circuits

Victor YurkovskyVictor Yurkovsky June 24, 20153 comments

Victor Yurkovsky hit poor synthesis packing while building a J1 CPU on Spartan-6 and traced the problem to an 18-bit logic ALU that mapped to many slices. He demonstrates a practical fix: instantiate LUT6 primitives with carefully chosen INIT values, then use RLOC placement to stack the per-bit LUTs and collapse the design down to five slices. This is a hands-on guide to Xilinx-specific optimization when synthesis falls short.


Makefiles for Xilinx Tools

Victor YurkovskyVictor Yurkovsky May 12, 20155 comments

Building a bitstream from HDL is messy, and Victor Yurkovsky lays out a minimal, practical makefile workflow for Xilinx ISE and XST. He shows a simple project layout, techniques to tame ISE's generated logs and temps, and a ready-to-clone repo; an LED blinker example builds to bitstream in under 20 seconds on his machine. Use it as a pragmatic starting point for command-line FPGA builds.


Fit Sixteen (or more) Asynchronous Serial Receivers into the Area of a Standard UART Receiver

Michael MorrisMichael Morris March 29, 20155 comments

Michael Morris shows how to pack many asynchronous serial receivers into the area of a single UART by treating FPGA LUTs as writable storage and sharing logic. Using a 4-bit channel counter, microprogrammed state machine, and time-multiplexed baud/sample resources, he fits 16 receive channels (12 used for Caller ID) into a Spartan II XC2S30 and explains input synchronization, filtering, and the multi-channel FIFO approach.


I don’t often convert VHDL to Verilog but when I do ...

Christopher FeltonChristopher Felton December 24, 20142 comments

Converting VHDL to Verilog is tedious, and Christopher Felton lays out a pragmatic, repeatable workflow using vhd2vl to do most of the heavy lifting. He walks through the iterate-run-comment-fix cycle, highlights frequent failure points like arrays, records and packages, and explains why many open-source projects favor Verilog for better FOSS simulator support.


MyHDL Interface Example

Christopher FeltonChristopher Felton January 18, 20142 comments

Christopher Felton shows how MyHDL 0.9 interfaces bundle Signals into a single bus object to cut connector clutter and simplify module connections. The post walks through a pedagogical example where button presses drive a memory-mapped BareBoneBus read-modify-write that inverts LEDs, with a TDD-style testbench and notes on converting to Verilog/VHDL and loading the example on supported boards.


BGA and QFP at Home 1 - A Practical Guide.

Victor YurkovskyVictor Yurkovsky October 13, 20134 comments

It's a myth that BGAs and fine-pitch QFPs can't be soldered at home. Victor Yurkovsky lays out a practical, no-frills approach for hobbyists to design and assemble FPGA boards using 2-layer PCBs, breakout modules, and low-cost reflow methods like toaster ovens or hotplates. The article focuses on manufacturable PCB choices, netlist-driven workflows, and power/decoupling tactics that make high-density parts approachable for amateurs.


Shared-multiplier polyphase FIR filter

Markus NentwigMarkus Nentwig July 31, 20137 comments

One multiplier and a dual-port RAM can implement an arbitrary m/n polyphase FIR resampler on an FPGA, Markus Nentwig demonstrates. The post focuses on practical implementation details, including a parametrized Verilog design, pipelined MAC control, and a Matlab testbench for verification. It shows how bank indexing and pipeline delay compensation let you multiplex many coefficient banks efficiently for resource-constrained FPGA designs.


VGA Output in 7 Slices. Really.

Victor YurkovskyVictor Yurkovsky September 25, 20122 comments

Victor Yurkovsky shows how to generate VGA timing on a Xilinx Spartan3 using clever SRL16 tricks to squeeze the generator into just a few slices. By using 32-bit SRLs for line pulses, two mutually prime SRL lengths as a divide-by-99 timebase, and tapped SRLs to combine HSYNC and HBLANK, the approach achieves accurate-enough horizontal and vertical timing with minimal LUT usage.


BGA and QFP at Home 1 - A Practical Guide.

Victor YurkovskyVictor Yurkovsky October 13, 20134 comments

It's a myth that BGAs and fine-pitch QFPs can't be soldered at home. Victor Yurkovsky lays out a practical, no-frills approach for hobbyists to design and assemble FPGA boards using 2-layer PCBs, breakout modules, and low-cost reflow methods like toaster ovens or hotplates. The article focuses on manufacturable PCB choices, netlist-driven workflows, and power/decoupling tactics that make high-density parts approachable for amateurs.


Use DPLL to Lock Digital Oscillator to 1PPS Signal

Michael MorrisMichael Morris July 24, 20168 comments

Michael Morris demonstrates a practical DPLL that locks a Direct Digital Synthesizer to a GPS 1PPS signal, achieving sub-microsecond alignment and removing reference-oscillator frequency error. The design uses a Phase-Frequency Detector for 0 degree phase lock, a multiplier-free α-filter, and a limiter to prevent saturation, and includes coast and re-lock logic plus a synthesizable Verilog reference core.


I don’t often convert VHDL to Verilog but when I do ...

Christopher FeltonChristopher Felton December 24, 20142 comments

Converting VHDL to Verilog is tedious, and Christopher Felton lays out a pragmatic, repeatable workflow using vhd2vl to do most of the heavy lifting. He walks through the iterate-run-comment-fix cycle, highlights frequent failure points like arrays, records and packages, and explains why many open-source projects favor Verilog for better FOSS simulator support.


An Editor for HDLs

Dave VandenboutDave Vandenbout July 17, 201211 comments

If you prefer Notepad++ over Emacs, Dave Vandenbout shows how to turn it into a capable HDL editor using templates, a Perl package generator, and Emacs run in batch mode for beautification. He covers FingerText snippets for VHDL skeletons, binding a Perl script to auto-create/update package component declarations, and invoking Emacs from a hotkey to format files with one keystroke.


Computing Fixed-Point Square Roots and Their Reciprocals Using Goldschmidt Algorithm

Michael MorrisMichael Morris June 14, 202010 comments

Michael Morris presents a practical, FPGA-friendly fixed-point implementation of the Goldschmidt algorithm to compute sqrt and 1/sqrt. The post shows how an msb-indexed Y_est table and an N_adj scaling factor produce a reliable initial inverse-square-root estimate for an FP32B16 format, enabling five-iteration convergence. It also covers fixed-point normalization, multiplier/shift tradeoffs, and why this fits a real-time motion-controller use case.


Developing FPGA-DSP IP with Python

Christopher FeltonChristopher Felton March 16, 20101 comment

Designing FPGA-DSP IP entirely in Python is practical and productive, as Christopher Felton demonstrates using MyHDL. He shows how numpy and scipy handle the signal design while a SIIR class generates RTL, enables side-by-side floating-point and HDL simulation, and converts to Verilog for synthesis. The post includes Xilinx XC3S500E resource results and a link to the SIIR source on BitBucket, making it easy to try the workflow.


Shared-multiplier polyphase FIR filter

Markus NentwigMarkus Nentwig July 31, 20137 comments

One multiplier and a dual-port RAM can implement an arbitrary m/n polyphase FIR resampler on an FPGA, Markus Nentwig demonstrates. The post focuses on practical implementation details, including a parametrized Verilog design, pipelined MAC control, and a Matlab testbench for verification. It shows how bank indexing and pipeline delay compensation let you multiplex many coefficient banks efficiently for resource-constrained FPGA designs.


VGA Output in 7 Slices. Really.

Victor YurkovskyVictor Yurkovsky September 25, 20122 comments

Victor Yurkovsky shows how to generate VGA timing on a Xilinx Spartan3 using clever SRL16 tricks to squeeze the generator into just a few slices. By using 32-bit SRLs for line pulses, two mutually prime SRL lengths as a divide-by-99 timebase, and tapped SRLs to combine HSYNC and HBLANK, the approach achieves accurate-enough horizontal and vertical timing with minimal LUT usage.


Inside the Spartan-6: Using LUTs to optimize circuits

Victor YurkovskyVictor Yurkovsky June 24, 20153 comments

Victor Yurkovsky hit poor synthesis packing while building a J1 CPU on Spartan-6 and traced the problem to an 18-bit logic ALU that mapped to many slices. He demonstrates a practical fix: instantiate LUT6 primitives with carefully chosen INIT values, then use RLOC placement to stack the per-bit LUTs and collapse the design down to five slices. This is a hands-on guide to Xilinx-specific optimization when synthesis falls short.


Fit Sixteen (or more) Asynchronous Serial Receivers into the Area of a Standard UART Receiver

Michael MorrisMichael Morris March 29, 20155 comments

Michael Morris shows how to pack many asynchronous serial receivers into the area of a single UART by treating FPGA LUTs as writable storage and sharing logic. Using a 4-bit channel counter, microprogrammed state machine, and time-multiplexed baud/sample resources, he fits 16 receive channels (12 used for Caller ID) into a Spartan II XC2S30 and explains input synchronization, filtering, and the multi-channel FIFO approach.