Computing Fixed-Point Square Roots and Their Reciprocals Using Goldschmidt Algorithm
Michael Morris presents a practical, FPGA-friendly fixed-point implementation of the Goldschmidt algorithm to compute sqrt and 1/sqrt. The post shows how an msb-indexed Y_est table and an N_adj scaling factor produce a reliable initial inverse-square-root estimate for an FP32B16 format, enabling five-iteration convergence. It also covers fixed-point normalization, multiplier/shift tradeoffs, and why this fits a real-time motion-controller use case.
Use DPLL to Lock Digital Oscillator to 1PPS Signal
Michael Morris demonstrates a practical DPLL that locks a Direct Digital Synthesizer to a GPS 1PPS signal, achieving sub-microsecond alignment and removing reference-oscillator frequency error. The design uses a Phase-Frequency Detector for 0 degree phase lock, a multiplier-free α-filter, and a limiter to prevent saturation, and includes coast and re-lock logic plus a synthesizable Verilog reference core.
Inside the Spartan-6: Using LUTs to optimize circuits
Victor Yurkovsky hit poor synthesis packing while building a J1 CPU on Spartan-6 and traced the problem to an 18-bit logic ALU that mapped to many slices. He demonstrates a practical fix: instantiate LUT6 primitives with carefully chosen INIT values, then use RLOC placement to stack the per-bit LUTs and collapse the design down to five slices. This is a hands-on guide to Xilinx-specific optimization when synthesis falls short.
Makefiles for Xilinx Tools
Building a bitstream from HDL is messy, and Victor Yurkovsky lays out a minimal, practical makefile workflow for Xilinx ISE and XST. He shows a simple project layout, techniques to tame ISE's generated logs and temps, and a ready-to-clone repo; an LED blinker example builds to bitstream in under 20 seconds on his machine. Use it as a pragmatic starting point for command-line FPGA builds.
Fit Sixteen (or more) Asynchronous Serial Receivers into the Area of a Standard UART Receiver
Michael Morris shows how to pack many asynchronous serial receivers into the area of a single UART by treating FPGA LUTs as writable storage and sharing logic. Using a 4-bit channel counter, microprogrammed state machine, and time-multiplexed baud/sample resources, he fits 16 receive channels (12 used for Caller ID) into a Spartan II XC2S30 and explains input synchronization, filtering, and the multi-channel FIFO approach.
I don’t often convert VHDL to Verilog but when I do ...
Converting VHDL to Verilog is tedious, and Christopher Felton lays out a pragmatic, repeatable workflow using vhd2vl to do most of the heavy lifting. He walks through the iterate-run-comment-fix cycle, highlights frequent failure points like arrays, records and packages, and explains why many open-source projects favor Verilog for better FOSS simulator support.
MyHDL Interface Example
Christopher Felton shows how MyHDL 0.9 interfaces bundle Signals into a single bus object to cut connector clutter and simplify module connections. The post walks through a pedagogical example where button presses drive a memory-mapped BareBoneBus read-modify-write that inverts LEDs, with a TDD-style testbench and notes on converting to Verilog/VHDL and loading the example on supported boards.
BGA and QFP at Home 1 - A Practical Guide.
It's a myth that BGAs and fine-pitch QFPs can't be soldered at home. Victor Yurkovsky lays out a practical, no-frills approach for hobbyists to design and assemble FPGA boards using 2-layer PCBs, breakout modules, and low-cost reflow methods like toaster ovens or hotplates. The article focuses on manufacturable PCB choices, netlist-driven workflows, and power/decoupling tactics that make high-density parts approachable for amateurs.
Shared-multiplier polyphase FIR filter
One multiplier and a dual-port RAM can implement an arbitrary m/n polyphase FIR resampler on an FPGA, Markus Nentwig demonstrates. The post focuses on practical implementation details, including a parametrized Verilog design, pipelined MAC control, and a Matlab testbench for verification. It shows how bank indexing and pipeline delay compensation let you multiplex many coefficient banks efficiently for resource-constrained FPGA designs.
VGA Output in 7 Slices. Really.
Victor Yurkovsky shows how to generate VGA timing on a Xilinx Spartan3 using clever SRL16 tricks to squeeze the generator into just a few slices. By using 32-bit SRLs for line pulses, two mutually prime SRL lengths as a divide-by-99 timebase, and tapped SRLs to combine HSYNC and HBLANK, the approach achieves accurate-enough horizontal and vertical timing with minimal LUT usage.
Computing Fixed-Point Square Roots and Their Reciprocals Using Goldschmidt Algorithm
Michael Morris presents a practical, FPGA-friendly fixed-point implementation of the Goldschmidt algorithm to compute sqrt and 1/sqrt. The post shows how an msb-indexed Y_est table and an N_adj scaling factor produce a reliable initial inverse-square-root estimate for an FP32B16 format, enabling five-iteration convergence. It also covers fixed-point normalization, multiplier/shift tradeoffs, and why this fits a real-time motion-controller use case.
Inside the Spartan-6: Using LUTs to optimize circuits
Victor Yurkovsky hit poor synthesis packing while building a J1 CPU on Spartan-6 and traced the problem to an 18-bit logic ALU that mapped to many slices. He demonstrates a practical fix: instantiate LUT6 primitives with carefully chosen INIT values, then use RLOC placement to stack the per-bit LUTs and collapse the design down to five slices. This is a hands-on guide to Xilinx-specific optimization when synthesis falls short.
Use DPLL to Lock Digital Oscillator to 1PPS Signal
Michael Morris demonstrates a practical DPLL that locks a Direct Digital Synthesizer to a GPS 1PPS signal, achieving sub-microsecond alignment and removing reference-oscillator frequency error. The design uses a Phase-Frequency Detector for 0 degree phase lock, a multiplier-free α-filter, and a limiter to prevent saturation, and includes coast and re-lock logic plus a synthesizable Verilog reference core.
Shared-multiplier polyphase FIR filter
One multiplier and a dual-port RAM can implement an arbitrary m/n polyphase FIR resampler on an FPGA, Markus Nentwig demonstrates. The post focuses on practical implementation details, including a parametrized Verilog design, pipelined MAC control, and a Matlab testbench for verification. It shows how bank indexing and pipeline delay compensation let you multiplex many coefficient banks efficiently for resource-constrained FPGA designs.
BGA and QFP at Home 1 - A Practical Guide.
It's a myth that BGAs and fine-pitch QFPs can't be soldered at home. Victor Yurkovsky lays out a practical, no-frills approach for hobbyists to design and assemble FPGA boards using 2-layer PCBs, breakout modules, and low-cost reflow methods like toaster ovens or hotplates. The article focuses on manufacturable PCB choices, netlist-driven workflows, and power/decoupling tactics that make high-density parts approachable for amateurs.
Fit Sixteen (or more) Asynchronous Serial Receivers into the Area of a Standard UART Receiver
Michael Morris shows how to pack many asynchronous serial receivers into the area of a single UART by treating FPGA LUTs as writable storage and sharing logic. Using a 4-bit channel counter, microprogrammed state machine, and time-multiplexed baud/sample resources, he fits 16 receive channels (12 used for Caller ID) into a Spartan II XC2S30 and explains input synchronization, filtering, and the multi-channel FIFO approach.
Makefiles for Xilinx Tools
Building a bitstream from HDL is messy, and Victor Yurkovsky lays out a minimal, practical makefile workflow for Xilinx ISE and XST. He shows a simple project layout, techniques to tame ISE's generated logs and temps, and a ready-to-clone repo; an LED blinker example builds to bitstream in under 20 seconds on his machine. Use it as a pragmatic starting point for command-line FPGA builds.
I don’t often convert VHDL to Verilog but when I do ...
Converting VHDL to Verilog is tedious, and Christopher Felton lays out a pragmatic, repeatable workflow using vhd2vl to do most of the heavy lifting. He walks through the iterate-run-comment-fix cycle, highlights frequent failure points like arrays, records and packages, and explains why many open-source projects favor Verilog for better FOSS simulator support.
MyHDL Interface Example
Christopher Felton shows how MyHDL 0.9 interfaces bundle Signals into a single bus object to cut connector clutter and simplify module connections. The post walks through a pedagogical example where button presses drive a memory-mapped BareBoneBus read-modify-write that inverts LEDs, with a TDD-style testbench and notes on converting to Verilog/VHDL and loading the example on supported boards.
An Editor for HDLs
If you prefer Notepad++ over Emacs, Dave Vandenbout shows how to turn it into a capable HDL editor using templates, a Perl package generator, and Emacs run in batch mode for beautification. He covers FingerText snippets for VHDL skeletons, binding a Perl script to auto-create/update package component declarations, and invoking Emacs from a hotkey to format files with one keystroke.
BGA and QFP at Home 1 - A Practical Guide.
It's a myth that BGAs and fine-pitch QFPs can't be soldered at home. Victor Yurkovsky lays out a practical, no-frills approach for hobbyists to design and assemble FPGA boards using 2-layer PCBs, breakout modules, and low-cost reflow methods like toaster ovens or hotplates. The article focuses on manufacturable PCB choices, netlist-driven workflows, and power/decoupling tactics that make high-density parts approachable for amateurs.
Use DPLL to Lock Digital Oscillator to 1PPS Signal
Michael Morris demonstrates a practical DPLL that locks a Direct Digital Synthesizer to a GPS 1PPS signal, achieving sub-microsecond alignment and removing reference-oscillator frequency error. The design uses a Phase-Frequency Detector for 0 degree phase lock, a multiplier-free α-filter, and a limiter to prevent saturation, and includes coast and re-lock logic plus a synthesizable Verilog reference core.
I don’t often convert VHDL to Verilog but when I do ...
Converting VHDL to Verilog is tedious, and Christopher Felton lays out a pragmatic, repeatable workflow using vhd2vl to do most of the heavy lifting. He walks through the iterate-run-comment-fix cycle, highlights frequent failure points like arrays, records and packages, and explains why many open-source projects favor Verilog for better FOSS simulator support.
An Editor for HDLs
If you prefer Notepad++ over Emacs, Dave Vandenbout shows how to turn it into a capable HDL editor using templates, a Perl package generator, and Emacs run in batch mode for beautification. He covers FingerText snippets for VHDL skeletons, binding a Perl script to auto-create/update package component declarations, and invoking Emacs from a hotkey to format files with one keystroke.
Computing Fixed-Point Square Roots and Their Reciprocals Using Goldschmidt Algorithm
Michael Morris presents a practical, FPGA-friendly fixed-point implementation of the Goldschmidt algorithm to compute sqrt and 1/sqrt. The post shows how an msb-indexed Y_est table and an N_adj scaling factor produce a reliable initial inverse-square-root estimate for an FP32B16 format, enabling five-iteration convergence. It also covers fixed-point normalization, multiplier/shift tradeoffs, and why this fits a real-time motion-controller use case.
Developing FPGA-DSP IP with Python
Designing FPGA-DSP IP entirely in Python is practical and productive, as Christopher Felton demonstrates using MyHDL. He shows how numpy and scipy handle the signal design while a SIIR class generates RTL, enables side-by-side floating-point and HDL simulation, and converts to Verilog for synthesis. The post includes Xilinx XC3S500E resource results and a link to the SIIR source on BitBucket, making it easy to try the workflow.
Shared-multiplier polyphase FIR filter
One multiplier and a dual-port RAM can implement an arbitrary m/n polyphase FIR resampler on an FPGA, Markus Nentwig demonstrates. The post focuses on practical implementation details, including a parametrized Verilog design, pipelined MAC control, and a Matlab testbench for verification. It shows how bank indexing and pipeline delay compensation let you multiplex many coefficient banks efficiently for resource-constrained FPGA designs.
VGA Output in 7 Slices. Really.
Victor Yurkovsky shows how to generate VGA timing on a Xilinx Spartan3 using clever SRL16 tricks to squeeze the generator into just a few slices. By using 32-bit SRLs for line pulses, two mutually prime SRL lengths as a divide-by-99 timebase, and tapped SRLs to combine HSYNC and HBLANK, the approach achieves accurate-enough horizontal and vertical timing with minimal LUT usage.
Inside the Spartan-6: Using LUTs to optimize circuits
Victor Yurkovsky hit poor synthesis packing while building a J1 CPU on Spartan-6 and traced the problem to an 18-bit logic ALU that mapped to many slices. He demonstrates a practical fix: instantiate LUT6 primitives with carefully chosen INIT values, then use RLOC placement to stack the per-bit LUTs and collapse the design down to five slices. This is a hands-on guide to Xilinx-specific optimization when synthesis falls short.
Fit Sixteen (or more) Asynchronous Serial Receivers into the Area of a Standard UART Receiver
Michael Morris shows how to pack many asynchronous serial receivers into the area of a single UART by treating FPGA LUTs as writable storage and sharing logic. Using a 4-bit channel counter, microprogrammed state machine, and time-multiplexed baud/sample resources, he fits 16 receive channels (12 used for Caller ID) into a Spartan II XC2S30 and explains input synchronization, filtering, and the multi-channel FIFO approach.










