FPGARelated.com
Forums

Optimizations, How Much and When?

Started by Rick C January 4, 2020
On Sunday, January 5, 2020 at 6:33:05 PM UTC, Rick C wrote:
> > Interesting effort. I'm surprised the result is so small. I'm also surprised the Cyclone V result is smaller than the Artix 7 result. Any idea why the register count varies? Usually the register count is fixed by the code. Did the tools use register splitting for speed? >
A state machine is a significant part of the design, so I expect that to be the main reason for differences in register count, depending on how the implementation tools decide to encode the state machine. The number of LUTs and registers also vary depending on tool optimisation settings. The register and LUT counts on the s430 page are for default tool settings.
> Does it take a lot of cycles to run code?
Yes it does, see the cycle count for a selection of instructions on the s430 page. The design is not a pipelined processor design, which saves logic of course but hurts performance. When I designed it I very roughly aimed at less than 50% resources in the ~1200 LUT/FF machxo3 devices for low power processing tasks with gcc. Part of the reason it's small is that it doesn't have the 16-bit ALU, but then two clocks are required instead of one for an ALU operation. I seem to recall reading somewhere that some of the earlier Z80s had 4-bit ALU's rather than 8 to save on space, so I think that's been done for a while now :). Interestingly on Github there is a NEO430 project that someone else has designed, and the end results there are not too dissimilar to what I've got, noting the optimisations I used.
On Monday, January 6, 2020 at 1:34:47 PM UTC-5, pau...@googlemail.com wrote:
> On Sunday, January 5, 2020 at 6:33:05 PM UTC, Rick C wrote: > > > > Interesting effort. I'm surprised the result is so small. I'm also surprised the Cyclone V result is smaller than the Artix 7 result. Any idea why the register count varies? Usually the register count is fixed by the code. Did the tools use register splitting for speed? > > > > A state machine is a significant part of the design, so I expect that to be the main reason for differences in register count, depending on how the implementation tools decide to encode the state machine. The number of LUTs and registers also vary depending on tool optimisation settings. The register and LUT counts on the s430 page are for default tool settings. > > > Does it take a lot of cycles to run code? > > Yes it does, see the cycle count for a selection of instructions on the s430 page. The design is not a pipelined processor design, which saves logic of course but hurts performance. When I designed it I very roughly aimed at less than 50% resources in the ~1200 LUT/FF machxo3 devices for low power processing tasks with gcc. > > Part of the reason it's small is that it doesn't have the 16-bit ALU, but then two clocks are required instead of one for an ALU operation. I seem to recall reading somewhere that some of the earlier Z80s had 4-bit ALU's rather than 8 to save on space, so I think that's been done for a while now :). > > Interestingly on Github there is a NEO430 project that someone else has designed, and the end results there are not too dissimilar to what I've got, noting the optimisations I used.
Thanks, it's always interesting to see not just the results, but the goals and motivations for CPU projects. I think you might be referring to the Z8 rather than the Z80? I guess there were some clones but I don't think the original Z80 had a 4 bit, double pumped ALU. Sounds more like something done in a Chinese 4 bit processor built to run Z80 code. The Z8 on the other hand was all about low selling price, so they may well have minimized the ALU and other logic this way. Anyone know if the 4 bit MCUs are still dominating the low end of the CPU market or have the cost differences with the 8 bit devices faded away? -- Rick C. +- Get 1,000 miles of free Supercharging +- Tesla referral code - https://ts.la/richard11209