Hi Geoffrey. I’m glad you brought up parallel prefix sum, and more generally scans, which I wrote about here recently. As I understand them, parallel prefix (aka “prefix scan”) circuits add many numbers and produce sums of all prefixes (and more generally, associative operations, as you mentioned). The tricky bit is efficiently calculating *all* of the prefix sums, maximizing parallelism (to reduce time) and minimizing redundant computation (to reduce work). If one wants just a single result, then a balanced tree fold is much simpler and faster.
In contrast, the problem I’m considering here is to add just a pair of numbers operating on the *multiple bits* in parallel, producing a single, multi-bit sum. I don’t know whether there are any interesting connections between pair addition and either scan or fold.

]]>