Do not generate code for stage 1 sumfact kernels that don't get used
Save all stage 1 sum factorization kernels that are used in accumulation expression in the cache during the dry run. Discard all inactive sum factorization kernels in decide_vetorization_strategy.
This fixes #100 (closed).