Consider vectorization strategies that use gather instructions

Currently we limit to one or two different input tensors. I think my Maxwell code would benefit from using four tensors. In order to have this work, the parameter needs to be configurable, because it introduces substantial complexity in the vectorization strategy generator.