Implement SSE vectorization
There is at least one interesting use case for SSE-only vectorization: In Stokes DG, the pressure evaluation is often implemented as scalar code, because a quadrature point increase is not feasible. On skeletons, this means, we calculate two scalar kernels instead of an SSE vectorized one.