|
|
There is some few operations in our sum factorization code that should translate to good assembly code in order to have the overall code perform well. I have set up godbolt to study these for us:
|
|
|
|
|
|
* `horizontal_add`
|
|
|
|
|
|
https://godbolt.org/g/ZGmSYN
|
|
|
* [`horizontal_add`](https://godbolt.org/#z:OYLghAFBqRAWIDGB7AJgUwKKoJYBdkAnAGhxAgCsQBGYgZ2QFdDF0QByAUgCYBmHAHaIANowydeAYRwBbGYLyFBAOjgTMnAAwBBLdtRMARsPQBqOERwAvZALwBDYQH17qVE4AsTgG4iIKATo8UycnGW4AVgA2VB4o03sASlNOAHYAIT1TbJCwyJjTPGoU3gARELl8pzhXdwAHVAh7YiSJTJ0c3JlqbgAOVELuEvKw8OindAAPRXtEPAAzHt6nBogi4mpEtqyc0O6%2BgbxeYYqZFzcnOkbRqsR7IIaqhqW1zeI8bi3edu1OwnQ8MwBKcnIhvHgrk55lEPGteF8fmlSno9AZGMYzBYlDY7I5zu4vHV7HhEHB0O5fMJ/LYgl18rFuPEkikMjtsgFaXslgNvI5GHU6uhCCcbuMpjM5os%2BitGs1TJtth0chzglyDqZecJhMgAO5CkWVcZ3B6oJ6oF6tb5s0wqrrcjWOTUGs61GUQTX8wUkB1a3VChHW21q/qmf50Z34t2jFZCmSMPDoN2azUbRLEH2agNK7L/QGEYHRsEQ9zQ2FhrO6VLInQovg4eamDDzQTkiC5bQANQAGhEeqFTMk0pIh430M2BOTTG3Qp2ez0AGL9rY1mt1huhADiADkAKqSfvqUzxIcj0IASS3ABVMAAZJySADyAFkAApnm%2BYABKKR0aIx5ksHEHGcV1lgUJRAhwRBqUCVUwl7bgGXiSZB1ZbNQwBIFTgQpx/lQRhWEjVYUMVStq10Ph0AEXB5hRX8jBMADsVsYDI2WSkYM5eCeiQ0wUJZH4/kw/MmOsFi8VdLwONGHDjTwBocMeaIIBQwduHSH9fk6bTRKAiSLikvwZL7cVCFmBYYUmQlGkmVMKyROj9AYzFAPEkCLmWIkSTJCk/CDbjELiPjUMEnJcywrExNxdyCRWYlSXJHwjLkWT7nk1BFNNZTVJSdTNJ07TIr0mLPDi7zEuklKTOmMzJUs6yVLs0iHJXCjeComjTAAei60xZwQ9g02EDgInYYgBA4TQxuQDhh3UngNIYZhWFy3hqDGvBJsG4h4CQZAZDqHATBIMhqQOo6hRAYBUm4Yhm2EBNCDochDC24hDEEexCAATw4dbiBQOQqLwB8BGEX72Cm4gMBkewBGAEw3twf45hwbx0GeyGxqmdBEHjNh2H%2BhR0GGrH3iUGQtqGnBDGeyA02QOo8BwGkOAAWlhlg1DKABrMHTDZh9eDGpaWDYN5SdG8a3pm9hJl6KI2ZhUxgEQRBTFSZQhggbB8CIVbaFMSR9sO46DcSDaqZ2hBAdNoVSHIW2LsIEAoMQahUjuo7HueiBXrJj6BC%2BiH/sBmRgdB8GkfQWH4cRsnkdx5n0cxqGcbxhM/rG4nSahxRZCt4QabpiAGaZlnAnZyYnzPSQ2f6noBaFkWmDFmghpGsaJrJ2X5cV5W3flVIp11ghhR4Nb02N86zYnz5LaxtNdqd46HbOu2XZEOHgAiTRNC9h6hV9/2ocD4Os4B/bw7sSOIahmHt/j%2B%2BcBR5OMbe9P8YvnO3vzynF%2BIEXWm4BS7EEZszVm7AOZfVJBIUofNhBN2FvQVurB26AM7tLHuHA%2B5Kw8DaYQ29TARGUJoUhI9cBjwNlPE2ztVrwgXlNRIaYeY0D3hg9gHgu4yw4CLEA%2B9NqLzTOjJ6FcQAeCAA)
|
|
|
|
|
|
* `transpose`
|
|
|
|