Blog:

This blog discusses ideas, drafts and papers about the whole spectrum of systems theory and anything else that concerns the WissTec R&D Services UG.

30.9.2020 - Numerical Methods I:

Efficient Matrix Multiplication:

(by Eike Scholz)

This is short entry, since we are intensively working on a systems design for compound standard currency systems which are implementable with the transaction settling protocol stack designed by us. Compound standard currency systems are a generalization of the historical gold standard system. However, due to zero funding the development on the transaction protocol stack is currently stalled. We hope, that we will be able to acquire further capital once that paper/book is finished.

This short entry will show how to implement sparse matrix multiplication efficiently on modern single instruction multiple data (simd) architectures. That is all modern PC CPUs. A variation of this has been used for efficient sparse linear algebra subroutines in our in-house finite element solvers - a then new, from scratch written solution, that was not possible to develop at a particular German university to due intervention from an particular professor. But in the free market, outside of the German universities, I found freedom to do research and implement my ideas. They work well.

For this post we use the C style index sets. For \( n\in\mathbb{N} \) an index set is defined as: \[ \underline{n} := \{0,1,2,...,n-1\} \] A vector \(v\in\mathbb{R}^n\) is thus a family \( \left( v_j\right)_{j\in\underline n} \). Further, a family \[ (A_j)_{j\in \underline m} \in \left(\mathbb{R}^n\right)^m \] of vectors with the index abbreviation scheme \[ A_{ij} := (A_j)_{i} \] is a matrix composed of column vectors and the matrix multiplication \( C = A B \) written in scalar components is \[ C_{ij} = \sum_{k\in\underline n}A_{ik} \cdot B_{kj} \text{ .} \] Writing it in column vectors thus gives the linear combinations \begin{equation} C_{j} = \sum_{k\in\underline n}A_{k} \cdot B_{kj} \label{colAB}\text{ .} \end{equation} since \begin{align*} (C_j)_i &= C_{ij} \\ &= \sum_{k\in\underline n}A_{ik} \cdot B_{kj} \\ &= \sum_{k\in\underline n} (A_{k})_i \cdot B_{kj} \\ &= \sum_{k\in\underline n} (A_{k} \cdot B_{kj} )_i \\ &= \left( \sum_{k\in\underline n} A_{k} \cdot B_{kj}\right)_i \end{align*} for each row i. On modern simd CPUs there are simd multiply-add vector instructions for efficient implementation of this equation.

Efficient implementation of matrix multiplication is thus not done with the row-column wise scalar products, that are often used as mnemonic bridge to remember how matrix multiplication is defined when learning undergraduate mathematics. However, for dense matrices that argument might not matter. There are dedicated simd scalar product instructions as well.

The more interesting application of this is sparse matrix multiplication. The scalar multiplications can be done in parallel as for the dense case. However, the addition of the linear combination needs different handling. In our in-hose FEM-solver we use the associativity of the addition for a simple recursive divide and conquer algorithm. It is implemented efficiently using 2 stacks in alternating roles for memory management in a way that a copying garbage collector works. That yielded very good performance results and the metacore programming languages 4 stack memory layout originates from this experience as well as ideas from Forth. However, at the moment I do not have time to continue the development of the metacore programming language.

Update: Fixed typo. I meant to write column-vectors, at some places. That is now fixed. The formulas where correct however.

Update: Fixed a index permutation error in the derivation and added a minimal proof.

All posts of this month.