Introduction

Three different dense approximation and low-rank truncation algorithms are implemented in 𝖧𝖫𝖨𝖡𝗉𝗋𝗈 based on

Singular Value Decomposition (SVD),
Randomized SVD (Rand-SVD) and
Rank Revealing QR (RRQR).

SVD always computes a best approximation with respect to the predefined accuracy or rank. However, it has complexity \(\mathcal{O}(n^3)\) for dense approximation and \(\mathcal{O}(k^2 n + k^3 )\) complexity for truncation with \(n\) being the matrix size and \(k\) the rank.

The complexity for Rand-SVD is \(\mathcal{O}(n k^2 + k^3)\) for dense approximation where \(k\) is the rank of the approximant. Truncation has the same complexity but saves a QR factorization compared to SVD. However, as this is a randomized procedure, approximation is not guaranteed.

RRQR uses a QR factorization with column pivoting instead of a SVD, which has complexity \(\mathcal{O}(n^2 k)\) with destination rank \(k\) for dense approximation. The truncation procedure again saves a QR factorization compared to SVD based truncation and slightly reduces complexity from \(\mathcal{O}(k^3)\) to \(\mathcal{O}(k'^2 k)\) with \(k'\) being the rank of the input matrix. Furthermore, RRQR has the same error control as SVD only usually with a slightly larger rank.

Leaving error control aside, the advantage of each of these methods depends on the resulting rank of the approximation, which furthermore determines the complexity of the H-arithmetic. Hence, in different applications different combinations of approximation and truncation methods may provide the best performance.

Activation

The approximation algorithm is controlled in 𝖧𝖫𝖨𝖡𝗉𝗋𝗈 with the parameter CFG::BLAS::approx_method which is of type BLAS::approx_t:

enum approx_t : int
{
    use_svd  = 0,
    use_rrqr = 1,
    use_rand = 2
};

The same type is used for the truncation method parameter CFG::BLAS::trunc_method.

Both parameter may then either be set directly

HLIB::CFG::BLAS::approx_method = HLIB::BLAS::use_svd;

HLIB::CFG::BLAS::trunc_method = HLIB::BLAS::use_rrqr;

or via environment variables

export HLIB_BLAS_approx_method=0

export HLIB_BLAS_trunc_method=1

or in the configuration file .hlibpro.conf

[BLAS]
approx_method=0
trunc_method=1

(see also Parameters).

Benchmarks

We will run H-LU benchmarks (without using symmetry) with three different H-matrices with all combinations of approximation and truncation algorithms.

Results are presented for runtime, memory consumption and approximation error.

All benchmarks are performed on a Intel i5-4670 CPU using a single CPU core.

Laplace

This benchmark uses the Laplace SLP operator with a spherical with 32768 triangles and piecewise linear ansatz (see Boundary Element Methods).

Runtime
	approx_method
trunc_method	SVD	RRQR	RandSVD
SVD	32.69s	29.80s	29.20s
RRQR	24.81s	22.04s	21.54s
Rand-SVD	33.22s	30.46s	29.89s

Memory
	approx_method
trunc_method	SVD	RRQR	RandSVD
SVD	385 MB	385 MB	385 MB
RRQR	387 MB	388 MB	387 MB
Rand-SVD	385 MB	385 MB	385 MB

Error
	approx_method
trunc_method	SVD	RRQR	RandSVD
SVD	1.47₁₀-3	1.47₁₀-3	1.47₁₀-3
RRQR	1.45₁₀-3	1.45₁₀-3	1.45₁₀-3
Rand-SVD	1.56₁₀-3	1.74₁₀-3	1.62₁₀-3

The best runtime and also the best approximation is obtained with RRQR truncation. The dense approximation technique has a smaller effect on the total runtime, albeit the Rand-SVD method show a slight advantage here without sacrificing accuracy.

However, using Rand-SVD for low-rank truncation clearly results in higher runtime and reduced accuracy.

All methods show very similar behaviour in terms of memory consumption. However, although having the smallest numbers for runtime, memory consumption is slightly increased for RRQR based truncation, indicating increased ranks after truncation compared to SVD or Rand-SVD.

Helmholtz

This benchmark uses the Helmholtz SLP operator with a spherical with 32768 triangles and piecewise constant ansatz (see Boundary Element Methods).

Runtime
	approx_method
trunc_method	SVD	RRQR	RandSVD
SVD	69.91s	64.98s	62.28s
RRQR	58.24s	53.68s	51.00s
Rand-SVD	85.70s	81.01s	77.64s

Memory
	approx_method
trunc_method	SVD	RRQR	RandSVD
SVD	861 MB	862 MB	861 MB
RRQR	877 MB	878 MB	877 MB
Rand-SVD	861 MB	862 MB	861 MB

Error
	approx_method
trunc_method	SVD	RRQR	RandSVD
SVD	6.73₁₀-4	6.76₁₀-4	6.72₁₀-4
RRQR	6.56₁₀-4	6.59₁₀-4	6.56₁₀-4
Rand-SVD	6.87₁₀-4	6.93₁₀-4	6.93₁₀-4

The overall picture is similar to the Laplace example with RRQR producing the fastest runtime, again in combination with Rand-SVD for dense approximation.

The accuracy is also best with RRQR based truncation but also shows a slight disadvantage for RRQR approximation.

Also for memory, RRQR shows the same behaviour as for Laplace with higher ranks in the low-rank matrices.

Convection-Diffusion

The third benchmark uses a sparse matrix from a 3D convection diffusion equation, partitioned using nested dissection (see Solving a Sparse Equation System). The number of unknowns is 132651.

Runtime
	approx_method
trunc_method	SVD	RRQR	RandSVD
SVD	40.10s	30.72s	29.44s
RRQR	34.98s	24.83s	23.33s
Rand-SVD	41.08s	31.06s	29.54s

Memory
	approx_method
trunc_method	SVD	RRQR	RandSVD
SVD	1310 MB	1320 MB	1310 MB
RRQR	1320 MB	1330 MB	1320 MB
Rand-SVD	1320 MB	1320 MB	1320 MB

Error
	approx_method
trunc_method	SVD	RRQR	RandSVD
SVD	1.74₁₀-3	2.11₁₀-3	5.55₁₀-0
RRQR	2.04₁₀-3	2.46₁₀-3	5.03₁₀-0
Rand-SVD	1.19₁₀-0	2.75₁₀-0	5.92₁₀-0

In terms of runtime and memory, the behaviour stays the same as in the previous examples.

However, obviously Rand-SVD has an approximation problem with data data involved in these examples. The difference here is, that the matrix blocks are no longer densely populated since the original matrix is sparse. This seems to lead to problems with the randomized algorithm.

Conclusion

The SVD algorithm for dense approximation and low-rank truncation is still the default option in 𝖧𝖫𝖨𝖡𝗉𝗋𝗈 due to best approximation. The runtime compared to the other options is also within bounds.

However, for optimal performance all options should be tested for a given application as significant savings can be obtained, especially with RRQR, which also guarantees error control.

See also Accumulator Arithmetic for other parameters influencing arithmetic.

Table of Contents