HLIBpro
2.7
|
Three different dense approximation and low-rank truncation algorithms are implemented in 𝖧𝖫𝖨𝖡𝗉𝗋𝗈 based on
SVD always computes a best approximation with respect to the predefined accuracy or rank. However, it has complexity \(\mathcal{O}(n^3)\) for dense approximation and \(\mathcal{O}(k^2 n + k^3 )\) complexity for truncation with \(n\) being the matrix size and \(k\) the rank.
The complexity for Rand-SVD is \(\mathcal{O}(n k^2 + k^3)\) for dense approximation where \(k\) is the rank of the approximant. Truncation has the same complexity but saves a QR factorization compared to SVD. However, as this is a randomized procedure, approximation is not guaranteed.
RRQR uses a QR factorization with column pivoting instead of a SVD, which has complexity \(\mathcal{O}(n^2 k)\) with destination rank \(k\) for dense approximation. The truncation procedure again saves a QR factorization compared to SVD based truncation and slightly reduces complexity from \(\mathcal{O}(k^3)\) to \(\mathcal{O}(k'^2 k)\) with \(k'\) being the rank of the input matrix. Furthermore, RRQR has the same error control as SVD only usually with a slightly larger rank.
Leaving error control aside, the advantage of each of these methods depends on the resulting rank of the approximation, which furthermore determines the complexity of the H-arithmetic. Hence, in different applications different combinations of approximation and truncation methods may provide the best performance.
The approximation algorithm is controlled in 𝖧𝖫𝖨𝖡𝗉𝗋𝗈 with the parameter CFG::BLAS::approx_method
which is of type BLAS::approx_t:
The same type is used for the truncation method parameter CFG::BLAS::trunc_method
.
Both parameter may then either be set directly
or via environment variables
or in the configuration file .hlibpro.conf
(see also Parameters).
We will run H-LU benchmarks (without using symmetry) with three different H-matrices with all combinations of approximation and truncation algorithms.
Results are presented for runtime, memory consumption and approximation error.
All benchmarks are performed on a Intel i5-4670 CPU using a single CPU core.
This benchmark uses the Laplace SLP operator with a spherical with 32768 triangles and piecewise linear ansatz (see Boundary Element Methods).
approx_method | |||
---|---|---|---|
trunc_method | SVD | RRQR | RandSVD |
SVD | 32.69s | 29.80s | 29.20s |
RRQR | 24.81s | 22.04s | 21.54s |
Rand-SVD | 33.22s | 30.46s | 29.89s |
approx_method | |||
---|---|---|---|
trunc_method | SVD | RRQR | RandSVD |
SVD | 385 MB | 385 MB | 385 MB |
RRQR | 387 MB | 388 MB | 387 MB |
Rand-SVD | 385 MB | 385 MB | 385 MB |
approx_method | |||
---|---|---|---|
trunc_method | SVD | RRQR | RandSVD |
SVD | 1.47₁₀-3 | 1.47₁₀-3 | 1.47₁₀-3 |
RRQR | 1.45₁₀-3 | 1.45₁₀-3 | 1.45₁₀-3 |
Rand-SVD | 1.56₁₀-3 | 1.74₁₀-3 | 1.62₁₀-3 |
The best runtime and also the best approximation is obtained with RRQR truncation. The dense approximation technique has a smaller effect on the total runtime, albeit the Rand-SVD method show a slight advantage here without sacrificing accuracy.
However, using Rand-SVD for low-rank truncation clearly results in higher runtime and reduced accuracy.
All methods show very similar behaviour in terms of memory consumption. However, although having the smallest numbers for runtime, memory consumption is slightly increased for RRQR based truncation, indicating increased ranks after truncation compared to SVD or Rand-SVD.
This benchmark uses the Helmholtz SLP operator with a spherical with 32768 triangles and piecewise constant ansatz (see Boundary Element Methods).
approx_method | |||
---|---|---|---|
trunc_method | SVD | RRQR | RandSVD |
SVD | 69.91s | 64.98s | 62.28s |
RRQR | 58.24s | 53.68s | 51.00s |
Rand-SVD | 85.70s | 81.01s | 77.64s |
approx_method | |||
---|---|---|---|
trunc_method | SVD | RRQR | RandSVD |
SVD | 861 MB | 862 MB | 861 MB |
RRQR | 877 MB | 878 MB | 877 MB |
Rand-SVD | 861 MB | 862 MB | 861 MB |
approx_method | |||
---|---|---|---|
trunc_method | SVD | RRQR | RandSVD |
SVD | 6.73₁₀-4 | 6.76₁₀-4 | 6.72₁₀-4 |
RRQR | 6.56₁₀-4 | 6.59₁₀-4 | 6.56₁₀-4 |
Rand-SVD | 6.87₁₀-4 | 6.93₁₀-4 | 6.93₁₀-4 |
The overall picture is similar to the Laplace example with RRQR producing the fastest runtime, again in combination with Rand-SVD for dense approximation.
The accuracy is also best with RRQR based truncation but also shows a slight disadvantage for RRQR approximation.
Also for memory, RRQR shows the same behaviour as for Laplace with higher ranks in the low-rank matrices.
The third benchmark uses a sparse matrix from a 3D convection diffusion equation, partitioned using nested dissection (see Solving a Sparse Equation System). The number of unknowns is 132651.
approx_method | |||
---|---|---|---|
trunc_method | SVD | RRQR | RandSVD |
SVD | 40.10s | 30.72s | 29.44s |
RRQR | 34.98s | 24.83s | 23.33s |
Rand-SVD | 41.08s | 31.06s | 29.54s |
approx_method | |||
---|---|---|---|
trunc_method | SVD | RRQR | RandSVD |
SVD | 1310 MB | 1320 MB | 1310 MB |
RRQR | 1320 MB | 1330 MB | 1320 MB |
Rand-SVD | 1320 MB | 1320 MB | 1320 MB |
approx_method | |||
---|---|---|---|
trunc_method | SVD | RRQR | RandSVD |
SVD | 1.74₁₀-3 | 2.11₁₀-3 | 5.55₁₀-0 |
RRQR | 2.04₁₀-3 | 2.46₁₀-3 | 5.03₁₀-0 |
Rand-SVD | 1.19₁₀-0 | 2.75₁₀-0 | 5.92₁₀-0 |
In terms of runtime and memory, the behaviour stays the same as in the previous examples.
However, obviously Rand-SVD has an approximation problem with data data involved in these examples. The difference here is, that the matrix blocks are no longer densely populated since the original matrix is sparse. This seems to lead to problems with the randomized algorithm.
The SVD algorithm for dense approximation and low-rank truncation is still the default option in 𝖧𝖫𝖨𝖡𝗉𝗋𝗈 due to best approximation. The runtime compared to the other options is also within bounds.
However, for optimal performance all options should be tested for a given application as significant savings can be obtained, especially with RRQR, which also guarantees error control.
See also Accumulator Arithmetic for other parameters influencing arithmetic.