HLIBpro
2.8.1
|
Matrix construction in ๐ง๐ซ๐จ๐ก๐๐๐ is the processor of using a given block cluster tree \( T \) and constructing matrices for all leaves in \( T \) and for all inner nodes of \( T \). The leaf matrices define the actual content of the matrix and may be of various formats, e.g. dense (TDenseMatrix) and low rank (TRkMatrix) matrices . Inner nodes on the other hand define the hierarchy of the H-matrix and are important for the matrix algebra. Such block matrices are usually of type TBlockMatrix.
A special version of a block matrix is THMatrix, which also stores the permutations supplied by block cluster trees to map between internal and external ordering of the unknowns during matrix vector multiplication. Standard block matrices, e.g. of type TBlockMatrix, will only know about the interal ordering of unknowns.
All matrix construction techniques are derived from the corresponding base class TMatBuilder, which defines the basic build interface, which mainly consists of the function build:
Both functions will accept a progress meter to indicate the current state of the process.
The difference between both functions comes from the different type of cluster
. In the first case a TBlockClusterTree object is given and thereby also the mappings between internal an external ordering of the unknowns stored within this object. This allows the construction of a THMatrix object. In the second case, this information is not available and therefore (usually) a TBlockMatrix object will be returned.
The approximation accuracy of low-rank matrices during construction or truncation is controlled by TTruncAcc
objects.
It provides different limits for the rank:
All low-rank matrices will have a fixed rank \(k\). During truncation only the largest \(k\) singular values are kept.
Fixed rank accuracy is defined by the function fixed_rank( k )
, e.g.,
The rank is defined adaptively, e.g., for a matrix \(M\) ensure that for the low-rank approximation \(\tilde M\) holds \(|M - \tilde M| \le \varepsilon |M|\), where \(\varepsilon\) defines the precision. For truncation this means that all singular values \(s_i\) with \( s_i / s_0 \ge \epsilon\) are kept.
Fixed precision accuracy is defined by the function fixed_prec( eps )
, e.g.,
Both may be coupled with
The rank is defined adaptively, e.g., for a matrix \(M\) ensure that for the low-rank approximation \(\tilde M\) holds \(|M - \tilde M| \le \varepsilon \), where \(\varepsilon\) defines the precision. For truncation this means that all singular values \(s_i\) with \( s_i \ge \epsilon\) are kept.
The absolute precision is an optional second argument to fixed_rank
and fixed_prec
, e.g.
By default, an the limit for absolute precision is 0.
Furthermore, in case of fixed precision a maximal rank may be set with TTruncAcc::set_max_rank( k )
, e.g., the rank may grow adaptively based on the predefined \(\varepsilon\) up to \(k\).
With TBlockTruncAcc
it is possible to specify different accuracies for different subblocks of a matrix. The subblocks are hereby addressed with their row and column index sets.
In this example, a \(3 \times 2\) partitioning of the global indexset is defined by the row indexsets row_idxs and the column indexsets col_idxs
. For each subblock of this partitioning a separate accuracy object is used (specified in column major format).
For a given matrix block of an H-matrux, the accuracy object is determined by testing in which subblock of the above partitioning the matrix block is in. Due to this, the index partitioning of TBlockTruncAcc
should follow the index partitioning of the H-matrix or otherwise errors may occur with matrix blocks belonging to several accuracy objects (intersecting more than one accuracy subblock).
In principle, this may be applied recursively and thereby a specific accuracy may be defined for each subblock of the H-matrix.
In a dense matrix \( A \), all leaf nodes of the block cluster tree will usually be represented by a non-zero matrix block. By default, a non-admissible leaf node will become a dense matrix, e.g. TDenseMatrix, and an admissible node will become a low rank matrix, e.g. TRkMatrix. This construction is implemented in TDenseMBuilder, which needs two additional object to perform the H-matrix construction:
The coefficient function has to be of type TCoeffFn or, to be precise of a derived class (as TCoeffFn is abstract). The main function in TCoeffFn is:
Here, rowis
and colis
hold the row and column indices of the sub matrix rowis
ร colis
to evaluate and matrix
should be used to store the result in column wise order (as usual in ๐ง๐ซ๐จ๐ก๐๐๐), e.g., the coefficient corresponding to the i'th row and j'th column index should be stored at position (i,j) in matrix
.
TCoeffFn
also implements another evaluation function:
which is based on TIndexSet
. This function can be used, if the index set is contiguous, which is the case with the internal ordering of H-matrices, e.g., clusters.
By default, the TIndexSet
version of eval
relies on the above eval
function.
Furthermore, TPermCoeffFn
implements the mapping from internal to external orderings and adds this permutation to the second eval
function, calling the first eval function with index sets in external ordering.
Another property of the final H-matrix has to be defined by the coefficient function, namely the matrix format, e.g. whether it is unsymmetric, symmetric or hermitian. This is done by overloading the function
A full example for a user defined cofficient function is described in Solving an Integral Equation.
๐ง๐ซ๐จ๐ก๐๐๐ provides several techniques for computing low rank approximations for dense matrices:
Although singular value decomposition will compute the best approximation, it is also the most time consuming with a computational complexity of \( \mathcal{O}(n^3) \), where \( n \) is the number of rows/columns.
Randomized SVD is usually faster but needs the full dense block for computations, i.e., complexity is at least \( \mathcal{O}(n^2) \). The same holds for rank revealing QR.
ACA is a heuristical approximation technique based on consecutive elimination of rank-1 matrices. In contrast to the HCA will always return an approximation of a given accuracy. The advantage of ACA compared to HCA is, that it only needs access to the matrix coefficients, whereas HCA needs further functionality, e.g. evaluation of collocation integrals. Both, ACA and HCA have a runtime of \( \mathcal{O}(n) \). ACA-Full is a special variant of ACA where the full matrix is searched for the best rank-1 matrix to eliminate. This also guarantees the wanted accuracy albeit with a higher complexity ( \( \mathcal{O}(n^2) \)).
Solving an Integral Equation contains a complete example for H-matrix construction based on dense matrices.
Representing sparse matrices using H-matrices will usually result in a much higher memory consumption since H-matrices have move overhead in storing data. Hence, it should only be performed if additional arithmetic is wanted for these matrices. The standard case is the computation of an H-LU factorisation of a sparse matrix.
The construction of the leaf matrix blocks, and hence, the construction of the H-matrix out of a sparse matrix is done by TSparseMBuilder. Beside the actual sparse matrix, it expects also mappings for the row and the column index sets between internal and external ordering since the sparse matrix is often given in external ordering. If not, the mappings may be NULL
.
A complete example for H-matrix construction with sparse matrices can be found in Solving a Sparse Equation System.
A special mode can be activated within TSparseMBuilder where instead of using TDenseMatrix and TRkMatrix, all leaf matrices will be constructed as sparse matrices itself. Although this significantly reduces the memory consumption of the final H-matrix, no arithmetic is supported for such a matrix beside matrix-vector multiplication. This mode can be enabled/disable by calling set_sparse_mode: