HLIBpro
2.8.1
|
In this example, an integral equation is to be solved by representing the discretised operator with an H-matrix, whereby the H-matrix is constructed using a matrix coefficient function for entries of the equation system. Furthermore, for the iterative solver a preconditioner is computed using H-LU factorisation.
Given the integral equation
\[ \int_0^1 \log|x-y| \mathbf{u}(y) dy = \mathbf{f}(x), \quad x \in [0,1] \]
with \(\mathbf{u} : [0,1] \to \mathbf{R}\) being sought for a given right hand side \(\mathbf{f} : [0,1] \to \mathbf{R}\), the Galerkin discretisation with constant ansatz functions \(\phi_i, 0 \le i < n\)
\[ \phi_i(x) = \left\{ \begin{array}{ll} 1, & x \in \left[\frac{i}{n},\frac{i+1}{n} \right] \\ 0, & \mathrm{otherwise} \end{array} \right. \]
leads to a linear equation system \(A u = f\) where \(u\) contains the coefficients of the discretised \(\mathbf{u}\) and \(A\) is defined by
\begin{eqnarray*} a_{ij} & = & \int_0^1 \int_0^1 \phi_i(x) \log|x-y| \phi_j(y) dy dx \\ & = & \int_{\frac{i}{n}}^{\frac{i+1}{n}} \int_{\frac{j}{n}}^{\frac{j+1}{n}} \log|x-y| dy dx \nonumber . \end{eqnarray*}
The right hand side \(f\) is given by
\[ f_i = \int_0^1 \phi_i(x) f(x) dx. \]
The code starts with the standard initialisation:
Together with the initialisation, the verbosity level of HLIBpro was increased to have additional output during execution. The corresponding function set_verbosity
is available in the namespace HLIB::CFG
.
real_t
is defined to prevent ambiguity errors with HLIB::real and other types of that name.For clustering the unknowns, coordinate information is to be defined. For this 1d example, the indices are placed in the middle of equally sized intervals.
Coordinates in HLIBpro are vectors of the corresponding dimension, e.g. one in for this problem. The set of all coordinates is stored in objects of type TCoordinate.
The usage of smart pointers, e.g. unique_ptr
, is advised, and extensively used in HLIBpro, since it decreases the possibility of memory leaks by automatically deleting the objects upon destruction of the spart pointer variable, e.g. when leaving the local block.
Having coordinates for each index, the cluster tree and block cluster tree can be constructed.
For the cluster tree, the partitioning strategy for the indices is determined automatically by TAutoBSPPartStrat. Furthermore, a \(n_{\min}\) of size 20 is used during construction.
For the block cluster tree, the standard admissibility condition implemented in TStdGeomAdmCond is used with \(\eta = 2\).
Using the block cluster tree, finally the H-matrix can be constructed. For this, adaptive cross approximation (ACA, see Low Rank Approximation) is applied to compute low rank approximations of admissibly matrixblocks. In HLIBpro, an improved version of ACA is implemented in TACAPlus.
ACA only needs access to the matrix coefficients of the dense matrix. Those are provided by TLogCoeffFn, which will be discussed below. Since the application normally uses a different ordering compared to the internal structures of the H-matrix, one also needs a permutation for all indices, e.g., from internal numbering of H-matrices to the external numbering of the application. This is performed by TPermCoeffFn
. Finally, the block wise accuracy of the H-matrix approximation has to be defined using TTruncAcc objects. In this case, an accuracy of \(\epsilon = 10^{-4}\) is used.
CFG::set_nthreads()
.The coefficient function used during matrix construction is implemented in class TLogCoeffFn.
It is derived from the base class for coefficient functions TCoeffFn
, which provides the basic interface. TLogCoeffFn
stores the step width for the grid as an internal variable for kernel evaluation, which is the only argument for the constructor:
The actual interface for the evaluation of the matrix coefficients is implemented by the function eval
, which is defined in TCoeffFn and has to be overloaded:
This functions gets a set of indices for rows (rowidxs
) and columns (colidxs
) and a pointer to a memory block, where all the coefficients should be stored. The indices in both sets are already in the external numbering, hence the indices can directly be used for kernel evaluation. The mapping was performed by TPermCoeffFn. The memory layout of matrix
is column wise, which holds for almost all data in HLIBpro. This is due to the memory layout of BLAS/LAPACK, originally implemented in Fortran, which uses column wise storage.
TCoeffFn
also provides an evaluation function for standard index sets, e.g., TIndexSet
. To avoid compiler warnings about `‘hidden functions’', this version is brought into local scope by
By default, the format of the final matrix defined by the coefficients will be unsymmetric, e.g. lower and upper half of the matrix will be built. Please note, that this will not effect the actual algebraic matrix which may still be symmetric or hermtitian. Only the storage and subsequent algorithms are affected in terms of computational costs.
To change the matrix format, the function matrix_format
has to be overloaded. In this case, as the matrix is symmetric, this looks like
Finally, TLogCoeffFn has to signal the value type, e.g. HLIB::real or HLIB::complex valued, to the matrix construction object with the method is_complex
.
It should be noted, that for a complex coefficient function, to evaluation method is not called eval
but ceval
.
Having constructed the H-matrix, the corresponding equation system shall be solved. In most cases, a standard iterative solver will not be sufficient, e.g. either the convergence rate is to bad or there is no convergence at all. Therefore, a (good) preconditioner is needed, with the inverse of \(A\) being the best possible. Since iterative schemes only need matrix-vector multiplication, the corresponding functionality for the inverse is sufficient. This is provided by an (H-) LU factorisation of \(A\), or for the symmetric case, a LDL factorisation.
For this, either the factorisation classes may be used directly or, instead, the function factorise_inv
be empoyed, which chooses uses a LU factorisation for unsymmetric matrices and a LDL factorisation otherwise. The return value is an object providing the functionality of a linear operator, e.g. evaluation of vectors. In this case, it corresponds to an operator for the evaluation of the inverse.
The copy of A
is neccessary, since the matrix is modified during the factorisation. The accuracy is chosen to be the same as during matrix construction.
Still missing is the right hand side of the equation system. In this example, it was chosen, such that \(\mathbf{u} \equiv \mathbf{1}\):
with rhs
defined by
The RHS was built using the original ordering of the unknowns. To be used with H-arithmetic, it has to be reordered according to the internal numbering of the H-matrix as defined by the clustering process. The object for the cluster tree stores both kind of permutations, i.e., from external to internal (H) numbering and from internal to external numbering. To reorder the RHS, the external to internal (e2i) permutation is used:
For the iterative solver TAutoSolver is used, which automatically chooses a suitable solver for the given combination of matrix and preconditioner. Furthermore, statistics about the solution process is stored in an object of type TSolverInfo, e.g. convergence rate, number of steps.
To check the accuracy of the computed solution, we compare it with the known exact solution. However, the exact solution uses external ordering, while the computed solution is still based on the internal ordering of the H-matrix. To compare both, the ordering has to be equal:
The standard finalisation and catch
block finishes the example: