News
- added missing
hpro_matrix_conjugate/transpose
in C bindings and - simplified
eigen
functions inBLAS
to prevent instantiation problems - re-added
aca_dense_fallback
parameter
- fixed bug in Cholesky factorization
- clustering based on space filling curves (using CGAL library) in
TSFCCTBuilder
- lowrank truncation support based on Frobenius norm
- support for adaptive quadrature order in BEM kernels
- added restriction to real/imaginary part of matrices (
restrict_re/im
) - added
make_symmetric/hermitian
to enforce symmetry status for given matrices
- using generic value types for all major types (matrices, vectors and linear operators)
- officially renamed namespace and header files to
Hpro
andhpro
(old names still valid for compatibility) - C bindings split into separate functions per value type; also with
hpro
prefix now (old function names and functionality is still supported) - support for mixed precision computations for matrix vector multiplication and in solver classes
- added
apply_add
withBLAS::Matrix
as argument toTLinearOperator
classes - added
absolute_prec
to defineTTruncAcc
- added support for NEON instruction set (Apple M1)
- added support for Mongoose graph partitioning library (
TMongooseAlgPartStrat
) - enhanced
TAlgAdmCond
to define maximal number of allowed connecting edges - fixes:
- issue in
BLAS::qrp
for matrices with nrows < ncols - coefficient tests in
TPermCoeffFn
(was missing before) - issue in
TMatrixProduct
with single factor - issue in
TMaternCovCoeffFn
with different row/column coordinates
- issue in
- replaced
HLIB::complex
bystd::complex
- removed old DAG interface
- modified pivot strategy of standard ACA (better, more robust convergence)
- using
geqr2
instead ofgeqrf
for QR factorization (slightly faster) - fixes:
- replaced deprecated features of TBB
TGeomGroupCTBuilder>
: fixed handling of offsets- missing instantiation of
BLAS::random
forBLAS::Vector
added - wrong solving flags in
TLDUInvMatrix
- in-efficient dependency handling in DAG construction for TLR/Tile-H
- memory leak in recursive DAG construction fixed
- fixed weak admissibility (was actually standard admissibility)
- fixed MBLR clustering (ordering only in one dimension led to extremely rectangular clusters)
- fixed bug in lowrank approximation (wrong type conversion)
- fixed non-SIMD implementation of TExpBF (still had additional factor)
- fixed TDenseCoeffFn in case given dense matrix had non-zero row/column offsets
- replaced
tbb::mutex
andtbb::atomic
bystd
versions since marked obsolete in recent TBB versions
- improved new DAG generation system (better speed and parallel scalability) and
made it the default system (old version still available with
CFG::dag_version=1
) - moved non-generic include files into
hpro
sub-directory for better separation with other libraries - added various approximation routines for sums of matrices, operators, e.g., SVD, pairwise SVD, Rand-SVD, Rand-LR and ACA for operator sums
- expanded lazy accumulator arithmetic to move all updates to leaves only (evaluating
all updates simultaneously; see
CFG::Arith::lazy_eval
andCFG::Arith::sum_approx
) - added
TZeroMatBuilder
andbuild_zero_mat
to construct empty matrix for given block clusters tree, e.g., as pre-initialized result of other H-matrix operation - parallelization of various routines during clustering, e.g., sorting, etc. (may result in slightly different clustering with different number of CPU cores)
- bug fixes:
-permutation of dense matrix in
TMatBuilder
removed (inconsistent behaviour compared to if H-matrix is built) -fixed update of aux. data in H-matrix incopy_nearfield
-fixed reading of old HLIB files (wrong processor sets) -fixed return value ofMem::usage
-fixed various issue when using single precision
- fixed various compiler issues with MS Visual C++
- added
build
method for coefficient functions to return dense matrix for given index set - C bindings:
- added
hlib_matrix_to_dense/rank
- added
hlib_matrix_approx_rank
to compute low-rank approximation of given matrix with different methods
- added
- fixed issue in
TBSPNDCTBuilder
when no interface is present - fixed issue in
HLIB::Mem::usage
- fixed issue with HDF5 library but removed support from binary distribution due to linking problems with newer version of libHDF5
- added missing functions to sequential
NET
interface - added functions to directly set LR matrices in
TRkMatrix
- using internal grid generation also in example code (laplace/helmholtz)
- additional spherical grid (different start grid for “inbetween” steps)
- improved coordinate visualization in PostScript format (better minimal distance estimate)
- new DAG generation based on recursive algorithms with automatic deduction of dependencies between
nodes (default: previous DAG; see
CFG::dag_version
) - new coefficient function for Matern kernel (
TMaternCovCoeffFn
) and exponential bilinear form (TExpBF
) - functions for computing low-rank approximations for sum of matrices directly (using pair-wise SVD
approx_sum_svd
, randomized SVDapprox_sum_randsvd
or randomized low-rank approxapprox_sum_randlr
.) - ACA:
-modified stop criterion for ACA (user controllable maximal rank
CFG::Build::aca_max_ratio
) -added dense fallback for ACA if not converging with only computing those coefficients, not yet computed - added MBLR cluster tree construction (
TMBLRCTBuilder
) - modified handling of matrix coefficient functions, especially
TPermCoeffFn
- (limited) grid generation and refinement in BEM library
- using libmvec for sin, cos and exp if available (glibc v2.22 and up) with significant speedups in complex valued computations
- new academic license without any user/host or date limitation
- Accumulator based H-arithmetic reducing number of truncations with support for lazy and eager evaluation
- added randomized SVD and implemented dense approximation and lowrank truncation for all types (SVD, RRQR and Rand-SVD)
- also added lowrank approximation algorithms for RRQR and Rand-SVD for H-construction
(
TRRQRLRApx
,TRandSVDLRApx
) - support for special flat H-hierarchy with optimised arithmetic functions, e.g., in-place inversion.
- support for block refinement during matrix construction, e.g., if admissibility gives false positives
- added infinity matrix norm (
TInfinityNorm
andnorm_inf()
) - implemented
TOffDiagAdmCond
with all off-diagonal blocks being admissible - massive code restructuring and cleanup
- initial support for HDF5 matrix IO (dense and lowrank)
- support for VSX instruction set (POWER CPUs)
- special handling for all BLAS functions in case of parallel Intel MKL
- added parameter configuration with config files
- added some functions to simplify solver stopping criterion
- changed behaviour/incompatibilities with previous versions:
- removed all permutation handling from
THMatrix
andTNearfieldMulVec
- no recompression in ACA/HCA (now only in matrix builders)
ptrcast()
now consistent withcptrcast()
, i.e., no*
needed- some parameter reorganization
- previous
TMatrix::copy_struct
is renamed toTMatrix::copy_struct_from
(TMatrix::copy_struct
will now return a matrix copy without data)
- removed all permutation handling from
- added operator for matrix sum (
TMatrixSum
) in addition to matrix products - missing bindings in C interface (matrix product/sum,
apply, apply_add
) - fixed issue in clustering with predefined partition with many groups
- support for more matrix formats and more robust IO (if files do not follow standard) for Harwell-Boeing/Matrix-Market format
- re-enabled parallel block cluster tree construction on shared memory
- bug fixes:
- in
solve_diag_left_block
- matrix solves in
TLLInvMatrix
- in
- implemented rank revealing QR based low-rank truncation
- Solvers:
- added CGS and TFQMR solvers
- added support for matrix solves in linear iteration (also \mcH-matrices!)
- optional computation of exact residual during iteration
- simplified handling of stop criterion parameters
- using status field in TSolverInfo instead of exception if solver fails (e.g., breakdown)
- some code restructuring
- support for block-wise Jacobi and Gauss-Seidel operators
- support for AVX512 instruction set
- many new user controllable parameters
- misc.:
- handling of diagonal in factorisation (inverse or normal) now a runtime option (default: inverse)
- added optional distance for TWeakAlgAdmCond to support distances other than one
- support for fixed rank 0
- correct progress bar support for WAZ factorisation and inversion
- some reorganization of source/header files
- C bindings:
- added
hlib_admcond_geom_hilo
forTHiLoFreqGeomAdmCond
- additional parameter for
blockdiag
functions (blocksize)
- added
- fixed serious issue with Intel TBB and with current Intel TBB-based Intel MKL
- various bug fixes
- Added factorization of inverse matrix WAZ = I, enabling vector solves using matrix vector mult. instead of forward/backward solves with much better parallel speedup.
- Significant improvements in parallel performance of matrix inversion.
- Improved performance of LU, matrix-vector mult., forward/backward solves.
- Added function
nearfield_sparse
to extract H-matrix nearfield as sparse matrix. - Switched to
adaptive_split_axis
as default for clustering. - Minor Changes:
- Additional options for matrix visualization (colormap, etc.)
- Basic VTK output of block clusters.
- fixed various bugs and race conditions
- extended ctors of various matrix classes to accept optional value type field
- added example on how to assemble block matrices
- Fixed two bugs in point-wise LU.
- Solver changes:
- Refactored solver classes (no interface changes); added
TRichardson
to replaceTSolver
in the future. - Fixed inconsistent computation of residual norm in solvers. Now Richardson, CG and BiCG will compute standard residual norm, while MINRES and GMRES compute preconditioned residual norm.
- Made initialisation of start vector in solver classes optional (function
initialise_start_value
)
- Refactored solver classes (no interface changes); added
- Added function
diagonal
to extract diagonal of a matrix. - Added example spectrum to compute spectrum of graph Laplacian (see also documentation).
- Fixed issues when solving dense matrices (used in new example for many RHSs).
- Modified
THiLoFreqGeomAdmCond
: now maximal number of wavelengths per cluster is tested. - Refactored geometrical clustering classes and partitioning strategies, thereby fixing several issues.
- C++11 changes:
- most object creating functions now return
std::unique_ptr
, - replaced
typedef
byusing
, - added iterators for
TIndexSet
,TNodeSet
,TGraph
,TProcSet
(for range basedfor
).
- most object creating functions now return
- Added parameter to algebraic clustering in C bindings to define partitioning algorithm (BFS, multi level, METIS or Scotch).
- Fixed issues with progress bar during factorisation (wrong block count).
- Removed BSP style communication functions (MPI only now).
- Finished conversion to new
packed_t
SIMD type. Using SSE3 instead of SSE2. - Added lock to
TScotchAlgPartStrat
because Scotch is not multi thread safe.
- Removing implicit reordering of unknowns during matrix-vector multiplication to fix inconsistent behaviour.
Please use permutations from cluster trees or ℋ-matrices to reorder vectors or
TPermMatrix
to represent permuted matrices instead. - Speedup improvements for matrix inversion. Triangular inversion and matrix multiplication available in standard user interface.
- Import/export from/to CCS/CRS matrices simplified.
- Simplified (and faster) mutex wrapper.
- Several C++11 changes.
- Removing reference counters in BLAS interface due to major performance issue on multi-core (-socket) systems. See documentation on how to use the modified interface (and avoid errors).
- New, scalable matrix-vector multiplication implemented.
- Using generic datatype for SIMD instructions, thereby enabling generic SIMD algorithms, e.g. for BEM kernels, and fast adoptation of new SIMD instructions, e.g. AVX2.
- Removed
TVirtualVector
(replaced byTScalarVector
). - and, as usual: several bugs fixed
- fixed race condition in C bindings
- fixed issue with initialisation of static variables
- fixed some bugs
- Major Changes
- Switched from OpenMP to Threading Building Blocks as interface to shared memory parallelism, thereby also changing most algorithms to task-based parallelism.
- Reducing dependency on external libraries by using C++11 features. Also replacing some classes by default C++ versions (finally removing old code).
- Alternative, non-recursive, level-wise ℋ-LU factorisation based on explicit block dependencies, which provides far better speedup on many-core systems, e.g. Intel MIC architecture.
- New H-LU factorisation algorithm also applicable in distributed environments, yielding better load-balancing (albeit with limited speedup).
- Added support for multiple CPUs to many algorithms, e.g. in clustering, norm computations, matrix-vector multiplication and solves, H²-conversion.
- Minor Changes
- Optimised BEM kernels for Intel MIC architecture.
- Introduced TLinearOperator for operators not supporting TMatrix functionality, e.g. factorised matrices.
- HLIBpro file format changed due to internal changes and due to some bugs in the format. However, backward read compatibility for most files written with earlier versions is kept.
- Added Support for Cairo library, thereby providing PDF output.
- And of course: many smaller feature upgrades and bug fixes.
- Matrix Construction:
- Switched to template based coefficient functions (
TCoeffFn
and derived) and all depended classes, e.g.TDenseMBuilder
, SVD and ACA low rank approximation. - Rewrote HCA:
- Simpler interface containing all neccessary functionality in single class.
- Using template for value type.
- Added base classes for permuted indices and for BEM applications using quadrature.
- Added implementation for Laplace and Helmholtz also for linear ansatz spaces and with support for SSE2 and AVX.
- Cleaned up ACA implementation.
- Changed handling of recompression: should now be handled by default for low rank approximation algorithm and not by matrix construction class (to avoid recompression of optimal results).
- Switched to template based coefficient functions (
- Clustering Changes:
- Added
TNDBSPPartStrat
to be used in connection with nested dissection (trying various clusterings and choosing best for ND). - Modified
TNDBSPCTBuilder
to more resemble algebraic version, e.g. average depth for interface clusters instead of maximal. - Fixed bug in PCA based clustering and added version for cardinality based clustering.
- Added various flags to modify clustering, e.g. synchronisation of interface depth, enforcing block clusters with same depth of corresponding clusters, using symmetrised weights in algebraic clustering.
- Input/Output and visualisation:
- Fixed bug in reading dense matrices.
- Changed order of dimension for coordinate IO using Matlab format: now ncoord × dimension (e.g. as also used by Sparse Matrix Collection).
- Added VTK visualisation for coordinates (with various options, e.g. marking clusters or index connectivity) and BEM grids.
- Added Output of Grids in HLIB format.
- Added coordinate IO in MatrixMarket format.
- Changes in LAPACK wrapper:
- added LAPACK workspace queries for optimal workspace size instead of using predefined block size
- using
xGESDD
for large matrices
- various bug fixes.
- Deactivated default coarsening during matrix construction.
- Added special H² matrix builder with predefined cluster bases.
- changes in BEM code:
- Added support for AVX.
- Performance speedups in SSE2 implementation of Helmholtz and Maxwell kernels.
- Runtime detection of SSE2/AVX availability and automatic choice of optimal kernel.
- Added
matrix_format
function to matrix coefficient functions to define whether unsymmetric, symmetric or hermitian (default: unsymmetric).
Defaultbuild
function in matrix builders now without format argument. - Added support for ILP64 BLAS/LAPACK implementations (64bit integers).
- Added support for AMD-LibM (integrated in binary Linux distributions).
- Added vector IO in MatrixMarket format.
- Cleaned up C++ examples (thereby also removing Boost link dependency).
- several bug fixes
- OpenMP exception handling changed: now all threads will stop as soon as possible in case of an error
- fixed several, previously undetected, non-critical compiler warnings (MS Visual C++)
- bug fixes
HLIBpro v1.0 is a major rewrite/reorganisation of many of the H-matrix algorithms. The following list of changes only covers the main topics and is by far not complete.
- added distributed computing via MPI for matrix construction and factorisation
- added H²-matrices
- added internal multi-level graph partitioning for blackbox clustering
- added support for piecewise linear basis functions and Maxwell EFIE/MFIE
- rewrote interface to BLAS/LAPACK
- rewrote C interface with better mapping of internal C++ and C types
- increased robustness of matrix factorisation in case of bad-conditioned matrices
- increased speedup of matrix factorisation in multi-threaded computations
- many performance improvements and bug fixes
- added optional diagonal scaling of H-matrices during LU factorisation
- added blockwise accuracy, e.g. accuracy depending on current matrix block
- rewrote accuracy handling in C bindings
- simplified BSP partitioning methods and added regular cardinality based and principle component based clustering
- added optional balancing of tree depth in cluster tree construction with predefined partitioning
- implemented optional double precision computation of matrix inversion and low-rank truncation in single precision mode
- fixed bug in calling single precision norm functions of LAPACK
- fixed bug in PostScript output and modified H-matrix output in PostScript format
- added support for Jacobi based SVD (
sgejsv
anddgejsv
) in LAPACK v3.2
- removed ID based cluster tree computations in matrices
- always computing SCC in algebraic clustering, also in nested dissection clustering
- reordering clusters depending on size ratio (large first)
- fixed bug with filenames without directories
- fixed non-exception safe OpenMP usage
- added matrix reduction to nearfield part
- added dense low-rank multiplication if result is large dense matrix
- fixed solve functions in
TLU
,TLDL
(checking forNULL
blocks) - fixed OpenMP call with zero threads in
TLU
TLDL
andTMatrixInv
- fixed
operator =
inautoptr
(wrong const) - removed unnecessary checks in
TArray::copy
- fixed recursive call in
restrict_blockdiag
- replaced fixed constants by type dependent constants in
lapack.cc - fixed
TMatrixInv::multiply_diag
when only D is dense
- fixed several warnings from Visual C++ and Intel C++ compilers
- moved all global variables and functions into
HLIB
namespace (exceptxerbla
override) - enabled user defined prefix for functions and types in C interface and added override for namespace name
- reactivated cardinality check when using
HLIB_BSP_AUTO
- replaced threads and mutices by OpenMP (thread start only, no scheduling)
- included log file support in addition to stdout
- added parallel LDLT factorisation (DD and blockdiag only)
- added parallel blockdiag LU factorisation
- added zero approximation during matrix construction (for nearfield only)
- fixed bug in algebraic nested dissection clustering (wrong path length in interface)
- reduced memory consumption/fragmentation in ACA generated matrices with large rank
- added Fiduccia/Mattheyses bisection optimisation for BFS clustering
- added FFT for vectors by implementing support for FFTW3 (optional)
- fixed bug in TBSPPartCTBuilder when using more than two partitions
- fixed potential issues in sorting algorithms
- fixed type issues with
*_bytesize
functions in C interface - fixed bug in PostScript visualisation of matrices if matrix norm is zero
- fixed issues with GCC-4.3
- fixed bug in command line parsing of configuration system
- minor modifications to SCons system to increase userfriendliness
- general Algorithmic Changes
- support for single precision arithmetic; has to be decided before compiling HLIBpro
- made complete C++ functions and classes visible from outside instead of just C interface functions
- rewrote complex arithmetic to distinguish between symmetric and hermitian matrices; added LDLH and LLH factorisations
- inversion now based on LU, thereby reducing memory consumption (roughly halved)
- added computation of the diagonal of the inverse without computing the inverse
- added evaluation of LU, LDLT factorisations (instead of just solving)
- removed point-wise LU and LDLT factorisation (only blocked) to improve robustness with zeroes on diagonal
- added (optional) check and fix for singular sub matrices during inversion and factorisation
- added complex valued HCA
- new version of ACA+
- multiplication C = ADB with diagonal D implemented
- implemented bilinear forms for Helmholtz single and double layer potential
- implemented bilinear form for acoustic scattering
- rewrote algebraic clustering for sparse matrices; added support for Scotch and CHACO
- added support for periodic coordinates in clustering
- added clustering with user defined index partition on first level in cluster tree
- added standard admissibility for algebraic clustering
- added maximal level in clustering to prevent infinite recursion
- modified solvers to handle complex valued data
- added permutation of dense matrices without temporary storage (needed in IO)
- parallel Arithmetic
- added thread parallel algorithms for matrix construction, matrix multiplication, inversion and LU factorisation
- redesigned thread pool, thereby fixing race conditions
- added support for Windows threads
- fixed several issues with thread safety
- Input and Output
- added general I/O functions with autodetection of file format
- added output of matrices in Harwell/Boeing format
- added MatrixMarket format
- added support for Ply and surface mesh format (NetGen) for Grid I/O
- fixed format errors in SAMG output
- conversion of arbitrary matrices to sparse format when writing in SAMG or Harwell/Boeing format
- fixed support for symmetric matrices in Harwell/Boeing format
- C interface
- prefixed all functions, types and constants with
hlib_
(orHLIB_
) to prevent collisions with other definitions (OS or libraries) - added support for C99 complex types (if available)
- added
hlib_set_coarsening
to activate/deactivate coarsening during matrix construction (default: on) and matrix arithmetic (default: off) - added
hlib_matrix_inv_diag
to return diagonal of inverse - added
hlib_matrix_is_complex
to test for real or complex valued matrices - added
hlib_set_nthreads
to set number of threads - added
hlib_coord_t
as special type for coordinates - separated stop criterion and solver in solver interface
- prefixed all functions, types and constants with
- Miscellaneous
- updated CPUflags and Rmalloc
- fixed optimisation issues (leading to infinite loops) in enclosed CLAPACK
- Algorithmic Changes
- added (blocked) LDLT factorisation (now default for symmetric matrices)
- no longer need extra matrix in matrix inversion
- using ACAFull in HCA (instead of SVD)
- adaptively choosing quadrature and interpolation order in ACA and HCA
- rewrote matrix addition to support general cases, e.g. low-rank to blocked
- rewrote low-rank truncation handling
- support for METIS in algebraic clustering routines
- added basic support for “dense” sparse matrices, e.g. with highly coupled indices
- added SSE2 based HCA algorithm
- added infinity norm for vectors
- using norm of preconditioned residual for all solvers if preconditioner is present
- added MINRES iteration
- using
ADM_AUTO
as default admissibility - finally removed all asserts and replaced by internal error checking
- Input and Output
- VRML97 support
- added Matlab compression (Matlab v7) and structs support
- support for Harwell-Boeing matrix format (read-only)
- modified PostScript output of block-wise SVD; now scaled w.r.t. 2-norm of matrix
- OS and Library support
- MS Windows support
- shared libraries for Linux and Windows
- changed configure system to better handle MS Windows environment
- added internal
xerbla
to handle LAPACK errors directly
- C interface
- automatic choice of matrix building in
hlib_matrix_build_bem_grid
- introduced
vector_t
as type to vectors (no more C arrays) - added Gauss and Sauter triangle quadrature rules
- added functions to access matrix and vector entries
- added
copyto
andcopyto_eps
functions - added
hlib_matrix_build_dense
to build H-matrix from dense matrix - changed solver management
- automatic choice of matrix building in
- Miscellaneous
- several improvements and bug fixes
- cleaned up error codes
- updated CPUflags and Rmalloc
- Arithmetic
- added ACA-Full
- added HCA (hybrid cross approximation)
- complex valued ACA and SVD
- added copy with coarsening for H-matrices
- added computation of spectral norm for the inverse of a matrix
- support for permutations in matrix-vector multiplication of sparse matrices
- added support for Laplace SLP/DLP and 3D triangle surface grids
- fixed issues with degenerated bounding boxes in geometrical clustering
- Input/Output
- support for PLTMG matrix format
- Miscellaneous
- replaced error handling with exceptions
- added modified CLAPACK as default implementation of LAPACK to HLIBpro
- integrated CPUFlags into configure system
- added function for fast reciprocal square root
- Arithmetic
- initial support for complex arithmetic
- support for symmetric matrices in arithmetic
- implemented block LU factorisation
- implemented LDLT factorisation
- added Frobenius norm for sparse matrices
- support for CRS format in sparse matrices
- added Jacobi and SOR matrix types (for matrix-vector multiplication)
- implemented hierarchical domain decomposition with parallel arithmetics
- Parallel Algorithms
- thread-parallel Cholesky factorisation
- thread-parallel coarsening of H-matrices
- fixed thread-parallel LU and inversion
- fixed dead-locks in thread-pool
- added direct communication in BSP mode
- parallel addition of matrices and vectors via streams
- Input/Output
- support for Matlab and SAMG format
- Miscellaneous
- introduced C interface functions and types
- added configure system for Makefiles
- added progress meter support for arithmetic
- added internal RTTI system
- support for memory consumption query on HP-UX
- rewrote error handling
- first public version as PHI (Parallel H-matrix Implementation)
- merged BSP-parallel and thread-parallel versions of H-matrix library