This program was contributed by Timo Heister. Special thanks to Sander Rhebergen for the inspiration to finally write this tutorial.
This material is based upon work partially supported by National Science Foundation grant DMS1522191 and the Computational Infrastructure in Geodynamics initiative (CIG), through the National Science Foundation under Award No. EAR-0949446 and The University of California-Davis.
The authors would like to thank the Isaac Newton Institute for Mathematical Sciences, Cambridge, for support and hospitality during the programme Melt in the Mantle where work on this tutorial was undertaken. This work was supported by EPSRC grant no EP/K032208/1.
Note
As a prerequisite of this program, you need to have PETSc or Trilinos and the p4est library installed. The installation of deal.II together with these additional libraries is described in the README file.
Introduction
Building on step-40, this tutorial shows how to solve linear PDEs with several components in parallel using MPI with PETSc or Trilinos for the linear algebra. For this, we return to the Stokes equations as discussed in step-22. The motivation for writing this tutorial is to provide an intermediate step (pun intended) between step-40 (parallel Laplace) and step-32 (parallel coupled Stokes with Boussinesq for a time dependent problem).
The learning outcomes for this tutorial are:
You are able to solve PDEs with several variables in parallel and can apply this to different problems.
You understand the concept of optimal preconditioners and are able to check this for a particular problem.
You are able to construct manufactured solutions using the free computer algreba system SymPy (https://sympy.org).
You can implement various other tasks for parallel programs: error computation, writing graphical output, etc.
You can visualize vector fields, stream lines, and contours of vector quantities.
We are solving for a velocity and pressure that satisfy the Stokes equation, which reads
Optimal preconditioners
Make sure that you read (even better: try) what is described in "Block Schur
complement preconditioner" in the "Possible Extensions" section in step-22. Like described there, we are going to solve the block system using a Krylov method and a block preconditioner.
Our goal here is to construct a very simple (maybe the simplest?) optimal preconditioner for the linear system. A preconditioner is called "optimal" or "of optimal complexity", if the number of iterations of the preconditioned system is independent of the mesh size . You can extend that definition to also require indepence of the number of processors used (we will discuss that in the results section), the computational domain and the mesh quality, the test case itself, the polynomial degree of the finite element space, and more.
Why is a constant number of iterations considered to be "optimal"? Assume the discretized PDE gives a linear system with N unknowns. Because the matrix coming from the FEM discretization is sparse, a matrix-vector product can be done in O(N) time. A preconditioner application can also only be O(N) at best (for example doable with multigrid methods). If the number of iterations required to solve the linear system is independent of (and therefore N), the total cost of solving the system will be O(N). It is not possible to beat this complexity, because even looking at all the entries of the right-hand side already takes O(N) time. For more information see [elman2005], Chapter 2.5 (Multigrid).
The preconditioner described here is even simpler than the one described in step-22 and will typically require more iterations and consequently time to solve. When considering preconditioners, optimality is not the only important metric. But an optimal and expensive preconditioner is typically more desirable than a cheaper, non-optimal one. This is because, eventually, as the mesh size becomes smaller and smaller and linear problems become bigger and bigger, the former will eventually beat the latter.
The solver and preconditioner
We precondition the linear system
with the block diagonal preconditioner
where is the Schur complement.
With this choice of , assuming that we handle and exactly (which is an "idealized" situation), the preconditioned linear system has three distinct eigenvalues independent of and is therefore "optimal". See section 6.2.1 (especially p. 292) in [elman2005]. For comparison, using the ideal version of the upper block-triangular preconditioner in step-22 (also used in step-56) would have all eigenvalues be equal to one.
We will use approximations of the inverse operations in that are (nearly) independent of . In this situation, one can again show, that the eigenvalues are independent of . For the Krylov method we choose MINRES, which is attractive for the analysis (iteration count is proven to be independent of , see the remainder of the chapter 6.2.1 in [elman2005]), great from the computational standpoint (simpler and cheaper than GMRES for example), and applicable (matrix and preconditioner are symmetric).
For the approximations we will use a CG solve with the mass matrix in the pressure space for approximating the action of . Note that the mass matrix is spectrally equivalent to . We can expect the number of CG iterations to be independent of , even with a simple preconditioner like ILU.
For the approximation of the velocity block we will perform a single AMG V-cycle. In practice this choice is not exactly independent of , which can explain the slight increase in iteration numbers. A possible explanation is that the coarsest level will be solved exactly and the number of levels and size of the coarsest matrix is not predictable.
The testcase
We will construct a manufactured solution based on the classical Kovasznay problem, see [kovasznay1948laminar]. Here is an image of the solution colored by the x velocity including streamlines of the velocity:
We have to cheat here, though, because we are not solving the non-linear Navier-Stokes equations, but the linear Stokes system without convective term. Therefore, to recreate the exact same solution, we use the method of manufactured solutions with the solution of the Kovasznay problem. This will effectively move the convective term into the right-hand side .
The right-hand side is computed using the script "reference.py" and we use the exact solution for boundary conditions and error computation.
The commented program
#include <deal.II/base/quadrature_lib.h>
#include <deal.II/base/function.h>
#include <deal.II/base/timer.h>
The following chunk out code is identical to step-40 and allows switching between PETSc and Trilinos:
We need a few helper classes to represent our solver strategy described in the introduction.
namespace LinearSolvers
{
This class exposes the action of applying the inverse of a giving matrix via the function InverseMatrix::vmult(). Internally, the inverse is not formed explicitly. Instead, a linear solver with CG is performed. This class extends the InverseMatrix class in step-22 with an option to specify a preconditioner, and to allow for different vector types in the vmult function.
The main class is very similar to step-40, except that matrices and vectors are now block versions, and we store a std::vector<IndexSet> for owned and relevant DoFs instead of a single IndexSet. We have exactly two IndexSets, one for all velocity unknowns and one for all pressure unknowns.
The construction of the block matrices and vectors is new compared to step-40 and is different compared to serial codes like step-22, because we need to supply the set of rows that belong to our processor.
Put all dim velocities into block 0 and the pressure into block 1, then reorder the unknowns by block. Finally count how many unknowns we have per block.
Setting up the constraints for boundary conditions and hanging nodes is identical to step-40. Even though we don't have any hanging nodes because we only perform global refinement, it is still a good idea to put this function call in, in case adaptive refinement gets introduced later.
Now we create the system matrix based on a BlockDynamicSparsityPattern. We know that we won't have coupling between different velocity components (because we use the laplace and not the deformation tensor) and no coupling between pressure with its test functions, so we use a Table to communicate this coupling information to DoFTools::make_sparsity_pattern.
The preconditioner matrix has a different coupling (we only fill in the 1,1 block with the mass matrix), otherwise this code is identical to the construction of the system_matrix above.
This function solves the linear system with MINRES with a block diagonal preconditioner and AMG for the two diagonal blocks as described in the introduction. The preconditioner applies a v cycle to the 0,0 block and a CG with the mass matrix for the 1,1 block (the Schur complement).
As expected from the discussion above, the number of iterations is independent of the number of processors and only very slightly dependent on :
PETSc
number of processors
cycle
dofs
1
2
4
8
16
32
64
128
0
659
49
49
49
51
51
51
49
49
1
2467
52
52
52
52
52
54
54
53
2
9539
56
56
56
54
56
56
54
56
3
37507
57
57
57
57
57
56
57
56
4
148739
58
59
57
59
57
57
57
57
5
592387
60
60
59
59
59
59
59
59
6
2364419
62
62
61
61
61
61
61
61
Trilinos
number of processors
cycle
dofs
1
2
4
8
16
32
64
128
0
659
37
37
37
37
37
37
37
37
1
2467
92
89
89
82
86
81
78
78
2
9539
102
99
96
95
95
88
83
95
3
37507
107
105
104
99
100
96
96
90
4
148739
112
112
111
111
127
126
115
117
5
592387
116
115
114
112
118
120
131
130
6
2364419
130
126
120
120
121
122
121
123
While the PETSc results show a constant number of iterations, the iterations increase when using Trilinos. This is likely because of the different settings used for the AMG preconditioner. For performance reasons we do not allow coarsening below a couple thousand unknowns. As the coarse solver is an exact solve (we are using LU by default), a change in number of levels will influence the quality of a V-cycle. Therefore, a V-cycle is closer to an exact solver for smaller problem sizes.
Possibilities for extensions
Investigate Trilinos iterations
Play with the smoothers, smoothing steps, and other properties for the Trilinos AMG to achieve an optimal preconditioner.
Solve the Oseen problem instead of the Stokes system
This change requires changing the outer solver to GMRES or BiCGStab, because the system is no longer symmetric.
You can prescribe the exact flow solution as in the convective term . This should give the same solution as the original problem, if you set the right hand side to zero.
Adaptive refinement
So far, this tutorial program refines the mesh globally in each step. Replacing the code in StokesProblem::refine_grid() by something like