This program was contributed by Ryan Grove and Timo Heister.
This material is based upon work partially supported by National Science Foundation grant DMS1522191 and the Computational Infrastructure in Geodynamics initiative (CIG), through the National Science Foundation under Award No. EAR-0949446 and The University of California-Davis.
The authors would like to thank the Isaac Newton Institute for Mathematical Sciences, Cambridge, for support and hospitality during the programme Melt in the Mantle where work on this tutorial was undertaken. This work was supported by EPSRC grant no EP/K032208/1.
Note
If you use this program as a basis for your own work, please consider citing it in your list of references. The initial version of this work was contributed to the deal.II project by the authors listed in the following citation:
Introduction
Stokes Problem
The purpose of this tutorial is to create an efficient linear solver for the Stokes equation and compare it to alternative approaches. Here, we will use FGMRES with geometric multigrid as a preconditioner velocity block, and we will show in the results section that this is a fundamentally better approach than the linear solvers used in step-22 (including the scheme described in "Possible Extensions"). Fundamentally, this is because only with multigrid it is possible to get solve time, where is the number of unknowns of the linear system. Using the Timer class, we collect some statistics to compare setup times, solve times, and number of iterations. We also compute errors to make sure that what we have implemented is correct.
Let and . The Stokes equations read as follows in non-dimensionalized form:
Note that we are using the deformation tensor instead of (a detailed description of the difference between the two can be found in step-22, but in summary, the deformation tensor is more physical as well as more expensive).
Linear Solver and Preconditioning Issues
The weak form of the discrete equations naturally leads to the following linear system for the nodal values of the velocity and pressure fields:
Our goal is to compare several solution approaches. While step-22 solves the linear system using a "Schur complement approach" in two separate steps, we instead attack the block system at once using FMGRES with an efficient preconditioner, in the spirit of the approach outlined in the "Results" section of step-22. The idea is as follows: if we find a block preconditioner such that the matrix
is simple, then an iterative solver with that preconditioner will converge in a few iterations. Notice that we are doing right preconditioning here. Using the Schur complement , we find that
is a good choice. Let be an approximation of and of , we see
Since is aimed to be a preconditioner only, we shall use the approximations on the right in the equation above.
As discussed in step-22, , where is the pressure mass matrix and is solved approximately by using CG with ILU as a preconditioner, and is obtained by one of multiple methods: solving a linear system with CG and ILU as preconditioner, just using one application of an ILU, solving a linear system with CG and GMG (Geometric Multigrid as described in step-16) as a preconditioner, or just performing a single V-cycle of GMG.
As a comparison, instead of FGMRES, we also use the direct solver UMFPACK on the whole system to compare our results with. If you want to use a direct solver (like UMFPACK), the system needs to be invertible. To avoid the one dimensional null space given by the constant pressures, we fix the first pressure unknown to zero. This is not necessary for the iterative solvers.
Reference Solution
The test problem is a "Manufactured Solution" (see step-7 for details), and we choose and . We apply Dirichlet boundary conditions for the velocity on the whole boundary of the domain . To enforce the boundary conditions we can just use our reference solution.
If you look up in the deal.II manual what is needed to create a class derived from Function<dim>, you will find that this class has numerous virtual functions, including Function::value(), Function::vector_value(), Function::value_list(), etc., all of which can be overloaded. Different parts of deal.II will require different ones of these particular functions. This can be confusing at first, but luckily the only thing you actually have to implement is value(). The other virtual functions in the Function class have default implementations inside that will call your implementation of value by default.
Notice that our reference solution fulfills . In addition, the pressure is chosen to have a mean value of zero. For the "Method of Manufactured Solutions" of step-7, we need to find such that:
Using the reference solution above, we obtain:
Computing Errors
Because we do not enforce the mean pressure to be zero for our numerical solution in the linear system, we need to post process the solution after solving. To do this we use the VectorTools::compute_mean_value() function to compute the mean value of the pressure to subtract it from the pressure.
DoF Handlers
The way we implement geometric multigrid here only executes it on the velocity variables (i.e., the matrix described above) but not the pressure. One could implement this in different ways, including one in which one considers all coarse grid operations as acting on block systems where we only consider the top left block. Alternatively, we can implement things by really only considering a linear system on the velocity part of the overall finite element discretization. The latter is the way we want to use here.
To implement this, one would need to be able to ask questions such as "May I have just part of a DoFHandler?". This is not possible at the time when this program was written, so in order to answer this request for our needs, we simply create a separate, second DoFHandler for just the velocities. We then build linear systems for the multigrid preconditioner based on only this second DoFHandler, and simply transfer the first block of (overall) vectors into corresponding vectors for the entire second DoFHandler. To make this work, we have to assure that the order in which the (velocity) degrees of freedom are ordered in the two DoFHandler objects is the same. This is in fact the case by first distributing degrees of freedom on both, and then using the same sequence of DoFRenumbering operations on both.
Differences from the Step 22 tutorial
The main difference between step-56 and step-22 is that we use block solvers instead of the Schur Complement approach used in step-22. Details of this approach can be found under the "Block Schur
complement preconditioner" subsection of the "Possible Extensions" section of step-22. For the preconditioner of the velocity block, we borrow a class from ASPECT called BlockSchurPreconditioner that has the option to solve for the inverse of or just apply one preconditioner sweep for it instead, which provides us with an expensive and cheap approach, respectively.
In order to make it easy to switch between the different solvers that are being used, we declare an enum that can be passed as an argument to the constructor of the main class.
The class Solution is used to define the boundary conditions and to compute errors of the numerical solution. Note that we need to define the values and gradients in order to compute L2 and H1 errors. Here we decided to separate the implementations for 2d and 3d using template specialization.
Note that the first dim components are the velocity components and the last is the pressure.
In the following, we will implement a preconditioner that expands on the ideas discussed in the Results section of step-22. Specifically, we
use an upper block-triangular preconditioner because we want to use right preconditioning.
optionally allow using an inner solver for the velocity block instead of a single preconditioner application.
do not use InverseMatrix but explicitly call SolverCG. This approach is also used in the ASPECT code (see https://aspect.geodynamics.org) that solves the Stokes equations in the context of simulating convection in the earth mantle, and which has been used to solve problems on many thousands of processors.
The bool flag do_solve_A in the constructor allows us to either apply the preconditioner for the velocity block once or use an inner iterative solver for a more accurate approximation instead.
Notice how we keep track of the sum of the inner iterations (preconditioner applications).
template <class PreconditionerAType, class PreconditionerSType>
class BlockSchurPreconditioner : publicSubscriptor
The main DoFHandler only needs active DoFs, so we are not calling distribute_mg_dofs() here
dof_handler.distribute_dofs(fe);
This block structure separates the dim velocity components from the pressure component (used for reordering). Note that we have 2 instead of dim+1 blocks like in step-22, because our FESystem is nested and the dim velocity components appear as one block.
This ensures that all velocities DoFs are enumerated before the pressure unknowns. This allows us to use blocks for vectors and matrices and allows us to get the same DoF numbering for dof_handler and velocity_dof_handler.
The following block of code initializes the MGConstrainedDofs (using the boundary conditions for the velocity), and the sparsity patterns and matrices for each level. The resize() function of MGLevelObject<T> will destroy all existing contained objects.
The following makes use of a component mask for interpolation of the boundary values for the velocity only, which is further explained in the vector valued dealii step-20 tutorial.
As discussed in the introduction, we need to fix one degree of freedom of the pressure variable to ensure solvability of the problem. We do this here by marking the first pressure dof, which has index n_u as a constrained dof.
if (solver_type == SolverType::UMFPACK)
constraints.add_line(n_u);
constraints.close();
}
std::cout << "\tNumber of active cells: " << triangulation.n_active_cells()
<< std::endl
<< "\tNumber of degrees of freedom: " << dof_handler.n_dofs()
In this function, the system matrix is assembled. We assemble the pressure mass matrix in the (1,1) block (if needed) and move it out of this location at the end of this function.
This function sets up things differently based on if you want to use ILU or GMG as a preconditioner. Both methods share the same solver (FGMRES) but require a different preconditioner to be initialized. Here we time not only the entire solve function, but we separately time the setup of the preconditioner as well as the solve itself.
This is used to pass whether or not we want to solve for A inside the preconditioner. One could change this to false to see if there is still convergence and if so does the program then run faster or slower
This function computes the L2 and H1 errors of the solution. For this, we need to make sure the pressure has mean zero.
template <int dim>
void StokesProblem<dim>::compute_errors()
{
Compute the mean pressure and then subtract it from each pressure coefficient. This will result in a pressure with mean value zero. Here we make use of the fact that the pressure is component and that the finite element space is nodal.
We first run the code and confirm that the finite element solution converges with the correct rates as predicted by the error analysis of mixed finite element problems. Given sufficiently smooth exact solutions and , the errors of the Taylor-Hood element should be
see for example Ern/Guermond "Theory and Practice of Finite Elements", Section 4.2.5 p195. This is indeed what we observe, using the element as an example (this is what is done in the code, but is easily changed in main()):
L2 Velocity
Reduction
L2 Pressure
Reduction
H1 Velocity
Reduction
3D, 3 global refinements
0.000670888
-
0.0036533
-
0.0414704
-
3D, 4 global refinements
8.38E-005
8.0
0.00088494
4.1
0.0103781
4.0
3D, 5 global refinements
1.05E-005
8.0
0.000220253
4.0
0.00259519
4.0
Timing Results
Let us compare the direct solver approach using UMFPACK to the two methods in which we choose and by solving linear systems with using CG. The preconditioner for CG is then either ILU or GMG. The following table summarizes solver iterations, timings, and virtual memory (VM) peak usage:
General
GMG
ILU
UMFPACK
Timings
Timings
Iterations
Timings
Iterations
Timings
Cycle
DoFs
Setup
Assembly
Setup
Solve
Outer
Inner (A)
Inner (S)
VM Peak
Setup
Solve
Outer
Inner (A)
Inner (S)
VM Peak
Setup
Solve
VM Peak
0
15468
0.1s
0.3s
0.3s
1.3s
21
67
22
4805
0.3s
0.6s
21
180
22
4783
2.65s
2.8s
5054
1
112724
1.0s
2.4s
2.6s
14s
21
67
22
5441
2.8s
15.8s
21
320
22
5125
236s
237s
11288
2
859812
9.0s
20s
20s
101s
20
65
21
10641
27s
268s
21
592
22
8307
-
-
-
As can be seen from the table:
UMFPACK uses large amounts of memory, especially in 3d. Also, UMFPACK timings do not scale favorably with problem size.
Because we are using inner solvers for and , ILU and GMG require the same number of outer iterations.
The number of (inner) iterations for increases for ILU with refinement, leading to worse than linear scaling in solve time. In contrast, the number of inner iterations for stays constant with GMG leading to nearly perfect scaling in solve time.
GMG needs slightly more memory than ILU to store the level and interface matrices.
Possibilities for extensions
Check higher order discretizations
Experiment with higher order stable FE pairs and check that you observe the correct convergence rates.
Compare with cheap preconditioner
The introduction also outlined another option to precondition the overall system, namely one in which we do not choose as in the table above, but in which is only a single preconditioner application with GMG or ILU, respectively.
This is in fact implemented in the code: Currently, the boolean use_expensive in solve() is set to true. The option mentioned above is obtained by setting it to false.
What you will find is that the number of FGMRES iterations stays constant under refinement if you use GMG this way. This means that the Multigrid is optimal and independent of .