This program was contributed by Martin Kronbichler and Scott Miller.
Introduction
This tutorial program presents the implementation of a hybridizable discontinuous Galkerin method for the convection-diffusion equation.
Hybridizable discontinuous Galerkin methods
One common argument against the use of discontinuous Galerkin elements is the large number of globally coupled degrees of freedom that one must solve in an implicit system. This is because, unlike continuous finite elements, in typical discontinuous elements there is one degree of freedom at each vertex for each of the adjacent elements, rather than just one, and similarly for edges and faces. As an example of how fast the number of unknowns grows, consider the FE_DGPMonomial basis: each scalar solution component is represented by polynomials of degree with degrees of freedom per element. Typically, all degrees of freedom in an element are coupled to all of the degrees of freedom in the adjacent elements. The resulting discrete equations yield very large linear systems very quickly, especially for systems of equations in 2 or 3 dimensions.
Reducing the size of the linear system
To alleviate the computational cost of solving such large linear systems, the hybridizable discontinuous Galerkin (HDG) methodology was introduced by Cockburn and co-workers (see the references in the recent HDG overview article by Nguyen and Peraire [Ngu2012]).
The HDG method achieves this goal by formulating the mathematical problem using Dirichlet-to-Neumann mappings. The partial differential equations are first written as a first order system, and each field is then discretized via a DG method. At this point, the single-valued "trace" values on the skeleton of the mesh, i.e., element faces, are taken to be independent unknown quantities. This yields unknowns in the discrete formulation that fall into two categories:
Face unknowns that only couple with the cell unknowns from both sides of the face;
Cell unknowns that only couple with the cell and face unknowns defined within the same cell. Crucially, no cell interior degree of freedom on one cell ever couples to any interior cell degree of freedom of a different cell.
The Dirichlet-to-Neumann map concept then permits the following solution procedure:
Use local element interior data to enforce a Neumann condition on the skeleton of the triangulation. The global problem is then to solve for the trace values, which are the only globally coupled unknowns.
Use the known skeleton values as Dirichlet data for solving local element-level solutions. This is known as the 'local solver', and is an embarrassingly parallel element-by-element solution process.
Relation with Static Condensation
The above procedure also has a linear algebra interpretation—referred to as static condensation—that was exploited to reduce the size of the global linear system by Guyan in the context of continuous Finite Elements [G65], and by Fraeijs de Veubeke for mixed methods [F65]. In the latter case (mixed formulation), the system reduction was achieved through the use of discontinuous fluxes combined with the introduction of an additional auxiliary hybrid variable that approximates the trace of the unknown at the boundary of every element. This procedure became known as hybridization and—by analogy—is the reason why the local discontinuous Galerkin method introduced by Cockburn, Gopalakrishnan, and Lazarov in 2009 [CGL2009], and subsequently developed by their collaborators, eventually came to be known as the hybridizable discontinuous Galerkin (HDG) method.
Let us write the complete linear system associated to the HDG problem as a block system with the discrete DG (cell interior) variables as first block and the skeleton (face) variables as the second block:
Our aim is now to eliminate the block with a Schur complement approach similar to step-20, which results in the following two steps:
The point is that the presence of is not a problem because is a block diagonal matrix where each block corresponds to one cell and is therefore easy enough to invert. The coupling to other cells is introduced by the matrices and over the skeleton variable. The block-diagonality of and the structure in and allow us to invert the matrix element by element (the local solution of the Dirichlet problem) and subtract from . The steps in the Dirichlet-to-Neumann map concept hence correspond to
constructing the Schur complement matrix and right hand side locally on each cell and inserting the contribution into the global trace matrix in the usual way,
solving the Schur complement system for , and
solving for using the second equation, given .
Solution quality and rates of convergence
Another criticism of traditional DG methods is that the approximate fluxes converge suboptimally. The local HDG solutions can be shown to converge as , i.e., at optimal order. Additionally, a super-convergence property can be used to post-process a new approximate solution that converges at the rate .
Alternative approaches
The hybridizable discontinuous Galerkin method is only one way in which the problems of the discontinuous Galerkin method can be addressed. Another idea is what is called the "weak Galerkin" method. It is explored in step-61.
HDG applied to the convection-diffusion problem
The HDG formulation used for this example is taken from N.C. Nguyen, J. Peraire, B. Cockburn: An implicit high-order hybridizable discontinuous Galerkin method for linear convection–diffusion equations, Journal of Computational Physics, 2009, 228:9, 3232-3254. [DOI]
We consider the convection-diffusion equation over the domain with Dirichlet boundary and Neumann boundary :
Introduce the auxiliary variable and rewrite the above equation as the first order system:
We multiply these equations by the weight functions and integrate by parts over every element to obtain:
The terms decorated with a hat denote the numerical traces (also commonly referred to as numerical fluxes). They are approximations to the interior values on the boundary of the element. To ensure conservation, these terms must be single-valued on any given element edge even though, with discontinuous shape functions, there may of course be multiple values coming from the cells adjacent to an interface. We eliminate the numerical trace by using traces of the form:
The variable is introduced as an additional independent variable and is the one for which we finally set up a globally coupled linear system. As mentioned above, it is defined on the element faces and discontinuous from one face to another wherever faces meet (at vertices in 2d, and at edges and vertices in 3d). Values for and appearing in the numerical trace function are taken to be the cell's interior solution restricted to the boundary .
The local stabilization parameter has effects on stability and accuracy of HDG solutions; see the literature for a further discussion. A stabilization parameter of unity is reported to be the choice which gives best results. A stabilization parameter that tends to infinity prohibits jumps in the solution over the element boundaries, making the HDG solution approach the approximation with continuous finite elements. In the program below, we choose the stabilization parameter as
where we set the diffusion and the diffusion length scale to .
The trace/skeleton variables in HDG methods are single-valued on element faces. As such, they must strongly represent the Dirichlet data on . This means that
where the equal sign actually means an projection of the boundary function onto the space of the face variables (e.g. linear functions on the faces). This constraint is then applied to the skeleton variable using inhomogeneous constraints by the method VectorTools::project_boundary_values.
Summing the elemental contributions across all elements in the triangulation, enforcing the normal component of the numerical flux, and integrating by parts on the equation weighted by , we arrive at the final form of the problem: Find such that
The unknowns are referred to as local variables; they are represented as standard DG variables. The unknown is the skeleton variable which has support on the codimension-1 surfaces (faces) of the mesh.
We use the notation to denote the sum of integrals over all cells and to denote integration over all faces of all cells, i.e., interior faces are visited twice, once from each side and with the corresponding normal vectors. When combining the contribution from both elements sharing a face, the above equation yields terms familiar from the DG method, with jumps of the solution over the cell boundaries.
In the equation above, the space for the scalar variable is defined as the space of functions that are tensor product polynomials of degree on each cell and discontinuous over the element boundaries , i.e., the space described by FE_DGQ<dim>(p). The space for the gradient or flux variable is a vector element space where each component is a locally polynomial and discontinuous . In the code below, we collect these two local parts together in one FESystem where the first dim components denote the gradient part and the last scalar component corresponds to the scalar variable. For the skeleton component , we define a space that consists of discontinuous tensor product polynomials that live on the element faces, which in deal.II is implemented by the class FE_FaceQ. This space is otherwise similar to FE_DGQ, i.e., the solution function is not continuous between two neighboring faces, see also the results section below for an illustration.
In the weak form given above, we can note the following coupling patterns:
The matrix consists of local-local coupling terms. These arise when the local weighting functions multiply the local solution terms . Because the elements are discontinuous, is block diagonal.
The matrix represents the local-face coupling. These are the terms with weighting functions multiplying the skeleton variable .
The matrix represents the face-local coupling, which involves the weighting function multiplying the local solutions .
The matrix is the face-face coupling; terms involve both and .
Post-processing and super-convergence
One special feature of the HDG methods is that they typically allow for constructing an enriched solution that gains accuracy. This post-processing takes the HDG solution in an element-by-element fashion and combines it such that one can get order of accuracy when using polynomials of degree . For this to happen, there are two necessary ingredients:
The computed solution gradient converges at optimal rate, i.e., .
The cell-wise average of the scalar part of the solution, , super-converges at rate .
We now introduce a new variable , which we find by minimizing the expression over the cell under the constraint . The constraint is necessary because the minimization functional does not determine the constant part of . This translates to the following system of equations:
Since we test by the whole set of basis functions in the space of tensor product polynomials of degree in the second set of equations, this is an overdetermined system with one more equation than unknowns. We fix this in the code below by omitting one of these equations (since the rows in the Laplacian are linearly dependent when representing a constant function). As we will see below, this form of the post-processing gives the desired super-convergence result with rate . It should be noted that there is some freedom in constructing and this minimization approach to extract the information from the gradient is not the only one. In particular, the post-processed solution defined here does not satisfy the convection-diffusion equation in any sense. As an alternative, the paper by Nguyen, Peraire and Cockburn cited above suggests another somewhat more involved formula for convection-diffusion that can also post-process the flux variable into an -conforming variant and better represents the local convection-diffusion operator when the diffusion is small. We leave the implementation of a more sophisticated post-processing as a possible extension to the interested reader.
Note that for vector-valued problems, the post-processing works similarly. One simply sets the constraint for the mean value of each vector component separately and uses the gradient as the main source of information.
Problem specific data
For this tutorial program, we consider almost the same test case as in step-7. The computational domain is and the exact solution corresponds to the one in step-7, except for a scaling. We use the following source centers for the exponentials
1D: ,
2D: ,
3D: .
With the exact solution given, we then choose the forcing on the right hand side and the Neumann boundary condition such that we obtain this solution (manufactured solution technique). In this example, we choose the diffusion equal to one and the convection as
Note that the convection is divergence-free, .
Implementation
Besides implementing the above equations, the implementation below provides the following features:
WorkStream to parallelize local solvers. Workstream has been presented in detail in step-9.
Reconstruct the local DG solution from the trace.
Post-processing the solution for superconvergence.
DataOutFaces for direct output of the global skeleton solution.
The commented program
Include files
Most of the deal.II include files have already been covered in previous examples and are not commented on.
However, we do have a few new includes for the example. The first one defines finite element spaces on the faces of the triangulation, which we refer to as the 'skeleton'. These finite elements do not have any support on the element interior, and they represent polynomials that have a single value on each codimension-1 surface, but admit discontinuities on codimension-2 surfaces.
#include <deal.II/fe/fe_face.h>
The second new file we include defines a new type of sparse matrix. The regular SparseMatrix type stores indices to all non-zero entries. The ChunkSparseMatrix takes advantage of the coupled nature of DG solutions. It stores an index to a matrix sub-block of a specified size. In the HDG context, this sub-block-size is actually the number of degrees of freedom per face defined by the skeleton solution field. This reduces the memory consumption of the matrix by up to one third and results in similar speedups when using the matrix in solvers.
#include <deal.II/lac/chunk_sparse_matrix.h>
The final new include for this example deals with data output. Since we have a finite element field defined on the skeleton of the mesh, we would like to visualize what that solution actually is. DataOutFaces does exactly this; the interface is the almost the same as the familiar DataOut, but the output only has codimension-1 data for the simulation.
#include <deal.II/numerics/data_out_faces.h>
#include <iostream>
We start by putting all of our classes into their own namespace.
The structure of the analytic solution is the same as in step-7. There are two exceptions. Firstly, we also create a solution for the 3d case, and secondly, we scale the solution so its norm is of order unity for all values of the solution width.
This class implements a function where the scalar solution and its negative gradient are collected together. This function is used when computing the error of the HDG approximation and its implementation is to simply call value and gradient function of the Solution class.
template <int dim>
class SolutionAndGradient : publicFunction<dim>, protected SolutionBase<dim>
Next comes the implementation of the convection velocity. As described in the introduction, we choose a velocity field that is in 2d and in 3d. This gives a divergence-free velocity field.
The last function we implement is the right hand side for the manufactured solution. It is very similar to step-7, with the exception that we now have a convection term instead of the reaction term. Since the velocity field is incompressible, i.e., , the advection term simply reads .
template <int dim>
class RightHandSide : publicFunction<dim>, protected SolutionBase<dim>
The HDG solution procedure follows closely that of step-7. The major difference is the use of three different sets of DoFHandler and FE objects, along with the ChunkSparseMatrix and the corresponding solutions vectors. We also use WorkStream to enable a multithreaded local solution process which exploits the embarrassingly parallel nature of the local solver. For WorkStream, we define the local operations on a cell and a copy function into the global matrix and vector. We do this both for the assembly (which is run twice, once when we generate the system matrix and once when we compute the element-interior solutions from the skeleton values) and for the postprocessing where we extract a solution that converges at higher order.
Data for the assembly and solution of the primal variables.
struct PerTaskData;
struct ScratchData;
Post-processing the solution to obtain is an element-by-element procedure; as such, we do not need to assemble any global data and do not declare any 'task data' for WorkStream to use.
struct PostProcessScratchData;
The following three functions are used by WorkStream to do the actual work of the program.
As stated in the introduction, HDG solutions can be post-processed to attain superconvergence rates of . The post-processed solution is a discontinuous finite element solution representing the primal variable on the interior of each cell. We define a FE type of degree to represent this post-processed solution, which we only use for output after constructing it.
The degrees of freedom corresponding to the skeleton strongly enforce Dirichlet boundary conditions, just as in a continuous Galerkin finite element method. We can enforce the boundary conditions in an analogous manner via an AffineConstraints object. In addition, hanging nodes are handled in the same way as for continuous finite elements: For the face elements which only define degrees of freedom on the face, this process sets the solution on the refined side to coincide with the representation on the coarse side.
Note that for HDG, the elimination of hanging nodes is not the only possibility — in terms of the HDG theory, one could also use the unknowns from the refined side and express the local solution on the coarse side through the trace values on the refined side. However, such a setup is not as easily implemented in terms of deal.II loops and not further analyzed.
The usage of the ChunkSparseMatrix class is similar to the usual sparse matrices: You need a sparsity pattern of type ChunkSparsityPattern and the actual matrix object. When creating the sparsity pattern, we just have to additionally pass the size of local blocks.
The constructor is similar to those in other examples, with the exception of handling multiple DoFHandler and FiniteElement objects. Note that we create a system of finite elements for the local DG part, including the gradient/flux part and the scalar part.
The system for an HDG solution is setup in an analogous manner to most of the other tutorial programs. We are careful to distribute dofs with all of our DoFHandler objects. The solution and system_matrix objects go with the global skeleton solution.
template <int dim>
void HDG<dim>::setup_system()
{
dof_handler_local.distribute_dofs(fe_local);
dof_handler.distribute_dofs(fe);
dof_handler_u_post.distribute_dofs(fe_u_post);
std::cout << " Number of degrees of freedom: " << dof_handler.n_dofs()
When creating the chunk sparsity pattern, we first create the usual dynamic sparsity pattern and then set the chunk size, which is equal to the number of dofs on a face, when copying this into the final sparsity pattern.
Next comes the definition of the local data structures for the parallel assembly. The first structure PerTaskData contains the local vector and matrix that are written into the global matrix, whereas the ScratchData contains all data that we need for the local assembly. There is one variable worth noting here, namely the boolean variable trace_reconstruct. As mentioned in the introduction, we solve the HDG system in two steps. First, we create a linear system for the skeleton system where we condense the local part into it via the Schur complement . Then, we solve for the local part using the skeleton solution. For these two steps, we need the same matrices on the elements twice, which we want to compute by two assembly steps. Since most of the code is similar, we do this with the same function but only switch between the two based on a flag that we set when starting the assembly. Since we need to pass this information on to the local worker routines, we store it once in the task data.
ScratchData contains persistent data for each thread within WorkStream. The FEValues, matrix, and vector objects should be familiar by now. There are two objects that need to be discussed: std::vector<std::vector<unsigned int> > fe_local_support_on_face and std::vector<std::vector<unsigned int> > fe_support_on_face. These are used to indicate whether or not the finite elements chosen have support (non-zero values) on a given face of the reference cell for the local part associated to fe_local and the skeleton part fe. We extract this information in the constructor and store it once for all cells that we work on. Had we not stored this information, we would be forced to assemble a large number of zero terms on each cell, which would significantly slow the program.
PostProcessScratchData contains the data used by WorkStream when post-processing the local solution . It is similar, but much simpler, than ScratchData.
The assemble_system function is similar to the one on step-32, where the quadrature formula and the update flags are set up, and then WorkStream is used to do the work in a multi-threaded manner. The trace_reconstruct input parameter is used to decide whether we are solving for the global skeleton solution (false) or the local solution (true).
One thing worth noting for the multi-threaded execution of assembly is the fact that the local computations in assemble_system_one_cell() call into BLAS and LAPACK functions if those are available in deal.II. Thus, the underlying BLAS/LAPACK library must support calls from multiple threads at the same time. Most implementations do support this, but some libraries need to be built in a specific way to avoid problems. For example, OpenBLAS compiled without multithreading inside the BLAS/LAPACK calls needs to built with a flag called USE_LOCKING set to true.
The real work of the HDG program is done by assemble_system_one_cell. Assembling the local matrices is done here, along with the local contributions of the global matrix .
We first compute the cell-interior contribution to ll_matrix matrix (referred to as matrix in the introduction) corresponding to local-local coupling, as well as the local right-hand-side vector. We store the values at each quadrature point for the basis functions, the right-hand-side value, and the convection velocity, in order to have quick access to these fields.
Face terms are assembled on all faces of all elements. This is in contrast to more traditional DG methods, where each face is only visited once in the assembly procedure.
Here we compute the stabilization parameter discussed in the introduction: since the diffusion is one and the diffusion length scale is set to 1/5, it simply results in a contribution of 5 for the diffusion part and the magnitude of convection through the element boundary in a centered scheme for the convection part.
When trace_reconstruct=false, we are preparing to assemble the system for the skeleton variable . If this is the case, we must assemble all local matrices associated with the problem: local-local, local-face, face-local, and face-face. The face-face matrix is stored as TaskData::cell_matrix, so that it can be assembled into the global system by copy_local_to_global.
i < scratch.fe_local_support_on_face[face_no].size();
++i)
for (unsignedint j = 0;
j < scratch.fe_support_on_face[face_no].size();
++j)
{
constunsignedint ii =
scratch.fe_local_support_on_face[face_no][i];
constunsignedint jj =
scratch.fe_support_on_face[face_no][j];
scratch.lf_matrix(ii, jj) +=
((scratch.q_phi[i] * normal +
(convection * normal - tau_stab) * scratch.u_phi[i]) *
scratch.tr_phi[j]) *
JxW;
Note the sign of the face_no-local matrix. We negate the sign during assembly here so that we can use the FullMatrix::mmult with addition when computing the Schur complement.
scratch.fl_matrix(jj, ii) -=
((scratch.q_phi[i] * normal +
tau_stab * scratch.u_phi[i]) *
scratch.tr_phi[j]) *
JxW;
}
for (unsignedint i = 0;
i < scratch.fe_support_on_face[face_no].size();
++i)
for (unsignedint j = 0;
j < scratch.fe_support_on_face[face_no].size();
++j)
{
constunsignedint ii =
scratch.fe_support_on_face[face_no][i];
constunsignedint jj =
scratch.fe_support_on_face[face_no][j];
task_data.cell_matrix(ii, jj) +=
((convection * normal - tau_stab) * scratch.tr_phi[i] *
When trace_reconstruct=true, we are solving for the local solutions on an element by element basis. The local right-hand-side is calculated by replacing the basis functions tr_phi in the lf_matrix computation by the computed values trace_values. Of course, the sign of the matrix is now minus since we have moved everything to the other side of the equation.
if (task_data.trace_reconstruct)
for (unsignedint i = 0;
i < scratch.fe_local_support_on_face[face_no].size();
++i)
{
constunsignedint ii =
scratch.fe_local_support_on_face[face_no][i];
scratch.l_rhs(ii) -=
(scratch.q_phi[i] * normal +
scratch.u_phi[i] * (convection * normal - tau_stab)) *
scratch.trace_values[q] * JxW;
}
}
}
Once assembly of all of the local contributions is complete, we must either: (1) assemble the global system, or (2) compute the local solution values and save them. In either case, the first step is to invert the local-local matrix.
scratch.ll_matrix.gauss_jordan();
For (1), we compute the Schur complement and add it to the cell_matrix, matrix in the introduction.
For (2), we are simply solving (ll_matrix).(solution_local) = (l_rhs). Hence, we multiply l_rhs by our already inverted local-local matrix and store the result using the set_dof_values function.
Once we have solved for the skeleton solution, we can solve for the local solutions in an element-by-element fashion. We do this by re-using the same assemble_system function but switching trace_reconstruct to true.
assemble_system(true);
}
HDG::postprocess
The postprocess method serves two purposes. First, we want to construct a post-processed scalar variables in the element space of degree that we hope will converge at order . This is again an element-by-element process and only involves the scalar solution as well as the gradient on the local cell. To do this, we introduce the already defined scratch data together with some update flags and run the work stream to do this in parallel.
Secondly, we want to compute discretization errors just as we did in step-7. The overall procedure is similar with calls to VectorTools::integrate_difference. The difference is in how we compute the errors for the scalar variable and the gradient variable. In step-7, we did this by computing L2_norm or H1_seminorm contributions. Here, we have a DoFHandler with these two contributions computed and sorted by their vector component, [0, dim) for the gradient and dim for the scalar. To compute their value, we hence use a ComponentSelectFunction with either of them, together with the SolutionAndGradient class introduced above that contains the analytic parts of either of them. Eventually, we also compute the L2-error of the post-processed solution and add the results into the convergence table.
This is the actual work done for the postprocessing. According to the discussion in the introduction, we need to set up a system that projects the gradient part of the DG solution onto the gradient of the post-processed variable. Moreover, we need to set the average of the new post-processed variable to equal the average of the scalar DG solution on the cell.
More technically speaking, the projection of the gradient is a system that would potentially fills our dofs_per_cell times dofs_per_cell matrix but is singular (the sum of all rows would be zero because the constant function has zero gradient). Therefore, we take one row away and use it for imposing the average of the scalar value. We pick the first row for the scalar part, even though we could pick any row for elements. However, had we used FE_DGP elements instead, the first row would correspond to the constant part already and deleting e.g. the last row would give us a singular system. This way, our program can also be used for those elements.
Having assembled all terms, we can again go on and solve the linear system. We invert the matrix and then multiply the inverse by the right hand side. An alternative (and more numerically stable) method would have been to only factorize the matrix and apply the factorization.
We have 3 sets of results that we would like to output: the local solution, the post-processed local solution, and the skeleton solution. The former 2 both 'live' on element volumes, whereas the latter lives on codimension-1 surfaces of the triangulation. Our output_results function writes all local solutions to the same vtk file, even though they correspond to different DoFHandler objects. The graphical output for the skeleton variable is done through use of the DataOutFaces class.
The DataOutFaces class works analogously to the DataOut class when we have a DoFHandler that defines the solution on the skeleton of the triangulation. We treat it as such here, and the code is similar to that above.
We implement two different refinement cases for HDG, just as in step-7: adaptive_refinement and global_refinement. The global_refinement option recreates the entire triangulation every time. This is because we want to use a finer sequence of meshes than what we would get with one refinement step, namely 2, 3, 4, 6, 8, 12, 16, ... elements per direction.
The adaptive_refinement mode uses the KellyErrorEstimator to give a decent indication of the non-regular regions in the scalar local solutions.
Just as in step-7, we set the boundary indicator of two of the faces to 1 where we want to specify Neumann boundary conditions instead of Dirichlet conditions. Since we re-create the triangulation every time for global refinement, the flags are set in every refinement step, not just at the beginning.
for (constauto &cell : triangulation.cell_iterators())
for (const auto &face : cell->face_iterators())
if (face->at_boundary())
if ((std::fabs(face->center()(0) - (-1)) < 1e-12) ||
The functionality here is basically the same as step-7. We loop over 10 cycles, refining the grid on each one. At the end, convergence tables are created.
There is one minor change for the convergence table compared to step-7: Since we did not refine our mesh by a factor two in each cycle (but rather used the sequence 2, 3, 4, 6, 8, 12, ...), we need to tell the convergence rate evaluation about this. We do this by setting the number of cells as a reference column and additionally specifying the dimension of the problem, which gives the necessary information for the relation between number of cells and mesh size.
We first have a look at the output generated by the program when run in 2D. In the four images below, we show the solution for polynomial degree and cycles 2, 3, 4, and 8 of the program. In the plots, we overlay the data generated from the internal data (DG part) with the skeleton part ( ) into the same plot. We had to generate two different data sets because cells and faces represent different geometric entities, the combination of which (in the same file) is not supported in the VTK output of deal.II.
The images show the distinctive features of HDG: The cell solution (colored surfaces) is discontinuous between the cells. The solution on the skeleton variable sits on the faces and ties together the local parts. The skeleton solution is not continuous on the vertices where the faces meet, even though its values are quite close along lines in the same coordinate direction. The skeleton solution can be interpreted as a rubber spring between the two sides that balances the jumps in the solution (or rather, the flux ). From the picture at the top left, it is clear that the bulk solution frequently over- and undershoots and that the skeleton variable in indeed a better approximation to the exact solution; this explains why we can get a better solution using a postprocessing step.
As the mesh is refined, the jumps between the cells get small (we represent a smooth solution), and the skeleton solution approaches the interior parts. For cycle 8, there is no visible difference in the two variables. We also see how boundary conditions are implemented weakly and that the interior variables do not exactly satisfy boundary conditions. On the lower and left boundaries, we set Neumann boundary conditions, whereas we set Dirichlet conditions on the right and top boundaries.
Next, we have a look at the post-processed solution, again at cycles 2, 3, 4, and 8. This is a discontinuous solution that is locally described by second order polynomials. While the solution does not look very good on the mesh of cycle two, it looks much better for cycles three and four. As shown by the convergence table below, we find that is also converges more quickly to the analytical solution.
Finally, we look at the solution for at cycle 2. Despite the coarse mesh with only 64 cells, the post-processed solution is similar in quality to the linear solution (not post-processed) at cycle 8 with 4,096 cells. This clearly shows the superiority of high order methods for smooth solutions.
Convergence tables
When the program is run, it also outputs information about the respective steps and convergence tables with errors in the various components in the end. In 2D, the convergence tables look the following:
One can see the error reduction upon grid refinement, and for the cases where global refinement was performed, also the convergence rates. The quadratic convergence rates of Q1 elements in the norm for both the scalar variable and the gradient variable is apparent, as is the cubic rate for the postprocessed scalar variable in the norm. Note this distinctive feature of an HDG solution. In typical continuous finite elements, the gradient of the solution of order converges at rate only, as opposed to for the actual solution. Even though superconvergence results for finite elements are also available (e.g. superconvergent patch recovery first introduced by Zienkiewicz and Zhu), these are typically limited to structured meshes and other special cases. For Q3 HDG variables, the scalar variable and gradient converge at fourth order and the postprocessed scalar variable at fifth order.
The convergence tables verify the expected convergence rates stated in the introduction. Now, we want to show a quick comparison of the computational efficiency of the HDG method compared to a usual finite element (continuous Galkerin) method on the problem of this tutorial. Of course, stability aspects of the HDG method compared to continuous finite elements for transport-dominated problems are also important in practice, which is an aspect not seen on a problem with smooth analytic solution. In the picture below, we compare the error as a function of the number of degrees of freedom (left) and of the computing time spent in the linear solver (right) for two space dimensions of continuous finite elements (CG) and the hybridized discontinuous Galerkin method presented in this tutorial. As opposed to the tutorial where we only use unpreconditioned BiCGStab, the times shown in the figures below use the Trilinos algebraic multigrid preconditioner in TrilinosWrappers::PreconditionAMG. For the HDG part, a wrapper around ChunkSparseMatrix for the trace variable has been used in order to utilize the block structure in the matrix on the finest level.
The results in the graphs show that the HDG method is slower than continuous finite elements at , about equally fast for cubic elements and faster for sixth order elements. However, we have seen above that the HDG method actually produces solutions which are more accurate than what is represented in the original variables. Therefore, in the next two plots below we instead display the error of the post-processed solution for HDG (denoted by for example). We now see a clear advantage of HDG for the same amount of work for both and , and about the same quality for .
Since the HDG method actually produces results converging as , we should compare it to a continuous Galerkin solution with the same asymptotic convergence behavior, i.e., FE_Q with degree . If we do this, we get the convergence curves below. We see that CG with second order polynomials is again clearly better than HDG with linears. However, the advantage of HDG for higher orders remains.
The results are in line with properties of DG methods in general: Best performance is typically not achieved for linear elements, but rather at somewhat higher order, usually around . This is because of a volume-to-surface effect for discontinuous solutions with too much of the solution living on the surfaces and hence duplicating work when the elements are linear. Put in other words, DG methods are often most efficient when used at relatively high order, despite their focus on a discontinuous (and hence, seemingly low accurate) representation of solutions.
Results for 3D
We now show the same figures in 3D: The first row shows the number of degrees of freedom and computing time versus the error in the scalar variable for CG and HDG at order , the second row shows the post-processed HDG solution instead of the original one, and the third row compares the post-processed HDG solution with CG at order . In 3D, the volume-to-surface effect makes the cost of HDG somewhat higher and the CG solution is clearly better than HDG for linears by any metric. For cubics, HDG and CG are of similar quality, whereas HDG is again more efficient for sixth order polynomials. One can alternatively also use the combination of FE_DGP and FE_FaceP instead of (FE_DGQ, FE_FaceQ), which do not use tensor product polynomials of degree but Legendre polynomials of complete degree . There are fewer degrees of freedom on the skeleton variable for FE_FaceP for a given mesh size, but the solution quality (error vs. number of DoFs) is very similar to the results for FE_FaceQ.
One final note on the efficiency comparison: We tried to use general-purpose sparse matrix structures and similar solvers (optimal AMG preconditioners for both without particular tuning of the AMG parameters on any of them) to give a fair picture of the cost versus accuracy of two methods, on a toy example. It should be noted however that geometric multigrid (GMG) for continuous finite elements is about a factor four to five faster for and . As of 2019, optimal-complexity iterative solvers for HDG are still under development in the research community. Also, there are other implementation aspects for CG available such as fast matrix-free approaches as shown in step-37 that make higher order continuous elements more competitive. Again, it is not clear to the authors of the tutorial whether similar improvements could be made for HDG. We refer to Kronbichler and Wall (2018) for a recent efficiency evaluation.
Possibilities for improvements
As already mentioned in the introduction, one possibility is to implement another post-processing technique as discussed in the literature.
A second item that is not done optimally relates to the performance of this program, which is of course an issue in practical applications (weighing in also the better solution quality of (H)DG methods for transport-dominated problems). Let us look at the computing time of the tutorial program and the share of the individual components:
Setup
Assemble
Solve
Trace reconstruct
Post-processing
Output
Total time
Relative share
2D, Q1, cycle 9, 37,248 dofs
5.34s
0.7%
1.2%
89.5%
0.9%
2.3%
5.4%
2D, Q3, cycle 9, 74,496 dofs
22.2s
0.4%
4.3%
84.1%
4.1%
3.5%
3.6%
3D, Q1, cycle 7, 172,800 dofs
9.06s
3.1%
8.9%
42.7%
7.0%
20.6%
17.7%
3D, Q3, cycle 7, 691,200 dofs
516s
0.6%
34.5%
13.4%
32.8%
17.1%
1.5%
As can be seen from the table, the solver and assembly calls dominate the runtime of the program. This also gives a clear indication of where improvements would make the most sense:
Better linear solvers: We use a BiCGStab iterative solver without preconditioner, where the number of iteration increases with increasing problem size (the number of iterations for Q1 elements and global refinements starts at 35 for the small sizes but increase up to 701 for the largest size). To do better, one could for example use an algebraic multigrid preconditioner from Trilinos, or some more advanced variants as the one discussed in Kronbichler and Wall (2018). For diffusion-dominated problems such as the problem at hand with finer meshes, such a solver can be designed that uses the matrix-vector products from the more efficient ChunkSparseMatrix on the finest level, as long as we are not working in parallel with MPI. For MPI-parallelized computations, a standard TrilinosWrappers::SparseMatrix can be used.
Speed up assembly by pre-assembling parts that do not change from one cell to another (those that do neither contain variable coefficients nor mapping-dependent terms).