New general feature for v5p9 links from New version features

## v5p9 General

Author: Jean-Michel Hervouet, 23rd January 2009

#### Distributive schemes: a new monotonicity criterion

##### Principle

Full explanations on distributive schemes are given in reference [2] from page 183 on. We just recall here that all such schemes end up in a discretization of advection equation in the form:

Where:

• is the integral of the test function of point ,
• is a distribution coefficient,
• the time step,
• the value of function at point after advection,
• the value of function at point before advection
• actually represents where is the advecting field.

The only thing to know here is that eventually takes the form:

where the coefficients are zero and all coefficients are positive or zero. The sum is done on all the neighbouring points of point , i.e. all the points that belong to an element containing . The final value of function at point is thus:

Classically, it is said then that the monotonicity criterion consists of ensuring that all the coefficients of and be positive, which yields, given the fact that all are positive or zero:

However, this condition is sufficient, but not necessary. As a matter of fact, the real monotonicity criterion reads:

If we now rewrite the equation above in the form and denoting we have:

Depending on the sign of , the new criterion yields:

• and

This criterion is better than the previous one, because (here example with ) is greater than , thus

##### Details on the implementation

We describe here the implementation of this new criterion in the SUBROUTINE MURD3D of the TELEMAC-3D module. In this subroutine the initial time step is left unchanged, but a reduction factor is introduced, so that the first time step is . After one iteration, the remaining time is then . A current time step is initialised with (here ), and then is done after every iteration. The problem is thus finding the reduction factor .

What we have called so far is , integral of the test function at the end of the time step. In SUBROUTINE MURD3D we have a varying mesh, with varying test function integrals, namely:

and

At the first sub-iteration the end of the time step corresponds to an integral which is put in an array called . After every sub-iteration the final test-function integral is .

We take hereafter the example with . Note that in SUBROUTINE MURD3D is array , so that . In our case is positive.

i.e. now becomes:

Let us denote:

The old criterion corresponds to , with in SUBROUTINE MURD3D

We have: which is also: , being positive, we eventually find that:

If , so is , and we get to the same result by choosing:

In the Fortran implementation is computed as , so we have here either:

if
if

This result can be compared with the old criterion which gave:

It happens then that the only thing to change with the new criterion is the value of . The possible divisions by zero contained in the presence of or in the denominator are easily handled because in this case is also 0. As a matter of fact, cannot be greater than zero if .

##### Results

As was said in previous sections the distributive schemes are implemented with sub-iterations that ensure that the monotonicity criterion will always be obeyed. We thus have less sub-iterations, which results in a less diffusive advection scheme. However, computing the local minimum and maximum of function f is an extra cost which could spoil the computer time. Tests have shown that in fact the computer time is reduced. For example if we run 10 time steps of a Berre lagoon computation, we get at the 10th time-step the following numbers of sub-iterations, for the 3 components of velocity U, V and W, and the tracer T (salinity):

 Old criterion New criterion 3 2 3 2 3 2 4 2

The computer times on HP C3700 unix machine are 35 s (old criterion) and 32 s (new criterion), a reduction in time of about 10%. A reduction of 25% of the number of iterations is generally observed.

#### Parallelism: discovery of a new and necessary property

##### Principle

The use of the new monotonicity criterion for distributive schemes (see previous section) has highlighted an old problem in parallelism. It happened that an interface point, i.e. a point belonging to at least 2 sub-domains, could have a value of a given function (namely here ) positive in one sub-domain and negative in another, thus causing a bifurcation in the algorithm and eventually a crash due to a number of sub-iterations beyond the accepted threshold of 100. Incident reports showed that in such cases a restart of the program just before the crash would suppress the problem (it actually forced the values to be the same, by reading them in a file). We thus get to the conclusion that to avoid problems in algorithms containing tests on real numbers:

An interface point must have exactly the same values in all the sub-domains it belongs to

As a matter of fact, it appeared in tests that values that should have been equal, e.g. depth or velocity, could be slightly “sub-domain-dependent”, due to truncation errors. A first analysis showed that the computation of normal vectors to boundaries could yield di¤erent values, due to the fact that coordinates of boundary points outside a sub-domain were stored and retrieved from a file without a sufficient number of digits (this is a case of forced truncation error). This problem was solved, but some differences remained, where do they come from ?

If we think of how the operations are done, it seems that we can prove that if we start from equal values, they should remain equal, because the same operations done on the same figures should yield the same result. Let us for example look at the iterative solvers: At every iteration they do operations like: and . This is a part of the conjugate gradient algorithm, with 2 dot-products and 1 matrix-vector product . If we start from the same , , and and can show that is the same, it is obvious that is also the same (for an interface point, seen from two different sub-domains). So we have to show that dot-product and matrix-vector product give identical values at interfaces.

: Dot-product: this property is obvious because the dot product is a single value gathering contributions from all processors and then sent back to all processors with a special communication (FUNCTION P_DOTS, which calls FUNCTION P_DSUM, which calls the MPI SUBROUTINE MPI_AllReduce). : Matrix-vector product: during a matrix-vector product, every interface point gathers information from its neighbouring elements. Then this information is passed to its “brothers” of other sub-domains, to be added to their contribution. This is done by the BIEF SUBROUTINE PARCOM with option 2. With two processors, two “brothers” will first get respectively quantities and . Then the first one will receive and compute . The second will receive and compute . Both points will have exactly the same result in the end. With 3 processors and 3 “brothers” with quantities , and , we can have combinations like and and this will give differences due to truncation errors. This problem was first pointed out at LNHE by Charles Moulinec.

The conclusion is that iterative solvers, and more generally matrix-vector products will yield differences between brother points. Tests confirm the analysis. A SUBROUTINE CHECK_DIGITS was built to look for differences at interface points. In this subroutine, calls to SUBROUTINE PARCOM with option 3, i.e. giving the maximum between brother points, are used. The maximum is then compared with the local value. Comparisons are done between double precision numbers with tests like IF(X.NE.Y) THEN… (this is generally never done in Fortran programming to allow truncation errors, but here it is exactly what we want). In TELEMAC-3D with an option calling TELEMAC-2D, in the test-case “gouttedo” with 2 processors, there is no difference on depth and velocities. With 3 processors, there is a difference after the first time-step, at only one point, the single point that belongs to 3 sub-domains. This difference is of the order .

##### Forcing equality of arrays at interface points

We have shown that differences between sub-domains occur only when calling SUBROUTINE PARCOM with option 2, for adding contributions stemming from sub-domains. These differences are only truncation errors. The idea is then to do a second call to SUBROUTINE PARCOM, with option 3 to choose the maximum between all values. This can be done in fact within the SUBROUTINE PARCOM, by calling SUBROUTINE PARACO twice. This is done only if option 2 is asked, and if there are more than 2 processors. Once this is achieved no more differences are noticed between two sub-domains. This has been checked on TELEMAC-2D, TELEMAC-3D and ESTEL-3D. However differences may still occur with a scalar computation, because operations like sums are still not done in the same order than in scalar.

A series of tests is now necessary to evaluate the cost of this double call to SUBROUTINE PARACO. If it were important, it would be necessary to build a new data structure to find more easily the maximum values between interface points that belong to more than 2 sub-domains. For example, if there is only one triple point (case with 3 subdomain) a call to FUNCTION P_DMAX for this very point is much faster to force a common value. Another idea would be to sort the quantities to be added with respect to their processor number.

#### Finite volume advection scheme: new options

For full details on this advection scheme, refer to [3]. This finite volume advection scheme is now available for suspension in SISYPHE and in TELEMAC-2D for the velocities. We detail hereafter how source terms have been dealt with, and how a variant has been designed to cope with advection fields that do not obey the continuity equation, as is the case in one option of SISYPHE.

##### Source terms

We now add an explicit and an implicit source term in the tracer equation. So that the equation reads:

is the explicit source term, the implicit source term. They are currently treated in TELEMAC (SUBROUTINE CVDFTR) in the non-conservative equation in the form:

In suspended sediment transport, a positive would correspond to erosion and a negative would correspond to deposition. Including also boundary terms and punctual source terms, equation is discretized in the form:

where is equal to if is positive, i.e. from to , and is equal to if is negative (upwind explicit scheme). Other terms account for boundary fluxes and punctual sources terms. When all equations are summed, the terms will all sum to zero.

Equation gives the following value of :

which can as well be written:

by using the fact that the continuity equation is written:

With an explicit upwind scheme, it gives:

This new formula would require a new monotonicity criterion, however explicit and implicit source terms do not necessarily obey the monotonicity criterion, so we choose to keep the previous criterion. If we call the previous obtained without our new explicit and implicit terms, we have:

and we get at the final result:

Great care must be taken that remains negative to keep the monotonicity. This is the case in Sisyphe, where the deposition has been intentionally kept fully implicit, with the very purpose of having positive concentrations.

##### A variant for Sisyphe

In Sisyphe there is an option “correction on convection velocity” that takes into account the fact that the suspended sediment has a maximum concentration near the bottom, and is thus not advected with the depth-averaged velocity. In this case the advecting eld is the depth averaged velocity multiplied by a correction factor, and the resulting velocity eld does not obey the continuity equation 3.5. This should not be a problem for nite volumes as we solve the conservative equation, but unfortunately in the derivation of the scheme we use equation 3.5 to get the new depth and this equation is no longer valid. In fact the use of this continuity equation consists of replacing the coefficient of , which is:

by 1. The denominator is the real depth, the numerator is the depth that would be obtained with the continuity equation, if the non conservative field were to be used. They are equal when a conservative velocity field is chosen. The variant consists of choosing the coefficient instead of 1 for . This new coefficient changes the monotonicity criterion but we decide to keep the previous criterion. As a matter of fact, when continuity is not ensured by the velocity field, the monotonicity cannot be guaranteed. However, should remain a positive number, being a ratio of two depths, and the concentration should remain positive. The test-case for exemplifying the effect of this new variant is called “16.conservation” in French test-cases (test.fr) of Sisyphe 5.9 package. It is in fact the Telemac-2D “gouttedo” test with mobile bed. There is bedload and suspension. With coefficient 1 for the total loss of sediment is 0.1477758 102 m3. With the variant it becomes 0.1195841 10-16 which is about the machine accuracy. Note: in the implementation in subroutine TVF, the “depth that would be obtained with the continuity equation” was already computed and is kept as , is now called . As a matter of fact, sub-iterations for monotonicity have to be taken into account, and the intermediate depths can be obtained by linear interpolation between and . This is true because the fluxes are kept constant during the time step.

#### Coupling with Delwaq: now possible with tidal flats

The coupling with Delwaq relies on a common understanding (between finite elements and finite volumes) of the continuity equation. In the case of tidal flats, the treatment of negative values on finite elements side is rather complex and so far could not be transmitted to finite volumes. As a matter of fact, the treatment of tidal flats cannot be translated into a modication of the velocity field in the continuity equation, as was done e.g. for the “compatibility of free surface gradient” (see section “suppressing wiggles” in reference [3]). A solution came from Leo Postma at Deltares, who emitted the remark that Delwaq only needs fluxes, regardless of the velocities that created them. The problem then only consists of finding the fluxes between points that result from the treatment of tidal flats, and this is indeed possible and explained hereafter. We first recall the ideas implemented in the interface between Telemac and Delwaq. We assume that the 2D continuity equation:

has been discretized in the following matrix form:

where is the mass-matrix and the time-step. If we use mass lumping, this will give for every degree of freedom :

where is as well the area of the finite volume around point , and what we call the volume of basis . If there is no mass lumping, this equation remains valid if and are replaced with and , where:

which is in fact what says exactly Equation 4.1. The quantity:

can be interpreted as the flux leaving point , it is in fact generally (i.e. without source terms and boundary fluxes): These fluxes leaving points are then translated into fluxes between points as explained in reference [3]. If we can now write the continuity equation in the form:

then the effect of tidal flats treatment on depth can be simply added to the other fluxes and treated in the same way. The only condition is that it must be done at the element level, as for the real fluxes. Now let us detail the treatment of tidal flats. It is composed of a modication of the free surface gradient in the momentum equation (which spoils nothing in the continuity equation) and of a smoothing of negative depths (which is the problem here). The smoothing is done in the following way:

• Negative depths are extracted in a work array : for every point ,
• Negative depths are removed from depth:
•  Negative depths are smoothed, i.e. is replaced by ,where is the mass matrix and is the lumped mass matrix. This is done several times (currently a value of two is hardcoded)



• Smoothed negative depths are added to others:

It is important to understand that after this treatment, depths may still be negative, they are just less negative. The difference is that without treatment the depth may become infinitely negative. Smoothing stabilizes the numerical scheme. The depth smoothing is mass-conservative in the sense that the integral of depth is left unchanged. The only problem is that the continuity equation is spoiled. The effect of one smoothing is the following, if we call hold the depth before smoothing and hnew the depth after:

such contributions should be added if there is more than one smoothing. We deduce from this that the continuity equation that is actually solved is:

At element level, this new contribution can be easily computed, by using bits of code from the Telemac subroutines computing and . It gives for ( would be used as the function F, SURFAC is the area of the element):

DO IELEM = 1 , NELEM
C
C Values of function F at the 3 points of the triangle
F1 = F(IKLE1(IELEM))
F2 = F(IKLE2(IELEM))
F3 = F(IKLE3(IELEM))
C
COEF = SURFAC(IELEM)/12.D0
C
W1(IELEM) = COEF * ( F2+F3-2*F1 )
W2(IELEM) = COEF * ( F1+F3-2*F2 )
W3(IELEM) = COEF * ( F1+F2-2*F3 )
C
ENDDO

This solution applied to the Malpasset test case show a full validation of the approach, the control of mass error done by the interface between Telemac and Delwaq (subroutine TEL4DEL) gives at every time step: maximum mass error without correction: order of 100. maximum mass error with correction: order of 1010.

This implementation of quadratic elements in TELEMAC-2D has been successfully performed by Algiane Froehly (MATMECA, Bordeaux). It has then been adapted for parallelism. We give hereafter some basic explanations. A full account can be found in her report (reference [1])

A quadratic interpolation of the velocity eld is a well-known solution to stabil- ity problems raised by the Ladyzhenskaya-Babuska-Brezzi condition in NavierStokes equations (also called discrete inf-sup condition). The pressure (or the depth in Shallow Water equations) remains linear. For quadratic interpolation, we add 3 degrees of freedom, numbered 4, 5 and 6 on Figure 5.1. We recall here that the linear bases carried by points 1, 2 and 3 are : 

where and are the coordinates in the reference triangle given in Figure 5.2: The coordinates of the 6 degrees of freedom are:

The quadratic interpolation polynoms , with = 1…6, are such that where is the isoparametric transformation that gives the real triangle as a function of the reference triangle and are the basis functions in the reference triangle. In practice is never built and the computation of integrals is done in the reference triangle.

The 6 quadratic basis are chosen to ensure the following property:

Figure 5.1: The six points of a quadratic triangle

Moreover every basis must be equal to 1 on its own point and zero on the five others. This is veried if we take:

where and are the indices of points of the segment where is point . More precisely:

Figure 5.2: reference triangle

Remark: on boundaries a point number 3 is added in the middle and the interpolation polynoms are:

As they have 6 points, these new quadratic elements have a similarity with prisms, e.g. element matrices have the same number of extra-diagonal terms. They are thus stored in the same way, i.e.:

Rectangular matrices are stored in the following way, for 3 x 6 matrices:

and for 6 x 3 matrices:

Most subroutines written for prisms may as well be re-used for quadratic triangles. All the subroutines computing matrices or vectors have been built with Maple V.

The list of new or modied Maple-built subroutines in the BIEF library and their function is given hereafter.

represents the test functions and the basis.

MT01CC: mass-matrix of size 6 x 6
FORMUL = 'MATMAS' : MT01CCij =

MT02CC : diffusion matrix of size 6 x 6
FORMUL = 'MATDIF' : MT02CCij =
Viscosity is left linear, with same cases (isotropic or not).

MT03CC : used for advection with SUPG technique, of size 6 x 6
FORMUL = 'MASUPG' : MT03CCij =
Function F is piece-wise constant and U is linear or quadratic.

MT04AA : SUPG matrix, of size 3 x 3
FORMUL = 'MAUGUG' : MT04AAij =

MT04CC : SUPG matrix of size 6 x 6
FORMUL = 'MAUGUG' : MT04CCij =
U and V may be linear or quadratic.

MT05AA : linear advection matrix, of size 3 x 3, with variants for distributive schemes
FORMUL = 'MATVGR' : MT05AAij =
The velocity may now be quadratic, but the N scheme only uses linear points.

MT05CC : quadratic advection matrix, of size 6 x 6, with variants for distributive schemes
FORMUL = 'MATVGR' : MT05CCij =
The velocity may be piece-wise constant, linear or quadratic, but the N scheme only uses linear points.

MT06OC : size 3 x 3 (boundary matrix).
FORMUL = 'FMATMA' : MT06OCij =
F may be piece-wise constant, linear or quadratic. The connectivity table giving the access to the boundary points has been extended.

MT06CC : size 6 x 6
FORMUL = 'FMATMA' : MT06CCij =
F may be piece-wise constant, linear or quadratic.

MT07CC : partly lumped mass-matrix, size 6 x 6.
FORMUL = 'MSLUMP' : MT07CCij =
F is piece-wise constant.

MT08AC : size 3 x 6
FORMUL = 'MATFGR X' : MT08ACij =
or 'MATFGR Y' : MT08ACij =

MT11AC : size 3 x 6.
FORMUL = 'MATGRF X' : MT11ACij =
or 'MATGRF Y' : MT11ACij =

MT12AC : matrix used with the SUPG technique, size 3 x 6
FORMUL = 'MATUGH X' : MT12CCij =
or 'MATUGH Y' : MT12CCij =
Here is linear and is quadratic. The velocity field is piece-wise constant or quadratic.

MT13CA : gradient matrix, size 6 x 3
FORMUL = 'MATGRA X' : MT13CCij =
or 'MATUGH Y' : MT13CCij =
Here is quadratic and is linear.

MT13CC : gradient matrix, size 6 x 6.
FORMUL = 'MATGRA X' : MT13CCij =
or 'MATUGH Y' : MT13CCij =

VC00CC : “volume” of basis functions, 6-component vector on every element.
FORMUL = 'MASBAS' : VC00CCi =

VC10OO : ux through boundaries, 2-component vector.
FORMUL = 'FLUBDF' : VC10OOi =
Here is quadratic. F is linear or quadratic, the velocity is linear or quadratic, given for all the mesh or only on the boundary.

FORMUL = 'GRADFX' : VC13CCi =
is quadratic, F linear or quadratic, and may be discontinuous between elements.

VC14AA : for turbulence in K - model, 3-component vector.
FORMUL = 'PRODFX' : VC14AAi =