Merge into trunk : symmetric-assemble : Code : DOLFIN

Status:	Merged
Merged at revision:	6625
Proposed branch:	lp:~jobh/dolfin/symmetric-assemble
Merge into:	lp:~fenics-core/dolfin/trunk
Diff against target:	1311 lines (+1080/-16) 12 files modified dolfin/common/utils.h (+20/-1) dolfin/fem/DirichletBC.cpp (+10/-1) dolfin/fem/DirichletBC.h (+5/-1) dolfin/fem/SymmetricAssembler.cpp (+588/-0) dolfin/fem/SymmetricAssembler.h (+82/-0) dolfin/fem/SystemAssembler.cpp (+1/-1) dolfin/fem/assemble.cpp (+36/-1) dolfin/fem/assemble.h (+33/-1) dolfin/fem/dolfin_fem.h (+1/-0) site-packages/dolfin/fem/assembling.py (+122/-9) test/unit/fem/python/SymmetricAssembler.py (+181/-0) test/unit/test.py (+1/-1)
To merge this branch:	bzr merge lp:~jobh/dolfin/symmetric-assemble
Related bugs:	Link a bug report

Reviewer	Review Type	Date Requested	Status
Garth Wells		2012-02-01	Approve on 2012-02-27
Review via email: mp+91107@code.launchpad.net

Revision history for this message

Garth Wells (garth-wells) wrote on 2012-02-01:

#

I'm not convinced that this is the best approach. I'm inclined towards supporting unassembled matrices, and using this to store the cell tensors.

With the code, I'd like to see more verbose variable naming. It's too terse for me.

I wouldn't back port this. It's and feature and not a bug. I would prefer to make new releases to get new functionality out.

Revision history for this message

Johan Hake (johan-hake) wrote on 2012-02-01:

#

On Wednesday February 1 2012 16:48:24 Garth Wells wrote:
> I'm not convinced that this is the best approach. I'm inclined towards
> supporting unassembled matrices, and using this to store the cell tensors.

What exactly do you mean with unassembled matrices, and how have you
envisioned the interface?

I think this is a usefull approach. As Joachim says it gives the user some
nice options. How is the performance compared to present system assembler?

> With the code, I'd like to see more verbose variable naming. It's too terse
> for me.

To me it doesn't look much more terse than the excisting code.

> I wouldn't back port this. It's and feature and not a bug. I would prefer
> to make new releases to get new functionality out.

I agree.

Johan

Revision history for this message

Kent-Andre Mardal (kent-and) wrote on 2012-02-01:

#

On 1 February 2012 16:48, Garth Wells <email address hidden> wrote:

> I'm not convinced that this is the best approach. I'm inclined towards
> supporting unassembled matrices, and using this to store the cell tensors.
>

I think the current design/code is nice in the sense that it is
user-friendly with std matrices
for representing the boundary condition. It can be used when combining
different matrices
for efficiency (like we do in our flow computations). It is exactly what we
need.

>
> With the code, I'd like to see more verbose variable naming. It's too
> terse for me.
>
> I wouldn't back port this. It's and feature and not a bug. I would prefer
> to make new releases to get new functionality out.
>

ok

> --
> https://code.launchpad.net/~jobh/dolfin/symmetric-assemble/+merge/91107<https://code.launchpad.net/%7Ejobh/dolfin/symmetric-assemble/+merge/91107>
> Your team DOLFIN Core Team is requested to review the proposed merge of
> lp:~jobh/dolfin/symmetric-assemble into lp:dolfin.
>

Revision history for this message

Anders Logg (logg) wrote on 2012-02-01:

#

On a side note: unassembled matrices (for use with PETSc CUSP) are
implemented in Fredrik's GPU branch.

--
Anders

On Wed, Feb 01, 2012 at 03:48:24PM -0000, Garth Wells wrote:
> I'm not convinced that this is the best approach. I'm inclined towards supporting unassembled matrices, and using this to store the cell tensors.
>
> With the code, I'd like to see more verbose variable naming. It's too terse for me.
>
> I wouldn't back port this. It's and feature and not a bug. I would prefer to make new releases to get new functionality out.

Revision history for this message

Anders Logg (logg) wrote on 2012-02-01:

#

On Wed, Feb 01, 2012 at 03:48:24PM -0000, Garth Wells wrote:
> I'm not convinced that this is the best approach. I'm inclined towards supporting unassembled matrices, and using this to store the cell tensors.
>
> With the code, I'd like to see more verbose variable naming. It's too terse for me.
>
> I wouldn't back port this. It's and feature and not a bug. I would prefer to make new releases to get new functionality out.

I haven't looked at the patch yet so I can't comment on it.

--
Anders

Revision history for this message

Joachim Haga (jobh) wrote on 2012-02-01:

#

I don't know what advantages unassembled matrices will bring, so I can't
comment on that directly. The new interface is quite consistent with the
current ones though, and brings needed functionality.

With regards to variable naming, it can be changed once the interface is
decided on. However, it is now mostly duplicated from Assemble.cpp. It's
probably good to not diverge these two too much so that they're easy to
keep in sync.

-j.
Den 1. feb. 2012 16:48 skrev "Garth Wells" <email address hidden> følgende:

> I'm not convinced that this is the best approach. I'm inclined towards
> supporting unassembled matrices, and using this to store the cell tensors.
>
> With the code, I'd like to see more verbose variable naming. It's too
> terse for me.
>
> I wouldn't back port this. It's and feature and not a bug. I would prefer
> to make new releases to get new functionality out.
> --
> https://code.launchpad.net/~jobh/dolfin/symmetric-assemble/+merge/91107
> You are the owner of lp:~jobh/dolfin/symmetric-assemble.
>

Revision history for this message

Joachim Haga (jobh) wrote on 2012-02-01:

#

> How is the performance compared to present system assemble

Similar. The cost compared to regular assemble is slightly higher, owing to
the extra global tensor and the extra matvec product. At a guess, up to 15%
for simple forms without tensor reuse. I have earlier measured
assemble_system to be a little bit slower than regular assemble, so also
comparable.

-j.

Revision history for this message

Garth Wells (garth-wells) wrote on 2012-02-01:

#

On 1 February 2012 18:19, Joachim Haga <email address hidden> wrote:
> I don't know what advantages unassembled matrices will bring, so I can't
> comment on that directly. The new interface is quite consistent with the
> current ones though, and brings needed functionality.
>

Simplicity and efficiency. An assembled matrix is not required to
apply bcs. Only the matrix for a given cell is required to apply bcs
to the RHS. With an unassembled matrix, the code would be very simple
and would not require any extra parallel communication.

> With regards to variable naming, it can be changed once the interface is
> decided on. However, it is now mostly duplicated from Assemble.cpp. It's
> probably good to not diverge these two too much so that they're easy to
> keep in sync.
>

I don't like terse names like

Impl, lrow_is_bc, t idx, lcol, n_entries (we generally use num_foo
in DOLFIN), etc

If the names are cleaned up the code formatting is made consistent
with the DOLFIN style, I don't have a any strong objection to merging,
but I would prefer to use unassembled matrices. A concern is that the
symmetric assembler code is in need of simplification, and we've made
a few limited attempts in this direction, but the patch is making it
more complex before it's been simplified as much as is reasonably
possible.

Garth

> -j.
> Den 1. feb. 2012 16:48 skrev "Garth Wells" <email address hidden> følgende:
>
>> I'm not convinced that this is the best approach. I'm inclined towards
>> supporting unassembled matrices, and using this to store the cell tensors.
>>
>> With the code, I'd like to see more verbose variable naming. It's too
>> terse for me.
>>
>> I wouldn't back port this. It's and feature and not a bug. I would prefer
>> to make new releases to get new functionality out.
>> --
>> https://code.launchpad.net/~jobh/dolfin/symmetric-assemble/+merge/91107
>> You are the owner of lp:~jobh/dolfin/symmetric-assemble.
>>
>
> --
> https://code.launchpad.net/~jobh/dolfin/symmetric-assemble/+merge/91107
> Your team DOLFIN Core Team is requested to review the proposed merge of lp:~jobh/dolfin/symmetric-assemble into lp:dolfin.

On 1 February 2012 18:19, Joachim Haga <jobh@simula.no> wrote:
> I don't know what advantages unassembled matrices will bring, so I can't
> comment on that directly. The new interface is quite consistent with the
> current ones though, and brings needed functionality.
>

Simplicity and efficiency. An assembled matrix is not required to
apply bcs. Only the matrix for a given cell is required to apply bcs
to the RHS. With an unassembled matrix, the code would be very simple
and would not require any extra parallel communication.

> With regards to variable naming, it can be changed once the interface is
> decided on. However, it is now mostly duplicated from Assemble.cpp. It's
> probably good to not diverge these two too much so that they're easy to
> keep in sync.
>

I don't like terse names like

Impl,  lrow_is_bc, t idx, lcol,  n_entries (we generally use num_foo
in DOLFIN), etc

If the names are cleaned up the code formatting is made consistent
with the DOLFIN style, I don't have a any strong objection to merging,
but I would prefer to use unassembled matrices. A concern is that the
symmetric assembler code is in need of simplification, and we've made
a few limited attempts in this direction, but the patch is making it
more complex before it's been simplified as much as is reasonably
possible.

Garth

> -j.
> Den 1. feb. 2012 16:48 skrev "Garth Wells" <gnw20@cam.ac.uk> følgende:
>
>> I'm not convinced that this is the best approach. I'm inclined towards
>> supporting unassembled matrices, and using this to store the cell tensors.
>>
>> With the code, I'd like to see more verbose variable naming. It's too
>> terse for me.
>>
>> I wouldn't back port this. It's and feature and not a bug. I would prefer
>> to make new releases to get new functionality out.
>> --
>> https://code.launchpad.net/~jobh/dolfin/symmetric-assemble/+merge/91107
>> You are the owner of lp:~jobh/dolfin/symmetric-assemble.
>>
>
> --
> https://code.launchpad.net/~jobh/dolfin/symmetric-assemble/+merge/91107
> Your team DOLFIN Core Team is requested to review the proposed merge of lp:~jobh/dolfin/symmetric-assemble into lp:dolfin.

Revision history for this message

Joachim Haga (jobh) wrote on 2012-02-01:

#

Den 1. feb. 2012 19:50 skrev "Garth Wells" <email address hidden> følgende:
>
> I don't like terse names like
>
> Impl, lrow_is_bc, t idx, lcol, n_entries (we generally use num_foo
> in DOLFIN), etc
>
> If the names are cleaned up the code formatting is made consistent
> with the DOLFIN style, I don't have a any strong objection to merging,
> but I would prefer to use unassembled matrices. A concern is that the
> symmetric assembler code is in need of simplification, and we've made
> a few limited attempts in this direction, but the patch is making it
> more complex before it's been simplified as much as is reasonably
> possible.

Sounds good! I'll go through the parts that are new to improve naming. To
be clear, the old symmetric assemble code is not much used by the new code,
it's the standard assemble that's reused with the addition of a method to
set conditions on a cell matrix. So the old symmetric assembler can be
simplified or even removed if so desired.

It's possible to merge regular and new assembler too, with a flag for
symmetric assembly, to avoid code duplication, but it's not a completely
natural fit I think.

-j.
Den 1. feb. 2012 19:50 skrev "Garth Wells" <email address hidden> følgende:

Revision history for this message

Anders Logg (logg) wrote on 2012-02-01:

#

On Wed, Feb 01, 2012 at 08:19:26PM -0000, Joachim Haga wrote:
> Den 1. feb. 2012 19:50 skrev "Garth Wells" <email address hidden> følgende:
> >
> > I don't like terse names like
> >
> > Impl, lrow_is_bc, t idx, lcol, n_entries (we generally use num_foo
> > in DOLFIN), etc
> >
> > If the names are cleaned up the code formatting is made consistent
> > with the DOLFIN style, I don't have a any strong objection to merging,
> > but I would prefer to use unassembled matrices. A concern is that the
> > symmetric assembler code is in need of simplification, and we've made
> > a few limited attempts in this direction, but the patch is making it
> > more complex before it's been simplified as much as is reasonably
> > possible.
>
> Sounds good! I'll go through the parts that are new to improve naming. To
> be clear, the old symmetric assemble code is not much used by the new code,
> it's the standard assemble that's reused with the addition of a method to
> set conditions on a cell matrix. So the old symmetric assembler can be
> simplified or even removed if so desired.
>
> It's possible to merge regular and new assembler too, with a flag for
> symmetric assembly, to avoid code duplication, but it's not a completely
> natural fit I think.

I think it would be very desirable to get rid of as much code
duplication as possible. Now we have 3 different assemblers: regular,
system and multicore. I'd like to have one single assembler. If for
some reason this becomes inefficient (bloated), we can look at ways to
handle that (templates, code generation).

--
Anders

Revision history for this message

Joachim Haga (jobh) wrote on 2012-02-01:

#

> > It's possible to merge regular and new assembler too, with a flag for
> > symmetric assembly, to avoid code duplication, but it's not a completely
> > natural fit I think.
>
> I think it would be very desirable to get rid of as much code
> duplication as possible. Now we have 3 different assemblers: regular,
> system and multicore. I'd like to have one single assembler. If for
> some reason this becomes inefficient (bloated), we can look at ways to
> handle that (templates, code generation).

Regular, system and new can become one without too much trouble, I think.
Not sure about multicore, but can have a look.

Mind if I merge the current one (after fixing style issues) first?

J.

Revision history for this message

Johan Hake (johan-hake) wrote on 2012-02-02:

#

On Wednesday February 1 2012 21:40:29 Joachim Haga wrote:
> > > It's possible to merge regular and new assembler too, with a flag for
> > > symmetric assembly, to avoid code duplication, but it's not a
> > > completely natural fit I think.
> >
> > I think it would be very desirable to get rid of as much code
> > duplication as possible. Now we have 3 different assemblers: regular,
> > system and multicore. I'd like to have one single assembler. If for
> > some reason this becomes inefficient (bloated), we can look at ways to
> > handle that (templates, code generation).

Agree!

> Regular, system and new can become one without too much trouble,

Good!

> I think.
> Not sure about multicore, but can have a look.

This is similar to the present AssembleSystem, in that it essentially iterates
over cells and then facets in each cell.

AFAIK, assemble over interior faces have not got as much love as the other
integrals. There are also some code dublications which can be removed (I
think). We have assemble_cells and assemble_cells_and_exterior_facets. One can
probably just have the latter.

> Mind if I merge the current one (after fixing style issues) first?

Not for me.

Johan

> J.

Revision history for this message

Joachim Haga (jobh) wrote on 2012-02-02:

#

>> Not sure about multicore, but can have a look.
>
> This is similar to the present AssembleSystem, in that it essentially iterates
> over cells and then facets in each cell.
>
> AFAIK, assemble over interior faces have not got as much love as the other
> integrals. There are also some code dublications which can be removed (I
> think). We have assemble_cells and assemble_cells_and_exterior_facets. One can
> probably just have the latter.

Ok. I assume there's a reason that multicore does it in this way,
meaning that if they are combined then it's the multicore version that
"wins". Other than that, I guess it's just a matter of specifying a
single thread and (if necessary) shorting out the mesh coloring in the
single-thread case.

But OpenMPAssembler hasn't replaced Assembler, so I guess there are
problems with this approach. Performance?

-j.

Revision history for this message

Garth Wells (garth-wells) wrote on 2012-02-02:

#

On 2 February 2012 10:01, Joachim Haga <email address hidden> wrote:
>>> Not sure about multicore, but can have a look.
>>
>> This is similar to the present AssembleSystem, in that it essentially iterates
>> over cells and then facets in each cell.
>>
>> AFAIK, assemble over interior faces have not got as much love as the other
>> integrals. There are also some code dublications which can be removed (I
>> think). We have assemble_cells and assemble_cells_and_exterior_facets. One can
>> probably just have the latter.
>
> Ok. I assume there's a reason that multicore does it in this way,
> meaning that if they are combined then it's the multicore version that
> "wins". Other than that, I guess it's just a matter of specifying a
> single thread and (if necessary) shorting out the mesh coloring in the
> single-thread case.
>
> But OpenMPAssembler hasn't replaced Assembler, so I guess there are
> problems with this approach. Performance?
>

I don't believe that it's possible to have one Assembler without
compromising on performance.

OpenMPAssembler is slower than Assembler for one thread because it
requires a somewhat different loop over cells. Also, at least when I
last worked on OpenMPAssembler, it didn't support as many cases as
Assembler. Johan H has probably bridged most/all of the gap in the
mean time.

OpenMPAssembler needs more testing before removing the 'experimental' tag.

I think that the performance focus should be on SystemAssembler (with
the possibility of just assembling the LHS or RHS). Assembler and
OpenMPAssembler could be merged for now. The assembler code would be
simpler if a number of the 'if' statements could be removed. Perhaps
the sub-domains code should be moved to the domain (i.e., the Mesh),
and the assemblers can just loop over sub-domains.

Garth

> -j.
>
> --
> https://code.launchpad.net/~jobh/dolfin/symmetric-assemble/+merge/91107
> Your team DOLFIN Core Team is requested to review the proposed merge of lp:~jobh/dolfin/symmetric-assemble into lp:dolfin.

On 2 February 2012 10:01, Joachim Haga <jobh@simula.no> wrote:
>>> Not sure about multicore, but can have a look.
>>
>> This is similar to the present AssembleSystem, in that it essentially iterates
>> over cells and then facets in each cell.
>>
>> AFAIK, assemble over interior faces have not got as much love as the other
>> integrals. There are also some code dublications which can be removed (I
>> think). We have assemble_cells and assemble_cells_and_exterior_facets. One can
>> probably just have the latter.
>
> Ok. I assume there's a reason that multicore does it in this way,
> meaning that if they are combined then it's the multicore version that
> "wins". Other than that, I guess it's just a matter of specifying a
> single thread and (if necessary) shorting out the mesh coloring in the
> single-thread case.
>
> But OpenMPAssembler hasn't replaced Assembler, so I guess there are
> problems with this approach. Performance?
>

I don't believe that it's possible to have one Assembler without
compromising on performance.

OpenMPAssembler is slower than Assembler for one thread because it
requires a somewhat different loop over cells. Also, at least when I
last worked on OpenMPAssembler, it didn't support as many cases as
Assembler. Johan H has probably bridged most/all of the gap in the
mean time.

OpenMPAssembler needs more testing before removing the 'experimental' tag.

I think that the performance focus should be on SystemAssembler (with
the possibility of just assembling the LHS or RHS). Assembler and
OpenMPAssembler could be merged for now. The assembler code  would be
simpler if a number of the 'if' statements could be removed. Perhaps
the sub-domains code should be moved to the domain (i.e., the Mesh),
and the assemblers can just loop over sub-domains.

Garth

> -j.
>
> --
> https://code.launchpad.net/~jobh/dolfin/symmetric-assemble/+merge/91107
> Your team DOLFIN Core Team is requested to review the proposed merge of lp:~jobh/dolfin/symmetric-assemble into lp:dolfin.

lp:~jobh/dolfin/symmetric-assemble updated on 2012-02-02

6538. By Joachim Haga on 2012-02-02: Style issues following merge request

Revision history for this message

Joachim Haga (jobh) wrote on 2012-02-02:

#

I've fixed (or at least improved) the style issues, as requested.

I realise that the private-implementation (pImpl) style is not used much in dolfin, but I think it saves a lot of clutter in this case. Besides, the semi-public nature of methods like f.x. Assembler::assemble_exterior_facets forces a particular implementation and loop ordering, which complicates matters if the different implementations are to be merged.

Revision history for this message

Joachim Haga (jobh) wrote on 2012-02-02:

#

> OpenMPAssembler is slower than Assembler for one thread because it
> requires a somewhat different loop over cells. Also, at least when I
> last worked on OpenMPAssembler, it didn't support as many cases as
> Assembler. Johan H has probably bridged most/all of the gap in the
> mean time.
>
> OpenMPAssembler needs more testing before removing the 'experimental' tag.
>
> I think that the performance focus should be on SystemAssembler (with
> the possibility of just assembling the LHS or RHS). Assembler and
> OpenMPAssembler could be merged for now. The assembler code would be
> simpler if a number of the 'if' statements could be removed. Perhaps
> the sub-domains code should be moved to the domain (i.e., the Mesh),
> and the assemblers can just loop over sub-domains.

I'm confused. I thought you said earlier that SystemAssembler was
overly complex (and hence not desireable to extend), and Johan
mentioned that SystemAssembler/OpenMPAssembler uses similar
loop-orderings (hence, potentially, with similar single-thread
performance).

Anyway, I'll have a look at it later. As I mentioned, SystemAssembler
is a bit slower now in my tests, but I guess that would change if
there are many facet integrals since then the per-cell assemble starts
to pay off.

-j.

Revision history for this message

Garth Wells (garth-wells) wrote on 2012-02-02:

#

On 2 February 2012 13:07, Joachim Haga <email address hidden> wrote:
>> OpenMPAssembler is slower than Assembler for one thread because it
>> requires a somewhat different loop over cells. Also, at least when I
>> last worked on OpenMPAssembler, it didn't support as many cases as
>> Assembler. Johan H has probably bridged most/all of the gap in the
>> mean time.
>>
>> OpenMPAssembler needs more testing before removing the 'experimental' tag.
>>
>> I think that the performance focus should be on SystemAssembler (with
>> the possibility of just assembling the LHS or RHS). Assembler and
>> OpenMPAssembler could be merged for now. The assembler code would be
>> simpler if a number of the 'if' statements could be removed. Perhaps
>> the sub-domains code should be moved to the domain (i.e., the Mesh),
>> and the assemblers can just loop over sub-domains.
>
> I'm confused. I thought you said earlier that SystemAssembler was
> overly complex (and hence not desireable to extend),

I would like to see it simplified before being extended.

> and Johan
> mentioned that SystemAssembler/OpenMPAssembler uses similar
> loop-orderings (hence, potentially, with similar single-thread
> performance).
>

OpenMPAssembler has an outer loop over colours, and an inner loop over
cells. The other assemblers have one loop over cells.

> Anyway, I'll have a look at it later. As I mentioned, SystemAssembler
> is a bit slower now in my tests,

The difference was always very small in my tests. The original
SystemAssembler was faster, but making it more general has lead to a
performance drop (not a huge drop). It may be possible to improve the
performance of the bc searches in the assembler.

Another issue with SystemAssembler is that it is not robust in
parallel with the faster bc methods. The problem is that a partition
can have a vertices on a Dirichlet boundary but no facets. The results
is that bcs are not applied when they should be. I would like to see
this worked out before extending SystemAssembler.

Garth

> but I guess that would change if
> there are many facet integrals since then the per-cell assemble starts
> to pay off.
>
> -j.
>
> --
> https://code.launchpad.net/~jobh/dolfin/symmetric-assemble/+merge/91107
> Your team DOLFIN Core Team is requested to review the proposed merge of lp:~jobh/dolfin/symmetric-assemble into lp:dolfin.

On 2 February 2012 13:07, Joachim Haga <jobh@simula.no> wrote:
>> OpenMPAssembler is slower than Assembler for one thread because it
>> requires a somewhat different loop over cells. Also, at least when I
>> last worked on OpenMPAssembler, it didn't support as many cases as
>> Assembler. Johan H has probably bridged most/all of the gap in the
>> mean time.
>>
>> OpenMPAssembler needs more testing before removing the 'experimental' tag.
>>
>> I think that the performance focus should be on SystemAssembler (with
>> the possibility of just assembling the LHS or RHS). Assembler and
>> OpenMPAssembler could be merged for now. The assembler code  would be
>> simpler if a number of the 'if' statements could be removed. Perhaps
>> the sub-domains code should be moved to the domain (i.e., the Mesh),
>> and the assemblers can just loop over sub-domains.
>
> I'm confused. I thought you said earlier that SystemAssembler was
> overly complex (and hence not desireable to extend),

I would like to see it simplified before being extended.

> and Johan
> mentioned that SystemAssembler/OpenMPAssembler uses similar
> loop-orderings (hence, potentially, with similar single-thread
> performance).
>

OpenMPAssembler has an outer loop over colours, and an inner loop over
cells. The other assemblers have one loop over cells.

> Anyway, I'll have a look at it later. As I mentioned, SystemAssembler
> is a bit slower now in my tests,

The difference was always very small in my tests. The original
SystemAssembler was faster, but making it more general has lead to a
performance drop (not a huge drop). It may be possible to improve the
performance of the bc searches in the assembler.

Another issue with SystemAssembler is that it is not robust in
parallel with the faster bc methods. The problem is that a partition
can have a vertices on a Dirichlet boundary but no facets. The results
is that bcs are not applied when they should be. I would like to see
this worked out before extending SystemAssembler.

Garth

> but I guess that would change if
> there are many facet integrals since then the per-cell assemble starts
> to pay off.
>
> -j.
>
> --
> https://code.launchpad.net/~jobh/dolfin/symmetric-assemble/+merge/91107
> Your team DOLFIN Core Team is requested to review the proposed merge of lp:~jobh/dolfin/symmetric-assemble into lp:dolfin.

Revision history for this message

Joachim Haga (jobh) wrote on 2012-02-02:

#

>
> > I'm confused. I thought you said earlier that SystemAssembler was
> > overly complex (and hence not desireable to extend),
>
> I would like to see it simplified before being extended.

I see, ok!

> OpenMPAssembler has an outer loop over colours, and an inner loop over
> cells. The other assemblers have one loop over cells.

Oh, the outer loop won't be a problem, all cells can be set to the same
color in sequential runs.

> Another issue with SystemAssembler is that it is not robust in
> parallel with the faster bc methods.
>

Noted! The only thing I'm missing now is... why, with all these problems,
do you still recommend SystemAssembler as the path forward? ;)

-j.

Revision history for this message

Garth Wells (garth-wells) wrote on 2012-02-02:

#

On 2 February 2012 13:46, Joachim Haga <email address hidden> wrote:
>>
>> > I'm confused. I thought you said earlier that SystemAssembler was
>> > overly complex (and hence not desireable to extend),
>>
>> I would like to see it simplified before being extended.
>
>
> I see, ok!
>
>
>> OpenMPAssembler has an outer loop over colours, and an inner loop over
>> cells. The other assemblers have one loop over cells.
>
>
> Oh, the outer loop won't be a problem, all cells can be set to the same
> color in sequential runs.
>

Yes, but there are some subtle issues that need to be taken care off.
Assembler uses a Mesh iterator, but we can't use the iterators in
OpenMPAssembler, so we loop with an integer, get the cell index and
then create a cell. We might want to integrate colouring more deeply
in Mesh, which would make things easier.

>
>> Another issue with SystemAssembler is that it is not robust in
>> parallel with the faster bc methods.
>>
>
> Noted! The only thing I'm missing now is... why, with all these problems,
> do you still recommend SystemAssembler as the path forward? ;)
>

Symmetry!

Garth

> -j.
>
> --
> https://code.launchpad.net/~jobh/dolfin/symmetric-assemble/+merge/91107
> Your team DOLFIN Core Team is requested to review the proposed merge of lp:~jobh/dolfin/symmetric-assemble into lp:dolfin.

Revision history for this message

Joachim Haga (jobh) wrote on 2012-02-02:

#

>
> > Noted! The only thing I'm missing now is... why, with all these problems,
> > do you still recommend SystemAssembler as the path forward? ;)
>
> Symmetry!

Aha! I think you need to accept my merge request, so that we get symmetry
without SystemAssembler!

:-)

-j.

Revision history for this message

Anders Logg (logg) wrote on 2012-02-02:

#

On Thu, Feb 02, 2012 at 10:20:13AM -0000, Garth Wells wrote:
> On 2 February 2012 10:01, Joachim Haga <email address hidden> wrote:
> >>> Not sure about multicore, but can have a look.
> >>
> >> This is similar to the present AssembleSystem, in that it essentially iterates
> >> over cells and then facets in each cell.
> >>
> >> AFAIK, assemble over interior faces have not got as much love as the other
> >> integrals. There are also some code dublications which can be removed (I
> >> think). We have assemble_cells and assemble_cells_and_exterior_facets. One can
> >> probably just have the latter.
> >
> > Ok. I assume there's a reason that multicore does it in this way,
> > meaning that if they are combined then it's the multicore version that
> > "wins". Other than that, I guess it's just a matter of specifying a
> > single thread and (if necessary) shorting out the mesh coloring in the
> > single-thread case.
> >
> > But OpenMPAssembler hasn't replaced Assembler, so I guess there are
> > problems with this approach. Performance?
> >
>
> I don't believe that it's possible to have one Assembler without
> compromising on performance.

Perhaps not, but it should be possible to share much more of the code.

> OpenMPAssembler is slower than Assembler for one thread because it
> requires a somewhat different loop over cells. Also, at least when I
> last worked on OpenMPAssembler, it didn't support as many cases as
> Assembler. Johan H has probably bridged most/all of the gap in the
> mean time.
>
> OpenMPAssembler needs more testing before removing the 'experimental' tag.

Agree.

> I think that the performance focus should be on SystemAssembler (with
> the possibility of just assembling the LHS or RHS). Assembler and
> OpenMPAssembler could be merged for now. The assembler code would be
> simpler if a number of the 'if' statements could be removed. Perhaps
> the sub-domains code should be moved to the domain (i.e., the Mesh),
> and the assemblers can just loop over sub-domains.

Yes, it should be possible to move that logic elsewhere. It probably
should not go into the mesh, since it is possible to assemble with the
subdomain specification separate from the mesh (which is the reason
for the somewhat complex logic). It might be possible to just move it
to a separate utility function, or a new data structure may be needed.

--
Anders

On Thu, Feb 02, 2012 at 10:20:13AM -0000, Garth Wells wrote:
> On 2 February 2012 10:01, Joachim Haga <jobh@simula.no> wrote:
> >>> Not sure about multicore, but can have a look.
> >>
> >> This is similar to the present AssembleSystem, in that it essentially iterates
> >> over cells and then facets in each cell.
> >>
> >> AFAIK, assemble over interior faces have not got as much love as the other
> >> integrals. There are also some code dublications which can be removed (I
> >> think). We have assemble_cells and assemble_cells_and_exterior_facets. One can
> >> probably just have the latter.
> >
> > Ok. I assume there's a reason that multicore does it in this way,
> > meaning that if they are combined then it's the multicore version that
> > "wins". Other than that, I guess it's just a matter of specifying a
> > single thread and (if necessary) shorting out the mesh coloring in the
> > single-thread case.
> >
> > But OpenMPAssembler hasn't replaced Assembler, so I guess there are
> > problems with this approach. Performance?
> >
>
> I don't believe that it's possible to have one Assembler without
> compromising on performance.

Perhaps not, but it should be possible to share much more of the code.

> OpenMPAssembler is slower than Assembler for one thread because it
> requires a somewhat different loop over cells. Also, at least when I
> last worked on OpenMPAssembler, it didn't support as many cases as
> Assembler. Johan H has probably bridged most/all of the gap in the
> mean time.
>
> OpenMPAssembler needs more testing before removing the 'experimental' tag.

Agree.

> I think that the performance focus should be on SystemAssembler (with
> the possibility of just assembling the LHS or RHS). Assembler and
> OpenMPAssembler could be merged for now. The assembler code  would be
> simpler if a number of the 'if' statements could be removed. Perhaps
> the sub-domains code should be moved to the domain (i.e., the Mesh),
> and the assemblers can just loop over sub-domains.

Yes, it should be possible to move that logic elsewhere. It probably
should not go into the mesh, since it is possible to assemble with the
subdomain specification separate from the mesh (which is the reason
for the somewhat complex logic). It might be possible to just move it
to a separate utility function, or a new data structure may be needed.

--
Anders

Revision history for this message

Anders Logg (logg) wrote on 2012-02-02:

#

On Thu, Feb 02, 2012 at 02:01:22PM -0000, Garth Wells wrote:
> On 2 February 2012 13:46, Joachim Haga <email address hidden> wrote:
> >>
> >> > I'm confused. I thought you said earlier that SystemAssembler was
> >> > overly complex (and hence not desireable to extend),
> >>
> >> I would like to see it simplified before being extended.
> >
> >
> > I see, ok!
> >
> >
> >> OpenMPAssembler has an outer loop over colours, and an inner loop over
> >> cells. The other assemblers have one loop over cells.
> >
> >
> > Oh, the outer loop won't be a problem, all cells can be set to the same
> > color in sequential runs.
> >
>
> Yes, but there are some subtle issues that need to be taken care off.
> Assembler uses a Mesh iterator, but we can't use the iterators in
> OpenMPAssembler, so we loop with an integer, get the cell index and
> then create a cell. We might want to integrate colouring more deeply
> in Mesh, which would make things easier.

We could either make CellIterator that takes an optional color
argument, or we could loop over integers also in the regular
assembler.

--
Anders

> >
> >> Another issue with SystemAssembler is that it is not robust in
> >> parallel with the faster bc methods.
> >>
> >
> > Noted! The only thing I'm missing now is... why, with all these problems,
> > do you still recommend SystemAssembler as the path forward? ;)
> >
>
> Symmetry!
>
> Garth
>
> > -j.
> >
>

lp:~jobh/dolfin/symmetric-assemble updated on 2012-02-03

6539. By Joachim Haga on 2012-02-03: Accept arbitrary iterables in addition to lists for bcs argument to assemble

Revision history for this message

Joachim Haga (jobh) wrote on 2012-02-04:

#

On 2 February 2012 15:01, Garth Wells <email address hidden> wrote:

> On 2 February 2012 13:46, Joachim Haga <email address hidden> wrote:>> Another
> issue with SystemAssembler is that it is not robust in
> >> parallel with the faster bc methods.
> >
> > Noted!
>

I've been thinking a bit about this, and as far as I can understand this
can only happen if DirichletBC::get_boundary_values() does not return all
(local) Dirichlet vertices for a given partition, is that right? If so,
maybe the warning that is printed by SystemAssemble should be moved there?

Anyway, it was a useful excercise, because I also figured out two problems
with the new assembler under similar circumstances. The first is serious
(wrong diagonal value), but also easy to fix -- I'll do it first thing on
Monday, and push to this branch.

The second, if I understand correctly the SystemAssemble problem you
mention, is that the new assembler will fail to fully symmetricise the
matrix when the local boundary-vertex records are incomplete. This will
require fixing DirichletBC::get_boundary_values, which is harder. However,
the consequences are not as bad as they are for SystemAssembler, since the
matrix is complete (just not totally symmetric).

-j.

lp:~jobh/dolfin/symmetric-assemble updated on 2012-02-06

6540. By Joachim Haga on 2012-02-05

More style changes for SymmetricAssembler

6541. By Joachim Haga on 2012-02-06

Fix for parallel SymmetricAssembler.

It now uses GenericMatrix::ident() to set BC rows, to ensure that the diagonal is 1.0.
If non-robust methods (topological etc) are used to compute BCs, the assembled matrix
may not be completely symmetric.

Revision history for this message

Joachim Haga (jobh) wrote on 2012-02-06:

#

I've pushed a change which should fix SymmetricAssembler in parallel. It uses GenericMatrix::ident to set the diagonal values, which should always work (it does, however, require a finalized matrix -- so it's not possible to return an unfinalized one now).

But maybe it would be better to just enforce a more robust boundary search. Even pointwise search should be significantly cheaper than assemble. (But creating a DirichletBC from a subdomain is expensive -- something like 10x the time of assemble.)

Revision history for this message

Garth Wells (garth-wells) wrote on 2012-02-06:

#

On 6 February 2012 10:24, Joachim Haga <email address hidden> wrote:
> I've pushed a change which should fix SymmetricAssembler in parallel. It uses GenericMatrix::ident to set the diagonal values, which should always work (it does, however, require a finalized matrix -- so it's not possible to return an unfinalized one now).
>

How can this always work? It can destroy symmetry, right??

Garth

> But maybe it would be better to just enforce a more robust boundary search. Even pointwise search should be significantly cheaper than assemble. (But creating a DirichletBC from a subdomain is expensive -- something like 10x the time of assemble.)
> --
> https://code.launchpad.net/~jobh/dolfin/symmetric-assemble/+merge/91107
> Your team DOLFIN Core Team is requested to review the proposed merge of lp:~jobh/dolfin/symmetric-assemble into lp:dolfin.

Revision history for this message

Joachim Haga (jobh) wrote on 2012-02-06:

#

On 6 February 2012 13:06, Garth Wells <email address hidden> wrote:

> On 6 February 2012 10:24, Joachim Haga <email address hidden> wrote:
> > I've pushed a change which should fix SymmetricAssembler in parallel. It
> uses GenericMatrix::ident to set the diagonal values, which should always
> work (it does, however, require a finalized matrix -- so it's not possible
> to return an unfinalized one now).
> >
>
> How can this always work? It can destroy symmetry, right??
>

Yes. Depends on what you mean by "works", I guess. The remaining problem,
as I said, is that the boundary column is NOT zeroed if the boundary dof is
sometimes missing from DirichletBC::get_boundary_values(), as I understand
can happen with topology-search in parallel.

In that case, ident() will still set the boundary row, but the boundary
column will not be completely symmetricised. Hence, the matrix is correct
in the sense that Ax=b gives the right answer for x, but A has a small
unsymmetric component. This is less severe than for SystemAssemble, which
will be symmetric but wrong.

Now, topological search should be easy enough to fix, but it will make it
heavier (parallel comms + a search through the dofs). Maybe it's not worth
it, because the cost of pointwise search is (usually?) low compared to
other overhead.

-j.

Revision history for this message

Garth Wells (garth-wells) wrote on 2012-02-06:

#

On 6 February 2012 12:45, Joachim Haga <email address hidden> wrote:
> On 6 February 2012 13:06, Garth Wells <email address hidden> wrote:
>
>> On 6 February 2012 10:24, Joachim Haga <email address hidden> wrote:
>> > I've pushed a change which should fix SymmetricAssembler in parallel. It
>> uses GenericMatrix::ident to set the diagonal values, which should always
>> work (it does, however, require a finalized matrix -- so it's not possible
>> to return an unfinalized one now).
>> >
>>
>> How can this always work? It can destroy symmetry, right??
>>
>
> Yes. Depends on what you mean by "works", I guess. The remaining problem,
> as I said, is that the boundary column is NOT zeroed if the boundary dof is
> sometimes missing from DirichletBC::get_boundary_values(), as I understand
> can happen with topology-search in parallel.
>
> In that case, ident() will still set the boundary row, but the boundary
> column will not be completely symmetricised. Hence, the matrix is correct
> in the sense that Ax=b gives the right answer for x, but A has a small
> unsymmetric component. This is less severe than for SystemAssemble, which
> will be symmetric but wrong.
>

It's just as bad. It will produce the wrong answer in common cases,
e.g. Cholesky factorisation.

> Now, topological search should be easy enough to fix, but it will make it
> heavier (parallel comms + a search through the dofs). Maybe it's not worth
> it, because the cost of pointwise search is (usually?) low compared to
> other overhead.
>

Yes, the solution is to look at the DirichletBC implementation.

Garth

> -j.
>
> --
> https://code.launchpad.net/~jobh/dolfin/symmetric-assemble/+merge/91107
> Your team DOLFIN Core Team is requested to review the proposed merge of lp:~jobh/dolfin/symmetric-assemble into lp:dolfin.

Revision history for this message

Joachim Haga (jobh) wrote on 2012-02-06:

#

>
> > unsymmetric component. This is less severe than for SystemAssemble, which
> > will be symmetric but wrong.
> >
>
> It's just as bad. It will produce the wrong answer in common cases,
> e.g. Cholesky factorisation.

Right you are, thanks! I'll put a warning in (like in SystemAssembler).

> > Now, topological search should be easy enough to fix, but it will make it
> > heavier (parallel comms + a search through the dofs). Maybe it's not
> worth
> > it, because the cost of pointwise search is (usually?) low compared to
> > other overhead.
> >
>
> Yes, the solution is to look at the DirichletBC implementation.
>

Not sure what you mean, "look at" as in "study" or as in "fix"? I'm leaning
towards thinking that the actual boundary search is not so important
compared to other overhead, so it's completely acceptable to just say "use
pointwise" in parallel.

-j.

Revision history for this message

Garth Wells (garth-wells) wrote on 2012-02-06:

#

On 6 February 2012 13:28, Joachim Haga <email address hidden> wrote:
>>
>> > unsymmetric component. This is less severe than for SystemAssemble, which
>> > will be symmetric but wrong.
>> >
>>
>> It's just as bad. It will produce the wrong answer in common cases,
>> e.g. Cholesky factorisation.
>
>
> Right you are, thanks! I'll put a warning in (like in SystemAssembler).
>
>
>> > Now, topological search should be easy enough to fix, but it will make it
>> > heavier (parallel comms + a search through the dofs). Maybe it's not
>> worth
>> > it, because the cost of pointwise search is (usually?) low compared to
>> > other overhead.
>> >
>>
>> Yes, the solution is to look at the DirichletBC implementation.
>>
>
> Not sure what you mean, "look at" as in "study" or as in "fix"?

Make faster (and improve re-use/caching of data, if possible) the
"pointwise" case.

We could also communicate boundary condition facets that have vertices
that are not wholly owned by a given process. There should be
relatively few.

Garth

> I'm leaning
> towards thinking that the actual boundary search is not so important
> compared to other overhead, so it's completely acceptable to just say "use
> pointwise" in parallel.
>
> -j.
>
> --
> https://code.launchpad.net/~jobh/dolfin/symmetric-assemble/+merge/91107
> Your team DOLFIN Core Team is requested to review the proposed merge of lp:~jobh/dolfin/symmetric-assemble into lp:dolfin.

lp:~jobh/dolfin/symmetric-assemble updated on 2012-02-09

6542. By Joachim Haga on 2012-02-09: Add timers to DirichletBC
6543. By Joachim Haga on 2012-02-09: SymmetricAssembler: Require pointwise DirichletBC in parallel, add back finalize_tensor to interface.

Revision history for this message

Joachim Haga (jobh) wrote on 2012-02-09:

#

OK, ready for review (again) now! It's back to not using ident() but rather setting the diagonal element-wise, as it did before. Pointwise boundary search is required in parallel.

Tested on a 256x256 unitsquare Poisson problem (sequential), symmetric_assemble() is 1% slower than regular assemble(). assemble_system() is 30% slower.

I've added some timers to DirichletBC as well (and removed one in AssemblerTools, which didn't measure anything).

Revision history for this message

Joachim Haga (jobh) wrote on 2012-02-14:

#

Are there any objections to merging this symmetric assembler? In particular, objections to the interface?

If not, I'll prepare a branch for 1.1 as well.

Anything else (merging the different assemblers, fixing DirichletBC pointwise performance / parallel correctness, etc) can be dealt with separately later.

lp:~jobh/dolfin/symmetric-assemble updated on 2012-02-14

6544. By Joachim Haga on 2012-02-14: Add copyright and modification information.

Revision history for this message

Anders Logg (logg) wrote on 2012-02-16:

#

On Tue, Feb 14, 2012 at 09:05:23AM -0000, Joachim Haga wrote:
> Are there any objections to merging this symmetric assembler? In particular, objections to the interface?

Not from me.

--
Anders

> If not, I'll prepare a branch for 1.1 as well.
>
> Anything else (merging the different assemblers, fixing DirichletBC pointwise performance / parallel correctness, etc) can be dealt with separately later.

Revision history for this message

Johan Hake (johan-hake) wrote on 2012-02-17:

#

On 02/16/2012 11:37 PM, Anders Logg wrote:
> On Tue, Feb 14, 2012 at 09:05:23AM -0000, Joachim Haga wrote:
>> Are there any objections to merging this symmetric assembler? In particular, objections to the interface?
> Not from me.

Neither from me.

>> If not, I'll prepare a branch for 1.1 as well.

Not sure what you mean with this. Are you going to include it in 1.0.x
branch? Even if it is a a feature which will, not for now break any
code, it will most probably trigger some iterations on the interface,
read merging of Assemblers, which a stable interface should be spared
from. But that is my opinion.

Johan

>> Anything else (merging the different assemblers, fixing DirichletBC pointwise performance / parallel correctness, etc) can be dealt with separately later.
>

Revision history for this message

Joachim Haga (jobh) wrote on 2012-02-17:

#

Den 17. feb. 2012 08.10 skrev "Johan Hake" <email address hidden>:
> >> If not, I'll prepare a branch for 1.1 as well.
>
> Not sure what you mean with this. Are you going to include it in 1.0.x
> branch?

No, I mixed up with the 1.1 series, I thought there was a branch as well.
Now I see there isn't.

-j.

Revision history for this message

Johan Hake (johan-hake) wrote on 2012-02-17:

#

On 02/17/2012 08:31 AM, Joachim Haga wrote:
> Den 17. feb. 2012 08.10 skrev "Johan Hake"<email address hidden>:
>>>> If not, I'll prepare a branch for 1.1 as well.
>> Not sure what you mean with this. Are you going to include it in 1.0.x
>> branch?
> No, I mixed up with the 1.1 series, I thought there was a branch as well.
> Now I see there isn't.

Ok!

Johan

>
> -j.
>

Revision history for this message

Anders Logg (logg) wrote on 2012-02-17:

#

On Fri, Feb 17, 2012 at 07:10:21AM -0000, Johan Hake wrote:
> On 02/16/2012 11:37 PM, Anders Logg wrote:
> > On Tue, Feb 14, 2012 at 09:05:23AM -0000, Joachim Haga wrote:
> >> Are there any objections to merging this symmetric assembler? In particular, objections to the interface?
> > Not from me.
>
> Neither from me.
>
> >> If not, I'll prepare a branch for 1.1 as well.
>
> Not sure what you mean with this. Are you going to include it in 1.0.x
> branch? Even if it is a a feature which will, not for now break any
> code, it will most probably trigger some iterations on the interface,
> read merging of Assemblers, which a stable interface should be spared
> from. But that is my opinion.

This should *not* be merged into 1.0. We should only do bug fixes to
the stable branch.

--
Anders

> Johan
>
> >> Anything else (merging the different assemblers, fixing DirichletBC pointwise performance / parallel correctness, etc) can be dealt with separately later.
> >
>
>

Revision history for this message

Joachim Haga (jobh) wrote on 2012-02-17:

#

On 17 February 2012 08:43, Anders Logg <email address hidden> wrote:
> This should *not* be merged into 1.0. We should only do bug fixes to
> the stable branch.

Don't worry, I just thought (erroneously) that there was a 1.1 branch
for the 1.1 series.

-j.

Revision history for this message

Anders Logg (logg) wrote on 2012-02-17:

#

On Fri, Feb 17, 2012 at 07:55:21AM -0000, Joachim Haga wrote:
> On 17 February 2012 08:43, Anders Logg <email address hidden> wrote:
> > This should *not* be merged into 1.0. We should only do bug fixes to
> > the stable branch.
>
> Don't worry, I just thought (erroneously) that there was a 1.1 branch
> for the 1.1 series.

ok. :-)

--
Anders

Revision history for this message

Joachim Haga (jobh) wrote on 2012-02-23:

#

I'm sorry if I nag, but since there are no objections, could this be merged? I'd like to finish this and move on.

Revision history for this message

Johan Hake (johan-hake) wrote on 2012-02-23:

#

On 02/24/2012 12:10 AM, Joachim Haga wrote:
> I'm sorry if I nag, but since there are no objections, could this be
> merged? I'd like to finish this and move on.

I am on a travel from now on so I wont be able to apply it right now.

Johan

Revision history for this message

Anders Logg (logg) wrote on 2012-02-23:

#

On Thu, Feb 23, 2012 at 11:31:17PM -0000, Johan Hake wrote:
> On 02/24/2012 12:10 AM, Joachim Haga wrote:
> > I'm sorry if I nag, but since there are no objections, could this be
> > merged? I'd like to finish this and move on.
>
> I am on a travel from now on so I wont be able to apply it right now.

I can merge it but Garth needs to comment first. He had objections
before.

--
Anders

Revision history for this message

Garth Wells (garth-wells) wrote on 2012-02-27:

#

If it passes all the tests and symmetry preservation is *guaranteed*, go ahead.

Bear in mind that it may be changed in the future once we support unassembled matrices.

review: Approve

Revision history for this message

Joachim Haga (jobh) wrote on 2012-02-27:

#

Thanks! Just hold on a few minutes while I check -Wall -Werror
-pedantic compilation.

-j.

On 27 February 2012 15:09, Garth Wells <email address hidden> wrote:
> Review: Approve
>
> If it passes all the tests and symmetry preservation is *guaranteed*, go ahead.
>
> Bear in mind that it may be changed in the future once we support unassembled matrices.
> --
> https://code.launchpad.net/~jobh/dolfin/symmetric-assemble/+merge/91107
> You are the owner of lp:~jobh/dolfin/symmetric-assemble.

Revision history for this message

Joachim Haga (jobh) wrote on 2012-02-27:

#

Ok, compiles fine now.

lp:~jobh/dolfin/symmetric-assemble updated on 2012-02-27

6545. By Joachim Haga on 2012-02-27: Fix errors from -Wall -Werror -pedantic

Revision history for this message

Anders Logg (logg) wrote on 2012-02-27:

#

On Mon, Feb 27, 2012 at 03:16:23PM -0000, Joachim Haga wrote:
> Ok, compiles fine now.

Great. I'll merge it into my branch then push when my buildbot is
green. I have a couple of changesets lined up for merge.

--
Anders

lp:~jobh/dolfin/symmetric-assemble updated on 2012-02-29

6546. By Joachim Haga on 2012-02-28: Revert accidental change to GenericMatrix
6547. By Joachim Haga on 2012-02-28: Merge with trunk
6548. By Joachim Haga on 2012-02-29: Merge with trunk
6549. By Joachim Haga on 2012-02-29: Make symmetric assembler safe with topological dirichlet bcs in parallel
6550. By Joachim Haga on 2012-02-29: Enable SymmetricAssembler unit test

DOLFIN

Merge lp:~jobh/dolfin/symmetric-assemble into lp:~fenics-core/dolfin/trunk

Commit message

Description of the change

Preview Diff

Subscribers

 === modified file 'dolfin/common/utils.h'
 --- dolfin/common/utils.h	2011-06-02 19:26:59 +0000
 +++ dolfin/common/utils.h	2012-02-29 22:48:20 +0000
@@ -15,13 +15,18 @@
  // You should have received a copy of the GNU Lesser General Public License
  // along with DOLFIN. If not, see <http://www.gnu.org/licenses/>.
  //
++// Modified by Joachim B. Haga, 2012.
++//
  // First added:  2009-08-09
--// Last changed: 2010-11-18
++// Last changed: 2012-02-01
  #ifndef __UTILS_H
  #define __UTILS_H
  #include <string>
++#include <cstring>
++#include <limits>
++#include <vector>
  #include "types.h"
  namespace dolfin
@@ -39,6 +44,20 @@
    /// Return simple hash for given signature string
    dolfin::uint hash(std::string signature);
++  /// Fast zero-fill of numeric vectors / blocks.
++  template <class T> inline void zerofill(T* arr, uint n)
++  {
++    if (std::numeric_limits<T>::is_integer || std::numeric_limits<T>::is_iec559)
++      std::memset(arr, 0, n*sizeof(T));
++    else
++      // should never happen in practice
++      std::fill_n(arr, n, T(0));
++  }
++
++  template <class T> inline void zerofill(std::vector<T> &vec)
++  {
++    zerofill(&vec[0], vec.size());
++  }
+ }
  #endif
 === modified file 'dolfin/fem/DirichletBC.cpp'
 --- dolfin/fem/DirichletBC.cpp	2012-02-29 13:22:33 +0000
 +++ dolfin/fem/DirichletBC.cpp	2012-02-29 22:48:20 +0000
@@ -18,7 +18,7 @@
  // Modified by Kristian Oelgaard, 2008
  // Modified by Martin Sandve Alnes, 2008
  // Modified by Johan Hake, 2009
--// Modified by Joachim B Haga, 2009
++// Modified by Joachim B. Haga, 2012
  //
  // First added:  2007-04-10
  // Last changed: 2012-02-29
@@ -28,6 +28,7 @@
  #include <boost/assign/list_of.hpp>
  #include <boost/serialization/utility.hpp>
++#include <dolfin/common/Timer.h>
  #include <dolfin/common/constants.h>
  #include <dolfin/common/Array.h>
  #include <dolfin/common/NoDeleter.h>
@@ -205,6 +206,8 @@
  //-----------------------------------------------------------------------------
  void DirichletBC::gather(Map& boundary_values) const
+ {
++  Timer timer("DirichletBC gather");
++
    typedef std::vector<std::pair<uint, double> > bv_vec_type;
    typedef std::map<uint, bv_vec_type> map_type;
@@ -480,6 +483,8 @@
                          GenericVector* b,
                          const GenericVector* x) const
+ {
++  Timer timer("DirichletBC apply");
++
    // Check arguments
    check_arguments(A, b, x);
@@ -605,6 +610,8 @@
  //-----------------------------------------------------------------------------
  void DirichletBC::init_facets() const
+ {
++  Timer timer("DirichletBC init facets");
++
    if (facets.size() > 0)
      return;
@@ -705,6 +712,8 @@
                               BoundaryCondition::LocalData& data,
                               std::string method) const
+ {
++  Timer timer("DirichletBC compute bc");
++
    // Set method if dafault
    if (method == "default")
      method = _method;
 === modified file 'dolfin/fem/DirichletBC.h'
 --- dolfin/fem/DirichletBC.h	2012-02-29 13:22:33 +0000
 +++ dolfin/fem/DirichletBC.h	2012-02-29 22:48:20 +0000
@@ -289,7 +289,11 @@
      void apply(GenericMatrix& A, GenericVector& b,
                 const GenericVector& x) const;
--    /// Get Dirichlet dofs and values
++    /// Get Dirichlet dofs and values. If a method other than 'pointwise' is
++    /// used in parallel, the map may not be complete for local vertices since
++    /// a vertex can have a bc applied, but the partition might not have a
++    /// facet on the boundary. To ensure all local boundary dofs are marked,
++    /// it is necessary to call gather() on the returned boundary values.
      ///
      /// *Arguments*
      ///     boundary_values (boost::unordered_map<uint, double>)
 === added file 'dolfin/fem/SymmetricAssembler.cpp'
 --- dolfin/fem/SymmetricAssembler.cpp	1970-01-01 00:00:00 +0000
 +++ dolfin/fem/SymmetricAssembler.cpp	2012-02-29 22:48:20 +0000
@@ -0,0 +1,588 @@
++// Copyright (C) 2007-2011 Anders Logg
++// Copyright (C) 2012 Joachim B. Haga
++//
++// This file is part of DOLFIN.
++//
++// DOLFIN is free software: you can redistribute it and/or modify
++// it under the terms of the GNU Lesser General Public License as published by
++// the Free Software Foundation, either version 3 of the License, or
++// (at your option) any later version.
++//
++// DOLFIN is distributed in the hope that it will be useful,
++// but WITHOUT ANY WARRANTY; without even the implied warranty of
++// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
++// GNU Lesser General Public License for more details.
++//
++// You should have received a copy of the GNU Lesser General Public License
++// along with DOLFIN. If not, see <http://www.gnu.org/licenses/>.
++//
++// First added: 2012-02-01 (modified from Assembler.cpp by jobh@simula.no)
++
++#include <boost/scoped_ptr.hpp>
++
++#include <dolfin/log/dolfin_log.h>
++#include <dolfin/common/Timer.h>
++#include <dolfin/common/utils.h>
++#include <dolfin/parameter/GlobalParameters.h>
++#include <dolfin/la/GenericMatrix.h>
++#include <dolfin/mesh/Mesh.h>
++#include <dolfin/mesh/Cell.h>
++#include <dolfin/mesh/Facet.h>
++#include <dolfin/mesh/MeshData.h>
++#include <dolfin/mesh/MeshFunction.h>
++#include <dolfin/mesh/SubDomain.h>
++#include <dolfin/function/GenericFunction.h>
++#include <dolfin/function/FunctionSpace.h>
++#include "GenericDofMap.h"
++#include "Form.h"
++#include "UFC.h"
++#include "FiniteElement.h"
++#include "AssemblerTools.h"
++#include "SymmetricAssembler.h"
++
++using namespace dolfin;
++
++/// The private implementation class. It holds all relevant parameters for a
++/// single assemble, the implementation, and some scratch variables. Its
++/// lifetime is never longer than the assemble itself, so it's safe to keep
++/// references to parameters.
++class SymmetricAssembler::PImpl
++{
++public:
++  // User-provided parameters
++  GenericMatrix& A;
++  GenericMatrix& A_asymm;
++  const Form& a;
++  const std::vector<const DirichletBC*>& row_bcs;
++  const std::vector<const DirichletBC*>& col_bcs;
++  const MeshFunction<uint>* cell_domains;
++  const MeshFunction<uint>* exterior_facet_domains;
++  const MeshFunction<uint>* interior_facet_domains;
++  bool reset_sparsity, add_values, finalize_tensor;
++
++  PImpl(GenericMatrix& _A, GenericMatrix& _A_asymm,
++        const Form& _a,
++        const std::vector<const DirichletBC*>& _row_bcs,
++        const std::vector<const DirichletBC*>& _col_bcs,
++        const MeshFunction<uint>* _cell_domains,
++        const MeshFunction<uint>* _exterior_facet_domains,
++        const MeshFunction<uint>* _interior_facet_domains,
++        bool _reset_sparsity, bool _add_values, bool _finalize_tensor)
++    : A(_A), A_asymm(_A_asymm), a(_a),
++      row_bcs(_row_bcs), col_bcs(_col_bcs),
++      cell_domains(_cell_domains),
++      exterior_facet_domains(_exterior_facet_domains),
++      interior_facet_domains(_interior_facet_domains),
++      reset_sparsity(_reset_sparsity),
++      add_values(_add_values),
++      finalize_tensor(_finalize_tensor),
++      mesh(_a.mesh()), ufc(_a), ufc_asymm(_a)
++  {
++  }
++
++  void assemble();
++
++private:
++  void assemble_cells();
++  void assemble_exterior_facets();
++  void assemble_interior_facets();
++
++  // Adjust the columns of the local element tensor so that it becomes
++  // symmetric once BCs have been applied to the rows. Returns true if any
++  // columns have been moved to _asymm.
++  bool make_bc_symmetric(std::vector<double>& elm_A, std::vector<double>& elm_A_asymm,
++                         const std::vector<const std::vector<uint>*>& dofs);
++
++  // These are derived from the variables above:
++  const Mesh& mesh;     // = Mesh(a)
++  UFC ufc;              // = UFC(a)
++  UFC ufc_asymm;        // = UFC(a), used for scratch local tensors
++  bool matching_bcs;    // true if row_bcs==col_bcs
++  DirichletBC::Map row_bc_values; // derived from row_bcs
++  DirichletBC::Map col_bc_values; // derived from col_bcs, but empty if matching_bcs
++
++  // These are used to keep track of which diagonals have been set:
++  std::pair<uint,uint> processor_dof_range;
++  std::set<uint> inserted_diagonals;
++
++  // Scratch variables
++  std::vector<bool> local_row_is_bc;
++};
++//-----------------------------------------------------------------------------
++void SymmetricAssembler::assemble(GenericMatrix& A,
++                                  GenericMatrix& A_asymm,
++                                  const Form& a,
++                                  const std::vector<const DirichletBC*>& row_bcs,
++                                  const std::vector<const DirichletBC*>& col_bcs,
++                                  const MeshFunction<uint>* cell_domains,
++                                  const MeshFunction<uint>* exterior_facet_domains,
++                                  const MeshFunction<uint>* interior_facet_domains,
++                                  bool reset_sparsity,
++                                  bool add_values,
++                                  bool finalize_tensor)
++{
++  PImpl pImpl(A, A_asymm, a, row_bcs, col_bcs,
++            cell_domains, exterior_facet_domains, interior_facet_domains,
++            reset_sparsity, add_values, finalize_tensor);
++  pImpl.assemble();
++}
++//-----------------------------------------------------------------------------
++void SymmetricAssembler::PImpl::assemble()
++{
++  // All assembler functions above end up calling this function, which
++  // in turn calls the assembler functions below to assemble over
++  // cells, exterior and interior facets.
++  //
++  // Important notes:
++  //
++  // 1. Note the importance of treating empty mesh functions as null
++  // pointers for the PyDOLFIN interface.
++  //
++  // 2. Note that subdomains given as input to this function override
++  // subdomains attached to forms, which in turn override subdomains
++  // stored as part of the mesh.
++
++  // If the bcs match (which is the usual case), we are assembling a normal
++  // square matrix which contains the diagonal (and the dofmaps should match,
++  // too).
++  matching_bcs = (row_bcs == col_bcs);
++
++  // Get Dirichlet dofs rows and values for local mesh
++  for (uint i = 0; i < row_bcs.size(); ++i)
++  {
++    row_bcs[i]->get_boundary_values(row_bc_values);
++    if (MPI::num_processes() > 1 && row_bcs[i]->method() != "pointwise")
++      row_bcs[i]->gather(row_bc_values);
++  }
++  if (!matching_bcs)
++  {
++    // Get Dirichlet dofs columns and values for local mesh
++    for (uint i = 0; i < col_bcs.size(); ++i)
++    {
++      col_bcs[i]->get_boundary_values(col_bc_values);
++      if (MPI::num_processes() > 1 && col_bcs[i]->method() != "pointwise")
++        col_bcs[i]->gather(col_bc_values);
++    }
++  }
++
++  dolfin_assert(a.rank() == 2);
++
++  // Get cell domains
++  if (!cell_domains || cell_domains->size() == 0)
++  {
++    cell_domains = a.cell_domains_shared_ptr().get();
++    if (!cell_domains)
++      cell_domains = a.mesh().domains().cell_domains(a.mesh()).get();
++  }
++
++  // Get exterior facet domains
++  if (!exterior_facet_domains || exterior_facet_domains->size() == 0)
++  {
++    exterior_facet_domains = a.exterior_facet_domains_shared_ptr().get();
++    if (!exterior_facet_domains)
++      exterior_facet_domains = a.mesh().domains().facet_domains(a.mesh()).get();
++  }
++
++  // Get interior facet domains
++  if (!interior_facet_domains || interior_facet_domains->size() == 0)
++  {
++    interior_facet_domains = a.interior_facet_domains_shared_ptr().get();
++    if (!interior_facet_domains)
++      interior_facet_domains = a.mesh().domains().facet_domains(a.mesh()).get();
++  }
++
++  // Check form
++  AssemblerTools::check(a);
++
++  // Gather off-process coefficients
++  const std::vector<boost::shared_ptr<const GenericFunction> >
++    coefficients = a.coefficients();
++  for (uint i = 0; i < coefficients.size(); ++i)
++    coefficients[i]->gather();
++
++  // Initialize global tensors
++  AssemblerTools::init_global_tensor(A, a, reset_sparsity, add_values);
++  AssemblerTools::init_global_tensor(A_asymm, a, reset_sparsity, add_values);
++
++  // Get dofs that are local to this processor
++  processor_dof_range = A.local_range(0);
++
++  // Assemble over cells
++  assemble_cells();
++
++  // Assemble over exterior facets
++  assemble_exterior_facets();
++
++  // Assemble over interior facets
++  assemble_interior_facets();
++
++  // Finalize assembly of global tensor
++  if (finalize_tensor)
++  {
++    A.apply("add");
++    A_asymm.apply("add");
++  }
++}
++//-----------------------------------------------------------------------------
++void SymmetricAssembler::PImpl::assemble_cells()
++{
++  // Skip assembly if there are no cell integrals
++  if (ufc.form.num_cell_domains() == 0)
++    return;
++
++  // Set timer
++  Timer timer("Assemble cells");
++
++  // Form rank
++  const uint form_rank = ufc.form.rank();
++
++  // Collect pointers to dof maps
++  std::vector<const GenericDofMap*> dofmaps;
++  for (uint i = 0; i < form_rank; ++i)
++    dofmaps.push_back(a.function_space(i)->dofmap().get());
++
++  // Vector to hold dof map for a cell
++  std::vector<const std::vector<uint>* > dofs(form_rank);
++
++  // Cell integral
++  dolfin_assert(ufc.cell_integrals.size() > 0);
++  ufc::cell_integral* integral = ufc.cell_integrals[0].get();
++
++  // Assemble over cells
++  Progress p(AssemblerTools::progress_message(A.rank(), "cells"), mesh.num_cells());
++  for (CellIterator cell(mesh); !cell.end(); ++cell)
++  {
++    // Get integral for sub domain (if any)
++    if (cell_domains && cell_domains->size() > 0)
++    {
++      const uint domain = (*cell_domains)[*cell];
++      if (domain < ufc.form.num_cell_domains())
++        integral = ufc.cell_integrals[domain].get();
++      else
++        continue;
++    }
++
++    // Skip integral if zero
++    if (!integral)
++      continue;
++
++    // Update to current cell
++    ufc.update(*cell);
++
++    // Get local-to-global dof maps for cell
++    for (uint i = 0; i < form_rank; ++i)
++      dofs[i] = &(dofmaps[i]->cell_dofs(cell->index()));
++
++    // Tabulate cell tensor
++    integral->tabulate_tensor(&ufc.A[0], ufc.w(), ufc.cell);
++
++    // Apply boundary conditions
++    const bool asymm_changed = make_bc_symmetric(ufc.A, ufc_asymm.A, dofs);
++
++    // Add entries to global tensor.
++    A.add(&ufc.A[0], dofs);
++    if (asymm_changed)
++      A_asymm.add(&ufc_asymm.A[0], dofs);
++
++    p++;
++  }
++}
++//-----------------------------------------------------------------------------
++void SymmetricAssembler::PImpl::assemble_exterior_facets()
++{
++  // Skip assembly if there are no exterior facet integrals
++  if (ufc.form.num_exterior_facet_domains() == 0)
++    return;
++  Timer timer("Assemble exterior facets");
++
++  // Extract mesh
++  const Mesh& mesh = a.mesh();
++
++  // Form rank
++  const uint form_rank = ufc.form.rank();
++
++  // Collect pointers to dof maps
++  std::vector<const GenericDofMap*> dofmaps;
++  for (uint i = 0; i < form_rank; ++i)
++    dofmaps.push_back(a.function_space(i)->dofmap().get());
++
++  // Vector to hold dof map for a cell
++  std::vector<const std::vector<uint>* > dofs(form_rank);
++
++  // Exterior facet integral
++  dolfin_assert(ufc.exterior_facet_integrals.size() > 0);
++  const ufc::exterior_facet_integral*
++    integral = ufc.exterior_facet_integrals[0].get();
++
++  // Compute facets and facet - cell connectivity if not already computed
++  const uint D = mesh.topology().dim();
++  mesh.init(D - 1);
++  mesh.init(D - 1, D);
++  dolfin_assert(mesh.ordered());
++
++  // Assemble over exterior facets (the cells of the boundary)
++  Progress p(AssemblerTools::progress_message(A.rank(), "exterior facets"),
++             mesh.num_facets());
++  for (FacetIterator facet(mesh); !facet.end(); ++facet)
++  {
++    // Only consider exterior facets
++    if (!facet->exterior())
++    {
++      p++;
++      continue;
++    }
++
++    // Get integral for sub domain (if any)
++    if (exterior_facet_domains && exterior_facet_domains->size() > 0)
++    {
++      const uint domain = (*exterior_facet_domains)[*facet];
++      if (domain < ufc.form.num_exterior_facet_domains())
++        integral = ufc.exterior_facet_integrals[domain].get();
++      else
++        continue;
++    }
++
++    // Skip integral if zero
++    if (!integral)
++      continue;
++
++    // Get mesh cell to which mesh facet belongs (pick first, there is only one)
++    dolfin_assert(facet->num_entities(D) == 1);
++    Cell mesh_cell(mesh, facet->entities(D)[0]);
++
++    // Get local index of facet with respect to the cell
++    const uint local_facet = mesh_cell.index(*facet);
++
++    // Update to current cell
++    ufc.update(mesh_cell, local_facet);
++
++    // Get local-to-global dof maps for cell
++    for (uint i = 0; i < form_rank; ++i)
++      dofs[i] = &(dofmaps[i]->cell_dofs(mesh_cell.index()));
++
++    // Tabulate exterior facet tensor
++    integral->tabulate_tensor(&ufc.A[0], ufc.w(), ufc.cell, local_facet);
++
++    // Apply boundary conditions
++    const bool asymm_changed = make_bc_symmetric(ufc.A, ufc_asymm.A, dofs);
++
++    // Add entries to global tensor
++    A.add(&ufc.A[0], dofs);
++    if (asymm_changed)
++      A_asymm.add(&ufc_asymm.A[0], dofs);
++
++    p++;
++  }
++}
++//-----------------------------------------------------------------------------
++void SymmetricAssembler::PImpl::assemble_interior_facets()
++{
++  // Skip assembly if there are no interior facet integrals
++  if (ufc.form.num_interior_facet_domains() == 0)
++    return;
++
++ not_working_in_parallel("Assembly over interior facets");
++
++ Timer timer("Assemble interior facets");
++
++  // Extract mesh and coefficients
++  const Mesh& mesh = a.mesh();
++
++  // Form rank
++  const uint form_rank = ufc.form.rank();
++
++  // Collect pointers to dof maps
++  std::vector<const GenericDofMap*> dofmaps;
++  for (uint i = 0; i < form_rank; ++i)
++    dofmaps.push_back(a.function_space(i)->dofmap().get());
++
++  // Vector to hold dofs for cells
++  std::vector<std::vector<uint> > macro_dofs(form_rank);
++  std::vector<const std::vector<uint>*> macro_dof_ptrs(form_rank);
++  for (uint i = 0; i < form_rank; i++)
++    macro_dof_ptrs[i] = &macro_dofs[i];
++
++  // Interior facet integral
++  dolfin_assert(ufc.interior_facet_integrals.size() > 0);
++  const ufc::interior_facet_integral*
++    integral = ufc.interior_facet_integrals[0].get();
++
++  // Compute facets and facet - cell connectivity if not already computed
++  const uint D = mesh.topology().dim();
++  mesh.init(D - 1);
++  mesh.init(D - 1, D);
++  dolfin_assert(mesh.ordered());
++
++  // Get interior facet directions (if any)
++  boost::shared_ptr<MeshFunction<unsigned int> >
++    facet_orientation = mesh.data().mesh_function("facet_orientation");
++  if (facet_orientation && facet_orientation->dim() != D - 1)
++  {
++    dolfin_error("Assembler.cpp",
++                 "assemble form over interior facets",
++                 "Expecting facet orientation to be defined on facets (not dimension %d)",
++                 facet_orientation->dim());
++  }
++
++  // Assemble over interior facets (the facets of the mesh)
++  Progress p(AssemblerTools::progress_message(A.rank(), "interior facets"),
++             mesh.num_facets());
++  for (FacetIterator facet(mesh); !facet.end(); ++facet)
++  {
++    // Only consider interior facets
++    if (facet->exterior())
++    {
++      p++;
++      continue;
++    }
++
++    // Get integral for sub domain (if any)
++    if (interior_facet_domains && interior_facet_domains->size() > 0)
++    {
++      const uint domain = (*interior_facet_domains)[*facet];
++      if (domain < ufc.form.num_interior_facet_domains())
++        integral = ufc.interior_facet_integrals[domain].get();
++      else
++        continue;
++    }
++
++    // Skip integral if zero
++    if (!integral)
++      continue;
++
++    // Get cells incident with facet
++    std::pair<const Cell, const Cell>
++      cells = facet->adjacent_cells(facet_orientation.get());
++    const Cell& cell0 = cells.first;
++    const Cell& cell1 = cells.second;
++
++    // Get local index of facet with respect to each cell
++    uint local_facet0 = cell0.index(*facet);
++    uint local_facet1 = cell1.index(*facet);
++
++    // Update to current pair of cells
++    ufc.update(cell0, local_facet0, cell1, local_facet1);
++
++    // Tabulate dofs for each dimension on macro element
++    for (uint i = 0; i < form_rank; ++i)
++    {
++      // Get dofs for each cell
++      const std::vector<uint>& cell_dofs0 = dofmaps[i]->cell_dofs(cell0.index());
++      const std::vector<uint>& cell_dofs1 = dofmaps[i]->cell_dofs(cell1.index());
++
++      // Create space in macro dof vector
++      macro_dofs[i].resize(cell_dofs0.size() + cell_dofs1.size());
++
++      // Copy cell dofs into macro dof vector
++      std::copy(cell_dofs0.begin(), cell_dofs0.end(),
++                macro_dofs[i].begin());
++      std::copy(cell_dofs1.begin(), cell_dofs1.end(),
++                macro_dofs[i].begin() + cell_dofs0.size());
++    }
++
++    // Tabulate exterior interior facet tensor on macro element
++    integral->tabulate_tensor(&ufc.macro_A[0], ufc.macro_w(), ufc.cell0, ufc.cell1,
++                              local_facet0, local_facet1);
++
++    // Apply boundary conditions
++    const bool asymm_changed = make_bc_symmetric(ufc.macro_A, ufc_asymm.macro_A, macro_dof_ptrs);
++
++    // Add entries to global tensor
++    A.add(&ufc.macro_A[0], macro_dofs);
++    if (asymm_changed)
++      A_asymm.add(&ufc_asymm.macro_A[0], macro_dofs);
++
++    p++;
++  }
++}
++//-----------------------------------------------------------------------------
++bool SymmetricAssembler::PImpl::make_bc_symmetric(std::vector<double>& local_A,
++                                                  std::vector<double>& local_A_asymm,
++                                                  const std::vector<const std::vector<uint>*>& dofs)
++{
++  // Get local dimensions
++  const uint num_local_rows = dofs[0]->size();
++  const uint num_local_cols = dofs[1]->size();
++
++  // Return value, true if columns have been moved to _asymm
++  bool columns_moved = false;
++
++  // Convenience aliases
++  const std::vector<uint>& row_dofs = *dofs[0];
++  const std::vector<uint>& col_dofs = *dofs[1];
++
++  if (matching_bcs && row_dofs!=col_dofs)
++    dolfin_error("SymmetricAssembler.cpp",
++                 "make_bc_symmetric",
++                 "Same BCs are used for rows and columns, but dofmaps don't match");
++
++  // Store the local boundary conditions, to avoid multiple searches in the
++  // (common) case of matching_bcs
++  local_row_is_bc.resize(num_local_rows);
++  for (uint row = 0; row < num_local_rows; ++row)
++  {
++    DirichletBC::Map::const_iterator bc_item = row_bc_values.find(row_dofs[row]);
++    local_row_is_bc[row] = (bc_item != row_bc_values.end());
++  }
++
++  // Clear matrix rows belonging to BCs. These modifications destroy symmetry.
++  for (uint row = 0; row < num_local_rows; ++row)
++  {
++    // Do nothing if row is not affected by BCs
++    if (!local_row_is_bc[row])
++      continue;
++
++    // Zero out the row
++    zerofill(&local_A[row*num_local_cols], num_local_cols);
++
++    // Set the diagonal if we're in a diagonal block
++    if (matching_bcs)
++    {
++      // ...but only set it on the owning processor
++      const uint dof = row_dofs[row];
++      if (dof >= processor_dof_range.first && dof < processor_dof_range.second)
++      {
++        // ...and only once.
++        const bool already_inserted = !inserted_diagonals.insert(dof).second;
++        if (!already_inserted)
++          local_A[row + row*num_local_cols] = 1.0;
++      }
++    }
++  }
++
++  // Modify matrix columns belonging to BCs. These modifications restore
++  // symmetry, but the entries must be moved to the asymm matrix instead of
++  // just cleared.
++  for (uint col = 0; col < num_local_cols; ++col)
++  {
++    // Do nothing if column is not affected by BCs
++    if (matching_bcs) {
++      if (!local_row_is_bc[col])
++        continue;
++    }
++    else
++    {
++      DirichletBC::Map::const_iterator bc_item = col_bc_values.find(col_dofs[col]);
++      if (bc_item == col_bc_values.end())
++        continue;
++    }
++
++    // Zero the asymmetric part before use
++    if (!columns_moved)
++    {
++      zerofill(local_A_asymm);
++      columns_moved = true;
++    }
++
++    // Move the column to A_asymm, zero it in A
++    for (uint row = 0; row < num_local_rows; ++row)
++      if (!local_row_is_bc[row])
++      {
++        const uint entry = col + row*num_local_cols;
++        local_A_asymm[entry] = local_A[entry];
++        local_A[entry] = 0.0;
++      }
++  }
++
++  return columns_moved;
++}
 === added file 'dolfin/fem/SymmetricAssembler.h'
 --- dolfin/fem/SymmetricAssembler.h	1970-01-01 00:00:00 +0000
 +++ dolfin/fem/SymmetricAssembler.h	2012-02-29 22:48:20 +0000
@@ -0,0 +1,82 @@
++// Copyright (C) 2012 Joachim B. Haga
++//
++// This file is part of DOLFIN.
++//
++// DOLFIN is free software: you can redistribute it and/or modify
++// it under the terms of the GNU Lesser General Public License as published by
++// the Free Software Foundation, either version 3 of the License, or
++// (at your option) any later version.
++//
++// DOLFIN is distributed in the hope that it will be useful,
++// but WITHOUT ANY WARRANTY; without even the implied warranty of
++// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
++// GNU Lesser General Public License for more details.
++//
++// You should have received a copy of the GNU Lesser General Public License
++// along with DOLFIN. If not, see <http://www.gnu.org/licenses/>.
++//
++// First added:  2012-01-26 (jobh@simula.no)
++
++#ifndef __SYMMETRIC_ASSEMBLER_H
++#define __SYMMETRIC_ASSEMBLER_H
++
++#include <map>
++#include <vector>
++#include <boost/scoped_ptr.hpp>
++#include <dolfin/common/types.h>
++#include "Form.h"
++#include "DirichletBC.h"
++
++namespace dolfin
++{
++  /// This class provides implements an assembler for systems
++  /// of the form Ax = b. Its assembly algorithms are similar to SystemAssember's,
++  /// but it saves the matrix modifications into a separate tensor so that it
++  /// can later apply the symmetric modifications to a RHS vector.
++
++  /// The non-symmetric part is only nonzero in BC columns, and is zero in all BC
++  /// rows, so that [(A_s+A_n) x = b] implies [A_s x = b - A_n b], IF b has
++  /// boundary conditions applied. (If the final A is composed from a sum of
++  /// A_s matrices, BCs must be reapplied to make the diagonal value for BC
++  /// dofs 1.0. The matrix will remain symmetric since only the diagonal is
++  /// changed.)
++  ///
++  /// *Example*
++  ///
++  ///    .. code-block:: c++
++  ///
++  ///        std::vector<const DirichletBC*> bcs = {bc};
++  ///        SymmetricAssembler::assemble(A, A_n, a, bcs, bcs);
++  ///        Assembler::assemble(b, L);
++  ///        bc.apply(b)
++  ///        A_n.mult(b, b_mod);
++  ///        b -= b_mod;
++
++  class SymmetricAssembler
++  {
++  public:
++
++    /// Assemble A and apply Dirichlet boundary conditions. Returns two
++    /// matrices, where the second contains the symmetric modifications
++    /// suitable for modifying RHS vectors.
++    ///
++    /// Note: row_bcs and col_bcs will normally be the same, but are different
++    /// for e.g. off-diagonal block matrices in a mixed PDE.
++    static void assemble(GenericMatrix &A,
++                         GenericMatrix &A_nonsymm,
++                         const Form &a,
++                         const std::vector<const DirichletBC*> &row_bcs,
++                         const std::vector<const DirichletBC*> &col_bcs,
++                         const MeshFunction<uint> *cell_domains=NULL,
++                         const MeshFunction<uint> *exterior_facet_domains=NULL,
++                         const MeshFunction<uint> *interior_facet_domains=NULL,
++                         bool reset_sparsity=true,
++                         bool add_values=false,
++                         bool finalize_tensor=true);
++
++  private:
++    class PImpl;
++  };
++}
++
++#endif
 === modified file 'dolfin/fem/SystemAssembler.cpp'
 --- dolfin/fem/SystemAssembler.cpp	2012-02-29 13:22:33 +0000
 +++ dolfin/fem/SystemAssembler.cpp	2012-02-29 22:48:20 +0000
@@ -673,7 +673,7 @@
    // Resize dof vector
    a_macro_dofs[0].resize(a0_dofs0.size() + a0_dofs1.size());
--  a_macro_dofs[1].resize(a0_dofs1.size() + a1_dofs1.size());
++  a_macro_dofs[1].resize(a1_dofs0.size() + a1_dofs1.size());
    L_macro_dofs[0].resize(L_dofs0.size() + L_dofs1.size());
    // Tabulate dofs for each dimension on macro element
 === modified file 'dolfin/fem/assemble.cpp'
 --- dolfin/fem/assemble.cpp	2011-11-14 18:20:22 +0000
 +++ dolfin/fem/assemble.cpp	2012-02-29 22:48:20 +0000
@@ -17,14 +17,16 @@
  //
  // Modified by Garth N. Wells, 2008.
  // Modified by Johan Hake, 2009.
++// Modified by Joachim B. Haga, 2012.
  //
  // First added:  2007-01-17
--// Last changed: 2011-11-13
++// Last changed: 2012-02-01
  #include <dolfin/la/Scalar.h>
  #include "Form.h"
  #include "Assembler.h"
  #include "SystemAssembler.h"
++#include "SymmetricAssembler.h"
  #include "assemble.h"
  using namespace dolfin;
@@ -122,6 +124,39 @@
                              reset_sparsity, add_values, finalize_tensor);
+ }
  //-----------------------------------------------------------------------------
++void dolfin::symmetric_assemble(GenericMatrix& As,
++                                GenericMatrix& An,
++                                const Form& a,
++                                const std::vector<const DirichletBC*>& bcs,
++                                const MeshFunction<unsigned int>* cell_domains,
++                                const MeshFunction<unsigned int>* exterior_facet_domains,
++                                const MeshFunction<unsigned int>* interior_facet_domains,
++                                bool reset_sparsity,
++                                bool add_values,
++                                bool finalize_tensor)
++{
++  SymmetricAssembler::assemble(As, An, a, bcs, bcs,
++                               cell_domains, exterior_facet_domains, interior_facet_domains,
++                               reset_sparsity, add_values, finalize_tensor);
++}
++//-----------------------------------------------------------------------------
++void dolfin::symmetric_assemble(GenericMatrix& As,
++                                GenericMatrix& An,
++                                const Form& a,
++                                const std::vector<const DirichletBC*>& row_bcs,
++                                const std::vector<const DirichletBC*>& col_bcs,
++                                const MeshFunction<unsigned int>* cell_domains,
++                                const MeshFunction<unsigned int>* exterior_facet_domains,
++                                const MeshFunction<unsigned int>* interior_facet_domains,
++                                bool reset_sparsity,
++                                bool add_values,
++                                bool finalize_tensor)
++{
++  SymmetricAssembler::assemble(As, An, a, row_bcs, col_bcs,
++                               cell_domains, exterior_facet_domains, interior_facet_domains,
++                               reset_sparsity, add_values, finalize_tensor);
++}
++//-----------------------------------------------------------------------------
  double dolfin::assemble(const Form& a,
                          bool reset_sparsity,
                          bool add_values,
 === modified file 'dolfin/fem/assemble.h'
 --- dolfin/fem/assemble.h	2011-10-03 13:19:12 +0000
 +++ dolfin/fem/assemble.h	2012-02-29 22:48:20 +0000
@@ -17,9 +17,10 @@
  //
  // Modified by Garth N. Wells, 2008, 2009.
  // Modified by Johan Hake, 2009.
++// Modified by Joachim B. Haga, 2012.
  //
  // First added:  2007-01-17
--// Last changed: 2011-09-29
++// Last changed: 2012-02-01
  //
  // This file duplicates the Assembler::assemble* and SystemAssembler::assemble*
  // functions in namespace dolfin, and adds special versions returning the value
@@ -113,6 +114,37 @@
                         bool add_values=false,
                         bool finalize_tensor=true);
++  /// Symmetric assembly of As, storing the modifications in An. To create
++  /// matching RHS, assemble and apply bcs normally, then subtract An*b.
++  /// In this variant of symmetric_assemble, rows and columns use the same BCs.
++  void symmetric_assemble(GenericMatrix& As,
++                          GenericMatrix& An,
++                          const Form& a,
++                          const std::vector<const DirichletBC*>& bcs,
++                          const MeshFunction<unsigned int>* cell_domains=NULL,
++                          const MeshFunction<unsigned int>* exterior_facet_domains=NULL,
++                          const MeshFunction<unsigned int>* interior_facet_domains=NULL,
++                          bool reset_sparsity=true,
++                          bool add_values=false,
++                          bool finalize_tensor=true);
++
++  /// Symmetric assembly of As, storing the modifications in An. To create
++  /// matching RHS, assemble and apply bcs normally, then subtract An*b.
++  /// In this variant of symmetric_assemble, rows and columns use (potentially)
++  /// different BCs. The BCs will be different for example in coupling
++  /// (i.e., off-diagonal) blocks of a block matrix.
++  void symmetric_assemble(GenericMatrix& As,
++                          GenericMatrix& An,
++                          const Form& a,
++                          const std::vector<const DirichletBC*>& row_bcs,
++                          const std::vector<const DirichletBC*>& col_bcs,
++                          const MeshFunction<unsigned int>* cell_domains=NULL,
++                          const MeshFunction<unsigned int>* exterior_facet_domains=NULL,
++                          const MeshFunction<unsigned int>* interior_facet_domains=NULL,
++                          bool reset_sparsity=true,
++                          bool add_values=false,
++                          bool finalize_tensor=true);
++
    //--- Specialized versions for scalars ---
    /// Assemble scalar
 === modified file 'dolfin/fem/dolfin_fem.h'
 --- dolfin/fem/dolfin_fem.h	2011-06-30 22:15:54 +0000
 +++ dolfin/fem/dolfin_fem.h	2012-02-29 22:48:20 +0000
@@ -17,6 +17,7 @@
  #include <dolfin/fem/Form.h>
  #include <dolfin/fem/Assembler.h>
  #include <dolfin/fem/SparsityPatternBuilder.h>
++#include <dolfin/fem/SymmetricAssembler.h>
  #include <dolfin/fem/SystemAssembler.h>
  #include <dolfin/fem/LinearVariationalProblem.h>
  #include <dolfin/fem/LinearVariationalSolver.h>
 === modified file 'site-packages/dolfin/fem/assembling.py'
 --- site-packages/dolfin/fem/assembling.py	2011-11-14 21:54:12 +0000
 +++ site-packages/dolfin/fem/assembling.py	2012-02-29 22:48:20 +0000
@@ -29,11 +29,12 @@
  # Modified by Martin Sandve Alnaes, 2008.
  # Modified by Johan Hake, 2008-2009.
  # Modified by Garth N. Wells, 2008-2009.
++# Modified by Joachim B. Haga, 2012.
+ #
  # First added:  2007-08-15
--# Last changed: 2010-11-04
++# Last changed: 2012-02-01
--__all__ = ["assemble", "assemble_system"]
++__all__ = ["assemble", "assemble_system", "symmetric_assemble"]
  import types
@@ -59,7 +60,9 @@
               add_values=False,
               finalize_tensor=True,
               backend=None,
--             form_compiler_parameters=None):
++             form_compiler_parameters=None,
++             bcs=None,
++             symmetric_mod=None):
      """
      Assemble the given form and return the corresponding tensor.
@@ -182,6 +185,14 @@
      if dolfin_form.rank() == 0:
          tensor = tensor.getval()
++    # Apply (possibly list of) boundary conditions
++    for bc in _wrap_in_list(bcs, 'bcs', cpp.DirichletBC):
++        bc.apply(tensor)
++
++    # Apply symmetric modification
++    if symmetric_mod:
++        tensor -= symmetric_mod*tensor
++
      # Return value
      return tensor
@@ -261,12 +272,7 @@
                           interior_facet_domains)
      # Check bcs
--    if not isinstance(bcs,(types.NoneType,list,cpp.DirichletBC)):
--        raise TypeError, "expected a 'list', or a 'DirichletBC' as bcs argument"
--    if bcs is None:
--        bcs = []
--    elif isinstance(bcs,cpp.DirichletBC):
--        bcs = [bcs]
++    bcs = _wrap_in_list(bcs, 'bcs', cpp.DirichletBC)
      # Call C++ assemble function
      cpp.assemble_system(A_tensor,
@@ -284,6 +290,113 @@
      return A_tensor, b_tensor
++# JIT system assembler
++def symmetric_assemble(A_form,
++                       bcs=None,
++                       row_bcs=None,
++                       col_bcs=None,
++                       A_coefficients=None,
++                       A_function_spaces=None,
++                       cell_domains=None,
++                       exterior_facet_domains=None,
++                       interior_facet_domains=None,
++                       reset_sparsity=True,
++                       add_values=False,
++                       finalize_tensor=True,
++                       As_tensor=None,
++                       An_tensor=None,
++                       backend=None,
++                       form_compiler_parameters=None,
++                       bc_diagonal_value=1.0):
++    """
++    Assemble form(s) and apply any given boundary conditions in a
++    symmetric fashion and return tensor(s).
++
++    The standard application of boundary conditions does not
++    necessarily preserve the symmetry of the assembled matrix.
++
++    *Examples of usage*
++
++       For instance, the statements
++
++       .. code-block:: python
++
++           A = assemble(a)
++           b = assemble(L)
++           bc.apply(A, b)
++
++       can alternatively be carried out by
++
++       .. code-block:: python
++
++           A = symmetric_assemble(a, bc)
++           b = assemble(L)
++           b.apply(bc)
++           A.symmetric_modification.apply(b)
++
++       The statement above is valid even if ``bc`` is a list of
++       :py:class:`DirichletBC <dolfin.fem.bcs.DirichletBC>`
++       instances. For more info and options, see :py:func:`assemble
++       <dolfin.fem.assembling.assemble>`.
++
++    """
++
++    # Extract subdomains
++    subdomains = { "cell": cell_domains,
++                   "exterior_facet": exterior_facet_domains,
++                   "interior_facet": interior_facet_domains}
++
++    # Wrap forms
++    A_dolfin_form = Form(A_form, A_function_spaces, A_coefficients,
++                         subdomains, form_compiler_parameters)
++
++    # Create tensors
++    As_tensor = _create_tensor(A_form, A_dolfin_form.rank(), backend, As_tensor)
++    An_tensor = _create_tensor(A_form, A_dolfin_form.rank(), backend, An_tensor)
++
++    # Extract domains
++    cell_domains, exterior_facet_domains, interior_facet_domains = \
++        _extract_domains(A_dolfin_form.mesh(),
++                         cell_domains,
++                         exterior_facet_domains,
++                         interior_facet_domains)
++
++    # Check bcs
++    if bcs is None:
++        row_bcs = _wrap_in_list(row_bcs, 'row_bcs', cpp.DirichletBC)
++        col_bcs = _wrap_in_list(col_bcs, 'col_bcs', cpp.DirichletBC)
++    else:
++        if row_bcs is not None or col_bcs is not None:
++            raise TypeError("supply 'bcs' or 'row_bcs'/'col_bcs', not both")
++        row_bcs = col_bcs = _wrap_in_list(bcs, 'bcs', cpp.DirichletBC)
++
++    # Call C++ assemble function
++    cpp.symmetric_assemble(As_tensor,
++                           An_tensor,
++                           A_dolfin_form,
++                           row_bcs,
++                           col_bcs,
++                           cell_domains,
++                           exterior_facet_domains,
++                           interior_facet_domains,
++                           reset_sparsity,
++                           add_values,
++                           finalize_tensor)
++
++    return As_tensor, An_tensor
++
++def _wrap_in_list(obj, name, types=type):
++    if obj is None:
++        lst = []
++    elif hasattr(obj, '__iter__'):
++        lst = list(obj)
++    else:
++        lst = [obj]
++    for obj in lst:
++        if not isinstance(obj, types):
++            raise TypeError("expected a (list of) %s as '%s' argument" % (str(types),name))
++    return lst
++
  def _create_tensor(form, rank, backend, tensor):
      "Create tensor for form"
 === added file 'test/unit/fem/python/SymmetricAssembler.py'
 --- test/unit/fem/python/SymmetricAssembler.py	1970-01-01 00:00:00 +0000
 +++ test/unit/fem/python/SymmetricAssembler.py	2012-02-29 22:48:20 +0000
@@ -0,0 +1,181 @@
++"""Unit tests for class SymmetricAssembler"""
++
++# Copyright (C) 2011 Garth N. Wells
++# Copyright (C) 2012 Joachim B. Haga
++#
++# This file is part of DOLFIN.
++#
++# DOLFIN is free software: you can redistribute it and/or modify
++# it under the terms of the GNU Lesser General Public License as published by
++# the Free Software Foundation, either version 3 of the License, or
++# (at your option) any later version.
++#
++# DOLFIN is distributed in the hope that it will be useful,
++# but WITHOUT ANY WARRANTY; without even the implied warranty of
++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
++# GNU Lesser General Public License for more details.
++#
++# You should have received a copy of the GNU Lesser General Public License
++# along with DOLFIN. If not, see <http://www.gnu.org/licenses/>.
++#
++# First added:  2012-01-01 (modified from SystemAssembler.py by jobh@simula.no)
++
++import unittest
++import numpy
++from dolfin import *
++
++class TestSymmetricAssembler(unittest.TestCase):
++
++    def _check_against_reference(self, a, L, bc):
++
++        # Assemble LHS using symmetric assembler
++        A, A_n = symmetric_assemble(a, bcs=bc)
++
++        # Assemble LHS using regular assembler
++        A_ref = assemble(a, bcs=bc)
++
++        # Check that the symmetric assemble matches the reference
++        N = A + A_n - A_ref
++        self.assertAlmostEqual(N.norm("frobenius"), 0.0, 10)
++
++        # Check that A is symmetric
++        X = assemble(L) # just to get the size
++        X.set_local(numpy.random.random(X.local_size()))
++        AT_X = Vector()
++        A.transpmult(X, AT_X)
++        N = A*X - AT_X
++        self.assertAlmostEqual(N.norm("l2"), 0.0, 10)
++
++    def test_cell_assembly(self):
++
++        mesh = UnitCube(4, 4, 4)
++        V = VectorFunctionSpace(mesh, "CG", 1)
++
++        v = TestFunction(V)
++        u = TrialFunction(V)
++        f = Constant((10, 20, 30))
++
++        def epsilon(v):
++            return 0.5*(grad(v) + grad(v).T)
++
++        a = inner(epsilon(v), epsilon(u))*dx
++        L = inner(v, f)*dx
++
++        # Define boundary condition
++        def boundary(x):
++            return near(x[0], 0.0) or near(x[0], 1.0)
++        u0 = Constant((1.0, 2.0, 3.0))
++        bc = DirichletBC(V, u0, boundary)
++
++        self._check_against_reference(a, L, bc)
++
++    def test_facet_assembly(self):
++
++        if MPI.num_processes() > 1:
++            print "FIXME: This unit test does not work in parallel, skipping"
++            return
++
++        mesh = UnitSquare(24, 24)
++        V = FunctionSpace(mesh, "CG", 1)
++
++        # Define test and trial functions
++        v = TestFunction(V)
++        u = TrialFunction(V)
++
++        # Define normal component, mesh size and right-hand side
++        n = V.cell().n
++        h = CellSize(mesh)
++        h_avg = (h('+')+h('-'))/2
++        f = Expression("500.0*exp(-(pow(x[0] - 0.5, 2) + pow(x[1] - 0.5, 2)) / 0.02)", degree=1)
++
++        # Define bilinear form
++        a = dot(grad(v), grad(u))*dx \
++            - dot(avg(grad(v)), jump(u, n))*dS \
++            - dot(jump(v, n), avg(grad(u)))*dS \
++            + 4.0/h_avg*dot(jump(v, n), jump(u, n))*dS \
++            - dot(grad(v), u*n)*ds \
++            - dot(v*n, grad(u))*ds \
++            + 8.0/h*v*u*ds
++
++        # Define linear form
++        L = v*f*dx
++
++        # Define boundary condition
++        def boundary(x):
++            return near(x[0], 0.0) or near(x[0], 1.0)
++        u0 = Constant(1.0)
++        bc = DirichletBC(V, u0, boundary, method="pointwise")
++
++        self._check_against_reference(a, L, bc)
++
++    def test_subdomain_assembly_meshdomains(self):
++        "Test assembly over subdomains with markers stored as part of mesh"
++
++        # Create a mesh of the unit cube
++        mesh = UnitCube(4, 4, 4)
++
++        # Define subdomains for 3 faces of the unit cube
++        class F0(SubDomain):
++            def inside(self, x, inside):
++                return near(x[0], 0.0)
++        class F1(SubDomain):
++            def inside(self, x, inside):
++                return near(x[1], 0.0)
++        class F2(SubDomain):
++            def inside(self, x, inside):
++                return near(x[2], 0.0)
++
++        # Define subdomain for left of x = 0.5
++        class S0(SubDomain):
++            def inside(self, x, inside):
++                return x[0] < 0.5 + DOLFIN_EPS
++
++        # Define subdomain for right of x = 0.5
++        class S1(SubDomain):
++            def inside(self, x, inside):
++                return x[0] >= 0.5 + DOLFIN_EPS
++
++        # Mark mesh
++        f0 = F0()
++        f1 = F1()
++        f2 = F2()
++        s0 = S0()
++        s1 = S1()
++        f0.mark_facets(mesh, 0)
++        f1.mark_facets(mesh, 1)
++        f2.mark_facets(mesh, 2)
++        s0.mark_cells(mesh, 0)
++        s1.mark_cells(mesh, 1)
++
++        # Define test and trial functions
++        V = FunctionSpace(mesh, "CG", 1)
++        u = TrialFunction(V)
++        v = TestFunction(V)
++
++        # FIXME: If the Z terms are not present, PETSc will claim:
++        #    Object is in wrong state!
++        #    Matrix is missing diagonal entry in row 124!
++        Z = Constant(0.0)
++
++        # Define forms on marked subdomains
++        a0 = 1*u*v*dx(0) + 2*u*v*ds(0) + 3*u*v*ds(1) + 4*u*v*ds(2) + Z*u*v*dx(1)
++        L0 = 1*v*dx(0) + 2*v*ds(0) + 3*v*ds(1) + 4*v*ds(2)
++
++        # Defined forms on unmarked subdomains (should be zero)
++        a1 = 1*u*v*dx(2) + 2*u*v*ds(3) + Z*u*v*dx(0) + Z*u*v*dx(1)
++        L1 = 1*v*dx(2) + 2*v*ds(3)
++
++        # Define boundary condition
++        def boundary(x):
++            return near(x[0], 0.0) or near(x[0], 1.0)
++        u0 = Constant(1.0)
++        bc = DirichletBC(V, u0, boundary, method="pointwise")
++
++        self._check_against_reference(a0, L0, bc)
++        self._check_against_reference(a1, L1, bc)
++
++if __name__ == "__main__":
++    print ""
++    print "Testing class SymmetricAssembler"
++    print "-----------------------------"
++    unittest.main()
 === modified file 'test/unit/test.py'
 --- test/unit/test.py	2012-02-22 12:31:32 +0000
 +++ test/unit/test.py	2012-02-29 22:48:20 +0000
@@ -35,7 +35,7 @@
      "adaptivity":     ["errorcontrol", "TimeSeries"],
      "book":           ["chapter_1", "chapter_10"],
      "fem":            ["solving", "Assembler", "DirichletBC", "DofMap",
--                       "FiniteElement", "SystemAssembler", "Form"],
++                       "FiniteElement", "SystemAssembler", "Form", "SymmetricAssembler"],
      "function":       ["Constant", "Expression", "Function", "FunctionSpace",
                         "SpecialFunctions"],
      "io":             ["vtk", "XMLMeshFunction", "XMLMesh",