DOLFIN

Code review comment for lp:~jobh/dolfin/fast-array

fast-array
Merge into trunk

Revision history for this message

Anders Logg (logg) wrote on 2012-02-28:

On Tue, Feb 28, 2012 at 05:08:11AM +0100, Johan Hake wrote:
> On 02/27/2012 09:16 PM, Anders Logg wrote:
> >On Mon, Feb 27, 2012 at 07:52:27PM -0000, Garth Wells wrote:
> >>On 27 February 2012 17:01, Anders Logg<email address hidden> wrote:
> >>>On Mon, Feb 27, 2012 at 03:32:04PM -0000, Johan Hake wrote:
> >>>>On 02/27/2012 03:01 PM, Garth Wells wrote:
> >>>>>On 27 February 2012 13:40, Anders Logg<email address hidden> wrote:
> >>>>>>On Fri, Feb 24, 2012 at 07:01:57PM -0000, Joachim Haga wrote:
> >>>>>>>I'd opine that the easiest would be to make array never own the data,
> >>>>>>>rather than always own the data. In that way, it can wrap anything.
> >>>>>>>
> >>>>>>>* TimeSeries returns (owned) Arrays in two places. These could
> >>>>>>>return references to its own data (const vector<double>&), that
> >>>>>>>would save a copy (they are currently copied twice, once into an
> >>>>>>>owned Array, then again in swig into a numpy array). But maybe the
> >>>>>>>amount of data returned is small, in which case a vector<double>
> >>>>>>>will be fine.
> >>>>>>
> >>>>>>The return of arrays from TimeSeries is typically not performance
> >>>>>>critical so a copy is fine (which it already is).
> >>>>>>
> >>>>>>>* Can't support resize() without ownership, so everywhere resize is
> >>>>>>>used on a reference argument should pass a vector<double>&
> >>>>>>>instead. I think this is a few places only.
> >>>>>>>
> >>>>>>>At that point, Array will be a plain wrapper of others' data.
> >>>>>>>
> >>>>>>>As for your (1), it might well be possible (keep a scratch array in
> >>>>>>>the GenericFunction class perhaps). But the thing is, the shared
> >>>>>>>version was no more safe. You can't keep a reference to an Array
> >>>>>>>because you don't know if it's shared or just wrapping data that
> >>>>>>>will be deleted from under you. It was "shared or borrowed" rather
> >>>>>>>than "owned or borrowed", only without the flag to tell the cases
> >>>>>>>apart.
> >>>>>>
> >>>>>>I'm not very happy with our current use of Array. We use std::vector
> >>>>>>in some places and Array in other places. I don't remember what the
> >>>>>>original intention was with Array. Is the purpose just to be able to
> >>>>>>pass shared data as arrays back and forth to Python?
> >>>>>
> >>>>>Yes.
> >>>>>
> >>>>>>Or should it be a
> >>>>>>general array class used throughout DOLFIN C++?
> >>>>>>
> >>>>>
> >>>>>No - there is no point in re-inventing std::vector. We should use
> >>>>>std::vector where possible. It's more flexible and can be used with
> >>>>>STL algorithms.
> >>>>
> >>>>Agree.
> >>>>
> >>>>>>If it is just for Python-wrapping, I think we should make it never own
> >>>>>>data as Joachim suggests.
> >>>>
> >>>>Agree. But we need to go over the interface and check that it makes
> >>>>sense everywhere. TimeSeries could probably just return a
> >>>>std::vector which is copied in the Python interface.
> >>>
> >>>Yes, definitely for TimeSeries. Perhaps we could settle for that
> >>>everytime we want to pass output data to Python?
> >>>
> >>>Then Array input would essentially be limited to a few particular
> >>>callbacks that need to be made efficient (GenericFunction::eval).
> >>>
> >>
> >>In the case of GenericFunction::eval we can use std::vector if, in
> >>the wrapping process, we can prevent resizing on the Python side.
> >
> >Don't know if that's possible. Johan will know but he's out traveling.
>
> The point me and Joachim is raising is that you cannot create a
> std::vector from a NumPy array without copying. So any use of
> GenericFunction::eval then would imply copying. This would only be a
> problem for evaluation of any GenericFunction within the Python
> interface.
>
> Now we can evaluate a GenericFunction by passing in parts of a
> larger numpy array and evaluate it in place. This cannot be done if
> we change the interface to std::vector.
>
> For the directory typemap we can wrap the outgoing std::vector to a
> NumPy array, prohibiting resizing all that.
>
> I am a bit hesitant scraping the Array from the interface as it work
> just fine. Especially if we make it not owe its data.

I'm not suggesting that. Just making it clear when and where we use it
in C++. We should not reinvent std::vector so we should isolate it to
as few places as possible, and in those few places it should work to
asssume that Array never owns the data.

--
Anders

> >>Maybe we can remove Array and support (a) std::vectors via copy and
> >>(b) references to a std::vector with resizing prevented.
>
> If a NumPy array does not own its own data it cannot be resized.
> This would be the case of an array originating from other data, let
> say a std::vector. This is illustrated by:
>
> UnitSquare(1,1).coordinates().resize((1,1))
> ValueError: cannot resize this array: it does not own its data
>
> Here coordinates returns a view of the coordinate data of the mesh,
> and resize is not allowed.
>
> >If that is possible, we should keep the Array class around, perhaps in
> >a new subdirectory 'deprecated' to provide backwards compatibility
> >with the book. We've changed the eval interface maybe 100 times
> >already so we should try not to break user code.
>
> Here most of the trouble would be at the C++ side as the Python code
> would be the same. It will however force regeneration of the
> compiled Expression code. That said I would like to have this
> possible interface change well thought through, as Anders points out
> we have changed this too many times already.
>
> Johan
>
>

On Tue, Feb 28, 2012 at 05:08:11AM +0100, Johan Hake wrote:
> On 02/27/2012 09:16 PM, Anders Logg wrote:
> >On Mon, Feb 27, 2012 at 07:52:27PM -0000, Garth Wells wrote:
> >>On 27 February 2012 17:01, Anders Logg<logg@simula.no>  wrote:
> >>>On Mon, Feb 27, 2012 at 03:32:04PM -0000, Johan Hake wrote:
> >>>>On 02/27/2012 03:01 PM, Garth Wells wrote:
> >>>>>On 27 February 2012 13:40, Anders Logg<logg@simula.no>    wrote:
> >>>>>>On Fri, Feb 24, 2012 at 07:01:57PM -0000, Joachim Haga wrote:
> >>>>>>>I'd opine that the easiest would be to make array never own the data,
> >>>>>>>rather than always own the data. In that way, it can wrap anything.
> >>>>>>>
> >>>>>>>* TimeSeries returns (owned) Arrays in two places. These could
> >>>>>>>return references to its own data (const vector<double>&), that
> >>>>>>>would save a copy (they are currently copied twice, once into an
> >>>>>>>owned Array, then again in swig into a numpy array). But maybe the
> >>>>>>>amount of data returned is small, in which case a vector<double>
> >>>>>>>will be fine.
> >>>>>>
> >>>>>>The return of arrays from TimeSeries is typically not performance
> >>>>>>critical so a copy is fine (which it already is).
> >>>>>>
> >>>>>>>* Can't support resize() without ownership, so everywhere resize is
> >>>>>>>used on a reference argument should pass a vector<double>&
> >>>>>>>instead. I think this is a few places only.
> >>>>>>>
> >>>>>>>At that point, Array will be a plain wrapper of others' data.
> >>>>>>>
> >>>>>>>As for your (1), it might well be possible (keep a scratch array in
> >>>>>>>the GenericFunction class perhaps). But the thing is, the shared
> >>>>>>>version was no more safe. You can't keep a reference to an Array
> >>>>>>>because you don't know if it's shared or just wrapping data that
> >>>>>>>will be deleted from under you. It was "shared or borrowed" rather
> >>>>>>>than "owned or borrowed", only without the flag to tell the cases
> >>>>>>>apart.
> >>>>>>
> >>>>>>I'm not very happy with our current use of Array. We use std::vector
> >>>>>>in some places and Array in other places. I don't remember what the
> >>>>>>original intention was with Array. Is the purpose just to be able to
> >>>>>>pass shared data as arrays back and forth to Python?
> >>>>>
> >>>>>Yes.
> >>>>>
> >>>>>>Or should it be a
> >>>>>>general array class used throughout DOLFIN C++?
> >>>>>>
> >>>>>
> >>>>>No - there is no point in re-inventing std::vector. We should use
> >>>>>std::vector where possible. It's more flexible and can be used with
> >>>>>STL algorithms.
> >>>>
> >>>>Agree.
> >>>>
> >>>>>>If it is just for Python-wrapping, I think we should make it never own
> >>>>>>data as Joachim suggests.
> >>>>
> >>>>Agree. But we need to go over the interface and check that it makes
> >>>>sense everywhere. TimeSeries could probably just return a
> >>>>std::vector which is copied in the Python interface.
> >>>
> >>>Yes, definitely for TimeSeries. Perhaps we could settle for that
> >>>everytime we want to pass output data to Python?
> >>>
> >>>Then Array input would essentially be limited to a few particular
> >>>callbacks that need to be made efficient (GenericFunction::eval).
> >>>
> >>
> >>In the case of GenericFunction::eval we can use std::vector if, in
> >>the wrapping process, we can prevent resizing on the Python side.
> >
> >Don't know if that's possible. Johan will know but he's out traveling.
>
> The point me and Joachim is raising is that you cannot create a
> std::vector from a NumPy array without copying. So any use of
> GenericFunction::eval then would imply copying. This would only be a
> problem for evaluation of any GenericFunction within the Python
> interface.
>
> Now we can evaluate a GenericFunction by passing in parts of a
> larger numpy array and evaluate it in place. This cannot be done if
> we change the interface to std::vector.
>
> For the directory typemap we can wrap the outgoing std::vector to a
> NumPy array, prohibiting resizing all that.
>
> I am a bit hesitant scraping the Array from the interface as it work
> just fine. Especially if we make it not owe its data.

--
Anders

> >>Maybe we can remove Array and support (a) std::vectors via copy and
> >>(b) references to a std::vector with resizing prevented.
>
> If a NumPy array does not own its own data it cannot be resized.
> This would be the case of an array originating from other data, let
> say a std::vector. This is illustrated by:
>
>   UnitSquare(1,1).coordinates().resize((1,1))
>   ValueError: cannot resize this array: it does not own its data
>
> Here coordinates returns a view of the coordinate data of the mesh,
> and resize is not allowed.
>
> >If that is possible, we should keep the Array class around, perhaps in
> >a new subdirectory 'deprecated' to provide backwards compatibility
> >with the book. We've changed the eval interface maybe 100 times
> >already so we should try not to break user code.
>
> Here most of the trouble would be at the C++ side as the Python code
> would be the same. It will however force regeneration of the
> compiled Expression code. That said I would like to have this
> possible interface change well thought through, as Anders points out
> we have changed this too many times already.
>
> Johan
>
>

« Back to merge proposal