DOLFIN

Code review comment for lp:~jobh/dolfin/fast-array

fast-array
Merge into trunk

Revision history for this message

Johan Hake (johan-hake) wrote on 2012-02-28:

On 02/27/2012 09:16 PM, Anders Logg wrote:
> On Mon, Feb 27, 2012 at 07:52:27PM -0000, Garth Wells wrote:
>> On 27 February 2012 17:01, Anders Logg<email address hidden> wrote:
>>> On Mon, Feb 27, 2012 at 03:32:04PM -0000, Johan Hake wrote:
>>>> On 02/27/2012 03:01 PM, Garth Wells wrote:
>>>>> On 27 February 2012 13:40, Anders Logg<email address hidden> wrote:
>>>>>> On Fri, Feb 24, 2012 at 07:01:57PM -0000, Joachim Haga wrote:
>>>>>>> I'd opine that the easiest would be to make array never own the data,
>>>>>>> rather than always own the data. In that way, it can wrap anything.
>>>>>>>
>>>>>>> * TimeSeries returns (owned) Arrays in two places. These could
>>>>>>> return references to its own data (const vector<double>&), that
>>>>>>> would save a copy (they are currently copied twice, once into an
>>>>>>> owned Array, then again in swig into a numpy array). But maybe the
>>>>>>> amount of data returned is small, in which case a vector<double>
>>>>>>> will be fine.
>>>>>>
>>>>>> The return of arrays from TimeSeries is typically not performance
>>>>>> critical so a copy is fine (which it already is).
>>>>>>
>>>>>>> * Can't support resize() without ownership, so everywhere resize is
>>>>>>> used on a reference argument should pass a vector<double>&
>>>>>>> instead. I think this is a few places only.
>>>>>>>
>>>>>>> At that point, Array will be a plain wrapper of others' data.
>>>>>>>
>>>>>>> As for your (1), it might well be possible (keep a scratch array in
>>>>>>> the GenericFunction class perhaps). But the thing is, the shared
>>>>>>> version was no more safe. You can't keep a reference to an Array
>>>>>>> because you don't know if it's shared or just wrapping data that
>>>>>>> will be deleted from under you. It was "shared or borrowed" rather
>>>>>>> than "owned or borrowed", only without the flag to tell the cases
>>>>>>> apart.
>>>>>>
>>>>>> I'm not very happy with our current use of Array. We use std::vector
>>>>>> in some places and Array in other places. I don't remember what the
>>>>>> original intention was with Array. Is the purpose just to be able to
>>>>>> pass shared data as arrays back and forth to Python?
>>>>>
>>>>> Yes.
>>>>>
>>>>>> Or should it be a
>>>>>> general array class used throughout DOLFIN C++?
>>>>>>
>>>>>
>>>>> No - there is no point in re-inventing std::vector. We should use
>>>>> std::vector where possible. It's more flexible and can be used with
>>>>> STL algorithms.
>>>>
>>>> Agree.
>>>>
>>>>>> If it is just for Python-wrapping, I think we should make it never own
>>>>>> data as Joachim suggests.
>>>>
>>>> Agree. But we need to go over the interface and check that it makes
>>>> sense everywhere. TimeSeries could probably just return a
>>>> std::vector which is copied in the Python interface.
>>>
>>> Yes, definitely for TimeSeries. Perhaps we could settle for that
>>> everytime we want to pass output data to Python?
>>>
>>> Then Array input would essentially be limited to a few particular
>>> callbacks that need to be made efficient (GenericFunction::eval).
>>>
>>
>> In the case of GenericFunction::eval we can use std::vector if, in
>> the wrapping process, we can prevent resizing on the Python side.
>
> Don't know if that's possible. Johan will know but he's out traveling.

The point me and Joachim is raising is that you cannot create a
std::vector from a NumPy array without copying. So any use of
GenericFunction::eval then would imply copying. This would only be a
problem for evaluation of any GenericFunction within the Python interface.

Now we can evaluate a GenericFunction by passing in parts of a larger
numpy array and evaluate it in place. This cannot be done if we change
the interface to std::vector.

For the directory typemap we can wrap the outgoing std::vector to a
NumPy array, prohibiting resizing all that.

I am a bit hesitant scraping the Array from the interface as it work
just fine. Especially if we make it not owe its data.

>> Maybe we can remove Array and support (a) std::vectors via copy and
>> (b) references to a std::vector with resizing prevented.

If a NumPy array does not own its own data it cannot be resized. This
would be the case of an array originating from other data, let say a
std::vector. This is illustrated by:

UnitSquare(1,1).coordinates().resize((1,1))
ValueError: cannot resize this array: it does not own its data

Here coordinates returns a view of the coordinate data of the mesh, and
resize is not allowed.

> If that is possible, we should keep the Array class around, perhaps in
> a new subdirectory 'deprecated' to provide backwards compatibility
> with the book. We've changed the eval interface maybe 100 times
> already so we should try not to break user code.

Here most of the trouble would be at the C++ side as the Python code
would be the same. It will however force regeneration of the compiled
Expression code. That said I would like to have this possible interface
change well thought through, as Anders points out we have changed this
too many times already.

Johan

> --
> Anders
>
>
>
>> Garth
>>
>>
>>>
>>>
>>>>> Memory has to be created somewhere in order to be used, and it has
>>> to
>>>>> be cleaned up.
>>>>>
>>>>> There are two issues here:
>>>>>
>>>>> 1) Dynamic allocation in order to interface with UFC. Using plain
>>>>> pointers rather than smart pointers is likely faster, but I think that
>>>>> it's still not a solution. We should try to avoid these allocations in
>>>>> loops.
>>>>>
>>>>> 2) We need to re-visit the NumPy interface. If we can create NumPy
>>>>> arrays of fixed length, then we can just use std::vector, and let the
>>>>> NumPy array wrap the pointer&x[0]. Maybe Johan Hake can comment on
>>>>> this?
>>>>
>>>> The problem is to pass a NumPy array to the C++ interface when it wants
>>>> a std::vector&. Then we need to copy the values. A light weight Array
>>>> class fixes that issue.
>>>>
>>>> Wrapping a std::vector& into a Numpy array is possible without copying
>>>> by just passing the pointer. This would be similar to what we do today
>>>> with the Array.
>>>>
>>>> Johan
>>>>
>>>>> Garth
>>>>>
>>>>>>
>>>>>>
>>>>>>> -j.
>>>>>>>
>>>>>>> On 24 February 2012 19:37, Garth Wells<email address hidden> wrote:
>>>>>>>
>>>>>>>> I'm not keen on ownership flags - they were a real mess before and it's
>>>>>>>> been a lot better since we go rid of them.
>>>>>>>>
>>>>>>>> Would be it be possible to (1) re-use the Array so we avoid dynamic
>>>>>>>> allocation or (2) have an Array-like structure that always owns the data?
>>>>>>>> The class Array was introduced to better interface with Python, but I
>>>>>>>> recall that we later became aware that the Numpy interface has has a flag
>>>>>>>> to prevent resizing, which would make typemaps more robust.
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>

On 02/27/2012 09:16 PM, Anders Logg wrote:
> On Mon, Feb 27, 2012 at 07:52:27PM -0000, Garth Wells wrote:
>> On 27 February 2012 17:01, Anders Logg<logg@simula.no>  wrote:
>>> On Mon, Feb 27, 2012 at 03:32:04PM -0000, Johan Hake wrote:
>>>> On 02/27/2012 03:01 PM, Garth Wells wrote:
>>>>> On 27 February 2012 13:40, Anders Logg<logg@simula.no>    wrote:
>>>>>> On Fri, Feb 24, 2012 at 07:01:57PM -0000, Joachim Haga wrote:
>>>>>>> I'd opine that the easiest would be to make array never own the data,
>>>>>>> rather than always own the data. In that way, it can wrap anything.
>>>>>>>
>>>>>>> * TimeSeries returns (owned) Arrays in two places. These could
>>>>>>> return references to its own data (const vector<double>&), that
>>>>>>> would save a copy (they are currently copied twice, once into an
>>>>>>> owned Array, then again in swig into a numpy array). But maybe the
>>>>>>> amount of data returned is small, in which case a vector<double>
>>>>>>> will be fine.
>>>>>>
>>>>>> The return of arrays from TimeSeries is typically not performance
>>>>>> critical so a copy is fine (which it already is).
>>>>>>
>>>>>>> * Can't support resize() without ownership, so everywhere resize is
>>>>>>> used on a reference argument should pass a vector<double>&
>>>>>>> instead. I think this is a few places only.
>>>>>>>
>>>>>>> At that point, Array will be a plain wrapper of others' data.
>>>>>>>
>>>>>>> As for your (1), it might well be possible (keep a scratch array in
>>>>>>> the GenericFunction class perhaps). But the thing is, the shared
>>>>>>> version was no more safe. You can't keep a reference to an Array
>>>>>>> because you don't know if it's shared or just wrapping data that
>>>>>>> will be deleted from under you. It was "shared or borrowed" rather
>>>>>>> than "owned or borrowed", only without the flag to tell the cases
>>>>>>> apart.
>>>>>>
>>>>>> I'm not very happy with our current use of Array. We use std::vector
>>>>>> in some places and Array in other places. I don't remember what the
>>>>>> original intention was with Array. Is the purpose just to be able to
>>>>>> pass shared data as arrays back and forth to Python?
>>>>>
>>>>> Yes.
>>>>>
>>>>>> Or should it be a
>>>>>> general array class used throughout DOLFIN C++?
>>>>>>
>>>>>
>>>>> No - there is no point in re-inventing std::vector. We should use
>>>>> std::vector where possible. It's more flexible and can be used with
>>>>> STL algorithms.
>>>>
>>>> Agree.
>>>>
>>>>>> If it is just for Python-wrapping, I think we should make it never own
>>>>>> data as Joachim suggests.
>>>>
>>>> Agree. But we need to go over the interface and check that it makes
>>>> sense everywhere. TimeSeries could probably just return a
>>>> std::vector which is copied in the Python interface.
>>>
>>> Yes, definitely for TimeSeries. Perhaps we could settle for that
>>> everytime we want to pass output data to Python?
>>>
>>> Then Array input would essentially be limited to a few particular
>>> callbacks that need to be made efficient (GenericFunction::eval).
>>>
>>
>> In the case of GenericFunction::eval we can use std::vector if, in
>> the wrapping process, we can prevent resizing on the Python side.
>
> Don't know if that's possible. Johan will know but he's out traveling.

The point me and Joachim is raising is that you cannot create a 
std::vector from a NumPy array without copying. So any use of 
GenericFunction::eval then would imply copying. This would only be a 
problem for evaluation of any GenericFunction within the Python interface.

Now we can evaluate a GenericFunction by passing in parts of a larger 
numpy array and evaluate it in place. This cannot be done if we change 
the interface to std::vector.

For the directory typemap we can wrap the outgoing std::vector to a 
NumPy array, prohibiting resizing all that.

I am a bit hesitant scraping the Array from the interface as it work 
just fine. Especially if we make it not owe its data.

>> Maybe we can remove Array and support (a) std::vectors via copy and
>> (b) references to a std::vector with resizing prevented.

If a NumPy array does not own its own data it cannot be resized. This 
would be the case of an array originating from other data, let say a 
std::vector. This is illustrated by:

UnitSquare(1,1).coordinates().resize((1,1))
   ValueError: cannot resize this array: it does not own its data

Here coordinates returns a view of the coordinate data of the mesh, and 
resize is not allowed.

Here most of the trouble would be at the C++ side as the Python code 
would be the same. It will however force regeneration of the compiled 
Expression code. That said I would like to have this possible interface 
change well thought through, as Anders points out we have changed this 
too many times already.

Johan

> --
> Anders
>
>
>
>> Garth
>>
>>
>>>
>>>
>>>>> Memory has to be created somewhere in order to be used, and it has
>>> to
>>>>> be cleaned up.
>>>>>
>>>>> There are two issues here:
>>>>>
>>>>> 1) Dynamic allocation in order to interface with UFC. Using plain
>>>>> pointers rather than smart pointers is likely faster, but I think that
>>>>> it's still not a solution. We should try to avoid these allocations in
>>>>> loops.
>>>>>
>>>>> 2) We need to re-visit the NumPy interface. If we can create NumPy
>>>>> arrays of fixed length, then we can just use std::vector, and let the
>>>>> NumPy array wrap the pointer&x[0]. Maybe Johan Hake can comment on
>>>>> this?
>>>>
>>>> The problem is to pass a NumPy array to the C++ interface when it wants
>>>> a std::vector&. Then we need to copy the values. A light weight Array
>>>> class fixes that issue.
>>>>
>>>> Wrapping a std::vector&  into a Numpy array is possible without copying
>>>> by just passing the pointer. This would be similar to what we do today
>>>> with the Array.
>>>>
>>>> Johan
>>>>
>>>>> Garth
>>>>>
>>>>>>
>>>>>>
>>>>>>> -j.
>>>>>>>
>>>>>>> On 24 February 2012 19:37, Garth Wells<gnw20@cam.ac.uk>    wrote:
>>>>>>>
>>>>>>>> I'm not keen on ownership flags - they were a real mess before and it's
>>>>>>>> been a lot better since we go rid of them.
>>>>>>>>
>>>>>>>> Would be it be possible to (1) re-use the Array so we avoid dynamic
>>>>>>>> allocation or (2) have an Array-like structure that always owns the data?
>>>>>>>> The class Array was introduced to better interface with Python, but I
>>>>>>>> recall that we later became aware that the Numpy interface has has a flag
>>>>>>>> to prevent resizing, which would make typemaps more robust.
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>

« Back to merge proposal