python-poppler doesn't close files

Bug #316722 reported by A.G. Nienhuis
18
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Poppler Python Bindings
In Progress
Medium
Gian Mario Tagliaretti
python-poppler (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

This app crashes with "too many open files" after a while:

import os
import poppler

for i in range(1000000):
    uri = "file://" + os.path.abspath("test.pdf")
    doc = poppler.document_new_from_file(uri, None)

Is there a way to close a document?

Tags: patch
Revision history for this message
A.G. Nienhuis (a-g-nienhuis) wrote :

'del doc' or 'gc.collect()' don't help

Revision history for this message
Gian Mario Tagliaretti (gianmt) wrote :

I've looking into this one for a long time with no real solution but I feel the GC could be the one to blame, did you investigate further?

Revision history for this message
A.G. Nienhuis (a-g-nienhuis) wrote : Re: [Bug 316722] Re: python-poppler doesn't close files

I did investigate further and did some testing:

Current release:

- all python objects get destroyed automatically
- files stay open
- memory leak per document_new_from_file(): 40 kB (for a 300 kB pdf file)
- gc.collect() does nothing

If you call g_object_unref(hash(doc)) after each call to
document_new_from_file() the problems go away:

- all python objects get destroyed automatically
- files are closed
- memory leak per document_new_from_file(): 70 bytes (for a 300 kB pdf file)
- gc.collect() does nothing
- doc is completely usable after g_object_unref(...)

BTW: hash(obj) returns the pointer to the gobject of a PyGobject.

Here is a simple test case:

=================================
import os
import poppler
from ctypes import *

glib = CDLL("libgobject-2.0.so")

uri = "file://" + os.path.abspath("test.pdf")
for i in xrange(1000000):
    print i
    doc = poppler.document_new_from_file(uri, None)
    glib.g_object_unref(hash(doc))
====================================

On Fri, Apr 10, 2009 at 11:23 PM, Gian Mario Tagliaretti
<email address hidden> wrote:
>
> I've looking into this one for a long time with no real solution but I
> feel the GC could be the one to blame, did you investigate further?

Revision history for this message
Gian Mario Tagliaretti (gianmt) wrote :

I have files a bug with a patch that adds a poppler.Document.release() that will solve this issue, hopefully it will get in poppler-0.12, valgrind shows no complaints

==6750== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 1248 from 9)
==6750== malloc/free: in use at exit: 3,470,220 bytes in 11,812 blocks.
==6750== malloc/free: 54,593 allocs, 42,781 frees, 9,141,246 bytes allocated.
==6750== For counts of detected errors, rerun with: -v
==6750== searching for pointers to 11,812 not-freed blocks.
==6750== checked 3,999,188 bytes.
==6750==
==6750== LEAK SUMMARY:
==6750== definitely lost: 452 bytes in 15 blocks.
==6750== possibly lost: 43,208 bytes in 71 blocks.
==6750== still reachable: 3,426,560 bytes in 11,726 blocks.
==6750== suppressed: 0 bytes in 0 blocks.

how did you measure the leaks?

Revision history for this message
A.G. Nienhuis (a-g-nienhuis) wrote :

> how did you measure the leaks?

Leave it running for a few minutes while looking at gnome-system-monitor

Revision history for this message
Gian Mario Tagliaretti (gianmt) wrote :

I'm still waiting for my patch to be pushed into poppler itself, let's cross fingers :)

Changed in poppler-python:
assignee: nobody → Gian Mario Tagliaretti (gianmt)
status: New → In Progress
Changed in poppler-python:
importance: Undecided → Medium
milestone: none → development
Revision history for this message
BenjaminBerg (benjamin-sipsolutions) wrote :

Ehm? Why is a patch to poppler needed in any way?
(this is also basically the same as #509408, but that is known ...)

 As far as I can tell the only issue is that poppler-python is missing the
  (caller-owns-return #t)
all over the place in the poppler.defs file. This means that while the python object is destroyed, the C Object will stay alive, and keep the file opened. If this is added to the different functions, the C object will be destroyed too, and the file is closed just fine.

I would say the following functions will at least need the hint:
 * poppler_document_new_from_file
 * poppler_document_new_from_data
 * poppler_document_get_page (should fix #509408)
 * poppler_document_get_page_by_label
 * poppler_document_find_dest
 * poppler_document_get_form_field
 * poppler_index_iter_copy
 * poppler_index_iter_get_action

I have not done any extensive testing, but was able to open, and get the number of pages of several thousands of PDF files with this changed.

And likely the thumbnail getter too. Not sure how exactly the GList* handling is done by the binding generator.

Revision history for this message
alexandervdm (alexandervdm) wrote :

I just compiled the latest python-poppler revision including the solution presented by BenjaminBerg (Thanks!) and can confirm it is working. Both my own application and the demo program in the bug description show no increased memory usage and the proc/{id}/fd folder shows the objects are succesfully destroyed.

Hopefully this can be accepted into trunk and distro-repositories soon.

tags: added: patch
Changed in python-poppler (Ubuntu):
status: New → Triaged
Revision history for this message
logari81 (logari81) wrote :

Are there any objections to the solution proposed in comment #7? I can confirm that the patch from comment #8 solves all memleaks that I have encountered (see bug #509408). Who is responsible for accepting and applying this patch into the upstream project?

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package python-poppler - 0.12.1-8

---------------
python-poppler (0.12.1-8) unstable; urgency=low

  * uploading to unstable

 -- Andrea Gasparini <email address hidden> Sun, 29 Apr 2012 17:51:08 +0200

Changed in python-poppler (Ubuntu):
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.