Merge lp:~jderose/microfiber/views into lp:microfiber

Proposed by Jason Gerard DeRose
Status: Merged
Merged at revision: 117
Proposed branch: lp:~jderose/microfiber/views
Merge into: lp:microfiber
Diff against target: 281 lines (+95/-20)
2 files modified
microfiber.py (+12/-3)
test_microfiber.py (+83/-17)
To merge this branch: bzr merge lp:~jderose/microfiber/views
Reviewer Review Type Date Requested Status
Jason Gerard DeRose Approve
Review via email: mp+113822@code.launchpad.net

Description of the change

Some misc API changes I grouped up here:

* Database.view() now includes reduce=False by default, which is really how the CouchDB API should work in the first place. For code that only uses a non-reduced view, it sucks for this code to have to know whether there is a reduce function. James and I discussed this a while back, so I'm trying it first in Microfiber, and then will add it to couch.js if we like it.

* Renamed new "non-atomic" Database.bulksave() to Database.save_many() because combined with Database.get_many(), it makes for a clearer, more consistent API.

* Renamed original "all-or-nothing" Database.bulksave2() back to Database.bulksave(). This way there is no API breakage, just in case anyone out there has been using this and wants the "all-or-nothing" behavior, understanding the implications. But I think we should deprecate this method soon. My idea with the convenience methods isn't to capture every scenario, but instead to just capture a few important patterns that might be tricky to get right using post(), get(), etc.

* Adds experimental id_slice_iter() function to give you all the ids from the rows is a view result, but chunked into groups of 25 (override with size=10 or whatever). This is for certain operations in Dmedia where we need to update a large number of docs, like the new Core.purge_store() method. So the idea is as you go through the view results, you use Database.get_many() to get 25 docs, update them all, and then save them back with Database.save_many(). This provides a huge performance improvement over get()/save() for each doc... 10x easily.

To post a comment you must log in.
Revision history for this message
Jason Gerard DeRose (jderose) wrote :

I'm gonna self-approve this as it's blocking other work.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'microfiber.py'
--- microfiber.py 2012-07-06 06:59:53 +0000
+++ microfiber.py 2012-07-07 12:53:19 +0000
@@ -49,6 +49,7 @@
49from urllib.parse import urlparse, urlencode, quote_plus49from urllib.parse import urlparse, urlencode, quote_plus
50from http.client import HTTPConnection, HTTPSConnection, BadStatusLine50from http.client import HTTPConnection, HTTPSConnection, BadStatusLine
51import threading51import threading
52import math
5253
5354
54__all__ = (55__all__ = (
@@ -234,6 +235,11 @@
234 return replication_body(name, peer, **kw)235 return replication_body(name, peer, **kw)
235236
236237
238def id_slice_iter(rows, size=25):
239 for i in range(math.ceil(len(rows) / size)):
240 yield [row['id'] for row in rows[i*size : (i+1)*size]]
241
242
237class HTTPError(Exception):243class HTTPError(Exception):
238 """244 """
239 Base class for custom `microfiber` exceptions.245 Base class for custom `microfiber` exceptions.
@@ -682,7 +688,8 @@
682 * `Database.server()` - return a `Server` pointing at same URL688 * `Database.server()` - return a `Server` pointing at same URL
683 * `Database.ensure()` - ensure the database exists689 * `Database.ensure()` - ensure the database exists
684 * `Database.save(doc)` - save to CouchDB, update doc _id & _rev in place690 * `Database.save(doc)` - save to CouchDB, update doc _id & _rev in place
685 * `Database.bulksave(docs)` - as above, but with a list of docs691 * `Database.save_many(docs)` - as above, but with a list of docs
692 * `Database.get_many(doc_ids)` - retrieve many docs at once
686 * `Datebase.view(design, view, **options)` - shortcut method, that's all693 * `Datebase.view(design, view, **options)` - shortcut method, that's all
687 """694 """
688 def __init__(self, name, env=SERVER):695 def __init__(self, name, env=SERVER):
@@ -749,7 +756,7 @@
749 doc['_rev'] = r['rev']756 doc['_rev'] = r['rev']
750 return r757 return r
751758
752 def bulksave(self, docs):759 def save_many(self, docs):
753 """760 """
754 Bulk-save using non-atomic semantics, updates all _rev in-place.761 Bulk-save using non-atomic semantics, updates all _rev in-place.
755762
@@ -778,7 +785,7 @@
778 raise BulkConflict(conflicts, rows)785 raise BulkConflict(conflicts, rows)
779 return rows786 return rows
780787
781 def bulksave2(self, docs):788 def bulksave(self, docs):
782 """789 """
783 Bulk-save using all-or-nothing semantics, updates all _rev in-place.790 Bulk-save using all-or-nothing semantics, updates all _rev in-place.
784791
@@ -819,6 +826,8 @@
819826
820 ``Database.get('_design', design, '_view', view, **options)``827 ``Database.get('_design', design, '_view', view, **options)``
821 """828 """
829 if 'reduce' not in options:
830 options['reduce'] = False
822 return self.get('_design', design, '_view', view, **options)831 return self.get('_design', design, '_view', view, **options)
823832
824 def dump(self, fp, attachments=True):833 def dump(self, fp, attachments=True):
825834
=== modified file 'test_microfiber.py'
--- test_microfiber.py 2012-07-06 06:59:53 +0000
+++ test_microfiber.py 2012-07-07 12:53:19 +0000
@@ -46,6 +46,7 @@
46 usercouch = None46 usercouch = None
4747
48import microfiber48import microfiber
49from microfiber import random_id
49from microfiber import NotFound, MethodNotAllowed, Conflict, PreconditionFailed50from microfiber import NotFound, MethodNotAllowed, Conflict, PreconditionFailed
5051
5152
@@ -65,10 +66,6 @@
65 )66 )
6667
6768
68def random_id():
69 return b32encode(os.urandom(10)).decode('ascii')
70
71
72def random_oauth():69def random_oauth():
73 return dict(70 return dict(
74 (k, random_id())71 (k, random_id())
@@ -92,7 +89,6 @@
9289
9390
94assert is_microfiber_id(microfiber.random_id())91assert is_microfiber_id(microfiber.random_id())
95assert not is_microfiber_id(random_id())
96assert not is_microfiber_id(test_id())92assert not is_microfiber_id(test_id())
9793
9894
@@ -560,6 +556,44 @@
560 }556 }
561 )557 )
562558
559 def test_id_slice_iter(self):
560 ids = [random_id() for i in range(74)]
561 rows = [{'id': _id} for _id in ids]
562 chunks = list(microfiber.id_slice_iter(rows))
563 self.assertEqual(len(chunks), 3)
564 self.assertEqual(len(chunks[0]), 25)
565 self.assertEqual(len(chunks[1]), 25)
566 self.assertEqual(len(chunks[2]), 24)
567 accum = []
568 for chunk in chunks:
569 accum.extend(chunk)
570 self.assertEqual(accum, ids)
571
572 ids = [random_id() for i in range(75)]
573 rows = [{'id': _id} for _id in ids]
574 chunks = list(microfiber.id_slice_iter(rows))
575 self.assertEqual(len(chunks), 3)
576 self.assertEqual(len(chunks[0]), 25)
577 self.assertEqual(len(chunks[1]), 25)
578 self.assertEqual(len(chunks[2]), 25)
579 accum = []
580 for chunk in chunks:
581 accum.extend(chunk)
582 self.assertEqual(accum, ids)
583
584 ids = [random_id() for i in range(76)]
585 rows = [{'id': _id} for _id in ids]
586 chunks = list(microfiber.id_slice_iter(rows))
587 self.assertEqual(len(chunks), 4)
588 self.assertEqual(len(chunks[0]), 25)
589 self.assertEqual(len(chunks[1]), 25)
590 self.assertEqual(len(chunks[2]), 25)
591 self.assertEqual(len(chunks[3]), 1)
592 accum = []
593 for chunk in chunks:
594 accum.extend(chunk)
595 self.assertEqual(accum, ids)
596
563597
564class TestErrors(TestCase):598class TestErrors(TestCase):
565 def test_errors(self):599 def test_errors(self):
@@ -825,6 +859,38 @@
825 self.assertEqual(s._basic, 'foo')859 self.assertEqual(s._basic, 'foo')
826 self.assertEqual(s._oauth, 'bar')860 self.assertEqual(s._oauth, 'bar')
827861
862 def test_view(self):
863 class Mock(microfiber.Database):
864 def get(self, *parts, **options):
865 self._parts = parts
866 self._options = options
867 assert not hasattr(self, '_return')
868 self._return = random_id()
869 return self._return
870
871 db = Mock('mydb')
872 self.assertEqual(db.view('foo', 'bar'), db._return)
873 self.assertEqual(db._parts, ('_design', 'foo', '_view', 'bar'))
874 self.assertEqual(db._options, {'reduce': False})
875
876 db = Mock('mydb')
877 self.assertEqual(db.view('foo', 'bar', reduce=True), db._return)
878 self.assertEqual(db._parts, ('_design', 'foo', '_view', 'bar'))
879 self.assertEqual(db._options, {'reduce': True})
880
881 db = Mock('mydb')
882 self.assertEqual(db.view('foo', 'bar', include_docs=True), db._return)
883 self.assertEqual(db._parts, ('_design', 'foo', '_view', 'bar'))
884 self.assertEqual(db._options, {'reduce': False, 'include_docs': True})
885
886 db = Mock('mydb')
887 self.assertEqual(
888 db.view('foo', 'bar', include_docs=True, reduce=True),
889 db._return
890 )
891 self.assertEqual(db._parts, ('_design', 'foo', '_view', 'bar'))
892 self.assertEqual(db._options, {'reduce': True, 'include_docs': True})
893
828894
829class ReplicationTestCase(TestCase):895class ReplicationTestCase(TestCase):
830 def setUp(self):896 def setUp(self):
@@ -1348,14 +1414,14 @@
1348 }1414 }
1349 )1415 )
13501416
1351 def test_bulksave(self):1417 def test_save_many(self):
1352 db = microfiber.Database(self.db, self.env)1418 db = microfiber.Database(self.db, self.env)
1353 self.assertTrue(db.ensure())1419 self.assertTrue(db.ensure())
13541420
1355 # Test that doc['_id'] gets set automatically1421 # Test that doc['_id'] gets set automatically
1356 markers = tuple(test_id() for i in range(10))1422 markers = tuple(test_id() for i in range(10))
1357 docs = [{'marker': m} for m in markers]1423 docs = [{'marker': m} for m in markers]
1358 rows = db.bulksave(docs)1424 rows = db.save_many(docs)
1359 for (marker, doc, row) in zip(markers, docs, rows):1425 for (marker, doc, row) in zip(markers, docs, rows):
1360 self.assertEqual(doc['marker'], marker)1426 self.assertEqual(doc['marker'], marker)
1361 self.assertEqual(doc['_id'], row['id'])1427 self.assertEqual(doc['_id'], row['id'])
@@ -1366,7 +1432,7 @@
1366 # Test when doc['_id'] is already present1432 # Test when doc['_id'] is already present
1367 ids = tuple(test_id() for i in range(10))1433 ids = tuple(test_id() for i in range(10))
1368 docs = [{'_id': _id} for _id in ids]1434 docs = [{'_id': _id} for _id in ids]
1369 rows = db.bulksave(docs)1435 rows = db.save_many(docs)
1370 for (_id, doc, row) in zip(ids, docs, rows):1436 for (_id, doc, row) in zip(ids, docs, rows):
1371 self.assertEqual(doc['_id'], _id)1437 self.assertEqual(doc['_id'], _id)
1372 self.assertEqual(row['id'], _id)1438 self.assertEqual(row['id'], _id)
@@ -1377,7 +1443,7 @@
1377 # Let's update all the docs1443 # Let's update all the docs
1378 for doc in docs:1444 for doc in docs:
1379 doc['x'] = 'foo' 1445 doc['x'] = 'foo'
1380 rows = db.bulksave(docs)1446 rows = db.save_many(docs)
1381 for (_id, doc, row) in zip(ids, docs, rows):1447 for (_id, doc, row) in zip(ids, docs, rows):
1382 self.assertEqual(doc['_id'], _id)1448 self.assertEqual(doc['_id'], _id)
1383 self.assertEqual(row['id'], _id)1449 self.assertEqual(row['id'], _id)
@@ -1404,7 +1470,7 @@
1404 good.append(doc)1470 good.append(doc)
14051471
1406 with self.assertRaises(microfiber.BulkConflict) as cm:1472 with self.assertRaises(microfiber.BulkConflict) as cm:
1407 rows = db.bulksave(docs)1473 rows = db.save_many(docs)
1408 self.assertEqual(str(cm.exception), 'conflict on 5 docs')1474 self.assertEqual(str(cm.exception), 'conflict on 5 docs')
1409 self.assertEqual(cm.exception.conflicts, bad)1475 self.assertEqual(cm.exception.conflicts, bad)
1410 self.assertEqual(len(cm.exception.rows), 10)1476 self.assertEqual(len(cm.exception.rows), 10)
@@ -1424,14 +1490,14 @@
1424 self.assertEqual(row['rev'], doc['_rev'])1490 self.assertEqual(row['rev'], doc['_rev'])
1425 self.assertEqual(real, doc)1491 self.assertEqual(real, doc)
14261492
1427 def test_bulksave2(self):1493 def test_bulksave(self):
1428 db = microfiber.Database(self.db, self.env)1494 db = microfiber.Database(self.db, self.env)
1429 self.assertTrue(db.ensure())1495 self.assertTrue(db.ensure())
14301496
1431 # Test that doc['_id'] gets set automatically1497 # Test that doc['_id'] gets set automatically
1432 markers = tuple(test_id() for i in range(10))1498 markers = tuple(test_id() for i in range(10))
1433 docs = [{'marker': m} for m in markers]1499 docs = [{'marker': m} for m in markers]
1434 rows = db.bulksave2(docs)1500 rows = db.bulksave(docs)
1435 for (marker, doc, row) in zip(markers, docs, rows):1501 for (marker, doc, row) in zip(markers, docs, rows):
1436 self.assertEqual(doc['marker'], marker)1502 self.assertEqual(doc['marker'], marker)
1437 self.assertEqual(doc['_id'], row['id'])1503 self.assertEqual(doc['_id'], row['id'])
@@ -1442,7 +1508,7 @@
1442 # Test when doc['_id'] is already present1508 # Test when doc['_id'] is already present
1443 ids = tuple(test_id() for i in range(10))1509 ids = tuple(test_id() for i in range(10))
1444 docs = [{'_id': _id} for _id in ids]1510 docs = [{'_id': _id} for _id in ids]
1445 rows = db.bulksave2(docs)1511 rows = db.bulksave(docs)
1446 for (_id, doc, row) in zip(ids, docs, rows):1512 for (_id, doc, row) in zip(ids, docs, rows):
1447 self.assertEqual(doc['_id'], _id)1513 self.assertEqual(doc['_id'], _id)
1448 self.assertEqual(row['id'], _id)1514 self.assertEqual(row['id'], _id)
@@ -1453,7 +1519,7 @@
1453 # Let's update all the docs1519 # Let's update all the docs
1454 for doc in docs:1520 for doc in docs:
1455 doc['x'] = 'foo' 1521 doc['x'] = 'foo'
1456 rows = db.bulksave2(docs)1522 rows = db.bulksave(docs)
1457 for (_id, doc, row) in zip(ids, docs, rows):1523 for (_id, doc, row) in zip(ids, docs, rows):
1458 self.assertEqual(doc['_id'], _id)1524 self.assertEqual(doc['_id'], _id)
1459 self.assertEqual(row['id'], _id)1525 self.assertEqual(row['id'], _id)
@@ -1472,7 +1538,7 @@
1472 # Now let's update all the docs, test all-or-nothing behavior1538 # Now let's update all the docs, test all-or-nothing behavior
1473 for doc in docs:1539 for doc in docs:
1474 doc['x'] = 'bar' 1540 doc['x'] = 'bar'
1475 rows = db.bulksave2(docs)1541 rows = db.bulksave(docs)
1476 for (_id, doc, row) in zip(ids, docs, rows):1542 for (_id, doc, row) in zip(ids, docs, rows):
1477 self.assertEqual(doc['_id'], _id)1543 self.assertEqual(doc['_id'], _id)
1478 self.assertEqual(row['id'], _id)1544 self.assertEqual(row['id'], _id)
@@ -1494,7 +1560,7 @@
1494 # Now update all the docs again, realize all-or-nothing is a bad idea:1560 # Now update all the docs again, realize all-or-nothing is a bad idea:
1495 for doc in docs:1561 for doc in docs:
1496 doc['x'] = 'baz' 1562 doc['x'] = 'baz'
1497 rows = db.bulksave2(docs)1563 rows = db.bulksave(docs)
1498 for (i, row) in enumerate(rows):1564 for (i, row) in enumerate(rows):
1499 _id = ids[i]1565 _id = ids[i]
1500 doc = docs[i]1566 doc = docs[i]
@@ -1516,7 +1582,7 @@
15161582
1517 ids = tuple(test_id() for i in range(50))1583 ids = tuple(test_id() for i in range(50))
1518 docs = [{'_id': _id} for _id in ids]1584 docs = [{'_id': _id} for _id in ids]
1519 db.bulksave(docs)1585 db.save_many(docs)
15201586
1521 # Test an empty doc_ids list1587 # Test an empty doc_ids list
1522 self.assertEqual(db.get_many([]), [])1588 self.assertEqual(db.get_many([]), [])

Subscribers

People subscribed via source and target branches