OpenStack Object Storage (swift)

Merge lp:~gholt/swift/largefiles into lp:~hudson-openstack/swift/trunk

largefiles
Merge into trunk

Proposed by gholt on 2010-09-15

Status:	Rejected
Rejected by:	gholt on 2010-10-22
Proposed branch:	lp:~gholt/swift/largefiles
Merge into:	lp:~hudson-openstack/swift/trunk
Diff against target:	1491 lines (+930/-231) 10 files modified swift/common/constraints.py (+8/-2) swift/obj/auditor.py (+1/-1) swift/obj/hashes.py (+179/-0) swift/obj/hashes.py.moved (+179/-0) swift/obj/replicator.py (+37/-177) swift/obj/server.py (+81/-30) swift/proxy/server.py (+387/-21) test/unit/__init__.py (+2/-0) test/unit/obj/test_hashes.py (+28/-0) test/unit/obj/test_hashes.py.moved (+28/-0)
To merge this branch:	bzr merge lp:~gholt/swift/largefiles
Related bugs:	Link a bug report

Reviewer	Review Type	Date Requested	Status
gholt (community)			Disapprove on 2010-10-21
Review via email: mp+35493@code.launchpad.net

Description of the change

Support for very large files by breaking them into segments and distributing the segments across the cluster.

lp:~gholt/swift/largefiles updated on 2010-10-19

72. By gholt on 2010-09-16: Fixed typo
73. By gholt on 2010-09-16: Merged changes from trunk
74. By gholt on 2010-09-16: very-large-files: now distributes segments across cluster and will not overwrite existing segments during the upload
75. By gholt on 2010-09-17: Updating object auditor to clean up old orphaned large file segments
76. By gholt on 2010-09-20: Merged changes from trunk
77. By gholt on 2010-09-30: Merged changes from trunk
78. By gholt on 2010-10-05: Merged from trunk
79. By gholt on 2010-10-05: Renovation to store object segments separate from whole objects
80. By gholt on 2010-10-05: Just a bit of DRY
81. By gholt on 2010-10-05: Support very large chunked transfer encoding uploads
82. By gholt on 2010-10-06: Update import and new test stub for moved obj hashes code
83. By gholt on 2010-10-07: Merge from trunk
84. By gholt on 2010-10-12: Merged from trunk
85. By gholt on 2010-10-18: Merged from trunk
86. By gholt on 2010-10-18: Merged with trunk
87. By gholt on 2010-10-19: Merged from trunk
88. By gholt on 2010-10-19: Refactor of container_updater calling; removed outdated imports
89. By gholt on 2010-10-19: Merged with refactorhashes

Revision history for this message

gholt (gholt) wrote on 2010-10-21:

I'm going to have to redo this puppy. It got to unwieldy so I'm splitting into separate branches.

review: Disapprove

lp:~gholt/swift/largefiles updated on 2010-10-22

90. By gholt on 2010-10-22: Merge from trunk
91. By gholt on 2010-10-22: Merged from refactorobjasync

Unmerged revisions

91. By gholt on 2010-10-22: Merged from refactorobjasync
90. By gholt on 2010-10-22: Merge from trunk
89. By gholt on 2010-10-19: Merged with refactorhashes
88. By gholt on 2010-10-19: Refactor of container_updater calling; removed outdated imports
87. By gholt on 2010-10-19: Merged from trunk
86. By gholt on 2010-10-18: Merged with trunk
85. By gholt on 2010-10-18: Merged from trunk
84. By gholt on 2010-10-12: Merged from trunk
83. By gholt on 2010-10-07: Merge from trunk
82. By gholt on 2010-10-06: Update import and new test stub for moved obj hashes code

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Anne Gentle

Blake Barnett

Chmouel Boudjnah

Chuck Short

Colin Nicholson

David Goetz

Devin Carlen

Doug Weimer

Isaku Yamahata

JJ Asghar

Joe Arnold

Jonathan Bryce

Mike Barton

Pedro Perez

Pugazhendhi

Tomoya Masuko

anticw

autumn wang

clayg

gholt

jeremyb

wangkun

 === modified file 'swift/common/constraints.py'
 --- swift/common/constraints.py	2010-09-22 19:53:38 +0000
 +++ swift/common/constraints.py	2010-10-19 18:43:49 +0000
@@ -19,8 +19,12 @@
      HTTPRequestEntityTooLarge
++# FIXME: This will get bumped way up once very large file support is added.
  #: Max file size allowed for objects
  MAX_FILE_SIZE = 5 * 1024 * 1024 * 1024 + 2
++# FIXME: Yeah, I know this is a silly low value; just for testing right now.
++#: File size for segments of large objects
++SEGMENT_SIZE = 1024
  #: Max length of the name of a key for metadata
  MAX_META_NAME_LENGTH = 128
  #: Max length of the value of a key for metadata
@@ -29,14 +33,16 @@
  MAX_META_COUNT = 90
  #: Max overall size of metadata
  MAX_META_OVERALL_SIZE = 4096
++#: Max account name length
++MAX_ACCOUNT_NAME_LENGTH = 256
++#: Max container name length
++MAX_CONTAINER_NAME_LENGTH = 256
  #: Max object name length
  MAX_OBJECT_NAME_LENGTH = 1024
  #: Max object list length of a get request for a container
  CONTAINER_LISTING_LIMIT = 10000
  #: Max container list length of a get request for an account
  ACCOUNT_LISTING_LIMIT = 10000
--MAX_ACCOUNT_NAME_LENGTH = 256
--MAX_CONTAINER_NAME_LENGTH = 256
  def check_metadata(req, target_type):
 === modified file 'swift/obj/auditor.py'
 --- swift/obj/auditor.py	2010-10-18 22:30:26 +0000
 +++ swift/obj/auditor.py	2010-10-19 18:43:49 +0000
@@ -19,7 +19,7 @@
  from random import random
  from swift.obj import server as object_server
--from swift.obj.replicator import invalidate_hash
++from swift.obj.hashes import invalidate_hash
  from swift.common.utils import get_logger, renamer, audit_location_generator
  from swift.common.exceptions import AuditException
  from swift.common.daemon import Daemon
 === added file 'swift/obj/hashes.py'
 --- swift/obj/hashes.py	1970-01-01 00:00:00 +0000
 +++ swift/obj/hashes.py	2010-10-19 18:43:49 +0000
@@ -0,0 +1,179 @@
++# Copyright (c) 2010 OpenStack, LLC.
++#
++# Licensed under the Apache License, Version 2.0 (the "License");
++# you may not use this file except in compliance with the License.
++# You may obtain a copy of the License at
++#
++#    http://www.apache.org/licenses/LICENSE-2.0
++#
++# Unless required by applicable law or agreed to in writing, software
++# distributed under the License is distributed on an "AS IS" BASIS,
++# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
++# implied.
++# See the License for the specific language governing permissions and
++# limitations under the License.
++
++import cPickle as pickle
++import hashlib
++import os
++from os.path import isdir, join
++
++from eventlet import tpool, sleep
++
++from swift.common.utils import lock_path, renamer
++
++
++PICKLE_PROTOCOL = 2
++ONE_WEEK = 604800
++HASH_FILE = 'hashes.pkl'
++
++
++def hash_suffix(path, reclaim_age):
++    """
++    Performs reclamation and returns an md5 of all (remaining) files.
++
++    :param reclaim_age: age in seconds at which to remove tombstones
++    """
++    md5 = hashlib.md5()
++    for hsh in sorted(os.listdir(path)):
++        hsh_path = join(path, hsh)
++        files = os.listdir(hsh_path)
++        if len(files) == 1:
++            if files[0].endswith('.ts'):
++                # remove tombstones older than reclaim_age
++                ts = files[0].rsplit('.', 1)[0]
++                if (time.time() - float(ts)) > reclaim_age:
++                    os.unlink(join(hsh_path, files[0]))
++                    files.remove(files[0])
++        elif files:
++            files.sort(reverse=True)
++            meta = data = tomb = None
++            for filename in files:
++                if not meta and filename.endswith('.meta'):
++                    meta = filename
++                if not data and filename.endswith('.data'):
++                    data = filename
++                if not tomb and filename.endswith('.ts'):
++                    tomb = filename
++                if (filename < tomb or       # any file older than tomb
++                    filename < data or       # any file older than data
++                    (filename.endswith('.meta') and
++                     filename < meta)):      # old meta
++                    os.unlink(join(hsh_path, filename))
++                    files.remove(filename)
++        if not files:
++            os.rmdir(hsh_path)
++        for filename in files:
++            md5.update(filename)
++    try:
++        os.rmdir(path)
++    except OSError:
++        pass
++    return md5.hexdigest()
++
++
++def recalculate_hashes(partition_dir, suffixes, reclaim_age=ONE_WEEK):
++    """
++    Recalculates hashes for the given suffixes in the partition and updates
++    them in the partition's hashes file.
++
++    :param partition_dir: directory of the partition in which to recalculate
++    :param suffixes: list of suffixes to recalculate
++    :param reclaim_age: age in seconds at which tombstones should be removed
++    """
++
++    def tpool_listdir(partition_dir):
++        return dict(((suff, None) for suff in os.listdir(partition_dir)
++                     if len(suff) == 3 and isdir(join(partition_dir, suff))))
++    hashes_file = join(partition_dir, HASH_FILE)
++    with lock_path(partition_dir):
++        try:
++            with open(hashes_file, 'rb') as fp:
++                hashes = pickle.load(fp)
++        except Exception:
++            hashes = tpool.execute(tpool_listdir, partition_dir)
++        for suffix in suffixes:
++            suffix_dir = join(partition_dir, suffix)
++            if os.path.exists(suffix_dir):
++                hashes[suffix] = hash_suffix(suffix_dir, reclaim_age)
++            elif suffix in hashes:
++                del hashes[suffix]
++        with open(hashes_file + '.tmp', 'wb') as fp:
++            pickle.dump(hashes, fp, PICKLE_PROTOCOL)
++        renamer(hashes_file + '.tmp', hashes_file)
++
++
++def invalidate_hash(suffix_dir):
++    """
++    Invalidates the hash for a suffix_dir in the partition's hashes file.
++
++    :param suffix_dir: absolute path to suffix dir whose hash needs
++                       invalidating
++    """
++
++    suffix = os.path.basename(suffix_dir)
++    partition_dir = os.path.dirname(suffix_dir)
++    hashes_file = join(partition_dir, HASH_FILE)
++    with lock_path(partition_dir):
++        try:
++            with open(hashes_file, 'rb') as fp:
++                hashes = pickle.load(fp)
++            if suffix in hashes and not hashes[suffix]:
++                return
++        except Exception:
++            return
++        hashes[suffix] = None
++        with open(hashes_file + '.tmp', 'wb') as fp:
++            pickle.dump(hashes, fp, PICKLE_PROTOCOL)
++        renamer(hashes_file + '.tmp', hashes_file)
++
++
++def get_hashes(partition_dir, do_listdir=True, reclaim_age=ONE_WEEK):
++    """
++    Get a list of hashes for the suffix dir.  do_listdir causes it to mistrust
++    the hash cache for suffix existence at the (unexpectedly high) cost of a
++    listdir.  reclaim_age is just passed on to hash_suffix.
++
++    :param partition_dir: absolute path of partition to get hashes for
++    :param do_listdir: force existence check for all hashes in the partition
++    :param reclaim_age: age at which to remove tombstones
++
++    :returns: tuple of (number of suffix dirs hashed, dictionary of hashes)
++    """
++
++    def tpool_listdir(hashes, partition_dir):
++        return dict(((suff, hashes.get(suff, None))
++                     for suff in os.listdir(partition_dir)
++                     if len(suff) == 3 and isdir(join(partition_dir, suff))))
++    hashed = 0
++    hashes_file = join(partition_dir, HASH_FILE)
++    with lock_path(partition_dir):
++        modified = False
++        hashes = {}
++        try:
++            with open(hashes_file, 'rb') as fp:
++                hashes = pickle.load(fp)
++        except Exception:
++            do_listdir = True
++        if do_listdir:
++            hashes = tpool.execute(tpool_listdir, hashes, partition_dir)
++            modified = True
++        for suffix, hash_ in hashes.items():
++            if not hash_:
++                suffix_dir = join(partition_dir, suffix)
++                if os.path.exists(suffix_dir):
++                    try:
++                        hashes[suffix] = hash_suffix(suffix_dir, reclaim_age)
++                        hashed += 1
++                    except OSError:
++                        logging.exception('Error hashing suffix')
++                        hashes[suffix] = None
++                else:
++                    del hashes[suffix]
++                modified = True
++                sleep()
++        if modified:
++            with open(hashes_file + '.tmp', 'wb') as fp:
++                pickle.dump(hashes, fp, PICKLE_PROTOCOL)
++            renamer(hashes_file + '.tmp', hashes_file)
++        return hashed, hashes
 === added file 'swift/obj/hashes.py.moved'
 --- swift/obj/hashes.py.moved	1970-01-01 00:00:00 +0000
 +++ swift/obj/hashes.py.moved	2010-10-19 18:43:49 +0000
@@ -0,0 +1,179 @@
++# Copyright (c) 2010 OpenStack, LLC.
++#
++# Licensed under the Apache License, Version 2.0 (the "License");
++# you may not use this file except in compliance with the License.
++# You may obtain a copy of the License at
++#
++#    http://www.apache.org/licenses/LICENSE-2.0
++#
++# Unless required by applicable law or agreed to in writing, software
++# distributed under the License is distributed on an "AS IS" BASIS,
++# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
++# implied.
++# See the License for the specific language governing permissions and
++# limitations under the License.
++
++import cPickle as pickle
++import hashlib
++import os
++from os.path import isdir, join
++
++from eventlet import tpool, sleep
++
++from swift.common.utils import lock_path, renamer
++
++
++PICKLE_PROTOCOL = 2
++ONE_WEEK = 604800
++HASH_FILE = 'hashes.pkl'
++
++
++def hash_suffix(path, reclaim_age):
++    """
++    Performs reclamation and returns an md5 of all (remaining) files.
++
++    :param reclaim_age: age in seconds at which to remove tombstones
++    """
++    md5 = hashlib.md5()
++    for hsh in sorted(os.listdir(path)):
++        hsh_path = join(path, hsh)
++        files = os.listdir(hsh_path)
++        if len(files) == 1:
++            if files[0].endswith('.ts'):
++                # remove tombstones older than reclaim_age
++                ts = files[0].rsplit('.', 1)[0]
++                if (time.time() - float(ts)) > reclaim_age:
++                    os.unlink(join(hsh_path, files[0]))
++                    files.remove(files[0])
++        elif files:
++            files.sort(reverse=True)
++            meta = data = tomb = None
++            for filename in files:
++                if not meta and filename.endswith('.meta'):
++                    meta = filename
++                if not data and filename.endswith('.data'):
++                    data = filename
++                if not tomb and filename.endswith('.ts'):
++                    tomb = filename
++                if (filename < tomb or       # any file older than tomb
++                    filename < data or       # any file older than data
++                    (filename.endswith('.meta') and
++                     filename < meta)):      # old meta
++                    os.unlink(join(hsh_path, filename))
++                    files.remove(filename)
++        if not files:
++            os.rmdir(hsh_path)
++        for filename in files:
++            md5.update(filename)
++    try:
++        os.rmdir(path)
++    except OSError:
++        pass
++    return md5.hexdigest()
++
++
++def recalculate_hashes(partition_dir, suffixes, reclaim_age=ONE_WEEK):
++    """
++    Recalculates hashes for the given suffixes in the partition and updates
++    them in the partition's hashes file.
++
++    :param partition_dir: directory of the partition in which to recalculate
++    :param suffixes: list of suffixes to recalculate
++    :param reclaim_age: age in seconds at which tombstones should be removed
++    """
++
++    def tpool_listdir(partition_dir):
++        return dict(((suff, None) for suff in os.listdir(partition_dir)
++                     if len(suff) == 3 and isdir(join(partition_dir, suff))))
++    hashes_file = join(partition_dir, HASH_FILE)
++    with lock_path(partition_dir):
++        try:
++            with open(hashes_file, 'rb') as fp:
++                hashes = pickle.load(fp)
++        except Exception:
++            hashes = tpool.execute(tpool_listdir, partition_dir)
++        for suffix in suffixes:
++            suffix_dir = join(partition_dir, suffix)
++            if os.path.exists(suffix_dir):
++                hashes[suffix] = hash_suffix(suffix_dir, reclaim_age)
++            elif suffix in hashes:
++                del hashes[suffix]
++        with open(hashes_file + '.tmp', 'wb') as fp:
++            pickle.dump(hashes, fp, PICKLE_PROTOCOL)
++        renamer(hashes_file + '.tmp', hashes_file)
++
++
++def invalidate_hash(suffix_dir):
++    """
++    Invalidates the hash for a suffix_dir in the partition's hashes file.
++
++    :param suffix_dir: absolute path to suffix dir whose hash needs
++                       invalidating
++    """
++
++    suffix = os.path.basename(suffix_dir)
++    partition_dir = os.path.dirname(suffix_dir)
++    hashes_file = join(partition_dir, HASH_FILE)
++    with lock_path(partition_dir):
++        try:
++            with open(hashes_file, 'rb') as fp:
++                hashes = pickle.load(fp)
++            if suffix in hashes and not hashes[suffix]:
++                return
++        except Exception:
++            return
++        hashes[suffix] = None
++        with open(hashes_file + '.tmp', 'wb') as fp:
++            pickle.dump(hashes, fp, PICKLE_PROTOCOL)
++        renamer(hashes_file + '.tmp', hashes_file)
++
++
++def get_hashes(partition_dir, do_listdir=True, reclaim_age=ONE_WEEK):
++    """
++    Get a list of hashes for the suffix dir.  do_listdir causes it to mistrust
++    the hash cache for suffix existence at the (unexpectedly high) cost of a
++    listdir.  reclaim_age is just passed on to hash_suffix.
++
++    :param partition_dir: absolute path of partition to get hashes for
++    :param do_listdir: force existence check for all hashes in the partition
++    :param reclaim_age: age at which to remove tombstones
++
++    :returns: tuple of (number of suffix dirs hashed, dictionary of hashes)
++    """
++
++    def tpool_listdir(hashes, partition_dir):
++        return dict(((suff, hashes.get(suff, None))
++                     for suff in os.listdir(partition_dir)
++                     if len(suff) == 3 and isdir(join(partition_dir, suff))))
++    hashed = 0
++    hashes_file = join(partition_dir, HASH_FILE)
++    with lock_path(partition_dir):
++        modified = False
++        hashes = {}
++        try:
++            with open(hashes_file, 'rb') as fp:
++                hashes = pickle.load(fp)
++        except Exception:
++            do_listdir = True
++        if do_listdir:
++            hashes = tpool.execute(tpool_listdir, hashes, partition_dir)
++            modified = True
++        for suffix, hash_ in hashes.items():
++            if not hash_:
++                suffix_dir = join(partition_dir, suffix)
++                if os.path.exists(suffix_dir):
++                    try:
++                        hashes[suffix] = hash_suffix(suffix_dir, reclaim_age)
++                        hashed += 1
++                    except OSError:
++                        logging.exception('Error hashing suffix')
++                        hashes[suffix] = None
++                else:
++                    del hashes[suffix]
++                modified = True
++                sleep()
++        if modified:
++            with open(hashes_file + '.tmp', 'wb') as fp:
++                pickle.dump(hashes, fp, PICKLE_PROTOCOL)
++            renamer(hashes_file + '.tmp', hashes_file)
++        return hashed, hashes
 === modified file 'swift/obj/replicator.py'
 --- swift/obj/replicator.py	2010-10-19 01:05:54 +0000
 +++ swift/obj/replicator.py	2010-10-19 18:43:49 +0000
@@ -19,7 +19,6 @@
  import shutil
  import time
  import logging
--import hashlib
  import itertools
  import cPickle as pickle
@@ -29,168 +28,16 @@
  from eventlet.support.greenlets import GreenletExit
  from swift.common.ring import Ring
--from swift.common.utils import whataremyips, unlink_older_than, lock_path, \
--        renamer, compute_eta, get_logger
++from swift.common.utils import compute_eta, get_logger, unlink_older_than, \
++                               whataremyips
  from swift.common.bufferedhttp import http_connect
  from swift.common.daemon import Daemon
++from swift.obj.hashes import get_hashes, recalculate_hashes
++from swift.obj.server import DATADIR
++
  hubs.use_hub('poll')
--PICKLE_PROTOCOL = 2
--ONE_WEEK = 604800
--HASH_FILE = 'hashes.pkl'
--
--
--def hash_suffix(path, reclaim_age):
--    """
--    Performs reclamation and returns an md5 of all (remaining) files.
--
--    :param reclaim_age: age in seconds at which to remove tombstones
--    """
--    md5 = hashlib.md5()
--    for hsh in sorted(os.listdir(path)):
--        hsh_path = join(path, hsh)
--        files = os.listdir(hsh_path)
--        if len(files) == 1:
--            if files[0].endswith('.ts'):
--                # remove tombstones older than reclaim_age
--                ts = files[0].rsplit('.', 1)[0]
--                if (time.time() - float(ts)) > reclaim_age:
--                    os.unlink(join(hsh_path, files[0]))
--                    files.remove(files[0])
--        elif files:
--            files.sort(reverse=True)
--            meta = data = tomb = None
--            for filename in files:
--                if not meta and filename.endswith('.meta'):
--                    meta = filename
--                if not data and filename.endswith('.data'):
--                    data = filename
--                if not tomb and filename.endswith('.ts'):
--                    tomb = filename
--                if (filename < tomb or       # any file older than tomb
--                    filename < data or       # any file older than data
--                    (filename.endswith('.meta') and
--                     filename < meta)):      # old meta
--                    os.unlink(join(hsh_path, filename))
--                    files.remove(filename)
--        if not files:
--            os.rmdir(hsh_path)
--        for filename in files:
--            md5.update(filename)
--    try:
--        os.rmdir(path)
--    except OSError:
--        pass
--    return md5.hexdigest()
--
--
--def recalculate_hashes(partition_dir, suffixes, reclaim_age=ONE_WEEK):
--    """
--    Recalculates hashes for the given suffixes in the partition and updates
--    them in the partition's hashes file.
--
--    :param partition_dir: directory of the partition in which to recalculate
--    :param suffixes: list of suffixes to recalculate
--    :param reclaim_age: age in seconds at which tombstones should be removed
--    """
--
--    def tpool_listdir(partition_dir):
--        return dict(((suff, None) for suff in os.listdir(partition_dir)
--                     if len(suff) == 3 and isdir(join(partition_dir, suff))))
--    hashes_file = join(partition_dir, HASH_FILE)
--    with lock_path(partition_dir):
--        try:
--            with open(hashes_file, 'rb') as fp:
--                hashes = pickle.load(fp)
--        except Exception:
--            hashes = tpool.execute(tpool_listdir, partition_dir)
--        for suffix in suffixes:
--            suffix_dir = join(partition_dir, suffix)
--            if os.path.exists(suffix_dir):
--                hashes[suffix] = hash_suffix(suffix_dir, reclaim_age)
--            elif suffix in hashes:
--                del hashes[suffix]
--        with open(hashes_file + '.tmp', 'wb') as fp:
--            pickle.dump(hashes, fp, PICKLE_PROTOCOL)
--        renamer(hashes_file + '.tmp', hashes_file)
--
--
--def invalidate_hash(suffix_dir):
--    """
--    Invalidates the hash for a suffix_dir in the partition's hashes file.
--
--    :param suffix_dir: absolute path to suffix dir whose hash needs
--                       invalidating
--    """
--
--    suffix = os.path.basename(suffix_dir)
--    partition_dir = os.path.dirname(suffix_dir)
--    hashes_file = join(partition_dir, HASH_FILE)
--    with lock_path(partition_dir):
--        try:
--            with open(hashes_file, 'rb') as fp:
--                hashes = pickle.load(fp)
--            if suffix in hashes and not hashes[suffix]:
--                return
--        except Exception:
--            return
--        hashes[suffix] = None
--        with open(hashes_file + '.tmp', 'wb') as fp:
--            pickle.dump(hashes, fp, PICKLE_PROTOCOL)
--        renamer(hashes_file + '.tmp', hashes_file)
--
--
--def get_hashes(partition_dir, do_listdir=True, reclaim_age=ONE_WEEK):
--    """
--    Get a list of hashes for the suffix dir.  do_listdir causes it to mistrust
--    the hash cache for suffix existence at the (unexpectedly high) cost of a
--    listdir.  reclaim_age is just passed on to hash_suffix.
--
--    :param partition_dir: absolute path of partition to get hashes for
--    :param do_listdir: force existence check for all hashes in the partition
--    :param reclaim_age: age at which to remove tombstones
--
--    :returns: tuple of (number of suffix dirs hashed, dictionary of hashes)
--    """
--
--    def tpool_listdir(hashes, partition_dir):
--        return dict(((suff, hashes.get(suff, None))
--                     for suff in os.listdir(partition_dir)
--                     if len(suff) == 3 and isdir(join(partition_dir, suff))))
--    hashed = 0
--    hashes_file = join(partition_dir, HASH_FILE)
--    with lock_path(partition_dir):
--        modified = False
--        hashes = {}
--        try:
--            with open(hashes_file, 'rb') as fp:
--                hashes = pickle.load(fp)
--        except Exception:
--            do_listdir = True
--        if do_listdir:
--            hashes = tpool.execute(tpool_listdir, hashes, partition_dir)
--            modified = True
--        for suffix, hash_ in hashes.items():
--            if not hash_:
--                suffix_dir = join(partition_dir, suffix)
--                if os.path.exists(suffix_dir):
--                    try:
--                        hashes[suffix] = hash_suffix(suffix_dir, reclaim_age)
--                        hashed += 1
--                    except OSError:
--                        logging.exception('Error hashing suffix')
--                        hashes[suffix] = None
--                else:
--                    del hashes[suffix]
--                modified = True
--                sleep()
--        if modified:
--            with open(hashes_file + '.tmp', 'wb') as fp:
--                pickle.dump(hashes, fp, PICKLE_PROTOCOL)
--            renamer(hashes_file + '.tmp', hashes_file)
--        return hashed, hashes
--
  class ObjectReplicator(Daemon):
      """
@@ -301,8 +148,9 @@
                  had_any = True
          if not had_any:
              return False
--        args.append(join(rsync_module, node['device'],
--                    'objects', job['partition']))
++        objdir = job.get('segments') and SEGMENTSDIR or DATADIR
++        args.append(join(rsync_module, node['device'], objdir,
++                    job['partition']))
          return self._rsync(args) == 0
      def check_ring(self):
@@ -337,12 +185,15 @@
                  for node in job['nodes']:
                      success = self.rsync(node, job, suffixes)
                      if success:
++                        headers = {'Content-Length': '0'}
++                        if job.get('segments'):
++                            headers['X-Object-Type'] = 'segment'
                          with Timeout(self.http_timeout):
                              http_connect(node['ip'],
                                  node['port'],
                                  node['device'], job['partition'], 'REPLICATE',
                                  '/' + '-'.join(suffixes),
--                          headers={'Content-Length': '0'}).getresponse().read()
++                                headers=headers).getresponse().read()
                      responses.append(success)
              if not suffixes or (len(responses) == \
                          self.object_ring.replica_count and all(responses)):
@@ -374,10 +225,13 @@
                  node = next(nodes)
                  attempts_left -= 1
                  try:
++                    headers = {'Content-Length': '0'}
++                    if job.get('segments'):
++                        headers['X-Object-Type'] = 'segment'
                      with Timeout(self.http_timeout):
                          resp = http_connect(node['ip'], node['port'],
                                  node['device'], job['partition'], 'REPLICATE',
--                            '', headers={'Content-Length': '0'}).getresponse()
++                                '', headers=headers).getresponse()
                          if resp.status == 507:
                              self.logger.error('%s/%s responded as unmounted' %
                                                (node['ip'], node['device']))
@@ -397,11 +251,14 @@
                      self.rsync(node, job, suffixes)
                      recalculate_hashes(job['path'], suffixes,
                                         reclaim_age=self.reclaim_age)
++                    headers = {'Content-Length': '0'}
++                    if job.get('segments'):
++                        headers['X-Object-Type'] = 'segment'
                      with Timeout(self.http_timeout):
                          conn = http_connect(node['ip'], node['port'],
                              node['device'], job['partition'], 'REPLICATE',
                              '/' + '-'.join(suffixes),
--                            headers={'Content-Length': '0'})
++                            headers=headers)
                          conn.getresponse().read()
                      self.suffix_sync += len(suffixes)
                  except (Exception, Timeout):
@@ -489,24 +346,27 @@
                      dev for dev in self.object_ring.devs
                      if dev and dev['ip'] in ips and dev['port'] == self.port]:
                  dev_path = join(self.devices_dir, local_dev['device'])
--                obj_path = join(dev_path, 'objects')
--                tmp_path = join(dev_path, 'tmp')
                  if self.mount_check and not os.path.ismount(dev_path):
                      self.logger.warn('%s is not mounted' % local_dev['device'])
                      continue
++                tmp_path = join(dev_path, 'tmp')
                  unlink_older_than(tmp_path, time.time() - self.reclaim_age)
--                if not os.path.exists(obj_path):
--                    continue
--                for partition in os.listdir(obj_path):
--                    try:
--                        nodes = [node for node in
--                            self.object_ring.get_part_nodes(int(partition))
--                                 if node['id'] != local_dev['id']]
--                        jobs.append(dict(path=join(obj_path, partition),
--                            nodes=nodes, delete=len(nodes) > 2,
--                            partition=partition))
--                    except ValueError:
--                        continue
++                for objdir in (DATADIR, SEGMENTSDIR):
++                    obj_path = join(dev_path, objdir)
++                    if os.path.exists(obj_path):
++                        for partition in os.listdir(obj_path):
++                            try:
++                                nodes = [node for node in
++                                    self.object_ring.get_part_nodes(
++                                        int(partition))
++                                    if node['id'] != local_dev['id']]
++                                jobs.append(dict(
++                                    path=join(obj_path, partition),
++                                    nodes=nodes, delete=len(nodes) > 2,
++                                    partition=partition,
++                                    segments=(objdir == SEGMENTSDIR)))
++                            except ValueError:
++                                continue
              random.shuffle(jobs)
              # Partititons that need to be deleted take priority
              jobs.sort(key=lambda job: not job['delete'])
 === modified file 'swift/obj/server.py'
 --- swift/obj/server.py	2010-10-19 01:05:54 +0000
 +++ swift/obj/server.py	2010-10-19 18:43:49 +0000
@@ -42,12 +42,12 @@
  from swift.common.constraints import check_object_creation, check_mount, \
      check_float, check_utf8
  from swift.common.exceptions import ConnectionTimeout
--from swift.obj.replicator import get_hashes, invalidate_hash, \
--    recalculate_hashes
++from swift.obj.hashes import get_hashes, invalidate_hash, recalculate_hashes
  DATADIR = 'objects'
--ASYNCDIR = 'async_pending'
++SEGMENTSDIR = 'object_segments'
++ASYNCDIR = 'object_async'
  PICKLE_PROTOCOL = 2
  METADATA_KEY = 'user.swift.metadata'
  MAX_OBJECT_NAME_LENGTH = 1024
@@ -84,15 +84,31 @@
      :param obj: object name for the object
      :param keep_data_fp: if True, don't close the fp, otherwise close it
      :param disk_chunk_Size: size of chunks on file reads
++    :param segment: If set to not None, indicates which segment of an object
++                    this file represents
++    :param segment_timestamp: X-Timestamp of the object's segments (set on the
++                              PUT, not changed on POSTs), required if segment
++                              is set to not None
      """
      def __init__(self, path, device, partition, account, container, obj,
--                    keep_data_fp=False, disk_chunk_size=65536):
++                 keep_data_fp=False, disk_chunk_size=65536, segment=None,
++                 segment_timestamp=None):
          self.disk_chunk_size = disk_chunk_size
          self.name = '/' + '/'.join((account, container, obj))
--        name_hash = hash_path(account, container, obj)
--        self.datadir = os.path.join(path, device,
--                    storage_directory(DATADIR, partition, name_hash))
++        if segment and int(segment):
++            ring_obj = '%s/%s/%s' % (obj, segment_timestamp, segment)
++        else:
++            ring_obj = obj
++        name_hash = hash_path(account, container, ring_obj)
++        if segment is not None:
++            self.datadir = os.path.join(path, device,
++                storage_directory(SEGMENTSDIR, partition, name_hash))
++            self.no_longer_segment_datadir = os.path.join(path, device,
++                storage_directory(DATADIR, partition, name_hash))
++        else:
++            self.datadir = os.path.join(path, device,
++                storage_directory(DATADIR, partition, name_hash))
          self.tmpdir = os.path.join(path, device, 'tmp')
          self.metadata = {}
          self.meta_file = None
@@ -195,7 +211,8 @@
              except OSError:
                  pass
--    def put(self, fd, tmppath, metadata, extension='.data'):
++    def put(self, fd, tmppath, metadata, extension='.data',
++            no_longer_segment=False):
          """
          Finalize writing the file on disk, and renames it from the temp file to
          the real location.  This should be called after the data has been
@@ -204,7 +221,10 @@
          :params fd: file descriptor of the temp file
          :param tmppath: path to the temporary file being used
          :param metadata: dictionary of metada to be written
--        :param extention: extension to be used when making the file
++        :param extension: extension to be used when making the file
++        :param no_longer_segment: Set to True if this was originally an object
++            segment but no longer is (case with chunked transfer encoding when
++            the object ends up less than the segment size)
          """
          metadata['name'] = self.name
          timestamp = normalize_timestamp(metadata['X-Timestamp'])
@@ -217,6 +237,8 @@
          if 'Content-Length' in metadata:
              drop_buffer_cache(fd, 0, int(metadata['Content-Length']))
          os.fsync(fd)
++        if no_longer_segment:
++            self.datadir = self.no_longer_segment_datadir
          invalidate_hash(os.path.dirname(self.datadir))
          renamer(tmppath, os.path.join(self.datadir, timestamp + extension))
          self.metadata = metadata
@@ -355,7 +377,9 @@
          if error_response:
              return error_response
          file = DiskFile(self.devices, device, partition, account, container,
--                        obj, disk_chunk_size=self.disk_chunk_size)
++                        obj, disk_chunk_size=self.disk_chunk_size,
++                        segment=request.headers.get('x-object-segment'),
++                        segment_timestamp=request.headers['x-timestamp'])
          upload_expiration = time.time() + self.max_upload_time
          etag = md5()
          upload_size = 0
@@ -397,17 +421,32 @@
              if 'content-encoding' in request.headers:
                  metadata['Content-Encoding'] = \
                      request.headers['Content-Encoding']
--            file.put(fd, tmppath, metadata)
++            if 'x-object-type' in request.headers:
++                metadata['X-Object-Type'] = request.headers['x-object-type']
++            if 'x-object-segment' in request.headers:
++                metadata['X-Object-Segment'] = \
++                    request.headers['x-object-segment']
++            no_longer_segment = False
++            if 'x-object-segment-if-length' in request.headers and \
++                    int(request.headers['x-object-segment-if-length']) != \
++                    os.fstat(fd).st_size:
++                del metadata['X-Object-Type']
++                del metadata['X-Object-Segment']
++                no_longer_segment = True
++            file.put(fd, tmppath, metadata,
++                     no_longer_segment=no_longer_segment)
          file.unlinkold(metadata['X-Timestamp'])
--        self.container_update('PUT', account, container, obj, request.headers,
--            {'x-size': file.metadata['Content-Length'],
--             'x-content-type': file.metadata['Content-Type'],
--             'x-timestamp': file.metadata['X-Timestamp'],
--             'x-etag': file.metadata['ETag'],
--             'x-cf-trans-id': request.headers.get('x-cf-trans-id', '-')},
--            device)
--        resp = HTTPCreated(request=request, etag=etag)
--        return resp
++        if 'X-Object-Segment' not in file.metadata:
++            self.container_update('PUT', account, container, obj,
++                request.headers,
++                {'x-size': request.headers.get('x-object-length',
++                                              file.metadata['Content-Length']),
++                 'x-content-type': file.metadata['Content-Type'],
++                 'x-timestamp': file.metadata['X-Timestamp'],
++                 'x-etag': file.metadata['ETag'],
++                 'x-cf-trans-id': request.headers.get('x-cf-trans-id', '-')},
++                device)
++        return HTTPCreated(request=request, etag=etag)
      def GET(self, request):
          """Handle HTTP GET requests for the Swift Object Server."""
@@ -420,7 +459,9 @@
          if self.mount_check and not check_mount(self.devices, device):
              return Response(status='507 %s is not mounted' % device)
          file = DiskFile(self.devices, device, partition, account, container,
--                obj, keep_data_fp=True, disk_chunk_size=self.disk_chunk_size)
++           obj, keep_data_fp=True, disk_chunk_size=self.disk_chunk_size,
++           segment=request.headers.get('x-object-segment'),
++           segment_timestamp=request.headers.get('x-object-segment-timestamp'))
          if file.is_deleted():
              if request.headers.get('if-match') == '*':
                  return HTTPPreconditionFailed(request=request)
@@ -460,7 +501,8 @@
                          'application/octet-stream'), app_iter=file,
                          request=request, conditional_response=True)
          for key, value in file.metadata.iteritems():
--            if key.lower().startswith('x-object-meta-'):
++            if key.lower().startswith('x-object-meta-') or \
++                    key.lower() in ('x-object-type', 'x-object-segment'):
                  response.headers[key] = value
          response.etag = file.metadata['ETag']
          response.last_modified = float(file.metadata['X-Timestamp'])
@@ -482,13 +524,16 @@
          if self.mount_check and not check_mount(self.devices, device):
              return Response(status='507 %s is not mounted' % device)
          file = DiskFile(self.devices, device, partition, account, container,
--                        obj, disk_chunk_size=self.disk_chunk_size)
++           obj, disk_chunk_size=self.disk_chunk_size,
++           segment=request.headers.get('x-object-segment'),
++           segment_timestamp=request.headers.get('x-object-segment-timestamp'))
          if file.is_deleted():
              return HTTPNotFound(request=request)
          response = Response(content_type=file.metadata['Content-Type'],
                              request=request, conditional_response=True)
          for key, value in file.metadata.iteritems():
--            if key.lower().startswith('x-object-meta-'):
++            if key.lower().startswith('x-object-meta-') or \
++                    key.lower() in ('x-object-type', 'x-object-segment'):
                  response.headers[key] = value
          response.etag = file.metadata['ETag']
          response.last_modified = float(file.metadata['X-Timestamp'])
@@ -513,7 +558,9 @@
              return Response(status='507 %s is not mounted' % device)
          response_class = HTTPNoContent
          file = DiskFile(self.devices, device, partition, account, container,
--                        obj, disk_chunk_size=self.disk_chunk_size)
++           obj, disk_chunk_size=self.disk_chunk_size,
++           segment=request.headers.get('x-object-segment'),
++           segment_timestamp=request.headers.get('x-object-segment-timestamp'))
          if file.is_deleted():
              response_class = HTTPNotFound
          metadata = {
@@ -522,10 +569,11 @@
          with file.mkstemp() as (fd, tmppath):
              file.put(fd, tmppath, metadata, extension='.ts')
          file.unlinkold(metadata['X-Timestamp'])
--        self.container_update('DELETE', account, container, obj,
--            request.headers, {'x-timestamp': metadata['X-Timestamp'],
--            'x-cf-trans-id': request.headers.get('x-cf-trans-id', '-')},
--            device)
++        if 'x-object-segment' not in request.headers:
++            self.container_update('DELETE', account, container, obj,
++                request.headers, {'x-timestamp': metadata['X-Timestamp'],
++                'x-cf-trans-id': request.headers.get('x-cf-trans-id', '-')},
++                device)
          resp = response_class(request=request)
          return resp
@@ -538,7 +586,10 @@
              unquote(request.path), 2, 3, True)
          if self.mount_check and not check_mount(self.devices, device):
              return Response(status='507 %s is not mounted' % device)
--        path = os.path.join(self.devices, device, DATADIR, partition)
++        if request.headers.get('x-object-type') == 'segment':
++            path = os.path.join(self.devices, device, SEGMENTSDIR, partition)
++        else:
++            path = os.path.join(self.devices, device, DATADIR, partition)
          if not os.path.exists(path):
              mkdirs(path)
          if suffix:
 === modified file 'swift/proxy/server.py'
 --- swift/proxy/server.py	2010-10-15 15:07:19 +0000
 +++ swift/proxy/server.py	2010-10-19 18:43:49 +0000
@@ -14,21 +14,25 @@
  # limitations under the License.
  from __future__ import with_statement
++try:
++    import simplejson as json
++except ImportError:
++    import json
  import mimetypes
  import os
  import time
  import traceback
  from ConfigParser import ConfigParser
++from hashlib import md5
  from urllib import unquote, quote
  import uuid
  import functools
  from eventlet.timeout import Timeout
--from webob.exc import HTTPBadRequest, HTTPMethodNotAllowed, \
--    HTTPNotFound, HTTPPreconditionFailed, \
--    HTTPRequestTimeout, HTTPServiceUnavailable, \
--    HTTPUnprocessableEntity, HTTPRequestEntityTooLarge, HTTPServerError, \
--    status_map
++from webob.exc import HTTPBadRequest, HTTPCreated, HTTPInternalServerError, \
++    HTTPMethodNotAllowed, HTTPNotFound, HTTPPreconditionFailed, \
++    HTTPRequestEntityTooLarge, HTTPRequestTimeout, HTTPServerError, \
++    HTTPServiceUnavailable, HTTPUnprocessableEntity, status_map
  from webob import Request, Response
  from swift.common.ring import Ring
@@ -37,7 +41,7 @@
  from swift.common.bufferedhttp import http_connect
  from swift.common.constraints import check_metadata, check_object_creation, \
      check_utf8, MAX_ACCOUNT_NAME_LENGTH, MAX_CONTAINER_NAME_LENGTH, \
--    MAX_FILE_SIZE
++    MAX_FILE_SIZE, SEGMENT_SIZE
  from swift.common.exceptions import ChunkReadTimeout, \
      ChunkWriteTimeout, ConnectionTimeout
@@ -89,6 +93,135 @@
      return wrapped
++class SegmentedIterable(object):
++    """
++    Iterable that returns the object contents for a segmented object in Swift.
++
++    In addition to these params, you can also set the `response` attr just
++    after creating the SegmentedIterable and it will update the response's
++    `bytes_transferred` value. Be sure to set the `bytes_transferred` value to
++    0 beforehand.
++
++    :param controller: The ObjectController instance to work with.
++    :param content_length: The total length of the object.
++    :param segment_size: The length of each segment (except perhaps the last)
++                         of the object.
++    :param timestamp: The X-Timestamp of the object's segments (set on the PUT,
++                      not changed on the POSTs).
++    """
++
++    def __init__(self, controller, content_length, segment_size, timestamp):
++        self.controller = controller
++        self.content_length = content_length
++        self.segment_size = segment_size
++        self.timestamp = timestamp
++        self.position = 0
++        self.segment = -1
++        self.segment_iter = None
++        self.response = None
++
++    def load_next_segment(self):
++        """ Loads the self.segment_iter with the next segment's contents. """
++        self.segment += 1
++        if self.segment:
++            ring_object_name = '%s/%s/%s' % (self.controller.object_name,
++                self.timestamp, self.segment)
++        else:
++            ring_object_name = self.controller.object_name
++        partition, nodes = self.controller.app.object_ring.get_nodes(
++            self.controller.account_name, self.controller.container_name,
++            ring_object_name)
++        path = '/%s/%s/%s' % (self.controller.account_name,
++            self.controller.container_name, self.controller.object_name)
++        req = Request.blank(path, headers={'X-Object-Segment': self.segment,
++                'X-Object-Segment-Timestamp': self.timestamp})
++        resp = self.controller.GETorHEAD_base(req, 'Object',
++            partition, self.controller.iter_nodes(partition, nodes,
++            self.controller.app.object_ring), path,
++            self.controller.app.object_ring.replica_count)
++        if resp.status_int // 100 != 2:
++            raise Exception(
++                'Could not load segment %s of %s' % (self.segment, path))
++        self.segment_iter = resp.app_iter
++
++    def __iter__(self):
++        """ Standard iterator function that returns the object's contents. """
++        while self.position < self.content_length:
++            if not self.segment_iter:
++                self.load_next_segment()
++            while True:
++                with ChunkReadTimeout(self.controller.app.node_timeout):
++                    try:
++                        chunk = self.segment_iter.next()
++                        break
++                    except StopIteration:
++                        self.load_next_segment()
++            if self.position + len(chunk) > self.content_length:
++                chunk = chunk[:self.content_length - self.position]
++            self.position += len(chunk)
++            if self.response:
++                self.response.bytes_transferred += len(chunk)
++            yield chunk
++
++    def app_iter_range(self, start, stop):
++        """
++        Non-standard iterator function for use with Webob in serving Range
++        requests more quickly.
++
++        TODO:
++
++        This currently helps on speed by jumping to the proper segment to start
++        with (and ending without reading the trailing segments, but that
++        already happened technically with __iter__).
++
++        But, what it does not do yet is issue a Range request with the first
++        segment to allow the object server to seek to the segment start point.
++
++        Instead, it just reads and throws away all leading segment data. Since
++        segments are 5G by default, it'll have to transfer the whole 5G from
++        the object server to the proxy server even if it only needs the last
++        byte. In practice, this should happen fairly quickly relative to how
++        long requests take for these very large files; but it's still wasteful.
++
++        Anyway, it shouldn't be too hard to implement, I just have other things
++        to work out first.
++
++        :param start: The first byte (zero-based) to return.
++        :param stop: The last byte (zero-based) to return.
++        """
++        if start:
++            self.segment = (start / self.segment_size) - 1
++            self.load_next_segment()
++            self.position = self.segment * self.segment_size
++            segment_start = start - (self.segment * self.segment_size)
++            while segment_start:
++                with ChunkReadTimeout(self.controller.app.node_timeout):
++                    chunk = self.segment_iter.next()
++                    self.position += len(chunk)
++                if len(chunk) > segment_start:
++                    chunk = chunk[segment_start:]
++                    if self.response:
++                        self.response.bytes_transferred += len(chunk)
++                    yield chunk
++                    segment_start = 0
++                else:
++                    segment_start -= len(chunk)
++        if stop is not None:
++            length = stop - start
++        else:
++            length = None
++        for chunk in self:
++            if length is not None:
++                length -= len(chunk)
++                if length < 0:
++                    # Chop off the extra:
++                    if self.response:
++                        self.response.bytes_transferred -= length
++                    yield chunk[:length]
++                    break
++            yield chunk
++
++
  def get_container_memcache_key(account, container):
      path = '/%s/%s' % (account, container)
      return 'container%s' % path
@@ -518,11 +651,56 @@
              aresp = req.environ['swift.authorize'](req)
              if aresp:
                  return aresp
++        # This is bit confusing, so an explanation:
++        # * First we attempt the GET/HEAD normally, as this is the usual case.
++        # * If the request was a Range request and gave us a 416 Unsatisfiable
++        #   response, we might be trying to do an invalid Range on a manifest
++        #   object, so we try again with no Range.
++        # * If it turns out we have a manifest object, and we had a Range
++        #   request originally that actually succeeded or we had a HEAD
++        #   request, we have to do the request again as a full GET because
++        #   we'll need the whole manifest.
++        # * Finally, if we had a manifest object, we pass it and the request
++        #   off to GETorHEAD_segmented; otherwise we just return the response.
          partition, nodes = self.app.object_ring.get_nodes(
              self.account_name, self.container_name, self.object_name)
--        return self.GETorHEAD_base(req, 'Object', partition,
++        resp = mresp = self.GETorHEAD_base(req, 'Object', partition,
++            self.iter_nodes(partition, nodes, self.app.object_ring),
++            req.path_info, self.app.object_ring.replica_count)
++        range_value = None
++        if mresp.status_int == 416:
++            range_value = req.range
++            req.range = None
++            mresp = self.GETorHEAD_base(req, 'Object', partition,
                  self.iter_nodes(partition, nodes, self.app.object_ring),
                  req.path_info, self.app.object_ring.replica_count)
++        if mresp.status_int // 100 != 2:
++            return resp
++        if 'x-object-type' in mresp.headers:
++            if mresp.headers['x-object-type'] == 'manifest':
++                if req.method == 'HEAD':
++                    req.method = 'GET'
++                    mresp = self.GETorHEAD_base(req, 'Object', partition,
++                        self.iter_nodes(partition, nodes,
++                        self.app.object_ring), req.path_info,
++                        self.app.object_ring.replica_count)
++                    if mresp.status_int // 100 != 2:
++                        return mresp
++                    req.method = 'HEAD'
++                elif req.range:
++                    range_value = req.range
++                    req.range = None
++                    mresp = self.GETorHEAD_base(req, 'Object', partition,
++                        self.iter_nodes(partition, nodes,
++                        self.app.object_ring), req.path_info,
++                        self.app.object_ring.replica_count)
++                    if mresp.status_int // 100 != 2:
++                        return mresp
++                if range_value:
++                    req.range = range_value
++                return self.GETorHEAD_segmented(req, mresp)
++            return HTTPNotFound(request=req)
++        return resp
      @public
      @delay_denial
@@ -536,6 +714,48 @@
          """Handler for HTTP HEAD requests."""
          return self.GETorHEAD(req)
++    def GETorHEAD_segmented(self, req, mresp):
++        """
++        Performs a GET for a segmented object.
++
++        :param req: The webob.Request to process.
++        :param mresp: The webob.Response for the original manifest request.
++        :returns: webob.Response object.
++        """
++        manifest = json.loads(''.join(mresp.app_iter))
++        # Ah, the fun of JSONing strs and getting unicodes back. We
++        # reencode to UTF8 to ensure crap doesn't blow up everywhere
++        # else.
++        keys_to_encode = []
++        for k, v in manifest.iteritems():
++            if isinstance(k, unicode):
++                keys_to_encode.append(k)
++            if isinstance(v, unicode):
++                manifest[k] = v.encode('utf8')
++        for k in keys_to_encode:
++            v = manifest[k]
++            del manifest[k]
++            manifest[k.encode('utf8')] = v
++        content_length = int(manifest['content-length'])
++        segment_size = int(manifest['x-segment-size'])
++        headers = dict(mresp.headers)
++        headers.update(manifest)
++        del headers['x-segment-size']
++        resp = Response(app_iter=SegmentedIterable(self, content_length,
++            segment_size, manifest['x-timestamp']), headers=headers,
++            request=req, conditional_response=True)
++        resp.headers['etag'] = manifest['etag'].strip('"')
++        resp.last_modified = mresp.last_modified
++        resp.content_length = int(manifest['content-length'])
++        resp.content_type = manifest['content-type']
++        if 'content-encoding' in manifest:
++            resp.content_encoding = manifest['content-encoding']
++        cresp = req.get_response(resp)
++        # Needed for SegmentedIterable to update bytes_transferred
++        cresp.bytes_transferred = 0
++        resp.app_iter.response = cresp
++        return cresp
++
      @public
      @delay_denial
      def POST(self, req):
@@ -652,11 +872,47 @@
                  if k.lower().startswith('x-object-meta-'):
                      new_req.headers[k] = v
              req = new_req
++        if req.headers.get('transfer-encoding') == 'chunked' or \
++                req.content_length > SEGMENT_SIZE:
++            resp = self.PUT_segmented_object(req, data_source, partition,
++                nodes, container_partition, containers)
++        else:
++            resp = self.PUT_whole_object(req, data_source, partition, nodes,
++                container_partition, containers)
++        if 'x-copy-from' in req.headers:
++            resp.headers['X-Copied-From'] = req.headers['x-copy-from']
++            for k, v in req.headers.items():
++                if k.lower().startswith('x-object-meta-'):
++                    resp.headers[k] = v
++        resp.last_modified = float(req.headers['X-Timestamp'])
++        return resp
++
++    def PUT_whole_object(self, req, data_source, partition, nodes,
++                         container_partition=None, containers=None):
++        """
++        Performs a PUT for a whole object (one with a content-length <=
++        SEGMENT_SIZE).
++
++        :param req: The webob.Request to process.
++        :param data_source: An iterator providing the data to store.
++        :param partition: The object ring partition the object falls on.
++        :param nodes: The object ring nodes the object falls on.
++        :param container_partition: The container ring partition the container
++                                    for the object falls on, None if the
++                                    container is not to be updated.
++        :param containers: The container ring nodes the container for the
++                           object falls on, None if the container is not to be
++                           updated.
++        :returns: webob.Response object.
++        """
++        conns = []
++        update_containers = containers is not None
          for node in self.iter_nodes(partition, nodes, self.app.object_ring):
--            container = containers.pop()
--            req.headers['X-Container-Host'] = '%(ip)s:%(port)s' % container
--            req.headers['X-Container-Partition'] = container_partition
--            req.headers['X-Container-Device'] = container['device']
++            if update_containers:
++                container = containers.pop()
++                req.headers['X-Container-Host'] = '%(ip)s:%(port)s' % container
++                req.headers['X-Container-Partition'] = container_partition
++                req.headers['X-Container-Device'] = container['device']
              req.headers['Expect'] = '100-continue'
              resp = conn = None
              if not self.error_limited(node):
@@ -674,12 +930,14 @@
              if conn and resp:
                  if resp.status == 100:
                      conns.append(conn)
--                    if not containers:
++                    if (update_containers and not containers) or \
++                            len(conns) == len(nodes):
                          break
                      continue
                  elif resp.status == 507:
                      self.error_limit(node)
--            containers.insert(0, container)
++            if update_containers:
++                containers.insert(0, container)
          if len(conns) <= len(nodes) / 2:
              self.app.logger.error(
                  'Object PUT returning 503, %s/%s required connections, '
@@ -765,15 +1023,123 @@
              statuses.append(503)
              reasons.append('')
              bodies.append('')
--        resp = self.best_response(req, statuses, reasons, bodies, 'Object PUT',
++        return self.best_response(req, statuses, reasons, bodies, 'Object PUT',
                                    etag=etag)
--        if 'x-copy-from' in req.headers:
--            resp.headers['X-Copied-From'] = req.headers['x-copy-from']
--            for k, v in req.headers.items():
--                if k.lower().startswith('x-object-meta-'):
--                    resp.headers[k] = v
--        resp.last_modified = float(req.headers['X-Timestamp'])
--        return resp
++
++    def PUT_segmented_object(self, req, data_source, partition, nodes,
++                             container_partition, containers):
++        """
++        Performs a PUT for a segmented object (one with a content-length >
++        SEGMENT_SIZE).
++
++        :param req: The webob.Request to process.
++        :param data_source: An iterator providing the data to store.
++        :param partition: The object ring partition the object falls on.
++        :param nodes: The object ring nodes the object falls on.
++        :param container_partition: The container ring partition the container
++                                    for the object falls on.
++        :param containers: The container ring nodes the container for the
++                           object falls on.
++        :returns: webob.Response object.
++        """
++        req.bytes_transferred = 0
++        leftover_chunk = [None]
++        etag = md5()
++        def segment_iter():
++            amount_given = 0
++            while amount_given < SEGMENT_SIZE:
++                if leftover_chunk[0]:
++                    chunk = leftover_chunk[0]
++                    leftover_chunk[0] = None
++                else:
++                    with ChunkReadTimeout(self.app.client_timeout):
++                        chunk = data_source.next()
++                        req.bytes_transferred += len(chunk)
++                        etag.update(chunk)
++                if amount_given + len(chunk) > SEGMENT_SIZE:
++                    yield chunk[:SEGMENT_SIZE - amount_given]
++                    leftover_chunk[0] = chunk[SEGMENT_SIZE - amount_given:]
++                    amount_given = SEGMENT_SIZE
++                else:
++                    yield chunk
++                    amount_given += len(chunk)
++        def segment_iter_iter():
++            while True:
++                if not leftover_chunk[0]:
++                    with ChunkReadTimeout(self.app.client_timeout):
++                        leftover_chunk[0] = data_source.next()
++                        req.bytes_transferred += len(leftover_chunk[0])
++                        etag.update(leftover_chunk[0])
++                yield segment_iter()
++        segment_number = 0
++        chunked = req.headers.get('transfer-encoding') == 'chunked'
++        if not chunked:
++            amount_left = req.content_length
++        headers = {'X-Timestamp': req.headers['X-Timestamp'],
++                   'Content-Type': req.headers['content-type'],
++                   'X-Object-Type': 'segment'}
++        for segment_source in segment_iter_iter():
++            if chunked:
++                headers['Transfer-Encoding'] = 'chunked'
++                if segment_number == 0:
++                    headers['X-Object-Segment-If-Length'] = SEGMENT_SIZE
++            elif amount_left > SEGMENT_SIZE:
++                headers['Content-Length'] = SEGMENT_SIZE
++            else:
++                headers['Content-Length'] = amount_left
++            headers['X-Object-Segment'] = segment_number
++            segment_req = Request.blank(req.path_info,
++                environ={'REQUEST_METHOD': 'PUT'}, headers=headers)
++            if 'X-Object-Segment-If-Length' in headers:
++                del headers['X-Object-Segment-If-Length']
++            if segment_number:
++                ring_object_name = '%s/%s/%s' % (self.object_name,
++                    req.headers['x-timestamp'], segment_number)
++            else:
++                ring_object_name = self.object_name
++            segment_partition, segment_nodes = self.app.object_ring.get_nodes(
++                self.account_name, self.container_name, ring_object_name)
++            resp = self.PUT_whole_object(segment_req, segment_source,
++                                         segment_partition, segment_nodes)
++            if resp.status_int // 100 == 4:
++                return resp
++            elif resp.status_int // 100 != 2:
++                return HTTPServiceUnavailable(request=req,
++                    body='Unable to complete very large file operation.')
++            if segment_number == 0 and req.bytes_transferred < SEGMENT_SIZE:
++                return HTTPCreated(request=req, etag=etag.hexdigest())
++            if not chunked:
++                amount_left -= SEGMENT_SIZE
++            segment_number += 1
++        etag = etag.hexdigest()
++        if 'etag' in req.headers and req.headers['etag'].lower() != etag:
++            return HTTPUnprocessableEntity(request=req)
++        manifest = {'x-timestamp': req.headers['x-timestamp'],
++                    'content-length': req.bytes_transferred,
++                    'content-type': req.headers['content-type'],
++                    'x-segment-size': SEGMENT_SIZE,
++                    'etag': etag}
++        if 'content-encoding' in req.headers:
++            manifest['content-encoding'] = req.headers['content-encoding']
++        manifest = json.dumps(manifest)
++        headers = {'X-Timestamp': req.headers['X-Timestamp'],
++                   'Content-Type': req.headers['content-type'],
++                   'Content-Length': len(manifest),
++                   'X-Object-Type': 'manifest',
++                   'X-Object-Length': req.bytes_transferred}
++        headers.update(i for i in req.headers.iteritems()
++            if i[0].lower().startswith('x-object-meta-') and len(i[0]) > 14)
++        manifest_req = Request.blank(req.path_info,
++            environ={'REQUEST_METHOD': 'PUT'}, body=manifest, headers=headers)
++        manifest_source = iter(lambda:
++            manifest_req.body_file.read(self.app.client_chunk_size), '')
++        resp = self.PUT_whole_object(manifest_req, manifest_source, partition,
++            nodes, container_partition=container_partition,
++            containers=containers)
++        if resp.status_int // 100 != 2:
++            return HTTPServiceUnavailable(request=req,
++                body='Unable to complete very large file operation.')
++        return HTTPCreated(request=req, etag=etag)
      @public
      @delay_denial
 === modified file 'test/unit/__init__.py'
 --- test/unit/__init__.py	2010-07-29 20:06:01 +0000
 +++ test/unit/__init__.py	2010-10-19 18:43:49 +0000
@@ -9,6 +9,8 @@
      crlfs = 0
      while crlfs < 2:
          c = fd.read(1)
++        if not len(c):
++            raise Exception('Never read 2crlfs; got %s' % repr(rv))
          rv = rv + c
          if c == '\r' and lc != '\n':
              crlfs = 0
 === added file 'test/unit/obj/test_hashes.py'
 --- test/unit/obj/test_hashes.py	1970-01-01 00:00:00 +0000
 +++ test/unit/obj/test_hashes.py	2010-10-19 18:43:49 +0000
@@ -0,0 +1,28 @@
++# Copyright (c) 2010 OpenStack, LLC.
++#
++# Licensed under the Apache License, Version 2.0 (the "License");
++# you may not use this file except in compliance with the License.
++# You may obtain a copy of the License at
++#
++#    http://www.apache.org/licenses/LICENSE-2.0
++#
++# Unless required by applicable law or agreed to in writing, software
++# distributed under the License is distributed on an "AS IS" BASIS,
++# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
++# implied.
++# See the License for the specific language governing permissions and
++# limitations under the License.
++
++# TODO: Tests
++
++import unittest
++from swift.obj import hashes
++
++class TestHashes(unittest.TestCase):
++
++    def test_placeholder(self):
++        pass
++
++
++if __name__ == '__main__':
++    unittest.main()
 === added file 'test/unit/obj/test_hashes.py.moved'
 --- test/unit/obj/test_hashes.py.moved	1970-01-01 00:00:00 +0000
 +++ test/unit/obj/test_hashes.py.moved	2010-10-19 18:43:49 +0000
@@ -0,0 +1,28 @@
++# Copyright (c) 2010 OpenStack, LLC.
++#
++# Licensed under the Apache License, Version 2.0 (the "License");
++# you may not use this file except in compliance with the License.
++# You may obtain a copy of the License at
++#
++#    http://www.apache.org/licenses/LICENSE-2.0
++#
++# Unless required by applicable law or agreed to in writing, software
++# distributed under the License is distributed on an "AS IS" BASIS,
++# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
++# implied.
++# See the License for the specific language governing permissions and
++# limitations under the License.
++
++# TODO: Tests
++
++import unittest
++from swift.obj import hashes
++
++class TestHashes(unittest.TestCase):
++
++    def test_placeholder(self):
++        pass
++
++
++if __name__ == '__main__':
++    unittest.main()