~thopiekar/linux/+git/linux-stable:linux-3.17.y

Last commit made on 2015-01-08
Get this branch:
git clone -b linux-3.17.y https://git.launchpad.net/~thopiekar/linux/+git/linux-stable

Branch merges

Branch information

Name:
linux-3.17.y
Repository:
lp:~thopiekar/linux/+git/linux-stable

Recent commits

bc15d46... by Greg Kroah-Hartman <email address hidden>

Linux 3.17.8

eece4e2... by Filipe Manana <email address hidden>

Btrfs: fix fs corruption on transaction abort if device supports discard

commit 678886bdc6378c1cbd5072da2c5a3035000214e3 upstream.

When we abort a transaction we iterate over all the ranges marked as dirty
in fs_info->freed_extents[0] and fs_info->freed_extents[1], clear them
from those trees, add them back (unpin) to the free space caches and, if
the fs was mounted with "-o discard", perform a discard on those regions.
Also, after adding the regions to the free space caches, a fitrim ioctl call
can see those ranges in a block group's free space cache and perform a discard
on the ranges, so the same issue can happen without "-o discard" as well.

This causes corruption, affecting one or multiple btree nodes (in the worst
case leaving the fs unmountable) because some of those ranges (the ones in
the fs_info->pinned_extents tree) correspond to btree nodes/leafs that are
referred by the last committed super block - breaking the rule that anything
that was committed by a transaction is untouched until the next transaction
commits successfully.

I ran into this while running in a loop (for several hours) the fstest that
I recently submitted:

  [PATCH] fstests: add btrfs test to stress chunk allocation/removal and fstrim

The corruption always happened when a transaction aborted and then fsck complained
like this:

   _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent
   *** fsck.btrfs output ***
   Check tree block failed, want=94945280, have=0
   Check tree block failed, want=94945280, have=0
   Check tree block failed, want=94945280, have=0
   Check tree block failed, want=94945280, have=0
   Check tree block failed, want=94945280, have=0
   read block failed check_tree_block
   Couldn't open file system

In this case 94945280 corresponded to the root of a tree.
Using frace what I observed was the following sequence of steps happened:

   1) transaction N started, fs_info->pinned_extents pointed to
      fs_info->freed_extents[0];

   2) node/eb 94945280 is created;

   3) eb is persisted to disk;

   4) transaction N commit starts, fs_info->pinned_extents now points to
      fs_info->freed_extents[1], and transaction N completes;

   5) transaction N + 1 starts;

   6) eb is COWed, and btrfs_free_tree_block() called for this eb;

   7) eb range (94945280 to 94945280 + 16Kb) is added to
      fs_info->pinned_extents (fs_info->freed_extents[1]);

   8) Something goes wrong in transaction N + 1, like hitting ENOSPC
      for example, and the transaction is aborted, turning the fs into
      readonly mode. The stack trace I got for example:

      [112065.253935] [<ffffffff8140c7b6>] dump_stack+0x4d/0x66
      [112065.254271] [<ffffffff81042984>] warn_slowpath_common+0x7f/0x98
      [112065.254567] [<ffffffffa0325990>] ? __btrfs_abort_transaction+0x50/0x10b [btrfs]
      [112065.261674] [<ffffffff810429e5>] warn_slowpath_fmt+0x48/0x50
      [112065.261922] [<ffffffffa032949e>] ? btrfs_free_path+0x26/0x29 [btrfs]
      [112065.262211] [<ffffffffa0325990>] __btrfs_abort_transaction+0x50/0x10b [btrfs]
      [112065.262545] [<ffffffffa036b1d6>] btrfs_remove_chunk+0x537/0x58b [btrfs]
      [112065.262771] [<ffffffffa033840f>] btrfs_delete_unused_bgs+0x1de/0x21b [btrfs]
      [112065.263105] [<ffffffffa0343106>] cleaner_kthread+0x100/0x12f [btrfs]
      (...)
      [112065.264493] ---[ end trace dd7903a975a31a08 ]---
      [112065.264673] BTRFS: error (device sdc) in btrfs_remove_chunk:2625: errno=-28 No space left
      [112065.264997] BTRFS info (device sdc): forced readonly

   9) The clear kthread sees that the BTRFS_FS_STATE_ERROR bit is set in
      fs_info->fs_state and calls btrfs_cleanup_transaction(), which in
      turn calls btrfs_destroy_pinned_extent();

   10) Then btrfs_destroy_pinned_extent() iterates over all the ranges
       marked as dirty in fs_info->freed_extents[], and for each one
       it calls discard, if the fs was mounted with "-o discard", and
       adds the range to the free space cache of the respective block
       group;

   11) btrfs_trim_block_group(), invoked from the fitrim ioctl code path,
       sees the free space entries and performs a discard;

   12) After an umount and mount (or fsck), our eb's location on disk was full
       of zeroes, and it should have been untouched, because it was marked as
       dirty in the fs_info->pinned_extents tree, and therefore used by the
       trees that the last committed superblock points to.

Fix this by not performing a discard and not adding the ranges to the free space
caches - it's useless from this point since the fs is now in readonly mode and
we won't write free space caches to disk anymore (otherwise we would leak space)
nor any new superblock. By not adding the ranges to the free space caches, it
prevents other code paths from allocating that space and write to it as well,
therefore being safer and simpler.

This isn't a new problem, as it's been present since 2011 (git commit
acce952b0263825da32cf10489413dec78053347).

Signed-off-by: Filipe Manana <email address hidden>
Signed-off-by: Chris Mason <email address hidden>
Signed-off-by: Greg Kroah-Hartman <email address hidden>

f9841b9... by Liu Bo <email address hidden>

Btrfs: fix loop writing of async reclaim

commit 25ce459c1af138f95a3fd318461193397ebb825b upstream.

One of my tests shows that when we really don't have space to reclaim via
flush_space and also run out of space, this async reclaim work loops on adding
itself into the workqueue and keeps writing something to disk according to
iostat's results, and these writes mainly comes from commit_transaction which
writes super_block. This's unacceptable as it can be bad to disks, especially
memeory storages.

This adds a check to avoid the above situation.

Signed-off-by: Liu Bo <email address hidden>
Signed-off-by: Chris Mason <email address hidden>
Cc: "Alexander E. Patrakov" <email address hidden>
Signed-off-by: Greg Kroah-Hartman <email address hidden>

94c2606... by Josef Bacik <email address hidden>

Btrfs: make sure logged extents complete in the current transaction V3

commit 50d9aa99bd35c77200e0e3dd7a72274f8304701f upstream.

Liu Bo pointed out that my previous fix would lose the generation update in the
scenario I described. It is actually much worse than that, we could lose the
entire extent if we lose power right after the transaction commits. Consider
the following

write extent 0-4k
log extent in log tree
commit transaction
 < power fail happens here
ordered extent completes

We would lose the 0-4k extent because it hasn't updated the actual fs tree, and
the transaction commit will reset the log so it isn't replayed. If we lose
power before the transaction commit we are save, otherwise we are not.

Fix this by keeping track of all extents we logged in this transaction. Then
when we go to commit the transaction make sure we wait for all of those ordered
extents to complete before proceeding. This will make sure that if we lose
power after the transaction commit we still have our data. This also fixes the
problem of the improperly updated extent generation. Thanks,

Signed-off-by: Josef Bacik <email address hidden>
Signed-off-by: Chris Mason <email address hidden>
Signed-off-by: Greg Kroah-Hartman <email address hidden>

6b20b24... by Josef Bacik <email address hidden>

Btrfs: do not move em to modified list when unpinning

commit a28046956c71985046474283fa3bcd256915fb72 upstream.

We use the modified list to keep track of which extents have been modified so we
know which ones are candidates for logging at fsync() time. Newly modified
extents are added to the list at modification time, around the same time the
ordered extent is created. We do this so that we don't have to wait for ordered
extents to complete before we know what we need to log. The problem is when
something like this happens

log extent 0-4k on inode 1
copy csum for 0-4k from ordered extent into log
sync log
commit transaction
log some other extent on inode 1
ordered extent for 0-4k completes and adds itself onto modified list again
log changed extents
see ordered extent for 0-4k has already been logged
 at this point we assume the csum has been copied
sync log
crash

On replay we will see the extent 0-4k in the log, drop the original 0-4k extent
which is the same one that we are replaying which also drops the csum, and then
we won't find the csum in the log for that bytenr. This of course causes us to
have errors about not having csums for certain ranges of our inode. So remove
the modified list manipulation in unpin_extent_cache, any modified extents
should have been added well before now, and we don't want them re-logged. This
fixes my test that I could reliably reproduce this problem with. Thanks,

Signed-off-by: Josef Bacik <email address hidden>
Signed-off-by: Chris Mason <email address hidden>
Signed-off-by: Greg Kroah-Hartman <email address hidden>

c44bd16... by David Sterba <email address hidden>

btrfs: fix wrong accounting of raid1 data profile in statfs

commit 0d95c1bec906dd1ad951c9c001e798ca52baeb0f upstream.

The sizes that are obtained from space infos are in raw units and have
to be adjusted according to the raid factor. This was missing for
f_bavail and df reported doubled size for raid1.

Reported-by: Martin Steigerwald <email address hidden>
Fixes: ba7b6e62f420 ("btrfs: adjust statfs calculations according to raid profiles")
Signed-off-by: David Sterba <email address hidden>
Signed-off-by: Chris Mason <email address hidden>
Signed-off-by: Greg Kroah-Hartman <email address hidden>

f9c9bec... by Josef Bacik <email address hidden>

Btrfs: make sure we wait on logged extents when fsycning two subvols

commit 9dba8cf128ef98257ca719722280c9634e7e9dc7 upstream.

If we have two fsync()'s race on different subvols one will do all of its work
to get into the log_tree, wait on it's outstanding IO, and then allow the
log_tree to finish it's commit. The problem is we were just free'ing that
subvols logged extents instead of waiting on them, so whoever lost the race
wouldn't really have their data on disk. Fix this by waiting properly instead
of freeing the logged extents. Thanks,

Signed-off-by: Josef Bacik <email address hidden>
Signed-off-by: Chris Mason <email address hidden>
Signed-off-by: Greg Kroah-Hartman <email address hidden>

6f07df9... by Michael Halcrow <email address hidden>

eCryptfs: Remove buggy and unnecessary write in file name decode routine

commit 942080643bce061c3dd9d5718d3b745dcb39a8bc upstream.

Dmitry Chernenkov used KASAN to discover that eCryptfs writes past the
end of the allocated buffer during encrypted filename decoding. This
fix corrects the issue by getting rid of the unnecessary 0 write when
the current bit offset is 2.

Signed-off-by: Michael Halcrow <email address hidden>
Reported-by: Dmitry Chernenkov <email address hidden>
Suggested-by: Kees Cook <email address hidden>
Signed-off-by: Tyler Hicks <email address hidden>
Signed-off-by: Greg Kroah-Hartman <email address hidden>

f02dd21... by Tyler Hicks

eCryptfs: Force RO mount when encrypted view is enabled

commit 332b122d39c9cbff8b799007a825d94b2e7c12f2 upstream.

The ecryptfs_encrypted_view mount option greatly changes the
functionality of an eCryptfs mount. Instead of encrypting and decrypting
lower files, it provides a unified view of the encrypted files in the
lower filesystem. The presence of the ecryptfs_encrypted_view mount
option is intended to force a read-only mount and modifying files is not
supported when the feature is in use. See the following commit for more
information:

  e77a56d [PATCH] eCryptfs: Encrypted passthrough

This patch forces the mount to be read-only when the
ecryptfs_encrypted_view mount option is specified by setting the
MS_RDONLY flag on the superblock. Additionally, this patch removes some
broken logic in ecryptfs_open() that attempted to prevent modifications
of files when the encrypted view feature was in use. The check in
ecryptfs_open() was not sufficient to prevent file modifications using
system calls that do not operate on a file descriptor.

Signed-off-by: Tyler Hicks <email address hidden>
Reported-by: Priya Bansal <email address hidden>
Signed-off-by: Greg Kroah-Hartman <email address hidden>

9bf2101... by Jan Kara <email address hidden>

udf: Check component length before reading it

commit e237ec37ec154564f8690c5bd1795339955eeef9 upstream.

Check that length specified in a component of a symlink fits in the
input buffer we are reading. Also properly ignore component length for
component types that do not use it. Otherwise we read memory after end
of buffer for corrupted udf image.

Reported-by: Carl Henrik Lunde <email address hidden>
Signed-off-by: Jan Kara <email address hidden>
Signed-off-by: Greg Kroah-Hartman <email address hidden>