elevator=cfq (default) cause starvation

Bug #381300 reported by Andrea Bravetti
120
This bug affects 17 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Unassigned

Bug Description

Binary package hint: linux-image

I had many performance problems using a usb pendrive:
writing large files cause a system slowdown (starvation, may be)
sometimes unacceptable.

I performed several tests and in the end the culprit was found to
be the CFQ scheduler which is used by default since 2.6.18...

CFQ may be good on large system (but I'm not sure), but on a desktop
with one (or two) cpu and only one disk and some very slow (but usefull)
devices it is unacceptable.

If you are unsure try something like "dd if=/dev/zero of=/dev/sdx" where
/dev/sdx is a 4gb pendrive, then try opening a web page... panic!

No, not "kernel panic" but "user panic" is guaranteed!

Any user that will try ubuntu and need to use a pendrive will
think that it work like a floppy disk on win95...

For a long time I believed it was a driver problem, but now
I'm sure it depends on the scheduler, in fact, I resolved it just
adding "elevator=noop".

If you care the desktop experience, please, consider the possibility
of moving to elevator=noop, at least for the "desktop" version
if not for the "server"...

I'm having this problem with any ubuntu version with kernel >= 2.6.18
and actually I'm on jaunty with karmic's 2.6.30...

A nice lecture:
http://www.redhat.com/magazine/008jun05/features/schedulers/

description: updated
description: updated
Revision history for this message
Andrea Bravetti (andreabravetti) wrote :

I tested it in a lot of different scenarios and now I think it is
not related to "writing a lot of data on slow devices" but
just "moving a lot of data"...

Simply I think CFQ does not work well and must be replaced.

summary: - elevator=cfq (default) cause starvation with very slow devices
+ elevator=cfq (default) cause starvation
Revision history for this message
Andrea Bravetti (andreabravetti) wrote :

I'm not the only one that prefer noop:

http://lkml.org/lkml/2008/8/26/116

Nor the only one that has problems:

http://lkml.org/lkml/2006/8/14/198

description: updated
description: updated
description: updated
Revision history for this message
Andrea Bravetti (andreabravetti) wrote :

If you can't change the default please consider the
possibility of doing (automatically) something like this
with any pendrive or ssd disk:

echo noop > /sys/block/sdX/queue/scheduler

No one here has this problem?

Revision history for this message
Andy Whitcroft (apw) wrote :

Have you tried any of the other schedulers other than noop and cfq? It would be good to have results for all of them as noop is unlikely to be the correct choice for rotating media.

Revision history for this message
Andy Whitcroft (apw) wrote :

This is a real kernel bug moving to the linux package.

affects: linux-meta (Ubuntu) → linux (Ubuntu)
Revision history for this message
Andrea Bravetti (andreabravetti) wrote :

> Have you tried any of the other schedulers other than noop....

elevator=as work quite well in my case, don't know about elevator=deadline but I can test it...

> ...noop is unlikely to be the correct choice for rotating media.

May be it's not the fastest choice, but it work and doesn't cause system slowdown.
By the way, when SSD become popular it will be the default.

Revision history for this message
Andrea Bravetti (andreabravetti) wrote :

Andy, yesterday when I was at home after your message
I tried to test every scheduler also recording a lot of stat
on memory and cpu usage...

Quite a waste of time, but it should be usefull...

Well, not only I got no problem with CFQ, but it also performed
better than other in almost every case, with the exception
that the elapsed for writing a 4gb pendrive taken 6%
more time than all others...

Now: what is the difference between my home desktop, the
notebook I use at work and some other pc where I had that
problem? Don't know... Some of that pc are quite different
from my home one, but my own notebook is slightly similar:
same amount of ram, same cpu, same filesystem used (ext3)
seme os and the same pendrive used...

I'm going to test it again on my notebook.

Revision history for this message
Andrea Bravetti (andreabravetti) wrote :

On my notebook (fujitsu-siemens amilopro v3505):

writing large file on a pendrive:

------------------ noop ------------------

elapsed: 19' 31"

avg-cpu: %user %nice %system %iowait %steal %idle
           4.77 0.00 14.39 49.16 0.00 35.82

no slowdown, all good...

------------------ anticipatory ------------------

elapsed: 19' 24"

avg-cpu: %user %nice %system %iowait %steal %idle
           4.04 0.00 12.87 53.54 0.00 31.16

minor slodown moving windows or writing to the internal disk.

------------------ deadline ------------------

elapsed: 19' 44"

avg-cpu: %user %nice %system %iowait %steal %idle
           6.76 0.00 10.38 46.15 0.00 37.59

minor slodown moving windows or writing to the internal disk.

------------------ cfq ------------------

Elapsed: 27' 35"

avg-cpu: %user %nice %system %iowait %steal %idle
           0.59 0.00 10.12 88.76 0.00 1.29

everything is slow, and some time locked...

-------------------------------------------

What I can say is that at home, yesterday, I never
seen the iowait over 40%.

The fact that it does not happen on every pc explain why
cfq is the default...

Revision history for this message
Andrea Bravetti (andreabravetti) wrote :

This is another test: this time I wrote a 6GB file on top af an ext3 fs on top of a 8GB pendrive...

This 8GB pendrive is slightly faster than the other one...

------------------ noop ------------------
elapsed: 10' 38", no slowdown, all good...
avg-cpu: %user %nice %system %iowait %steal %idle
          13.75 0.01 9.10 51.23 0.00 26.33
------------------ anticipatory ------------------
elapsed: 09' 46", no slowdown, all good...
avg-cpu: %user %nice %system %iowait %steal %idle
          19.22 0.01 13.04 43.32 0.00 24.88
------------------ deadline ------------------
elapsed: 09' 45", no slowdown, all good...
avg-cpu: %user %nice %system %iowait %steal %idle
          18.90 0.02 13.20 46.88 0.00 21.40
------------------ cfq ------------------
elapsed: 09' 39", no slowdown, all good...
avg-cpu: %user %nice %system %iowait %steal %idle
          18.56 0.01 12.58 43.52 0.00 25.65
-------------------------------------------

Now, I'm a bit disappointed...
Some time it (cfq) work very well, some time it's a disaster!

Revision history for this message
Andrea Bravetti (andreabravetti) wrote :

Again: this time I simultaneously wrote two 3GB files on top af an ext3 fs on top of a 8GB pendrive, ...

------------------ noop ------------------
elapsed: 11' 52", high iowait but no system slowdown, it's quite good.
avg-cpu: %user %nice %system %iowait %steal %idle
                8.01 0.00 6.61 83.31 0.00 2.65
------------------ anticipatory ------------------
elapsed: 11' 56", high iowait but no system slowdown, it's quite good.
avg-cpu: %user %nice %system %iowait %steal %idle
                6.51 0.00 6.01 82.96 0.00 5.16
------------------ deadline ------------------
elapsed: 11' 58", high iowait but no system slowdown, it's quite good.
avg-cpu: %user %nice %system %iowait %steal %idle
                6.87 0.02 7.58 79.64 0.00 6.54
------------------ cfq ------------------
elapsed: 19' 50", visible system slowdown with many app, but it's still usable.
avg-cpu: %user %nice %system %iowait %steal %idle
                5.44 0.00 6.39 82.24 0.00 6.30
-------------------------------------------

This time in iostat I can't see significative difference between cases,
but the system become slow and unresponsive with cfq...

Now, I'm much more disappointed...

Revision history for this message
Andrea Bravetti (andreabravetti) wrote :

Looking around I found this bug:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/343371

He talk about reading big files, so I make some test and I found out that I have this problem too, even with noop:
This time (just reading a big file from a sata disk, not using a pendrive) the system become really unusable!

On the home pc, where I never seen problem writing large file using CFQ, I don't have this problem too.

so, may be it's not a scheduler problem, but something else, and changing the scheduler just alleviate the problem...

Is it possible?

How can I find out if it's a latency problem as described in 343371?

Revision history for this message
Andrea Bravetti (andreabravetti) wrote :

What described in 343371 was another problem and I had that one too...

I'm not too lucky...

I resolved it adding my hd (WDC WD3200BEVT-22ZCT0) in the NCQ blacklist
in drivers/ata/libata-core.c and now I can read very big files from the disk
without any problem...

However, the visible slowdown using cfq on some devices remain, with
high IO wait and high elapsed (from the start to the end of the write).

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
Andrea Bravetti (andreabravetti) wrote :
Download full text (6.7 KiB)

Yesterday I was copying (really trying to copy)
78 GB of data from a disk to another one.

Source:
Caviar WD2000BB-22GUA0, a 200GB PATA disk.
USB2.0 to IDE adapter: EBL35U2
ext2 partition...

Dest:
my internal WDC WD3200BEVT-22ZCT0.
ext3 partition...

The system become slow as the copy start,
as usually since I was not using my patched
kernel but the "stock" 2.6.30-9-generic.

But near the end something happened:

[ 810.659068] BUG: unable to handle kernel paging request at 018ca000
[ 810.659079] IP: [<c02f07f5>] __percpu_counter_add+0x25/0xb0
[ 810.659095] *pde = 00000000
[ 810.659101] Oops: 0000 [#1] SMP
[ 810.659108] last sysfs file: /sys/devices/pci0000:00/0000:00:1c.1/0000:04:00.0/rfkill/rfkill0/state
[ 810.659114] Modules linked in: usb_storage binfmt_misc bridge stp bnep vmnet ppdev parport_pc vmblock vmci vmmon lp parport snd_hda_codec_si3054 snd_hda_codec_realtek joydev snd_hda_intel snd_hda_codec arc4 ecb snd_pcm_oss snd_mixer_oss mmc_block snd_pcm iwl3945 iwlcore snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer mac80211 pcmcia snd_seq_device iTCO_wdt iTCO_vendor_support psmouse snd soundcore snd_page_alloc sdhci_pci sdhci btusb cfg80211 serio_raw pcspkr yenta_socket rsrc_nonstatic pcmcia_core led_class usbhid sky2 raid10 raid456 raid6_pq async_xor async_memcpy async_tx xor raid1 raid0 multipath linear i915 drm i2c_algo_bit video output intel_agp agpgart fbcon tileblit font bitblit softcursor
[ 810.659248]
[ 810.659255] Pid: 6126, comm: umount Not tainted (2.6.30-9-generic #10-Ubuntu) AMILO Pro Edition V3505
[ 810.659261] EIP: 0060:[<c02f07f5>] EFLAGS: 00010006 CPU: 0
[ 810.659268] EIP is at __percpu_counter_add+0x25/0xb0
[ 810.659273] EAX: 00000000 EBX: f6974b7c ECX: 00000000 EDX: 00000001
[ 810.659278] ESI: 00000000 EDI: 018ca000 EBP: f3c0bea0 ESP: f3c0be80
[ 810.659283] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 810.659289] Process umount (pid: 6126, ti=f3c0a000 task=f52a98e0 task.ti=f3c0a000)
[ 810.659294] Stack:
[ 810.659297] f673eff0 c07cbee0 f3c0be90 f6e6893c f6974b7c f6e6893c 00000000 f6e6893c
[ 810.659312] f3c0beb0 c01a218a 00000010 c1716200 f3c0bec8 c01ee2e0 f6e6894c c1716200
[ 810.659327] f4010000 f53b8c14 f3c0bed4 c01ee3a4 f53b8c00 f3c0bef4 c026f84d c0150770
[ 810.659343] Call Trace:
[ 810.659348] [<c01a218a>] ? account_page_dirtied+0x4a/0x80
[ 810.659358] [<c01ee2e0>] ? __set_page_dirty+0x40/0xb0
[ 810.659368] [<c01ee3a4>] ? mark_buffer_dirty+0x54/0x90
[ 810.659377] [<c026f84d>] ? journal_update_superblock+0x6d/0xd0
[ 810.659387] [<c0150770>] ? autoremove_wake_function+0x0/0x50
[ 810.659397] [<c026fcfb>] ? journal_destroy+0xeb/0x110
[ 810.659406] [<c022fa24>] ? ext3_put_super+0x24/0x230
[ 810.659414] [<c01dff86>] ? invalidate_inodes+0xf6/0x120
[ 810.659422] [<c0546b4d>] ? lock_kernel+0x2d/0x50
[ 810.659433] [<c01ceb8a>] ? generic_shutdown_super+0x6a/0x110
[ 810.659441] [<c01cec55>] ? kill_block_super+0x25/0x40
[ 810.659448] [<c02093f0>] ? vfs_quota_off+0x0/0x20
[ 810.659456] [<c01cf1cf>] ? deactivate_super+0x5f/0x80
[ 810.659464] [<c01e373c>] ? mntput_no_expire+0xec/0x130
[ 810.659472] [<c01e3ce4>] ? sys_umount+0x4...

Read more...

Revision history for this message
Andrea Bravetti (andreabravetti) wrote :

I asked to remove comment 13 since it
is completely wrong, but it's still there.

As I saw "BUG: unable to handle kernel paging..." I start
thinking "I can't understand what's happened, so I'll send
it and someone will read it"...

Now, It was caused by umount and I can't understand why,
it was not an intentional umount, I am sure. May be an hardware
failure that disconnected a device, besides not the device I
was reading from.

So, comment 13 has nothing to do with this bug
and is here just to create confusion.

If you can please remove it.

Revision history for this message
Jim Lieb (lieb) wrote :

@Andrea, your report is closely related to some other reports including #343371 and #131094. See my comment and request at https://bugs.launchpad.net/ubuntu/+source/linux/+bug/131094/comments/235. In your case, there are (potentially) two issues. First, there is the latency under I/O load and, second, there may be still an issue with cfq with the pen drive. We would like you to do the testing described in the comment link first and if there is still a pen drive problem, we can address that next.

The I/O schedulers have been focused primarily on HDDs. The typical drive has a shared filesystem on it and, being mechanical, has rotational and seek latencies that must be compensated for. The CFQ scheduler does a pretty good job of fair and sustained throughput in that environment. On the other hand, SSDs and pen drives, actually any flash based device, have a different set of constraints. There is no seek/rotational latency at all but write performance can be pretty bad, especially for small, random writes. This is because the controller in the drive must read a whole block (usually 64k+), copy the small write into it, erase the block, and then, finally, write the block back, most often in a different location to "wear level" the device. SSDs are still evolving but current parts do not perform well with short writes. They are getting better but they are not there yet. This is a work in progress. Pen drives do not get even this much attention given that they are a single user/task, offline storage device.

Using NOOP scheduling does seem to work better than CFQ for these devices but from your comment, the dd is only doing 512 byte "records". Try using a larger block size (> 1MB). That seems to work well enough to make scheduling a non-issue.

I have changed this to incomplete pending your test results. I may mark it as a duplicate based on those results.

Thanks

Changed in linux (Ubuntu):
assignee: nobody → Jim Lieb (lieb)
status: Triaged → Incomplete
Revision history for this message
Andrea Bravetti (andreabravetti) wrote :

Jim,
thanks for your reply!

I alredy use dd up to bs=1M, but never more than 1M...

I'm going to test it with 2.6.31 as soon as possible, but I can't do it
now because I need vmware at work (but this is another issue...).

Revision history for this message
Tim Gardner (timg-tpi) wrote :

It seems the CFQ scheduler has some issues even with SSDs, so I'm gonna experiment with changing the default to DEADLINE in order to facilitate the boot process (where speed is king). You can always change your I/O scheduler setting by writing to /sys/block/*/queue/scheduler.

Changed in linux (Ubuntu):
assignee: Jim Lieb (lieb) → Tim Gardner (timg-tpi)
importance: Medium → High
status: Incomplete → In Progress
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 2.6.31-10.35

---------------
linux (2.6.31-10.35) karmic; urgency=low

  [ Amit Kucheria ]

  * Disable CONFIG_UEVENT_HELPER_PATH

  [ Andy Whitcroft ]

  * [Config] Enable CONFIG_USB_GADGET_DUMMY_HCD
  * remove the tlsup driver
  * remove lmpcm logitech driver support

  [ Bryan Wu ]

  * Add 3 missing files to prerm remove file list
    - LP: #345623, #415832

  [ Chris Wilson ]

  * [Upstream] drm/i915: Check that the relocation points to within the
    target
    - LP: #429241

  [ Luke Yelavich ]

  * [Config] Set CONFIG_EXT4_FS=y on ports architectures

  [ Manoj Iyer ]

  * SAUCE: Added quirk to recognize GE0301 3G modem as an interface.
    - LP: #348861

  [ Tim Gardner ]

  * Revert "[Upstream] ACPI: Add Thinkpad W500, W700, & W700ds to OSI(Linux) white-list"
  * Revert "[Upstream] ACPI: Add Thinkpad R400 & Thinkpad R500 to OSI(Linux) white-list"
  * Revert "[Upstream] ACPI: Add Thinkpad X300 & Thinkpad X301 to OSI(Linux) white-list"
  * Revert "[Upstream] ACPI: Add Thinkpad X200, X200s, X200t to OSI(Linux) white-list"
  * Revert "[Upstream] ACPI: Add Thinkpad T400 & Thinkpad T500 to OSI(Linux) white-list"
    Upstream suggests that this is not the right approach.

  * [Config] Set default I/O scheduler to DEADLINE
    CFQ seems to have some load related problems which are often exacerbated by sreadahead.
    - LP: #381300

  [ <email address hidden> ]

  * SAUCE: ipw2200: Enable LED by default
    - LP: #21367

  [ Upstream Kernel Changes ]

  * ALSA: hda - Add support for new AMD HD audio devices
    - LP: #430564

 -- Andy Whitcroft <email address hidden> Wed, 16 Sep 2009 15:37:49 +0100

Changed in linux (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
Rocko (rockorequin) wrote :

After seeing a discussion at http://lkml.org/lkml/2009/9/19/288, I tried the patch to disable NEW_FAIR_SLEEPERS on 2.6.31-1 [ie change kernel/sched_features.h to include SCHED_FEAT(NEW_FAIR_SLEEPERS, 0) instead of SCHED_FEAT(NEW_FAIR_SLEEPERS, 1)].

It made a huge difference desktop responsiveness with heavy I/O using CFQ. I even loaded up the system with VMs so it was using all 4GB of RAM and 800MB or so of swap and the desktop remained usable - previously under those conditions, the system would invariably become unresponsive, even freezing entirely for ten minutes or more at a time.

The solution isn't perfect, though - it does reduce my test game frame rates by between 10% and 40%.

Revision history for this message
Johan Kiviniemi (ion) wrote :

Indeed, there’s bug #436342 about NEW_FAIR_SLEEPERS.

Revision history for this message
JLR (artirj) wrote :

Why is cfq default again?

Revision history for this message
Daniel Hahler (blueyed) wrote :

This happened in:
  linux (2.6.31-12.39) karmic; urgency=low
and references this bug:
  [ Tim Gardner ]
  [...]
  * [Config] Set default I/O scheduler back to CFQ for desktop flavours
    - LP: #381300

Tim, what's the reason?

Revision history for this message
ktp (kari-petersen) wrote :

My system runs on karmic.
uname -a:
Linux kplaptop 2.6.31-17-generic #54-Ubuntu SMP Thu Dec 10 17:01:44 UTC 2009 x86_64 GNU/Linux

Copying 155GB data to external usb drive with NTFS and cfq scheduler starts with around 30MB/s and drops to 9MB/s after transfering 10 to 15 GB and slows down to 2MB/s if I wait a little bit longer. Using Ext3 on the same disk and using cfq scheduler the datarate is around 22MB/s for the first 90 to 100GB and then drops down to 2MB/s. While copying the mount process was on top of cpu usage with around 4% and my desktop applications reacted a little bit slow. Using noop scheduler I get constant transfer rates of 21MB/s and no desktop starvation.

ktp (kari-petersen)
Changed in linux (Ubuntu):
status: Fix Released → Confirmed
Tim Gardner (timg-tpi)
Changed in linux (Ubuntu):
assignee: Tim Gardner (timg-tpi) → nobody
Revision history for this message
DLHDavidLH (dlhdavidlh-yahoo) wrote :

i think that the default I/O scheduler
 should be change from CFQ to deadline

-----------------------------------------------------

i have test CFQ and deadline I/O scheduler

on a DELL Latitude 120L with 1GB of RAM and 40GB hard drive

with EXT4 filesystem and GPT with both I/O schedulers

and Found that deadline I/O scheduler is the fastest

--------------------------------------------------------------------

also test both I/O schedulers on two other computers (total of 3 computers)

the deadline I/O scheduler preformed the best/fastest

-----------------------------------------------------------------

also went deadline I/O scheduler is put in Grub ( etc/default/grub OR grub.cfg )
the boot speed is increased and desktop speed

OR went deadline I/O scheduler is put in rc.local ( etc/rc.local )
the desktop speed is increased only

--------------------------------------------------------------------

http://en.wikipedia.org/wiki/Deadline_scheduler
http://en.wikipedia.org/wiki/GUID_Partition_Table
http://en.wikipedia.org/wiki/Ext4
http://itezer.com/blog/ubuntu-linux/125-Four_Tweaks_for_Using_Ubuntu_with_SSD.html

http://en.wikipedia.org/wiki/CFQ

Revision history for this message
DLHDavidLH (dlhdavidlh-yahoo) wrote :

deadline I/O scheduler is good with both

 SSD (solid-state-drives) and HDD (Hard disk drive)

-- from my research (SSD) and testing (HDD)

Revision history for this message
DLHDavidLH (dlhdavidlh-yahoo) wrote :

- - idea - -

ubuntu could make simple applet to change i/o scheduler.

And later on users depending on their workloads can choose whatever they want.

------------------------------

* CFQ I/O scheduler

* Noop I/O scheduler

* Deadline I/O scheduler

* Anticipatory I/O scheduler

------------------------------

- - idea - -

Revision history for this message
DLHDavidLH (dlhdavidlh-yahoo) wrote :

applet / app / GUI to change I/O scheduler

Revision history for this message
DLHDavidLH (dlhdavidlh-yahoo) wrote :

Noop I/O scheduler is good with both

 SSD (solid-state-drives) and HDD (Hard disk drive)

-- from my research (SSD) and testing (HDD)

Revision history for this message
DLHDavidLH (dlhdavidlh-yahoo) wrote :

default I/O scheduler should be change from CFQ to deadline OR Noop

https://bugs.launchpad.net/ubuntu/+bug/631871

------------------------------------------------------------------

Make a GUI to change the Default I/O scheduler

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/632562

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

resetting this bug to Fix Released. Please file a new bug if you feel you are experiencing a similar issue.

Thanks!

~JFo

Changed in linux (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.