trunk : Code : staden-io-lib-trunk

Get this branch:: bzr branch lp:staden-io-lib-trunk

Branch merges

This import branch has no branches proposed for merge into it.

No branches dependent on this one.

Related bugs

Link a bug report

Related blueprints

Branch information

Owner:: James Bonfield

Project:: staden-io-lib-trunk

Status:: Development

Import details

Import Status: Failed

This branch is an import of the Subversion branch from https://svn.code.sf.net/p/staden/code/io_lib/trunk.

The import has been suspended because it failed 5 or more times in succession.

Last successful import was on 2018-02-15.

Import started on 2018-02-19 on pear and finished on 2018-02-19 taking 15 seconds — see the log

Import started on 2018-02-17 on russkaya and finished on 2018-02-17 taking 15 seconds — see the log

Import started on 2018-02-16 on pear and finished on 2018-02-16 taking 15 seconds — see the log

Recent revisions

590. By jkbonfield on 2016-12-12

Fixed a rare renormalisation bug in the rANS codec.

The symbol frequencies need to sum to TOTFREQ (4096 currently) and are
rounded up/down accordingly. The combination of integer rounding
means the renormalised frequences don't always total 4096 exactly, so
the remainder is added-to / subtracted-from the most frequent symbol.
In one particular data set this remainder was larger than the most
frequent symbol, causing it to become negative.

We now just do another round of renormalisation with slightly lower
products until we get it right. It's not the fastest solution, but a
very rare event.

589. By jkbonfield on 2016-11-28

Fix BAM bin value for placed but unmapped reads. (Reported by German
Tischler.)

This corresponds to a SAM spec change from 8th April 2014 where
unmapped data was explicitly stated to have length 1. Io_lib's
implementation assumed unmapped data to be zero length.

588. By jkbonfield on 2016-11-01

Fixed a CRAM encoder crash when no @SQ lines are present but the
sequences have reference names in use.

587. By jkbonfield on 2016-11-01

Removed a CRAM encoding crash.

When an @SQ line is present but no SN: entry exists, the name field
was NULL but dereferenced.

586. By jkbonfield on 2016-11-01

Fixed a compression inefficiency when switching to unsorted mode.

We switch from sorted to unsorted mode only after a couple tiny
containers have been created. (Ideally we'd detect upfront.)

We also compute compression metrics on the first few containers and
then keep those stats for the next 100 or so. The combination of
these meant we computed compression metrics based on data that was not
of comparable size to the rest of the container. In one test set this
meant Z_RLE was optimal on the 1-read slices but then applied to
10,000 read slices when Z_FILTERED is preferable (due to lots of
duplicate entries).

585. By jkbonfield on 2016-09-28

Removed an uninitialised memory access, although I'm a little unsure
why this is even there! (Bad memory.)

It's in code that is executed when the cram codec fails to initialise,
so I believe this change is a no-op on valid files.

584. By jkbonfield on 2016-09-27

Merged in the cram_filter branch.

This tool should still be considered as experimental.

583. By jkbonfield on 2016-09-14

Improved multi-threaded CRAM decoding.

When given a thread pool, we now migrate the cram_to_bam calls from
within the cram_get_bam_seq function (called in the main thread) to
the cram_decode_slice function (called inside a worker thread).

This significantly improves parallelisation opportunities.

Better still would be to change the API so that the bam object
returned has an associated free function pointer to deallocate. Eg

get_seq(fd, &s);
// do stuff
s->free(s);

Instead of just the "free(s)" we have now. Currently we have to
memcpy our cached bam structures to a new malloced location instead of
returning the address of the precomputed bam structs. Making this
change would remove another 40% or so CPU from the main thread of cram
decoding (not done, but see cram_get_bam_seq for comments).

582. By jkbonfield on 2016-09-13

Moved the block CRC32 checking from within block I/O to the block
uncompression code.

This has two outcomes:

1) We don't incurr integrity checking unless we use the data (both
good and bad).

2) When multi-threading, the CRC computation is spread between cores.

This means CRAM reading is around 10% faster real-time when using -t16.

581. By jkbonfield on 2016-08-30

Add io_lib/bgzip.h to pkginclude_HEADERS

(Thanks to German Tischler)

»

Branch metadata

Branch format:: Branch format 7

Repository format:: Bazaar repository format 2a (needs bzr 1.16 or later)

staden-io-lib-trunk

lp:staden-io-lib-trunk

Branch merges

Related source package recipes

Related snap packages