lp:~bioinformatics/eva/trunk
- Get this branch:
- bzr branch lp:~bioinformatics/eva/trunk
Branch information
Recent revisions
- 119. By Xavier
-
Changed the function to run Count reads (from snpEff) so that stderr is sent to another log file, to prevent oversizing the standard log file (near 1 Gb of data instead of just a few hundred Kb).
- 118. By Xavier
-
Since filtering by quality was still failing for qualities with decimal points, another extra step has been added to successfully filter out variants with less quality than the threshold set at the parameters for the run:
fun.variant.
filter. fix.qual - 117. By Xavier
-
* Indexing the bam file is done twice nowadays: just after the sam2bam (before the mark duplicated by picard) and again after the duplicated removal and all editing possible to the bam files.
* 'Count reads'-related functions moved to the end of the file, to go along with the new moment when Count reads is run in the whole workflow - 116. By Xavier
-
* Added a few comments for bam.metrics.txt and help for the whole eva pipeline,
* Added rmdup fixed with TMP_DIR param to allow long runs to complete successfully by means of using another temp dir with much more space available. - 115. By Xavier
-
Improvements and refactoring mainly to the functions related to count reads
eva_analysis_
functions. R:
-------------- ------- ------- ------- -
* tasks related to count reads splitted in two functions: one for snpEff count reads, and the other for postprocessing of data with R
* cr.txt read with package mmap.csv():
# mmap.csv() is meant to be the analogue of read.csv in R, with the primary difference being
# that data is read, by column, into memory-mapped structs on disk.
** the whole function adapted to work with objects from mmap package.
* show & log some information about memory consumed along the process in this count reads R function.Adapted the other R files to allow the revious changes to work.
- 114. By Xavier
-
eva_analysis_
functions. R:
-------------- ------- ------- -------
* fix for count reads in some cases when filename to process was absolute (with full path as prefix).
* tried adding the param -i file.bed to count reads to process only the regions of the target genes; but disabled it temporarily because of issue with it (potential bug, reported snpEff author already).
* added a (commented out) alternative method to aggregate results from count reads, and report position when at each of the 3 steps that once stopped the process due to (presumably) lack of RAM, when running several processes at the server at the same time.
* functions to fixcols of annovar output called only after the csv files have been grepped for the target genes (so results are much smaller, kg instead of Mb, and processing time went down from many hours to less than a minute). In fact, the researcher is interested anyway in looking in detail to the results for the target genes only, anyway, so easy workaround.Similar changes to the other files to support these changes (mainly to annotation.
s.fixcols) . - 113. By Xavier
-
fun.variant.
annotation. summary. call.fixcolumns made optional to prevent issues in real-world data with big files taking too long to complete. Not too trivial to convert in apply funtions since there are many particularities in the data frame that require fine-granined treatment. - 112. By Xavier
-
New option added to process variant calling splitting the work by chromosomes. p_bychr
# Split the processing by chromosomes (variant calling only, as of April 23, 2013).
# Watch out that some QUAL might be recorded as lower than the value indicated at the bam file
# for some misterous reason. See comments inside function var.calling - 111. By Xavier
-
Important changes:
FIX: remove duplicates works, and variant callings back working again. The culprit was the "fixmates" function in samtools, which in facts seems unnecessary (it was doing nothing meaningful), and it breaks the variant calling process creating zero variants in the vcf files.
MOD: nowadays, each step in which there is processing of ,bam files, after the bam creation, creates (or updates) the hardlink to sam.sorted.
edited. bam fro the source edited bam file at each step.
Branch metadata
- Branch format:
- Branch format 7
- Repository format:
- Bazaar repository format 2a (needs bzr 1.16 or later)