EVA

lp:~bioinformatics/eva/trunk

Created by Xavier and last modified
Get this branch:
bzr branch lp:~bioinformatics/eva/trunk
Members of Bioinformatics can upload to this branch. Log in for directions.

Related bugs

Related blueprints

Branch information

Owner:
Bioinformatics
Project:
EVA
Status:
Development

Recent revisions

119. By Xavier

Changed the function to run Count reads (from snpEff) so that stderr is sent to another log file, to prevent oversizing the standard log file (near 1 Gb of data instead of just a few hundred Kb).

118. By Xavier

Since filtering by quality was still failing for qualities with decimal points, another extra step has been added to successfully filter out variants with less quality than the threshold set at the parameters for the run:

fun.variant.filter.fix.qual

117. By Xavier

* Indexing the bam file is done twice nowadays: just after the sam2bam (before the mark duplicated by picard) and again after the duplicated removal and all editing possible to the bam files.
* 'Count reads'-related functions moved to the end of the file, to go along with the new moment when Count reads is run in the whole workflow

116. By Xavier

* Added a few comments for bam.metrics.txt and help for the whole eva pipeline,
* Added rmdup fixed with TMP_DIR param to allow long runs to complete successfully by means of using another temp dir with much more space available.

115. By Xavier

Improvements and refactoring mainly to the functions related to count reads

eva_analysis_functions.R:
------------------------------------
* tasks related to count reads splitted in two functions: one for snpEff count reads, and the other for postprocessing of data with R
* cr.txt read with package mmap.csv():
  # mmap.csv() is meant to be the analogue of read.csv in R, with the primary difference being
  # that data is read, by column, into memory-mapped structs on disk.
** the whole function adapted to work with objects from mmap package.
* show & log some information about memory consumed along the process in this count reads R function.

Adapted the other R files to allow the revious changes to work.

114. By Xavier

eva_analysis_functions.R:
-----------------------------------
* fix for count reads in some cases when filename to process was absolute (with full path as prefix).
* tried adding the param -i file.bed to count reads to process only the regions of the target genes; but disabled it temporarily because of issue with it (potential bug, reported snpEff author already).
* added a (commented out) alternative method to aggregate results from count reads, and report position when at each of the 3 steps that once stopped the process due to (presumably) lack of RAM, when running several processes at the server at the same time.
* functions to fixcols of annovar output called only after the csv files have been grepped for the target genes (so results are much smaller, kg instead of Mb, and processing time went down from many hours to less than a minute). In fact, the researcher is interested anyway in looking in detail to the results for the target genes only, anyway, so easy workaround.

Similar changes to the other files to support these changes (mainly to annotation.s.fixcols).

113. By Xavier

fun.variant.annotation.summary.call.fixcolumns made optional to prevent issues in real-world data with big files taking too long to complete. Not too trivial to convert in apply funtions since there are many particularities in the data frame that require fine-granined treatment.

112. By Xavier

New option added to process variant calling splitting the work by chromosomes. p_bychr
# Split the processing by chromosomes (variant calling only, as of April 23, 2013).
# Watch out that some QUAL might be recorded as lower than the value indicated at the bam file
# for some misterous reason. See comments inside function var.calling

111. By Xavier

Important changes:

FIX: remove duplicates works, and variant callings back working again. The culprit was the "fixmates" function in samtools, which in facts seems unnecessary (it was doing nothing meaningful), and it breaks the variant calling process creating zero variants in the vcf files.

MOD: nowadays, each step in which there is processing of ,bam files, after the bam creation, creates (or updates) the hardlink to sam.sorted.edited.bam fro the source edited bam file at each step.

110. By Xavier

typo: pìcard -> picard

Branch metadata

Branch format:
Branch format 7
Repository format:
Bazaar repository format 2a (needs bzr 1.16 or later)
This branch contains Public information 
Everyone can see this information.

Subscribers