My Way of Life: parameters in pace

Dynamic Programming scores and their default values:
--------------------------------------------------------

(1) match 2 (match)
(2) mismatch -5 (mismatch)
(3) gap continuation -1 (gap)
(4) gap opening -6 (hgap)
(5) Score for alignment
with base 'N' -5 (AlignmentWithN)

Load balancing and Work related parameters:
-----------------------------------------------

(6) Fixed window size for bucketing 11 (window)
If the data size <=10,000 ESTs then a window size of 10 is recommended. Constraint: window <= MinLen and window<=11 Clustering parameters (Quality control): ---------------------------------------- (7) (7.1) MinLen (default 30) Signifies the minimum length cutoff of a maximal match between any pair of sequences to be considered for alignment computation. Nessary but not sufficient condition for a pair of sequences to cause merging of their two clusters. (7.1) MaxStringsInABucket (default 100000) Ignores exact matches of length "window" which occur in >= MaxStringsInABucket
number of distinct input sequences.

(8)
Flag for Gene Homology and Transcript Homology (TranscriptsTogether):
1 means Gene Homology
0 means Transcript Homology
(PS: No other values are valid.)

(9)
Clustering criteria for accepting dynamic programming alignment results:
-------------------------------------------------------------------------

(9.1) Parameters computed:

(a)
EndToEndScoreRatioThreshold (default 15%):
|Global alignment Obtained Score - Global alignment Ideal Score|
---------------------------------------------------------------- X 100
Global alignment Ideal Score

(9.2)
EndtoEndAlignLenThreshold (default 100 bp)

Global alignment length = length of aligning region
(w.r.t the minimum of the number of
bases participating from both sequences in the alignment)

(9.3)
MaxScoreRatioThreshold (default 5%)

Local alignment Score Ratio
|Local alignment Obtained Score - Local alignment Ideal Score|
= ---------------------------------------------------------------- X 100
Local alignment Ideal Score

(9.4)
TranscriptCoverageThreshold (default 40%)

Local alignment length Coverage
= (Local alignment length / minimum of the lengths of the two sequences) X 100

Condition for merging two clusters based on evidence from an aligned pair of sequences:

Condition#1:= ( (Global alignment score ratio <=25%) AND (Global alignment length>=100) )
Condition#2:= ( (Local alignment score ratio <=5%) AND (Local alignment length Coverage >= 40%) )

Gene Homology:
A pair of ESTs will be put in one cluster if:
either (Condition#1 OR Condition#2 OR both) is/are satisfied.

Transcript Homology:
A pair of ESTs will be put in one cluster if:
(Condition#1) is satisfied

(10)
ClonePairsFile None
Clone Mates/Pairs Information:
------------------------------

Clone Mates or Clone Pairs information can be specified in a file and can be used to improve quality of clustering (esp., cases where ESTs do not show complete over their corresponding transcript). Give the name of the file containing Clone Mate/Pair information agains$

(11)
Reporting features:
------------------

(11.1)
ReportSplicedCandidates (default 0)
If 1:
Reports all pairs of sequences generated that pass the local alignment test (Condition#2)
but FAIL the global alignment test (Condition#1). This can be used as a
set of potential pairs of sequences that flag an alternative splicing or unspliced
intron event.

(11.2)
ReportMaximalPairs (default 0)
If 1:
Reports all pairs of sequences that were generated by PaCE. The pairs are the ones
which have at least one maximal common substring of length >= MinLen.
Warning: The output is quadratic (#pairs) and so use it only for analysis purposes.

(11.3)
ReportMaximalSubstrings (default 0)
If 1:
Reports all maximal common substrings (length >= MinLen) generated by PaCE.
Warning: The output is linear but for large input size can be quite high. So use it only for analysis purposes.

(11.4)
ReportAcceptedPairs (default 0)
If 1:
Reports all pairs of sequences that led to merging of clusters. The number of such pairs
is linear in the number of sequences.

(11.5)
OutputLargeMerges (default 0)

argeClusterThreshold (default 500)

If 1:
Reports a pair of sequences leading to a cluster merge, if the individual
sizes of the two clusters of these two sequences (at the time of merge)
are both >= LargeClusterThreshold. The reports go into a file called:
large_merges.*

(11.6)
ReportGeneratedPairs (default 0)

If 1:
Report all promising pairs generated. This may take up a lot of disk space
because the number of such pairs in the worst case can be quadratic
in the input number of sequences.

(11.7)
ReportPairsCountUnit (default 1000)

The basic unit of display in the final report on the number of promising pairs
generated, aligned, and accepted.

(11.8)
DumpClustersMidway (default 0)

If 1:
Output intermediate sets of clusters during the course of execution.
Handy, if the run is expected to be a long one.

(12)
OutputFolder (default .)
All PaCE output files will be written into this folder.

(13)
Miscellaneous:
--------------

(13.1)
MPI_Block_Sends (default 1)

If 1:
Uses MPI_Ssend to communicate from slaves to master during the
alignment phase. Is expected to be about 3-4 times slower than
using MPI_BlockSends as 0 (i.e., just MPI_Isend and MPI_Wait).
Feature incorportated to ensure no message is loss in case of
large number of processors. Recommended to turn the flag on
if >= 512 processors are used.

(13.2)
Keep_Mbuf_Full (default 0)

Deprecated.

My Way of Life

Wednesday, October 27, 2010

parameters in pace

No comments:

Post a Comment