My Way of Life: March 2011

ESPRIT :: Installation :: Cluster
Obtain the source code

On *nix, Steps to install ESPRIT
$ unzip ESPRIT_distribution.zip
$ cd ESPRIT_distribution

Read, esprit_user_guide.pdf and README.txt
$ cd source
$ vim Makefile
Choose the platform by uncomment/comment

To make the package
$ make esprit_cc

Its always better to
$ make clean
$ make esprit_cc

Precaution:
- make sure that the fasta file has header in one and the sequence in one line
- if the sequence is in multiple lines, convert the file to contain just one line of sequence

Pseudocode is followed here using shell scripting and clusterjobmanager
Copy the sequence file here, If you have more than one than group them.
$ cp /path/to/sequence.fas .

To run preproc
$ /path/to/ESPRIT_distribution/source/preproc -f sequence.fas
160794 Seqs Match Primer
160794 Seqs Valid Len

31072 Seqs After Process
1.63 secs in Purging Strings.

flag:
-f this prevents the program from trimming.

Files created:
sequence_Clean.fas
sequence_Clean.frq

To check
$ awk -F' ' '{ s+=$2 } END { print s }' sequence_Clean.frq
160794
$ grep -c '>' sequence.fas
160794
- Make sure that these numbers are same.

To run kmerdist_par
$ cat submit_kmer_jobs.sh
for i in $(seq 1 10)
do
for j in $(seq $i 10)
do
job="/path/to/ESPRIT_distribution/source/kmerdist_par sequence_Clean.fas 10 $i $j\n ";
RANDOM=10
num=$RANDOM
echo -e $job > kmer_job_$num.clusterJob
clusterJobSubmission < kmer_job_$num.clusterJob done done - where clusterJobSubmission is your cluster job submission manager - the extension .clusterJob can be replaced with the extension required - variable job can include other details if required. $ cat jobs.clusterJob ## .. sh submit_kmer_jobs.sh $ jobsubmit < jobs. clusterJob - this will submit the job. Output: sequence_Clean_[*]_[*].dist - make sure that numbers are 1_[1-10], 2_[2-10], 3_[3-10], 4_[4-10], 5_[5-10], 6_[6-10], 7_[7-10], 8_[8-10], 9_[9-10], 10_10 Merge all the .dist files $ cat sequence_Clean_*.dist >> kmer.dist

Split the kmer files into 100 files
$ /path/to/ESPRIT_distribution/source/splitdist -s 100 kmer.dist
Counting Total Records....
71249223 Records Found, Splitting...

Output:
kmer.dist_[0-99]

Submit parallel jobs for needle_dist
$ cat submit_needle_job.sh
for i in $(seq 0 99)
do
job="/path/to/ESPRIT_distribution/source/needledist sequence_Clean.fas kmer.dist\_$i needle.dist\_$i\n ";
RANDOM=10
num=$RANDOM
echo -e $job > needle_job_$num.clusterJob
clusterJobSubmission < needle_job_$num.clusterJob done Output needle.dist_[0-99] Group all the needle.dist files $ cat needle.dist_* >> sequence.ndist

To run hcluster
$ /path/to/ESPRIT_distribution/source/hcluster -t 15000 sequence.ndist sequence_Clean.frq
- flag -t is used to increase the size of the linked table, default is 10000

Output
sequence.ndist_sort
sequence.OTU
sequence.Outliers
sequence.Rarefaction
sequence.Cluster
sequence.Cluster_List
sequence.ACE
sequence.CHAO1

My Way of Life

Friday, March 25, 2011

mummer

Saturday, March 19, 2011

esprit