Sunday, December 19, 2010

useful links

http://ycombinator.com/
http://www.reddit.com/
http://www.catb.org/~esr/faqs/hacker-howto.html
http://slashdot.org/
http://www.crunchgear.com/
http://www.wired.com/
http://www.engadget.com/
http://lifehacker.com/
http://techcrunch.com/
http://download.oracle.com/javase/6/docs/api/index.html
http://www.javabeginner.com/

all time fav techrepublic

Saturday, December 18, 2010

Wednesday, December 1, 2010

running batch jobs on talon cluster ----> RepeatMasker

#!/bin/bash
DIR=/users/prm0080/RepeatMasker/RepeatMasker/HMP_Bacteria_ContigCombine2
J=0
K=1
for i in $DIR/*
do
files[$J]=$i
let J=$J+$K
done
#BSUB -J "jobArray[1-560]"
#BSUB -n 1
#BSUB -M 2
#BSUB -e mothur.%J.err output error file; %J=job number
# $LSB_JOBINDEX
#Run the application
/users/prm0080/RepeatMasker/RepeatMasker/RepeatMasker ${files[$LSB_JOBINDEX]} -dir $DIR/suresh/

Monday, November 22, 2010

java stuff...

java blocking queue...
producer and consumer..

java threads...java concurrent
java.util.concurrent.atomic

java cubbyhole

java collections
java pattern matching
java map
hash map

java hotspot

scala programming

Wednesday, November 17, 2010

Monday, November 15, 2010

nlp project

Polarity Detection of movie reviews
Introduction :-
Today, very large amounts of information are available in on-line documents. As part of the effort to better organize this information for users, researchers have been actively investigating the problem of automatic text categorization. In Polarity detection in movie reviews, The basic problem is to classify a movie review as positive or negative. Some training data set is provided and we have to develop the best technique to classify them into their respective classes. This would be very useful in knowing the defects in a product. If a customer had decided to purchase a product and is willing to know only the defects of a product rather than concentrating on parts which emphasize it. This classification process would be very
helpful. Similarly we have many other scenarios like ad generation which would depend on polarity detection.

Proposed approach :-
Data Set :- Data set used for this project would be 1400 movie reviews classified for
their polarity.
Preprocessing :- The given data may have some irrelevant and noisy things. Which can be eliminated by applying some techniques. These application of techniques would increase the accuracy.
Part of Speech Tagging: Tagging Each words in the data set with its part of speech using Some standards like penntree bank or some other methods. These different combinations of part of speech tags, those more related to describing a sentiment for our further operation. as from general observation it is noted that Adjectives are the good features for polartiy detection.
Stop words: Also known as noise words, are words which are of not much significance in context of categorization and sentiment analysis. Eg.: articles, conjunctions, etc. We need to remove all the stop words from our data set and produced a new data set.

Stemming: Stemming is the process of changing a words to its root or base form, by removing inflexions ending from words in English.( porter stemmer algorithm) There are also few other preprocessing techniques like quote-weighting, html-tag filter and non-English review filter. which can be applied depending on the data present. As from previous experimental evaluation, it was discovered that the frequency and presence of a particular feature for polarity detection dint had much effect. So, I am not yet decided but thinking to consider only the presence of a particular feature.

I am thinking to use the following lexical features to decide the polarity
Combination of a word and its part-of-speech tag
Bigram words

Unigram words

considering the words before adjective as collocation... because we have some word with negation before like agree and not agree....

part-of-speech tag
I am still working on using some other lexical features. I would inform them as the project progress.

Naive bayes :-
One approach to text classification is to assign to agiven document
d the class c∗ = arg maxc P (c | d).We derive the Naive Bayes (NB) classifier by first
observing that by Bayes’ rule,
P (c | d) = P (c)P (d | c) / P (d)
where P (d) plays no role in selecting c∗ . To estimate the term P (d | c), Naive Bayes
decomposes it by assuming the fi ’s are conditionally independent.

pl project...

server sockets in java ---> ssl sockets....
file handling
threads
runtime java

cloud apps

live mesh
drop box

Saturday, November 13, 2010

learn programming by having challenges,

ruby quiz

projet euler

pragmatic programmer --> book

duby-talk mailing list

perlmonks.

hadoop file system..

hadoop is platform for performing distributed computing...


Hadoop is currently aimed at “big data” problems (say, processing Census Bureau data). The nice thing about it is that a Hadoop cluster scales out easily, and there are a number of providers who will let you add and remove instances from a Hadoop cluster as your needs change to save you money. It is the kind of system that lends itself perfectly to cloud computing, although you could definitely have a Hadoop cluster in-house.

While the focus is on number crunching, I think that Hadoop can easily be used in any situation where a massively parallelized architecture is needed.

Wednesday, November 10, 2010

error in creating session..

org.hibernate.HibernateException: Could not parse configuration: /hibernate.cfg.xml
at org.hibernate.cfg.Configuration.doConfigure(Configuration.java:1494)
at org.hibernate.cfg.Configuration.configure(Configuration.java:1428)
at org.hibernate.cfg.Configuration.configure(Configuration.java:1414)
at com.mycompany.app.db.HibernateSessionFactory.(HibernateSessionFactory.java:30)
at com.mycompany.entity.Programs.getProgramsWaitingForSubmission2(Programs.java:33)
at com.mycompany.app.job.JobManager.runJobSubmissions(JobManager.java:744)
at com.mycompany.app.clients.JobClient2.runJobs(JobClient2.java:102)
at com.mycompany.app.clients.JobClient2.access$000(JobClient2.java:18)
at com.mycompany.app.clients.JobClient2$4.run(JobClient2.java:87)
Caused by: org.dom4j.DocumentException: Error on line 2 of document : The processing instruction target matching "[xX][mM][lL]" is not allowed. Nested exception: The processing instruction target matching "[xX][mM][lL]" is not allowed.
at org.dom4j.io.SAXReader.read(SAXReader.java:482)
at org.hibernate.cfg.Configuration.doConfigure(Configuration.java:1484)
... 8 more
%%%% Error Creating SessionFactory %%%%
org.hibernate.HibernateException: /var/www/site31/swarmapp/src/main/resources/hibernate.cfg.xml not found
at org.hibernate.util.ConfigHelper.getResourceAsStream(ConfigHelper.java:147)
at org.hibernate.cfg.Configuration.getConfigurationInputStream(Configuration.java:1405)
at org.hibernate.cfg.Configuration.configure(Configuration.java:1427)
at com.mycompany.app.db.HibernateSessionFactory.rebuildSessionFactory(HibernateSessionFactory.java:94)
at com.mycompany.app.db.HibernateSessionFactory.getSession(HibernateSessionFactory.java:72)
at com.mycompany.entity.Programs.getProgramsWaitingForSubmission2(Programs.java:33)
at com.mycompany.app.job.JobManager.runJobSubmissions(JobManager.java:744)
at com.mycompany.app.clients.JobClient2.runJobs(JobClient2.java:102)
at com.mycompany.app.clients.JobClient2.access$000(JobClient2.java:18)
at com.mycompany.app.clients.JobClient2$4.run(JobClient2.java:87)

no path for sending files into teragrid..

condorjobmultisites.java --->swarmapp/src/main/java/com/myco/app/job

there is no path for either input or output files....

left out with one dependency..

installed all the java dependencies..

left out with the swarm jar file

3) Swarm:Swarm:jar:0.9

Try downloading the file manually from the project website.

Then, install it using the command:
mvn install:install-file -DgroupId=Swarm -DartifactId=Swarm -Dversion=0.9 -Dpackaging=jar -Dfile=/path/to/file

Alternatively, if you host your own repository you can deploy the file there:
mvn deploy:deploy-file -DgroupId=Swarm -DartifactId=Swarm -Dversion=0.9 -Dpackaging=jar -Dfile=/path/to/file -Durl=[url] -DrepositoryId=[id]

Path to dependency:
1) swarmapp:swarmapp:jar:0.0.1-SNAPSHOT
2) Swarm:Swarm:jar:0.9

Tuesday, November 9, 2010

got struck with installing biojava..

i installed it using the Readme file for the swarmapp but actually it need to be installed using maven...

atlast the following command worked out

mvn install:install-file -DgroupId=biojava -DartifactId=biojava -Dversion=1.6.1 -Dpackaging=jar -Dfile=/path/to/file

Monday, November 8, 2010

workit out !!!!!!

Missing:
----------
1) biojava:biojava:jar:1.6.1

Try downloading the file manually from the project website.

Then, install it using the command:
mvn install:install-file -DgroupId=biojava -DartifactId=biojava -Dversion=1.6.1 -Dpackaging=jar -Dfile=/path/to/file

Alternatively, if you host your own repository you can deploy the file there:
mvn deploy:deploy-file -DgroupId=biojava -DartifactId=biojava -Dversion=1.6.1 -Dpackaging=jar -Dfile=/path/to/file -Durl=[url] -DrepositoryId=[id]

Path to dependency:
1) swarmapp:swarmapp:jar:0.0.1-SNAPSHOT
2) biojava:biojava:jar:1.6.1

2) javax.sql:jdbc-stdext:jar:2.0

Try downloading the file manually from:
http://java.sun.com/products/jdbc/download.html

Then, install it using the command:
mvn install:install-file -DgroupId=javax.sql -DartifactId=jdbc-stdext -Dversion=2.0 -Dpackaging=jar -Dfile=/path/to/file

Alternatively, if you host your own repository you can deploy the file there:
mvn deploy:deploy-file -DgroupId=javax.sql -DartifactId=jdbc-stdext -Dversion=2.0 -Dpackaging=jar -Dfile=/path/to/file -Durl=[url] -DrepositoryId=[id]

Path to dependency:
1) swarmapp:swarmapp:jar:0.0.1-SNAPSHOT
2) javax.sql:jdbc-stdext:jar:2.0

3) Swarm:Swarm:jar:0.9

Try downloading the file manually from the project website.

Then, install it using the command:
mvn install:install-file -DgroupId=Swarm -DartifactId=Swarm -Dversion=0.9 -Dpackaging=jar -Dfile=/path/to/file

Alternatively, if you host your own repository you can deploy the file there:
mvn deploy:deploy-file -DgroupId=Swarm -DartifactId=Swarm -Dversion=0.9 -Dpackaging=jar -Dfile=/path/to/file -Durl=[url] -DrepositoryId=[id]

Path to dependency:
1) swarmapp:swarmapp:jar:0.0.1-SNAPSHOT
2) Swarm:Swarm:jar:0.9

4) javax.security:jaas:jar:1.0.01

Try downloading the file manually from:
http://java.sun.com/products/jaas/index-10.html

Then, install it using the command:
mvn install:install-file -DgroupId=javax.security -DartifactId=jaas -Dversion=1.0.01 -Dpackaging=jar -Dfile=/path/to/file

Alternatively, if you host your own repository you can deploy the file there:
mvn deploy:deploy-file -DgroupId=javax.security -DartifactId=jaas -Dversion=1.0.01 -Dpackaging=jar -Dfile=/path/to/file -Durl=[url] -DrepositoryId=[id]

Path to dependency:
1) swarmapp:swarmapp:jar:0.0.1-SNAPSHOT
2) javax.security:jaas:jar:1.0.01

----------
4 required artifacts are missing.

for artifact:
swarmapp:swarmapp:jar:0.0.1-SNAPSHOT

from the specified remote repositories:
central (http://repo1.maven.org/maven2)



[INFO] ------------------------------------------------------------------------
[INFO] For more information, run Maven with the -e switch
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 5 seconds
[INFO] Finished at: Mon Nov 08 10:01:07 CST 2010
[INFO] Final Memory: 11M/26M
[INFO] ------------------------------------------------------------------------

Thursday, November 4, 2010

i had forgot to install the swarmapp.. which i got through source code.....
this is done by using maven2
>>mvn install in swarmapp..

Monday, November 1, 2010

Friday, October 29, 2010

running pace new method

Step 1. generate datafile with n sequences

head -n*2 original.fsa > small_sample.fsa

Step 2. run preprocess

./preprocessPaCE 5k.fsa 1

Step 3. run GItitle.pl

./GItitle.pl 5k.fsa.PaCE > 5k.GI.fsa

Step 4. run PaCE

#!/bin/csh
#@ job_type = parallel
#@ class = LONG
#@ account_no = NONE
#@ node = 2
#@ tasks_per_node = 4
#@ checkpoint = no
#@ wall_clock_limit = 05:00:00
#@ error = $(Executable).$(Cluster).err
#@ output = $(Executable).$(Cluster).out
#@ environment = COPY_ALL
#@ queue

llmachinelist
mpirun -v -np 8 -machinefile /tmp/machinelist.$LOADL_STEP_ID ./PaCE_v9
/N/gpfs/cap3/leesangm/HumanMRNA/split1mil/5k.GI.PaCE 5000
./Phase.cfg



llsubmit submitTest.sh


Step 5. split PaCE output

./PaCEclusterFasta.pl test.fsa estClust.5000.3.PaCE


#####
Notes


1. Pace: How to run

mpirun -v -np 8 ./PaCE /N/gpfs/cap3/leesangm/HumanMRNA/example5K/5k.GI.fsa 5000 ./PaCE.cfg

Wednesday, October 27, 2010

installing Pace

Run the "make" command from the shell.

(For system specific MPICH and C compilers modify the MakeFile appropriately.)
After successful build, copy the executables into the folder:

cp PreprocessPaCE.pl PaCE-pipeline/PaCE-clustering
cp PaCE PaCE-pipeline/PaCE-clustering

Checklist for a successful build:
- Executables created: PaCE
- No fatal errors or "serious" warnings flagged by the compiler (minor wa$
- The mk uses O3 optimization level. If this high level is not
supported by the C compiler being used, change it to a lower level
like O2 or O1 or O0 as appropriate.

parameters in pace

Dynamic Programming scores and their default values:
--------------------------------------------------------

(1) match 2 (match)
(2) mismatch -5 (mismatch)
(3) gap continuation -1 (gap)
(4) gap opening -6 (hgap)
(5) Score for alignment
with base 'N' -5 (AlignmentWithN)

Load balancing and Work related parameters:
-----------------------------------------------

(6) Fixed window size for bucketing 11 (window)
If the data size <=10,000 ESTs then a window size of 10 is recommended. Constraint: window <= MinLen and window<=11 Clustering parameters (Quality control): ---------------------------------------- (7) (7.1) MinLen (default 30) Signifies the minimum length cutoff of a maximal match between any pair of sequences to be considered for alignment computation. Nessary but not sufficient condition for a pair of sequences to cause merging of their two clusters. (7.1) MaxStringsInABucket (default 100000) Ignores exact matches of length "window" which occur in >= MaxStringsInABucket
number of distinct input sequences.

(8)
Flag for Gene Homology and Transcript Homology (TranscriptsTogether):
1 means Gene Homology
0 means Transcript Homology
(PS: No other values are valid.)

(9)
Clustering criteria for accepting dynamic programming alignment results:
-------------------------------------------------------------------------

(9.1) Parameters computed:

(a)
EndToEndScoreRatioThreshold (default 15%):
|Global alignment Obtained Score - Global alignment Ideal Score|
---------------------------------------------------------------- X 100
Global alignment Ideal Score

(9.2)
EndtoEndAlignLenThreshold (default 100 bp)

Global alignment length = length of aligning region
(w.r.t the minimum of the number of
bases participating from both sequences in the alignment)


(9.3)
MaxScoreRatioThreshold (default 5%)

Local alignment Score Ratio
|Local alignment Obtained Score - Local alignment Ideal Score|
= ---------------------------------------------------------------- X 100
Local alignment Ideal Score

(9.4)
TranscriptCoverageThreshold (default 40%)

Local alignment length Coverage
= (Local alignment length / minimum of the lengths of the two sequences) X 100


Condition for merging two clusters based on evidence from an aligned pair of sequences:

Condition#1:= ( (Global alignment score ratio <=25%) AND (Global alignment length>=100) )
Condition#2:= ( (Local alignment score ratio <=5%) AND (Local alignment length Coverage >= 40%) )

Gene Homology:
A pair of ESTs will be put in one cluster if:
either (Condition#1 OR Condition#2 OR both) is/are satisfied.

Transcript Homology:
A pair of ESTs will be put in one cluster if:
(Condition#1) is satisfied


(10)
ClonePairsFile None
Clone Mates/Pairs Information:
------------------------------

Clone Mates or Clone Pairs information can be specified in a file and can be used to improve quality of clustering (esp., cases where ESTs do not show complete over their corresponding transcript). Give the name of the file containing Clone Mate/Pair information agains$

(11)
Reporting features:
------------------

(11.1)
ReportSplicedCandidates (default 0)
If 1:
Reports all pairs of sequences generated that pass the local alignment test (Condition#2)
but FAIL the global alignment test (Condition#1). This can be used as a
set of potential pairs of sequences that flag an alternative splicing or unspliced
intron event.


(11.2)
ReportMaximalPairs (default 0)
If 1:
Reports all pairs of sequences that were generated by PaCE. The pairs are the ones
which have at least one maximal common substring of length >= MinLen.
Warning: The output is quadratic (#pairs) and so use it only for analysis purposes.

(11.3)
ReportMaximalSubstrings (default 0)
If 1:
Reports all maximal common substrings (length >= MinLen) generated by PaCE.
Warning: The output is linear but for large input size can be quite high. So use it only for analysis purposes.

(11.4)
ReportAcceptedPairs (default 0)
If 1:
Reports all pairs of sequences that led to merging of clusters. The number of such pairs
is linear in the number of sequences.

(11.5)
OutputLargeMerges (default 0)

argeClusterThreshold (default 500)

If 1:
Reports a pair of sequences leading to a cluster merge, if the individual
sizes of the two clusters of these two sequences (at the time of merge)
are both >= LargeClusterThreshold. The reports go into a file called:
large_merges.*

(11.6)
ReportGeneratedPairs (default 0)

If 1:
Report all promising pairs generated. This may take up a lot of disk space
because the number of such pairs in the worst case can be quadratic
in the input number of sequences.

(11.7)
ReportPairsCountUnit (default 1000)

The basic unit of display in the final report on the number of promising pairs
generated, aligned, and accepted.

(11.8)
DumpClustersMidway (default 0)

If 1:
Output intermediate sets of clusters during the course of execution.
Handy, if the run is expected to be a long one.

(12)
OutputFolder (default .)
All PaCE output files will be written into this folder.


(13)
Miscellaneous:
--------------

(13.1)
MPI_Block_Sends (default 1)

If 1:
Uses MPI_Ssend to communicate from slaves to master during the
alignment phase. Is expected to be about 3-4 times slower than
using MPI_BlockSends as 0 (i.e., just MPI_Isend and MPI_Wait).
Feature incorportated to ensure no message is loss in case of
large number of processors. Recommended to turn the flag on
if >= 512 processors are used.


(13.2)
Keep_Mbuf_Full (default 0)

Deprecated.

Assembly of PaCE clusters using CAP3 :

This is a wrapper script for running CAP3 assembly on each of the PaCE cluster.
This package does NOT include the "cap3" executable.
If CAP3 is not available, it can be obtained by mailing
Dr.Xiaoqiu Huang (xqhuang@cs.iastate.edu).

For the scripts to work, the CAP3 executable ("cap3") should be
present in the directory PaCE-pipeline/CAP3-assembly/ or in the path of the system.

PS:
Although the scripts are for running CAP3 as the assembly tool,
they can be easily modified accordingly to enable usage of alternative assembly tools.
The only script to be modified for this purpose
is "PaCE-pipeline/CAP3-assembly/caploop".


Input:
-------
Must have generated the cluster file using PaCE.
For illustration,
- let the organizm be denoted by "tEST",
- let the FASTA data file be "tEST.data"
(PS: this is the original data file - NOT the PaCE preprocessed version),
- let the PaCE cluster file be denoted by "estClust.500.3.PaCE".

Output:
--------
CAP3 assembly output for each of the PaCE cluster in estClust.500.3.PaCE.


Steps for Assembly:
-------------------

To start with, the following steps assumes the PaCE cluster file (estClust.500.3.PaCE) is
in the directory PaCE-clustering.

- From PaCE-pipeline directory:
cd CAP3-assembly

- mkdir tEST

- cp estClust.500.3 tEST/tEST
ie., copy the PaCE cluster file (from its current location) to the file named "tEST" inside the newly created directory

- You can use the perl script:
perl extractCF.pl tEST/tEST tEST.data

This step will create one FASTA data file corresponding to each PaCE cluster for tEST.


- rm tEST/tEST
ie., Remove the cluster file from the tEST folder - so that now it has only
the data files corresponding to each PaCE cluster

- caploop tEST
This script runs CAP3 on each of the FASTA data file present in the tEST folder,
generating the CAP3 output for each in the same folder.
This is the only script that requires modification if in case the assembly
tool is NOT CAP3.

Friday, October 22, 2010

Running cap3

after installing cap3

just take a sequence file and use the following terminal at command prompt

> cap3 filename [options]


you can specify constraints and qual

CAP3 takes as input a file of sequence reads in FASTA format. If the names of reads contain a dot ('.'), CAP3 requres that the names of reads sequenced from the same subclone contain the same substring up to the first dot. CAP3 takes two optional files: a file of quality values
in FASTA format and a file of forward-reverse constraints.

The file of quality values must be named "xyz.qual", and the file of forward-reverse constraints must be named "xyz.con", where "xyz" is the name of the sequence file.

my work involved

input
-----
>cap3 seq

seq is the file consisting of some sequence of data...

output
------
6 files

seq.cap.ace
seq.cap.contigs.links
seq.cap.info
seq.cap.contigs
seq.cap.contigs.qual
seq.cap.singlets


seq.cap.ace
-----------

AS 1 5

CO Contig1 1422 5 11 U
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAGTTTT
AGTTTTCCTCTGAAGCAAGCACACCTTCCCTTTCCCGTCTGTCTATCCATCCCTGACCCT
GTTGTCTGTCTATCCCTGACCCCGTAGTCTCCTAAGTCGCCCCAGATTTTGTGAACACCC
TCTGGAACTAGAATCTAGTGGGCGGATGGACCATTTACTAGACGGAGGTAGAGGTGGGTG
GATGCGAACGACAGGGTGCATAGTCAGCCCGGTTTTAAGGGCAGGTCACTTGGTAGGTCA
GCAGGCGGGTCAGTGGGCGGGTGCCTGCAGCATTTATGAACTTATTTGGCCCAGCAAACA
TTTTGAGTGTCAGGCCGTGCCTACCCAAGGTGAGGGTAAGGAGCAAAATCAGCCCAGCCC
AGAGCACTGGGTGGCTACACAGAGCCGACCTCTAATGTGCGCTCCGGGTCGGGATGGCAC
TCAGCTCGCCTTTAGGGAGTGATGATCTGGATGCCTGGCTTGGAGGTGACAGAGCCTGCC
CTTATGAGACAATTAAGAGACTGACTAAGCACCCGGCAGGAGGCCACGAGAATCCCCATG
TGAGAAAGAAGAGCATAAACAGGAAACACATTTAATAATTAAACAAAGATAACTCCCTCG
TGTGCGCGCACCGGGCCAGCCCCTATAGAAACATCTGAGGAGTCACTTCCTCCCATGACT
CTCGCCCGCCCGGCCGGCTGGAGTCGGCTCCTGGCAAGCTTCAGGCACCTCAGTTGTCCT
GAATACACACAGCACCCTTTCCTTACTGAAGCCCCTGAGAGCCTCCAGTTCTCCCTCCTT



BQ
10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 1$
20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 2$
20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 2$
20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 2$
20 20 20 20 20 20 20 5 5 5 20 2


AF R3 U 1
AF R1 U 44
AF R2 U 1
AF R4 C 478
AF R6 C 571
BS 1 248 R2
BS 249 250 R3
BS 251 849 R2
BS 850 850 R3
BS 851 974 R2
BS 975 1155 R4
BS 1156 1157 R6
BS 1158 1159 R4
BS 1160 1168 R6
BS 1169 1242 R4
BS 1243 1422 R6

-------------------------------------------------------------------
seq.cap.contigs.qual
---------------------

>Contig1
10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
10 10 10 10 10 10 10 10 10 10 10 10 10 10 20 20 20 20 20 20
20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20






a clear documentation is present about pace at the link below

http://icb.med.cornell.edu/crt/CAP3/example_usage.xml

Wednesday, October 20, 2010

Running the pace

- cd PaCE-pipeline/PaCE-clustering

- Preprocessing:

This preprocessing step formats the input FASTA file and generates
the corresponding input datafile for PaCE.

Run the preprocessPaCE command to preprocess the EST input file (fasta format)
eg.,
$ PreprocessPaCE.pl {FASTA data file} {1: prune PolyA/T, 0 otherwise} > tEST.data.PaCE

If the second argument is :
0: Does not modify the input sequences. Just concatenate each sequence
such that each sequence appears in one line and also
converts all DNA characters to upper case.
1: In addition to the two effects produced by option '0' this option
trims off streaks of As and Ts occurring at either ends
of the sequence. This option is not required if the input FASTA
GNU nano 1.2.5 File: PaCE-README

sequences have already been stripped of poly As and Ts or if they
they are required to be input. If you use this option,
make sure you inspect manually some modifications to confirm
there will be no negative impacts on the clustering. This inspection
can be done by first running preprocessPaCE using flag 0 and then
using flag 1 (both on the original file) and taking a diff of both
the output files.

eg.,
$ PreprocessPaCE.pl ../datafiles/tEST.data 0 > ../datafiles/tEST.data.PaCE
This generates another file by name "tEST.data.PaCE" in ../datafiles/ .

- Find "n":
Find the number of sequences in the preprocessed output data file.
Let us call it "n".

This can be found by a simple unix command like:
eg., grep -c ">" ../datafiles/tEST.data.PaCE

ie., n=500 for ../datafiles/tEST.data.PaCE.

- Parameterization:
Parameters to PaCE are kept in the file PaCE.cfg.
Check if PaCE.cfg is there in the directory of the executable.

You typically do NOT need to change any parameters except "window".

If the data size <=10,000 ESTs then a window size of 7 is recommended. If the data size <=30,000 and >=10,000 ESTs then a window size of 8 is recommended.
Otherwise a window size of 9 is recommended.


- Run PaCE:
PaCE takes two parameters:
Usage: {MPIRunCommand} PaCE {preprocessed FASTA datafile} {number of ESTs}

(P.S: Number of processors should be AT LEAST 2)


eg.,
$ mpirun -np 4 PaCE ../datafiles/tEST.data.PaCE 500

where 4 is the number of processors to run on.

"{MPIRunCommand}" depends on the available MPI implementation and job scheduler
of the parallel cluster being used.
For batch mode parallel platforms, use the specified batch submission routines like "qsub" or "llsubmit".

- PaCE output:


The results are of two categories: Run-time and Cluster results.
A summary of these results are printed on the standard output. If the parallel platform
uses batch processing which outputs the standard output to a file, then the summary
will be present in that file.

(i) Run-time results:

The total run-time and the run-time in different components of the system
is shown for each SLAVE processor. This indicates the total run-time for PaCE clustering.

For eg.,

Time taken by slave 1 : Load<0> + Preprocessing Phase<3> + Clustering Phase<1>= 5 secs

Here processor rank 1 took a total of 5 seconds to complete with the run-times in phases
indicated separately. The numbers are truncated to integer values.
Almost all the slave processors take the same amount of total
run-time. Also the time to load the data file into memory
indicated by Load<> first might vary with systems and as it is the
time for initialization, subtracting it from the total run-time
tells the actual time taken by the software.


(ii) Clustering results:

The standard output will also have something like this:

Master: #Clusters Output:= 357 #Singletons=258
Master: #Contained ESTs:= 63

This means: The total number of clusters generated is 357, out of which 258 are singletons.
Also out of the n(=500) ESTs supplied, 63 ESTs are completely contained
(with 100% identity) in other ESTs.

The clusters themselves are located in file estClust.n.p.PaCE
where n is the number of ESTs and p is the number of slave processors.
eg., estClust.500.3.PaCE
PS: The number of slave processors is always one less than the total
number of processors used to run PaCE.

The size distribution of these clusters (in the number of ESTs) is present
in estClustSize.n.p.PaCE.
eg., estClustSize.500.3.PaCE
Each line is of the format "{Cluster#} {Number of Members in Cluster#}".

The set of EST sequences which are contained in (an)other EST sequence(s) are
indicated in ContainedESTs.n.PaCE.
eg., ContainedESTs.500.PaCE
Each line is of the format "{Contained EST sequence header} IN {Container EST sequence header}".
PS: One EST can be contained in multiple EST sequences, and only one container
sequence is indicated here.
Also the set of contained ESTs reported need not comprise of all the ESTs
that are actually contained. More specifically, the contained ESTs
reported are only based on the set for which alignments were performed by PaCE.

The set of EST sequences which are not contained in any other EST sequence in the
provided data set is present in NonContainedESTs.n.PaCE.
eg., NonContainedESTs.500.PaCE


Adding clone mates information :

--------------------------------

As additional input, PaCE is designed to take as input clone mates information.
This information should be present in a file which should be present in
the following format:

>clone mate id #
FASTA header for sequences belonging to this clone mate (each in separate lines)
....

eg., Let the clone mates be in a file myCloneMates that looks like this:

>CloneId1
gi|19863109|
gi|19800130|
>CloneId2
gi|19863111|
gi|19800132|
...


Step 1:

Run the following command (for the above example) :
$perl formatCloneMates.pl myCloneMates tEST.data.PaCE > myCloneMates.PaCE

This will generate the PaCE formatted myCloneMates.PaCE.
(The numbers in each line indicates the sequence number that PaCE
provides for each input sequence.)

Step 2:

The PaCE formatted clone mate file should be in the folder where the
PaCE executable resides.
To specify the clone mate file as input, it has to be provided in the PaCE.cfg file as:

ClonePairsFile clonematefilename

eg.,
ClonePairsFile myCloneMates.PaCE

This has to be done before starting to execute PaCE. PaCE will put each
set of sequences linked by the same clone mate id together in one cluster
in its final output.a

If you do not have any clone pairs information to provide then set:

ClonePairsFile None


PS: Both these steps should be performed for each input FASTA file (even
if you intend to run for subsets of the original FASTA data file).

Tuesday, October 19, 2010

whats the magic number

started work on pace got an error about some magic number...magic magic...!!!!!

tg-qdong@BigRed:/N/gpfs/tg-qdong/PaCE> /N/gpfs/tg-qdong/PaCE/PaCE_v9
Error: Need to obtain the job magic number in MXMPI_MAGIC !

my first job on tera grid ( running repeatmasker)...

Repeatmasker est.fasta


output


est.fasta.cat
est.fasta.log
est.fasta.masked > gives the output file removing the unwanted and redundant data
est.fasta.out > gives where the changes had been made
est.fasta.tbl

est.fasta.out
--------




SW perc perc perc query position in query matching repeat position in repeat
score div. del. ins. sequence begin end (left) repeat class/family begin end (left) ID

180 25.0 2.1 0.0 seq1165 373 420 (263) + L2b LINE/L2 3234 3282 (93) 1
180 25.0 2.1 0.0 seq1545 549 596 (16) + L2b LINE/L2 3234 3282 (93) 2
180 25.0 2.1 0.0 seq3921 473 520 (48) + L2b LINE/L2 3234 3282 (93) 3
180 25.0 2.1 0.0 seq5025 181 228 (350) + L2b LINE/L2 3234 3282 (93) 4
180 25.0 2.1 0.0 seq5225 149 196 (531) + L2b LINE/L2 3234 3282 (93) 5
180 25.0 2.1 0.0 seq5514 431 478 (123) + L2b LINE/L2 3234 3282 (93) 6
180 25.0 2.1 0.0 seq5573 359 406 (177) + L2b LINE/L2 3234 3282 (93) 7
180 25.0 2.1 0.0 seq7881 289 336 (296) + L2b LINE/L2 3234 3282 (93) 8
180 25.0 2.1 0.0 seq9083 218 265 (302) + L2b LINE/L2 3234 3282 (93) 9



est.fasta.masked
--------------
>seq1
CAAAGATTAGCTCAACCCCACCCGTGCAGCAGTGGCCAGGAATCCACCAG
TCGCTTTAAATATGCACTACACGCGTTATCGTTGGCGGGGATCCCGGGGG
ATGATCCATTGGCCTTGATACGGGCCACAACCACTGACACGCCCCACAAG
CGAGTTTGGTTCACTGGGGCGAGCTGAAGGTTAGGTCTATCTATTGCGTA
GCCGAAAAAGTCATCACCAGTAACTATGCCGCTCTCCTTGCCCTTTGCCA
TGTGGTAGCTACGTCCTCCTCTATTGCACGTAACCCGGTTCAAAACCACG
AATCGTATCACTTTGCTTCGAAAAACCACTTTGATCCAGATACAAATTAA
ACTGCGATCCGAGCTTACCACCAAATCCGGGAGCATAAGCTGAAAGTGCA
GGAGAGGCAGAGCTGGCGGTGGCCCTTCCACCTCCTTTTATCGTAACACG
ATCGGCACGTCGGGCACTTTCGGGAAGGGCTGCCACACGAAAAGTCTTTT
ATAACTACGAGC
>seq2
CCACGCATTATGTAAAAATTACCGACGTGGAGCTTCGTGCCGCCGGATTG
TAAGAATAGCTCGTTGGAGATTACCTGAGTTGGTTGTTCTTTGTGTGTCA
ATCCATCTCGCCCTGACGACGGGGGCAAACATAAAAGTCCGATAACCCCG
AATTCACCTGGAATCATGGACAACCCGGACATTCCTACTTGGAAATCAGA
TCAGAGTTCGCTCGGATCAGCGACTGTTTGCCCGATAACGCGGGACAGGC
ATGGTTTAAAAAGTTCCGTTTTTTGTGCAAATTTCAAGACAATGCGTCTA
ACCGTAGCCTCGAAGGGCTACTTATCGGCTCGCCGGTTATGGGCGGTGAT
TTAAGTTAACGCGTTAGTTTGTAGAGCAGACACGTCCCTGTACCCATCGA
AGAGTTATCTGTGACTATACCCAATATAGTTCACTTAACTACTTATACGT
CCCCTGACAACATATGATCGAAGATAGAACGTTCGGCTATAGGCTATTAG
AGGCGAGTTGTCTCGAGCACACTTATTAATAAATGTCC
>seq3
TGCGCACGCGAACGCCCCAGCCTAACGTCGGAAATAAGGGGCCTCTAAAT
CCAAGTGCATTGTTCTCCGCTGGTGCGACACCGGACAAGGGTATGATGGA
GATATGACACTTATACTAACGTATGCGTTCCTAACATTCCACCCCAGGCA
ACGCGTACATGGAGGGCCCCGTTGCCTAAGTTGTATCCCCATGGGTGTGC
CAACCAAAAGACCCAGCGCAAGAGGCACTGCACATTCGAGTTATTGAGGA
CGTAGTTAACCGTAGACCCTTTCCATAAATATGCACCTGAGAAGAATCCT
TATGGTCGGCCACGTTCAGTTCCTCATCGTAAACCGGCAAGACCCTATCG
GGCTACAATTCAGATAACTGCGTAAATTGAATTCTAAGTCTGTTAATCCT
CCTGATAACCGGACGTAGTCAGGTCGCTCAATCTATGCGGGTTGAAGGAC
GCTCTACGGGCCCACCCCACGGAGATTGGGTCCTCGGATCGATCTTGGAT
GTCGATACCAAGAAGTCGGATCAAACACGTCTACTACTGCCGGGGTTTTG
CATCCCTAGCGAGCGGGCGGCCGTCGAACGCACTTCGCCCACCTTGATGA
CCCCCCAACCCTTCAGAGGACCCGCCATGGTGAGCACAAGCAACAGAAGT
TAGGGCGTGATTTGAAGAT
>seq4
GGTGACAAGCGGTCTAACGGAGCCTCTACGGGATGGGGATTTTCTGATCA
AACTCTGCCAAGGAGTAGTATAGCACATGAGCTCAGTCGTAGGGAAGCAA
AACAATCACGAGATCCCTCTATATTCTCACCTCCCGAAAGATCTGACCCC
GCCAAGACAGAATATCAGATCCTGAAGATTTATCTCATCCTTAACAGCCA
GGCCGTGGTTCACGTACGGATGACTTTTCGTTAGTGGCGGTATATCGTCC
CTATGTAGGGGAGACGAGACACAAGTATGAGCAGACTTAGCACTTACATC
GGGGACGTTATGGCAAGTGTGCCTGCCCAATGTTCGACTGGCTCCCATGG
GAACTTATAGTTCTTCGGCTACAGACACGGTATATGCCACTCCCTAAAAA
GCTTACAATGTTCAAGCGTAGTCTGTGCAAGGAGAAATGTGATATCATTC
ATGCGAGGCCTCAATTTTTTTTAACAGAGCGGCGGGGCCGCTCGCCGGCC
>seq5
AGCTTTGATGGTAGCTGCGAGCCCATCGCCGGTCTAAGGGACTTGATGGA
GGATCAACTCATCCCATTCGGCCTGGCTTGAGTCCTTCAGTCACTCATTC
TCTTGGCCGCGGATAGGATGCACTTATAAACGACCCAAAAGGGCAGGAGT
TTCAGCCCTAGGAGGGTTTCTGGGTTCACTGGCATTAGCACCGACTCTGG
GCTTAATCCAGGTCCCACGGCTGTGCCGAGAGTCCCTTAGCAGGAGTTGG

Monday, October 18, 2010

installing Java

i know many of us get struck with the little things... the code below are few simple steps to install java on your machine

Linux
-----
Add partner repository using the following command

> sudo add-apt-repository "deb http://archive.canonical.com/ lucid partner"

Update the source list

> sudo apt-get update

Now install sun java packages using the following commands

> sudo apt-get install sun-java6-jre sun-java6-plugin sun-java6-fonts


test by using the following command

java -verison





Just installing new Java flavours does not change the default Java pointed to by /usr/bin/java. You must explicitly set this:

Open a Terminal window
Run sudo update-java-alternatives -l to see the current configuration and possibilities.
Run sudo update-java-alternatives -s XXXX to set the XXX java version as default. For Sun Java 6 this would be sudo update-java-alternatives -s java-6-sun
Run java -version to ensure that the correct version is being called.
You can also use the following command to interactively make the change;

Open a Terminal window
Run sudo update-alternatives --config java
Follow the onscreen prompt
You can also try IcedTea NPR Web Browser Plugin

Monday, October 11, 2010

Many things to learn

"ALL men of whatsoever quality they be, who have done anything of excellence, or which may properly resemble excellence, ought, if they are persons of truth and honesty, to describe their life with their own hand; but they ought not to attempt so fine an enterprise till they have passed the age of forty."

Marlon pierce blog

http://communitygrids.blogspot.com/

Friday, October 8, 2010

Inferal installation

Infernal

* Infernal ("INFERence of RNA ALignment") is for searching DNA sequence databases for RNA structure and sequence similarities.
* It is an implementation of a special case of profile stochastic context-free grammars called covariance models (CMs).
* A CM is like a sequence profile, but it scores a combination of sequence consensus and RNA secondary structure consensus, so in many cases, it is more capable of identifying RNA homologs that conserve their secondary structure more than their primary sequence.
* URL: http://infernal.janelia.org/



$ wget ftp://selab.janelia.org/pub/software/infernal/infernal.tar.gz
$ gunzip infernal.tar.gz
$ tar xvf infernal.tar
$ cd infernal-1.0.2/
$ ./configure [--enable-mpi]
$ make
$ sudo make install

Testing

* Just running the 7 commands that are available with this package

$ cmalign
$ cmbuild
$ cmcalibrate
$ cmemit
$ cmscore
$ cmsearch
$ cmstat

* if it gives the standard usage message, then it is installed.

installing TRNAscan-SE

TRNAscan-SE

* tRNAscan-SE is a program for improved detection of transfer RNA genes in genomic sequence.
* URL: http://lowelab.ucsc.edu/tRNAscan-SE/

[edit] Installation

* Get the link location

$ wget http://lowelab.ucsc.edu/software/tRNAscan-SE.tar.gz
$ cd tRNAscan-SE-1.23/

* Edit Makefile to provide the following details

## where you want things installed
BINDIR = /usr/local/bin
LIBDIR = /usr/local/lib/tRNAscan-SE
MANDIR = /usr/local/man

If you dont change the above path it will install in your home directory $HOME

* Now make the package

$ make
..
..
sqio.c:238: error: conflicting types for ‘getline’
/usr/include/stdio.h:651: note: previous declaration of ‘getline’ was here
make: *** [sqio.o] Error 1
..

* The make did not complete because, there were 2 getline subroutines in 2 different files
* Solution:
o Checked if getline is present in any of the *.c files in this directory
o opened sqio.c and changed all the getline to getLine

$ make

make ran with no error.

NOTE:

* there are some instructions at the end of make. It requires us to run source setup.tRNAscan-SE; rehash for the current session
* Or include a line source /home/krevanna/Desktop/TOOL_TEST/tRNAscan-SE-1.23/setup.tRNAscan-SE in ~/.cshrc
* This wont work because we are in bash shell and it expects us to be in C-shell

* I did not follow the above instructions and i went ahead with make install

$ sudo make install
$ make testrun
$ make clean

installing blast

Steps to download and install BLAST

* Visit the URL: ftp://ftp.ncbi.nih.gov/blast/executables/release/.
* Click on 'LATEST' to get the latest version of BLAST.
* Right click on the version and 'Copy link address'.
* On the server, type wget and paste the url

$ wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release/release/2.2.24/blast-2.2.24-x64-linux.tar.gz
$ tar zxvf blast-2.2.24-x64-linux.tar.gz
$ cd blast-2.2.24/bin
$ ls
bl2seq blastall blastclust blastpgp copymat fastacmd formatdb formatrpsdb impala makemat megablast rpsblast seedtop

installing glimmer

Installation

* On Ubuntu OS

$ sudo apt-cache search glimmer
tigr-glimmer - Gene detection in archea and bacteria
$ sudo apt-get install tigr-glimmer

[edit] Testing

* To check if the program has been installed

$ tigr-glimmer
Usage: /usr/bin/tigr-glimmer
Existing programs are:
anomaly build-icm entropy-score glimmer3 multi-extract start-codon-distrib uncovered
build-fixed entropy-profile extract long-orfs score-fixed test window-acgt

installing diya

* Download url: http://sourceforge.net/projects/diyg/

$ tar zxvf diya-1.0-rc4.tar.gz
$ cd diya-1.0-rc4/

[edit] Prerequisites

* Perl Modules (How to check Perl module)
o Bioperl
o Data::Merger
o Getopt::Long
o FileHandle
o XML::Simple
o File::Basename
* Software
o Perl (>= 5.8)
o MUMmer v3.20
o Glimmer v3.02
o BLAST
o tRNAscan-SE v1.23
o Infernal v0.81
o rfamscan.pl v0.1
* Database
o UniRef50 (refer to Others)
o Protein Clusters (refer to Others)

[edit] Installation

* Steps to install

$ perl Makefile.PL
$ make
$ sudo make install

installing mira

* download url: http://sourceforge.net/projects/mira-assembler/files/
* version: 3.2.0

$ wget http://sourceforge.net/projects/mira-assembler/files/MIRA/V3.2.0/mira_3.2.0_prod_linux-gnu_x86_64_static.tar.bz2/download
$ bunzip2 mira_3.2.0_prod_linux-gnu_x86_64_static.tar.bz2
$ tar xvf mira_3.2.0_prod_linux-gnu_x86_64_static.tar
$ cd mira_3.2.0_prod_linux-gnu_x86_64_static/

* All the executables are in bin directory
* Export the path to bin to bashrc

$ vim ~/.bashrc
export PATH=$PATH:/path/to/mira/bin
$ source ~/.bashrc
$ mira
...

[edit] Instructions

* Open the index.html file present in docs folder on firefox.

[edit] Usage

mira \
[-project=]
[--job=arguments]
[-fasta[=] | -fastq[=] | -caf[=] | -phd[=]] [-notraceinfo] [-noclipping[=...]] [-highlyrepetitive] [-lowqualitydata] [-highqualitydata] [-params=] [-GENERAL:arguments]
[-STRAIN/BACKBONE:arguments]
[-ASSEMBLY:arguments]
[-DATAPROCESSING:arguments]
[-CLIPPING:arguments]
[-SKIM:arguments]
[-ALIGN:arguments]
[-CONTIG:arguments]
[-EDIT:arguments]
[-MISC:arguments]
[-DIRECTORY:arguments]
[-FILENAME:arguments]
[-OUTPUT:arguments]
[COMMON_SETTINGS | SANGER_SETTINGS | 454_SETTINGS | SOLEXA_SETTINGS | SOLID_SETTINGS]


ESTs (simulated)

* Generate a genome of 1,000,000 length
* Generated ESTs of length 500-800 bp for the above genome
* Around 10,000 ESTs were generated and stored in the file est.fasta

$ mira --project=EST --job=denovo,genome,normal,454 -fasta=est.fasta -SK:mnr=yes:nrr=10 454_SETTINGS -LR:wqf=no -LR:mxti=no -AS:epoq=no >&log_assembly.txt
$ cd EST_assembly/
$ ls -R
.:
EST_d_chkpt EST_d_info EST_d_log EST_d_results

./EST_d_chkpt:
passInfo.txt readpool.maf

./EST_d_info:
EST_info_assembly.txt EST_info_consensustaglist.txt EST_info_contigstats.txt EST_info_readrepeats.lst
EST_info_callparameters.txt EST_info_contigreadlist.txt EST_info_debrislist.txt EST_info_readtaglist.txt

./EST_d_log:
EST_error_reads_invalid EST_info_reads_tooshort EST_out_pass.2.caf
EST_info_consensustaglist.1.txt EST_info_readtaglist.1.txt EST_out_pass.3.caf
EST_info_consensustaglist.2.txt EST_info_readtaglist.2.txt EST_readpoolinfo.lst
EST_info_consensustaglist.3.txt EST_info_readtaglist.3.txt hashstat.bin
EST_info_contigreadlist_pass.1.txt EST_int_clippings.0.txt miralog.ads_pass.4.adsfacts
EST_info_contigreadlist_pass.2.txt EST_int_normalisedskims_pass.4.bin miralog.ads_pass.4.adsfacts.pclusters
EST_info_contigreadlist_pass.3.txt EST_int_posmatchc_pass.4.lst miralog.ads_pass.4.complement
EST_info_contigstats_pass.1.txt EST_int_posmatchc_pass.4.lst.reduced miralog.ads_pass.4.forward
EST_info_contigstats_pass.2.txt EST_int_posmatchf_pass.4.lst miralog.ads_pass.4.reject
EST_info_contigstats_pass.3.txt EST_int_posmatchf_pass.4.lst.reduced miralog.noqualities
EST_info_debrislist_pass.1.txt EST_int_posmatch_megahubs_pass.4.lst miralog.usedids
EST_info_debrislist_pass.2.txt EST_int_posmatch_multicopystat_preassembly.0.txt
EST_info_debrislist_pass.3.txt EST_out_pass.1.caf

./EST_d_results:
EST_out.ace EST_out.maf EST_out.padded.fasta.qual EST_out.unpadded.fasta EST_out.wig
EST_out.caf EST_out.padded.fasta EST_out.tcs EST_out.unpadded.fasta.qual

[edit] Inference

* Snippet of some of the information in EST_d_info/EST_info_assembly.txt

..
..
Length assessment:
------------------
Number of contigs: 106
Total consensus: 944829
Largest contig: 36580
N50 contig size: 13625
N90 contig size: 5366
N95 contig size: 2945
..
..

* Number of contigs in EST_out.unpadded.fasta and EST_out.padded.fasta

$ grep -c '>' ./EST_d_results/EST_out.unpadded.fasta
107

Monday, September 27, 2010

installing repeatmasker

there are few pre-requisite things to install repeatmasker..

repeatmasker is the first step involved in the est assembly pipeline

the prerequisites are

----perl
----sequence search engine
        you can install one of the three search engines present
         -cross-match
         -rm blast  ( i am using this one)  http://www.repeatmasker.org/RMBlast.html
         -abblast/wublast (these requires licensing )


Try 1. Download pre-compiled versions

$ wget http://www.repeatmasker.org/rmblast-1.2-ncbi-blast-2.2.23+-x64-linux.tar.gz
$ tar zxvf rmblast-1.2-ncbi-blast-2.2.23+-x64-linux.tar.gz
$ cd rmblast-1.2-ncbi-blast-2.2.23+
$ cd bin
$ ./rmblastn
./rmblastn: error while loading shared libraries: libpcre.so.0: cannot open shared object file:
No such file or directory

I am unable to install this library. Things i found out, i may be wrong

* libpcre is part of the 32-bit libraries
* on ubuntu i was unable to find the library, maybe because of the repository.

Try 2. Download RMBlast source

$ wget http://www.repeatmasker.org/rmblast-1.2-ncbi-blast-2.2.23+-src.tar.gz
$ tar zxvf rmblast-1.2-ncbi-blast-2.2.23+-src.tar.gz
$ cd rmblast-1.2-ncbi-blast-2.2.23+-src/c++
$ ./configure --with-mt --prefix=/usr/local/rmblast --without-debug
$ make
$ sudo make install

This installed good without any problem Modify ~/.bashrc to include the line below

export PATH=$PATH:/usr/local/rmblast/bin

To check installation

$ source ~/.bashrc
$ rmblastn
BLAST query/options error: Either a BLAST database or subject sequence(s) must be specified

If you get the above message then RMBlast is installed

---- tanden repeat finder (trf) is needed

The TRF download will contain a single executable file. You will need to rename the file from whatever it is to just 'trf' (all lower case). Make TRF executable by typing chmod +x+u trf. You can then move this file wherever you want. I just put it in the /RepeatMasker directory.

the next step is installing repeat database...... it takes 2 days to get registered soo temporarily this process would be out of service.................

after 1 day..........

----Repeatmasker database

* url: http://www.repeatmasker.org/RMDownload.html
* download latest release version

$ cd /usr/local
$ wget http://www.repeatmasker.org/RepeatMasker-open-3-2-9.tar.gz
$ tar zxvf RepeatMasker-open-3-2-9.tar.gz

* It requires RepeatMasker Libraries
o url: http://www.girinst.org/
o Go to Downloads > RepeatMasker libraries > RepeatMasker edition: local:

$ tar zxvf repeatmaskerlibraries-20090604.tar.gz
$ rm -rf repeatmaskerlibraries-20090604.tar.gz
$ mv Libraries/RepeatMaskerLib.embl /usr/local/RepeatMasker/Libraries/.



download RepeatMasker
unpack it
install RepeatMasker libraries..copy them into RepeatMasker main folder

now configure the RepeatMasker script by using
perl ./configure

while configuring it would ask for so many where abouts ??
perl path
trf path

after finishing installation
check whether its working or not by typing the command

RepeatMasker in terminal..

Repeat masker

5.  Install RepeatMasker. Download from http://www.repeatmasker.org



 a.  The most current version of RepeatMasker requires a program called TRF.

      This can be downloaded from http://tandem.bu.edu/trf/trf.html

  b.  The TRF download will contain a single executable file.  You will need to

      rename the file from whatever it is to just 'trf' (all lower case).

  c.  Make TRF executable by typing chmod +x+u trf.  You can then move this file

      wherever you want.  I just put it in the /RepeatMasker directory.

  d.  Unpack RepeatMasker to the directory of your choice (i.e. /usr/local).

  e.  If you do not have WuBlast installed, you will need to install the program

      RMBlast.  We do not recomend using cross_match, as RepeatMasker

      performance will suffer.

  f.  Now in the RepeatMasker directory type perl ./configure in the command

      line. You will be asked to identify the location of perl, wublast, and

      trf.  The script expects the paths to the folders containing the

      executables (because you are pointing to a folder the path must end in a

      '/' character or the configuration script throws a fit).

  g.  Add the location where you installed RepeatMasker to your PATH variable in

      .bash_profile (i.e. export PATH="/usr/local/RepeatMasker:$PATH").

  h.  You must register at http://www.girinst.org and download the Repbase

      repeat database, Repeat Masker edition, for RepeatMasker to work.

  i.  Unpack the contents of the RepBase tarball into the RepeatMasker/Libraries

      directory.


Friday, September 24, 2010

Tuesday, September 21, 2010

Monday, September 20, 2010

est assembling

applications present in swarm..

what are they ???
where are they ???
how to run them ???


raw data -> repeat masker -> pace -> cap3 -> blast

repeat masker is used to eliminate the junk est's

pace is used to cluster the est's (organizing the est's into groups)......learn about talon

cap3 is used to create a sequence from the cluster .

blast is used to compare the sequence with the data present in database and intimate what kind of sequence it is. which gene it is ..........

Friday, September 17, 2010

had problem in site with route.php

it gives some error -- warning 2 the following is already sent at...

this occurs if we had kept some space after ending the php code

like <php

> after this there should be no spaces

Thursday, September 16, 2010

started my work on bov and deploying some other applications
the problem for not working of hyperlink was due to apache..

configure mode-rewrite  and now the site is back and its working good

Tuesday, September 14, 2010

the site is working but the hyperlink to the other pages is giving a problem

i found some matter regarding controllers in cakephp

its at app/controllers/page_controller.php

tried working on it.. everything is ok with the page_controller.php but i dont know why it still giving me the same problem

checking out for somethingin
app/config/routes.php


this too is configured correctly...
still the problem continues..................... :)-

Friday, September 10, 2010

upload .sql files using
mysql> source file path

or

using phpmyadmin

got problem with cakephp.

deployed it in apache2 but the script is not running getting some errors....

Wednesday, September 8, 2010

installig tomcat6

http://www.ubuntugeek.com/how-to-install-tomcat-6-on-ubuntu-9-04-jaunty.html











apache2 has to be installed to run cakephp files..

now it has been succesfully installed


Apache2 has the concept of sites, which are separate configuration files that Apache2 will read. These are available in /etc/apache2/sites-available. By default, there is one site available called default this is what you will see when you browse to http://localhost or http://127.0.0.1. You can have many different site configurations available, and activate only those that you need.
As an example, we want the default site to be /home/user/public_html/. To do this, we must create a new site and then enable it in Apache2.
To create a new site:


  • Copy the default website as a starting point. sudo cp /etc/apache2/sites-available/default /etc/apache2/sites-available/mysite 


  • Edit the new configuration file in a text editor "sudo nano" on the command line or "gksudo gedit", for example: gksudo gedit /etc/apache2/sites-available/mysite


  • Change the DocumentRoot to point to the new location. For example, /home/user/public_html/


  • Change the Directory directive, replace <Directory /var/www/> to <Directory /home/user/public_html/>


  • You can also set separate logs for each site. To do this, change the ErrorLog and CustomLog directives. This is optional, but handy if you have many sites
  • Save the file
Now, we must deactivate the old site, and activate our new one. Ubuntu provides two small utilities that take care of this: a2ensite (apache2enable site) and a2dissite (apache2disable site).
sudo a2dissite default && sudo a2ensite mysite

Finally, we restart Apache2:
sudo /etc/init.d/apache2 restart

If you have not created /home/user/public_html/, you will receive an warning message
To test the new site, create a file in /home/user/public_html/:
echo '<b>Hello! It is working!</b>' > /home/user/public_html/index.html
Finally, browse to http://localhost/


 https://help.ubuntu.com/community/ApacheMySQLPHP

Monday, August 30, 2010

Cake php

i got some code of cake php which i need to deploy in  my server. it uses mvc(model, view,control) architecture..

an example to do this is as follows..


http://book.cakephp.org/view/1528/Blog


                               

Thursday, August 26, 2010

setting path for condor

nano ~/.bash_profile



if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
export MAVEN_HOME=/home/swarm/apache-maven-2.2.1
export PATH=/home/swarm/apache-maven-2.2.1/bin:$PATH:~/condor/bin:~/condor/sbin
export CATALINA_HOME=/home/swarm/ogce-axis-services/portal_deploy/apache-tomcat$
export CONDOR_CONFIG=/home/swarm/condor/etc/condor_config

setting java path

goto nano ~/.bashrc file


JAVA_HOME=/usr/java/jdk1.6.0_21
export JAVA_HOME
PATH=$PATH:/usr/java/jdk1.6.0_21/bin
export PATH
PATH=$PATH:/usr/share/maven2/bin
export PATH
ANT_HOME=/usr/share/ant
PATH=$PATH:${ANT_HOME}/bin
export PATH

Tuesday, August 24, 2010

hmmmmm.... unfortunately i found so many java files installed in my server again and again.......

it is like total server was filled with java files...


i worked the whole night to remove all this files... than i installed a new version of java. which worked finely but still its creating some errors with other applications...

Monday, August 23, 2010

The swarm code can be downloaded from the svn repo using

 svn checkout https://ogce.svn.sourceforge.net/svnroot/ogce/ogce-services-incubator/swarm

The build instructions are at http://www.collab-ogce.org/ogce/index.php/Swarm
The userguide listed on the Swarm wiki page has configuration information, but I may need to tweak this for your specific install and will get to this on wednesday.


after installing swarm we need to follow the following steps...




You can start with :

1. Placing the cake-php scripts under your document root and changing the document root in apache to the "webroot" directory of cake-php.

2. Set up the database from the dump file I provided.

3. Test if you are able to insert data into the database from the front end. You will have to tweek the cake PHP configuration script for the database.

4. Then come to the Swarm part. Configure Hibernate so that it can access the database.

5. As I had mentioned in the README, issue mvn install and see if it works. Understand the flow and then set up a cronjob.




after a 4 hours hardwork i got my way back in installing apache tomcat

an admin package should be installed which has not been installed by default

>>wget http://archive.apache.org/dist/tomcat/tomcat-5/v5.5.27/bin/apache-tomcat-5.5.27-admin.tar.gz

Thursday, August 19, 2010

learnt using vi editor

finding hard to install java with all of its apps....
unable to find jdk in my server....... its really makin me to think....

learning about vi editor....
  • Moving around with cursor:
    h key = LEFT, l key = RIGHT, k key = UP, j key = DOWN
  • Exiting vim editor without saving:
    press ESC to get into command mode, enter :q! to exit.
  • Deleting characters in vim command mode:
    delete with x key
  • Inserting / appending text:
    Press i or a in command mode and type
  • Saving changes and exit:
    in command mode :wq or SHIFT+zz 


  • Deleting words:
    delete word with d operator and w or e motion
  • Deleting to the end of the line:
    delete to the end of the line with d operator and $ motion
  • Using operators, motions and counts:
    beginning of the line 0, end of the line $, end of the 2nd word 2e beginning of the 4th word 4w
  • Deleting multiple words:
    to delete 3 words you would use d3w
  • Deleting lines:
    to delete single line dd, delete n lines ndd
  • Undo changes:
    undo changes with u
 http://www.linuxconfig.org/Vim_Tutorial

Wednesday, August 18, 2010

installed condor 6.8 succesfully

trying hard to install swarm..
i got so many errors.. but every error is becoming challenging and learning a new thing with every error i am encountering

it was very hard at first to work on linux ubuntu.. but now i am feeling comfortable.

all i can say today is .. Research is something we need to enjoy and if something goes wrong we get something new to learn.

but presently i am enjoying the fruits of failure..

if something goes wrong i am just using google.......

Monday, August 16, 2010

got a new zone to install swarm..
but i dint found condor 6.8 which was older version and no more available.

Friday, August 13, 2010

I tried installing the newer version of condor... it was 7.2

now i need to install the condor 6.8 version to remove the dependency problem
for installing condor i had to install so many other sub packages...
these include..

autoconf-2.64
m4-1.4.13
flex-2.4.35
byacc-20100610


but still its asking me for some another error......hmmm

Wednesday, August 11, 2010

---I had started my actual work today..
   finished with the requirement specification part...

--- installing the condor on the local system as a test process.