Lee Gir Won: BLAT and BLAST Output Format Fields

BLAST (-m 8)과 BLAT 결과를 보면 table 형식으로 되어 있는데 head들이 친절하게 설명되어 있는 것도있지만 대량분석할 때에는 귀찮아서 Header 정보를 제거하고 결과를 뽑아서 가끔씩 헷갈릴때가 있다. 본인은 그렇다.. :)

그래서 다시 정리를... 쿨럭.. ㅎㅎ

NCBI Blast Tabular output format fields

(Blast Head의 경우 부연 설명이 필요 없을 정도로 simple하다. :))
QueryIdSubjectId
Identity percent
AlnLength
mismatchCount
gapOpenCount
QueryStart
QueryEnd
SubjectStart
SubjectEnd
Evalue
bitScore

위의 링크된 사용자가 심플하게 parsing하는 예제를 함께 보여주고 있는데
참고하면 좋을듯 :)

-Python

for line in open(“myfile.blast”):
(queryId, subjectId, percIdentity, alnLength, mismatchCount, gapOpenCount, queryStart, queryEnd, subjectStart, subjectEnd, eVal, bitScore) = line.split(“\t”)

-Perl

while (<>) {
($queryId, $subjectId, $percIdentity, $alnLength, $mismatchCount, $gapOpenCount, $queryStart, $queryEnd, $subjectStart, $subjectEnd, $eVal, $bitScore) = split(/\t/)
}

BLAT Spec

matches int unsigned , # Number of bases that match that aren't repeats
misMatches int unsigned , # Number of bases that don't match
repMatches int unsigned , # Number of bases that match but are part of repeats
nCount int unsigned , # Number of 'N' bases
qNumInsert int unsigned , # Number of inserts in query
qBaseInsert int unsigned , # Number of bases inserted in query
tNumInsert int unsigned , # Number of inserts in target
tBaseInsert int unsigned , # Number of bases inserted in target
strand char(2) , # + or - for query strand, optionally followed by + or – for target strand
qName varchar(255) , # Query sequence name
qSize int unsigned , # Query sequence size
qStart int unsigned , # Alignment start position in query
qEnd int unsigned , # Alignment end position in query
tName varchar(255) , # Target sequence name
tSize int unsigned , # Target sequence size
tStart int unsigned , # Alignment start position in target
tEnd int unsigned , # Alignment end position in target
blockCount int unsigned , # Number of blocks in alignment. A block contains no gaps.
blockSizes longblob , # Size of each block in a comma separated list
qStarts longblob , # Start of each block in query in a comma separated list
tStarts longblob , # Start of each block in target in a comma separated list

-Python

for line in open(“myfile.blat”):
(matches, misMatches, repMatches, nCount, qNumInsert, qBaseInsert, tNumInsert, tBaseInsert, strand, qName, qSize, qStart, qEnd, tName, tSize, tStart, tEnd, blockCount, blockSizes, qStarts, tStarts) = lines.split("\t")

Lee Gir Won

Pages

금요일, 7월 13, 2012

BLAT and BLAST Output Format Fields

댓글 없음: