그래서 다시 정리를... 쿨럭.. ㅎㅎ
NCBI Blast Tabular output format fields
(Blast Head의 경우 부연 설명이 필요 없을 정도로 simple하다. :))
QueryIdSubjectId
Identity percent
AlnLength
mismatchCount
gapOpenCount
QueryStart
QueryEnd
SubjectStart
SubjectEnd
Evalue
bitScore
위의 링크된 사용자가 심플하게 parsing하는 예제를 함께 보여주고 있는데
참고하면 좋을듯 :)
-Python
(queryId, subjectId, percIdentity, alnLength, mismatchCount, gapOpenCount, queryStart, queryEnd, subjectStart, subjectEnd, eVal, bitScore) = line.split(“\t”)
-Perl
($queryId, $subjectId, $percIdentity, $alnLength, $mismatchCount, $gapOpenCount, $queryStart, $queryEnd, $subjectStart, $subjectEnd, $eVal, $bitScore) = split(/\t/)
}
BLAT Spec
matches int unsigned , # Number of bases that match that aren't repeats
misMatches int unsigned , # Number of bases that don't match
repMatches int unsigned , # Number of bases that match but are part of repeats
nCount int unsigned , # Number of 'N' bases
qNumInsert int unsigned , # Number of inserts in query
qBaseInsert int unsigned , # Number of bases inserted in query
tNumInsert int unsigned , # Number of inserts in target
tBaseInsert int unsigned , # Number of bases inserted in target
strand char(2) , # + or - for query strand, optionally followed by + or – for target strand
qName varchar(255) , # Query sequence name
qSize int unsigned , # Query sequence size
qStart int unsigned , # Alignment start position in query
qEnd int unsigned , # Alignment end position in query
tName varchar(255) , # Target sequence name
tSize int unsigned , # Target sequence size
tStart int unsigned , # Alignment start position in target
tEnd int unsigned , # Alignment end position in target
blockCount int unsigned , # Number of blocks in alignment. A block contains no gaps.
blockSizes longblob , # Size of each block in a comma separated list
qStarts longblob , # Start of each block in query in a comma separated list
tStarts longblob , # Start of each block in target in a comma separated list
-Python
for line in open(“myfile.blat”):
(matches, misMatches, repMatches, nCount, qNumInsert, qBaseInsert, tNumInsert, tBaseInsert, strand, qName, qSize, qStart, qEnd, tName, tSize, tStart, tEnd, blockCount, blockSizes, qStarts, tStarts) = lines.split("\t")
댓글 없음:
댓글 쓰기