# HG changeset patch # User jpayne # Date 1518039462 18000 # Node ID c36a89d3a35198ce2b1b19542beab481957a688a planemo upload diff -r 000000000000 -r c36a89d3a351 quast-select.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/quast-select.xml Wed Feb 07 16:37:42 2018 -0500 @@ -0,0 +1,45 @@ + + assembly based on a combined QUAST table + + python + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff -r 000000000000 -r c36a89d3a351 quast_select.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/quast_select.py Wed Feb 07 16:37:42 2018 -0500 @@ -0,0 +1,27 @@ +from __future__ import print_function + +import csv +from operator import lt, gt +import sys + +def pick(rows, key, reverse=False): + sorted_rows = sorted(rows, key=lambda r:r[key], reverse=reverse) + return sorted_rows[0]['Assembly'] + +def int_or_str(token): + try: + return int(token) + except ValueError: + return str(token) + +if __name__ == '__main__': + path, compared = sys.argv[1:] + #QUAST tables have sample info as columns, so we need to transpose the table + rows = list(zip(*csv.reader(open(path, "rU"), delimiter='\t', dialect='excel'))) + hed = rows.pop(0) + dict_rows = [{h : int_or_str(r[i]) for i, h in enumerate(hed)} for r in rows] + if "#" in compared: + reverse = False #if it's a count, we want the fewest + else: + reverse = True #otherwise it's a length and we want the longest + print(pick(dict_rows, compared, reverse)) \ No newline at end of file diff -r 000000000000 -r c36a89d3a351 test-data/combined_table.tsv --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/combined_table.tsv Wed Feb 07 16:37:42 2018 -0500 @@ -0,0 +1,14 @@ +Assembly sample1 sample2 sample3 sample4 +# N's per 100 kbp 0.00 0.00 0.00 0.00 +# contigs 15 26 25 18 +# contigs (>= 0 bp) 15 26 25 18 +# contigs (>= 1000 bp) 12 17 20 13 +GC (%) 49.67 50.22 49.81 49.62 +L50 4 8 8 5 +L75 7 14 15 9 +Largest contig 9036 4811 5055 5138 +N50 4026 1934 1668 3114 +N75 3428 1371 1217 1833 +Total length 42889 42188 41537 41859 +Total length (>= 0 bp) 42889 42188 41537 41859 +Total length (>= 1000 bp) 40450 35624 37562 37621 diff -r 000000000000 -r c36a89d3a351 test-data/sample1.fasta --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/sample1.fasta Wed Feb 07 16:37:42 2018 -0500 @@ -0,0 +1,2 @@ +>sample1 +AAAA \ No newline at end of file diff -r 000000000000 -r c36a89d3a351 test-data/sample2.fasta --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/sample2.fasta Wed Feb 07 16:37:42 2018 -0500 @@ -0,0 +1,2 @@ +>sample2 +TTTT \ No newline at end of file diff -r 000000000000 -r c36a89d3a351 test-data/sample3.fasta --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/sample3.fasta Wed Feb 07 16:37:42 2018 -0500 @@ -0,0 +1,2 @@ +>sample3 +GGGG \ No newline at end of file diff -r 000000000000 -r c36a89d3a351 test-data/sample4.fasta --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/sample4.fasta Wed Feb 07 16:37:42 2018 -0500 @@ -0,0 +1,2 @@ +>sample4 +CCCC \ No newline at end of file