csp2: CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/lib/python3.8/site-packages/Bio/pairwise2.py annotate

annotate CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/lib/python3.8/site-packages/Bio/pairwise2.py @ 69:33d812a61356

planemo upload commit 2e9511a184a1ca667c7be0c6321a36dc4e3d116d

author	jpayne
date	Tue, 18 Mar 2025 17:55:14 -0400
parents
children

rev	line source
jpayne@69	1 # Copyright 2002 by Jeffrey Chang.
jpayne@69	2 # Copyright 2016, 2019, 2020 by Markus Piotrowski.
jpayne@69	3 # All rights reserved.
jpayne@69	4 #
jpayne@69	5 # This file is part of the Biopython distribution and governed by your
jpayne@69	6 # choice of the "Biopython License Agreement" or the "BSD 3-Clause License".
jpayne@69	7 # Please see the LICENSE file that should have been included as part of this
jpayne@69	8 # package.
jpayne@69	9
jpayne@69	10 """Pairwise sequence alignment using a dynamic programming algorithm.
jpayne@69	11
jpayne@69	12 This provides functions to get global and local alignments between two
jpayne@69	13 sequences. A global alignment finds the best concordance between all
jpayne@69	14 characters in two sequences. A local alignment finds just the
jpayne@69	15 subsequences that align the best. Local alignments must have a positive
jpayne@69	16 score to be reported and they will not be extended for 'zero counting'
jpayne@69	17 matches. This means a local alignment will always start and end with
jpayne@69	18 a positive counting match.
jpayne@69	19
jpayne@69	20 When doing alignments, you can specify the match score and gap
jpayne@69	21 penalties. The match score indicates the compatibility between an
jpayne@69	22 alignment of two characters in the sequences. Highly compatible
jpayne@69	23 characters should be given positive scores, and incompatible ones
jpayne@69	24 should be given negative scores or 0. The gap penalties should be
jpayne@69	25 negative.
jpayne@69	26
jpayne@69	27 The names of the alignment functions in this module follow the
jpayne@69	28 convention
jpayne@69	29 <alignment type>XX
jpayne@69	30 where <alignment type> is either "global" or "local" and XX is a 2
jpayne@69	31 character code indicating the parameters it takes. The first
jpayne@69	32 character indicates the parameters for matches (and mismatches), and
jpayne@69	33 the second indicates the parameters for gap penalties.
jpayne@69	34
jpayne@69	35 The match parameters are::
jpayne@69	36
jpayne@69	37 CODE DESCRIPTION & OPTIONAL KEYWORDS
jpayne@69	38 x No parameters. Identical characters have score of 1, otherwise 0.
jpayne@69	39 m A match score is the score of identical chars, otherwise mismatch
jpayne@69	40 score. Keywords ``match``, ``mismatch``.
jpayne@69	41 d A dictionary returns the score of any pair of characters.
jpayne@69	42 Keyword ``match_dict``.
jpayne@69	43 c A callback function returns scores. Keyword ``match_fn``.
jpayne@69	44
jpayne@69	45 The gap penalty parameters are::
jpayne@69	46
jpayne@69	47 CODE DESCRIPTION & OPTIONAL KEYWORDS
jpayne@69	48 x No gap penalties.
jpayne@69	49 s Same open and extend gap penalties for both sequences.
jpayne@69	50 Keywords ``open``, ``extend``.
jpayne@69	51 d The sequences have different open and extend gap penalties.
jpayne@69	52 Keywords ``openA``, ``extendA``, ``openB``, ``extendB``.
jpayne@69	53 c A callback function returns the gap penalties.
jpayne@69	54 Keywords ``gap_A_fn``, ``gap_B_fn``.
jpayne@69	55
jpayne@69	56 All the different alignment functions are contained in an object
jpayne@69	57 ``align``. For example:
jpayne@69	58
jpayne@69	59 >>> from Bio import pairwise2
jpayne@69	60 >>> alignments = pairwise2.align.globalxx("ACCGT", "ACG")
jpayne@69	61
jpayne@69	62 For better readability, the required arguments can be used with optional keywords:
jpayne@69	63
jpayne@69	64 >>> alignments = pairwise2.align.globalxx(sequenceA="ACCGT", sequenceB="ACG")
jpayne@69	65
jpayne@69	66 The result is a list of the alignments between the two strings. Each alignment
jpayne@69	67 is a named tuple consisting of the two aligned sequences, the score and the
jpayne@69	68 start and end positions of the alignment:
jpayne@69	69
jpayne@69	70 >>> print(alignments)
jpayne@69	71 [Alignment(seqA='ACCGT', seqB='A-CG-', score=3.0, start=0, end=5), ...
jpayne@69	72
jpayne@69	73 You can access each element of an alignment by index or name:
jpayne@69	74
jpayne@69	75 >>> alignments[0][2]
jpayne@69	76 3.0
jpayne@69	77 >>> alignments[0].score
jpayne@69	78 3.0
jpayne@69	79
jpayne@69	80 For a nice printout of an alignment, use the ``format_alignment`` method of
jpayne@69	81 the module:
jpayne@69	82
jpayne@69	83 >>> from Bio.pairwise2 import format_alignment
jpayne@69	84 >>> print(format_alignment(*alignments[0]))
jpayne@69	85 ACCGT
jpayne@69	86 \| \|\|
jpayne@69	87 A-CG-
jpayne@69	88 Score=3
jpayne@69	89 <BLANKLINE>
jpayne@69	90
jpayne@69	91 All alignment functions have the following arguments:
jpayne@69	92
jpayne@69	93 - Two sequences: strings, Biopython sequence objects or lists.
jpayne@69	94 Lists are useful for supplying sequences which contain residues that are
jpayne@69	95 encoded by more than one letter.
jpayne@69	96
jpayne@69	97 - ``penalize_extend_when_opening``: boolean (default: False).
jpayne@69	98 Whether to count an extension penalty when opening a gap. If false, a gap of
jpayne@69	99 1 is only penalized an "open" penalty, otherwise it is penalized
jpayne@69	100 "open+extend".
jpayne@69	101
jpayne@69	102 - ``penalize_end_gaps``: boolean.
jpayne@69	103 Whether to count the gaps at the ends of an alignment. By default, they are
jpayne@69	104 counted for global alignments but not for local ones. Setting
jpayne@69	105 ``penalize_end_gaps`` to (boolean, boolean) allows you to specify for the
jpayne@69	106 two sequences separately whether gaps at the end of the alignment should be
jpayne@69	107 counted.
jpayne@69	108
jpayne@69	109 - ``gap_char``: string (default: ``'-'``).
jpayne@69	110 Which character to use as a gap character in the alignment returned. If your
jpayne@69	111 input sequences are lists, you must change this to ``['-']``.
jpayne@69	112
jpayne@69	113 - ``force_generic``: boolean (default: False).
jpayne@69	114 Always use the generic, non-cached, dynamic programming function (slow!).
jpayne@69	115 For debugging.
jpayne@69	116
jpayne@69	117 - ``score_only``: boolean (default: False).
jpayne@69	118 Only get the best score, don't recover any alignments. The return value of
jpayne@69	119 the function is the score. Faster and uses less memory.
jpayne@69	120
jpayne@69	121 - ``one_alignment_only``: boolean (default: False).
jpayne@69	122 Only recover one alignment.
jpayne@69	123
jpayne@69	124 The other parameters of the alignment function depend on the function called.
jpayne@69	125 Some examples:
jpayne@69	126
jpayne@69	127 - Find the best global alignment between the two sequences. Identical
jpayne@69	128 characters are given 1 point. No points are deducted for mismatches or gaps.
jpayne@69	129
jpayne@69	130 >>> for a in pairwise2.align.globalxx("ACCGT", "ACG"):
jpayne@69	131 ... print(format_alignment(*a))
jpayne@69	132 ACCGT
jpayne@69	133 \| \|\|
jpayne@69	134 A-CG-
jpayne@69	135 Score=3
jpayne@69	136 <BLANKLINE>
jpayne@69	137 ACCGT
jpayne@69	138 \|\| \|
jpayne@69	139 AC-G-
jpayne@69	140 Score=3
jpayne@69	141 <BLANKLINE>
jpayne@69	142
jpayne@69	143 - Same thing as before, but with a local alignment. Note that
jpayne@69	144 ``format_alignment`` will only show the aligned parts of the sequences,
jpayne@69	145 together with the starting positions.
jpayne@69	146
jpayne@69	147 >>> for a in pairwise2.align.localxx("ACCGT", "ACG"):
jpayne@69	148 ... print(format_alignment(*a))
jpayne@69	149 1 ACCG
jpayne@69	150 \| \|\|
jpayne@69	151 1 A-CG
jpayne@69	152 Score=3
jpayne@69	153 <BLANKLINE>
jpayne@69	154 1 ACCG
jpayne@69	155 \|\| \|
jpayne@69	156 1 AC-G
jpayne@69	157 Score=3
jpayne@69	158 <BLANKLINE>
jpayne@69	159
jpayne@69	160 To restore the 'historic' behaviour of ``format_alignemt``, i.e., showing
jpayne@69	161 also the un-aligned parts of both sequences, use the new keyword parameter
jpayne@69	162 ``full_sequences``:
jpayne@69	163
jpayne@69	164 >>> for a in pairwise2.align.localxx("ACCGT", "ACG"):
jpayne@69	165 ... print(format_alignment(*a, full_sequences=True))
jpayne@69	166 ACCGT
jpayne@69	167 \| \|\|
jpayne@69	168 A-CG-
jpayne@69	169 Score=3
jpayne@69	170 <BLANKLINE>
jpayne@69	171 ACCGT
jpayne@69	172 \|\| \|
jpayne@69	173 AC-G-
jpayne@69	174 Score=3
jpayne@69	175 <BLANKLINE>
jpayne@69	176
jpayne@69	177
jpayne@69	178 - Do a global alignment. Identical characters are given 2 points, 1 point is
jpayne@69	179 deducted for each non-identical character. Don't penalize gaps.
jpayne@69	180
jpayne@69	181 >>> for a in pairwise2.align.globalmx("ACCGT", "ACG", 2, -1):
jpayne@69	182 ... print(format_alignment(*a))
jpayne@69	183 ACCGT
jpayne@69	184 \| \|\|
jpayne@69	185 A-CG-
jpayne@69	186 Score=6
jpayne@69	187 <BLANKLINE>
jpayne@69	188 ACCGT
jpayne@69	189 \|\| \|
jpayne@69	190 AC-G-
jpayne@69	191 Score=6
jpayne@69	192 <BLANKLINE>
jpayne@69	193
jpayne@69	194 - Same as above, except now 0.5 points are deducted when opening a gap, and
jpayne@69	195 0.1 points are deducted when extending it.
jpayne@69	196
jpayne@69	197 >>> for a in pairwise2.align.globalms("ACCGT", "ACG", 2, -1, -.5, -.1):
jpayne@69	198 ... print(format_alignment(*a))
jpayne@69	199 ACCGT
jpayne@69	200 \| \|\|
jpayne@69	201 A-CG-
jpayne@69	202 Score=5
jpayne@69	203 <BLANKLINE>
jpayne@69	204 ACCGT
jpayne@69	205 \|\| \|
jpayne@69	206 AC-G-
jpayne@69	207 Score=5
jpayne@69	208 <BLANKLINE>
jpayne@69	209
jpayne@69	210 - Note that you can use keywords to increase the readability, e.g.:
jpayne@69	211
jpayne@69	212 >>> a = pairwise2.align.globalms("ACGT", "ACG", match=2, mismatch=-1, open=-.5,
jpayne@69	213 ... extend=-.1)
jpayne@69	214
jpayne@69	215 - Depending on the penalties, a gap in one sequence may be followed by a gap in
jpayne@69	216 the other sequence.If you don't like this behaviour, increase the gap-open
jpayne@69	217 penalty:
jpayne@69	218
jpayne@69	219 >>> for a in pairwise2.align.globalms("A", "T", 5, -4, -1, -.1):
jpayne@69	220 ... print(format_alignment(*a))
jpayne@69	221 A-
jpayne@69	222 <BLANKLINE>
jpayne@69	223 -T
jpayne@69	224 Score=-2
jpayne@69	225 <BLANKLINE>
jpayne@69	226 >>> for a in pairwise2.align.globalms("A", "T", 5, -4, -3, -.1):
jpayne@69	227 ... print(format_alignment(*a))
jpayne@69	228 A
jpayne@69	229 .
jpayne@69	230 T
jpayne@69	231 Score=-4
jpayne@69	232 <BLANKLINE>
jpayne@69	233
jpayne@69	234 - The alignment function can also use known matrices already included in
jpayne@69	235 Biopython (in ``Bio.Align.substitution_matrices``):
jpayne@69	236
jpayne@69	237 >>> from Bio.Align import substitution_matrices
jpayne@69	238 >>> matrix = substitution_matrices.load("BLOSUM62")
jpayne@69	239 >>> for a in pairwise2.align.globaldx("KEVLA", "EVL", matrix):
jpayne@69	240 ... print(format_alignment(*a))
jpayne@69	241 KEVLA
jpayne@69	242 \|\|\|
jpayne@69	243 -EVL-
jpayne@69	244 Score=13
jpayne@69	245 <BLANKLINE>
jpayne@69	246
jpayne@69	247 - With the parameter ``c`` you can define your own match- and gap functions.
jpayne@69	248 E.g. to define an affine logarithmic gap function and using it:
jpayne@69	249
jpayne@69	250 >>> from math import log
jpayne@69	251 >>> def gap_function(x, y): # x is gap position in seq, y is gap length
jpayne@69	252 ... if y == 0: # No gap
jpayne@69	253 ... return 0
jpayne@69	254 ... elif y == 1: # Gap open penalty
jpayne@69	255 ... return -2
jpayne@69	256 ... return - (2 + y/4.0 + log(y)/2.0)
jpayne@69	257 ...
jpayne@69	258 >>> alignment = pairwise2.align.globalmc("ACCCCCGT", "ACG", 5, -4,
jpayne@69	259 ... gap_function, gap_function)
jpayne@69	260
jpayne@69	261 You can define different gap functions for each sequence.
jpayne@69	262 Self-defined match functions must take the two residues to be compared and
jpayne@69	263 return a score.
jpayne@69	264
jpayne@69	265 To see a description of the parameters for a function, please look at
jpayne@69	266 the docstring for the function via the help function, e.g.
jpayne@69	267 type ``help(pairwise2.align.localds)`` at the Python prompt.
jpayne@69	268
jpayne@69	269 """ # noqa: W291
jpayne@69	270
jpayne@69	271 import warnings
jpayne@69	272 from collections import namedtuple
jpayne@69	273
jpayne@69	274 from Bio import BiopythonWarning
jpayne@69	275 from Bio import BiopythonDeprecationWarning
jpayne@69	276 from Bio.Align import substitution_matrices
jpayne@69	277
jpayne@69	278 warnings.warn(
jpayne@69	279 "Bio.pairwise2 has been deprecated, and we intend to remove it in a "
jpayne@69	280 "future release of Biopython. As an alternative, please consider using "
jpayne@69	281 "Bio.Align.PairwiseAligner as a replacement, and contact the "
jpayne@69	282 "Biopython developers if you still need the Bio.pairwise2 module.",
jpayne@69	283 BiopythonDeprecationWarning,
jpayne@69	284 )
jpayne@69	285
jpayne@69	286
jpayne@69	287 MAX_ALIGNMENTS = 1000 # maximum alignments recovered in traceback
jpayne@69	288
jpayne@69	289 Alignment = namedtuple("Alignment", ("seqA, seqB, score, start, end"))
jpayne@69	290
jpayne@69	291
jpayne@69	292 class align:
jpayne@69	293 """Provide functions that do alignments.
jpayne@69	294
jpayne@69	295 Alignment functions are called as:
jpayne@69	296
jpayne@69	297 pairwise2.align.globalXX
jpayne@69	298
jpayne@69	299 or
jpayne@69	300
jpayne@69	301 pairwise2.align.localXX
jpayne@69	302
jpayne@69	303 Where XX is a 2 character code indicating the match/mismatch parameters
jpayne@69	304 (first character, either x, m, d or c) and the gap penalty parameters
jpayne@69	305 (second character, either x, s, d, or c).
jpayne@69	306
jpayne@69	307 For a detailed description read the main module's docstring (e.g.,
jpayne@69	308 type ``help(pairwise2)``).
jpayne@69	309 To see a description of the parameters for a function, please
jpayne@69	310 look at the docstring for the function, e.g. type
jpayne@69	311 ``help(pairwise2.align.localds)`` at the Python prompt.
jpayne@69	312 """
jpayne@69	313
jpayne@69	314 class alignment_function:
jpayne@69	315 """Callable class which impersonates an alignment function.
jpayne@69	316
jpayne@69	317 The constructor takes the name of the function. This class
jpayne@69	318 will decode the name of the function to figure out how to
jpayne@69	319 interpret the parameters.
jpayne@69	320 """
jpayne@69	321
jpayne@69	322 # match code -> tuple of (parameters, docstring)
jpayne@69	323 match2args = {
jpayne@69	324 "x": ([], ""),
jpayne@69	325 "m": (
jpayne@69	326 ["match", "mismatch"],
jpayne@69	327 "match is the score to given to identical characters.\n"
jpayne@69	328 "mismatch is the score given to non-identical ones.",
jpayne@69	329 ),
jpayne@69	330 "d": (
jpayne@69	331 ["match_dict"],
jpayne@69	332 "match_dict is a dictionary where the keys are tuples\n"
jpayne@69	333 "of pairs of characters and the values are the scores,\n"
jpayne@69	334 "e.g. ('A', 'C') : 2.5.",
jpayne@69	335 ),
jpayne@69	336 "c": (
jpayne@69	337 ["match_fn"],
jpayne@69	338 "match_fn is a callback function that takes two "
jpayne@69	339 "characters and returns the score between them.",
jpayne@69	340 ),
jpayne@69	341 }
jpayne@69	342 # penalty code -> tuple of (parameters, docstring)
jpayne@69	343 penalty2args = {
jpayne@69	344 "x": ([], ""),
jpayne@69	345 "s": (
jpayne@69	346 ["open", "extend"],
jpayne@69	347 "open and extend are the gap penalties when a gap is\n"
jpayne@69	348 "opened and extended. They should be negative.",
jpayne@69	349 ),
jpayne@69	350 "d": (
jpayne@69	351 ["openA", "extendA", "openB", "extendB"],
jpayne@69	352 "openA and extendA are the gap penalties for sequenceA,\n"
jpayne@69	353 "and openB and extendB for sequenceB. The penalties\n"
jpayne@69	354 "should be negative.",
jpayne@69	355 ),
jpayne@69	356 "c": (
jpayne@69	357 ["gap_A_fn", "gap_B_fn"],
jpayne@69	358 "gap_A_fn and gap_B_fn are callback functions that takes\n"
jpayne@69	359 "(1) the index where the gap is opened, and (2) the length\n"
jpayne@69	360 "of the gap. They should return a gap penalty.",
jpayne@69	361 ),
jpayne@69	362 }
jpayne@69	363
jpayne@69	364 def __init__(self, name):
jpayne@69	365 """Check to make sure the name of the function is reasonable."""
jpayne@69	366 if name.startswith("global"):
jpayne@69	367 if len(name) != 8:
jpayne@69	368 raise AttributeError("function should be globalXX")
jpayne@69	369 elif name.startswith("local"):
jpayne@69	370 if len(name) != 7:
jpayne@69	371 raise AttributeError("function should be localXX")
jpayne@69	372 else:
jpayne@69	373 raise AttributeError(name)
jpayne@69	374 align_type, match_type, penalty_type = name[:-2], name[-2], name[-1]
jpayne@69	375 try:
jpayne@69	376 match_args, match_doc = self.match2args[match_type]
jpayne@69	377 except KeyError:
jpayne@69	378 raise AttributeError(f"unknown match type {match_type!r}")
jpayne@69	379 try:
jpayne@69	380 penalty_args, penalty_doc = self.penalty2args[penalty_type]
jpayne@69	381 except KeyError:
jpayne@69	382 raise AttributeError(f"unknown penalty type {penalty_type!r}")
jpayne@69	383
jpayne@69	384 # Now get the names of the parameters to this function.
jpayne@69	385 param_names = ["sequenceA", "sequenceB"]
jpayne@69	386 param_names.extend(match_args)
jpayne@69	387 param_names.extend(penalty_args)
jpayne@69	388 self.function_name = name
jpayne@69	389 self.align_type = align_type
jpayne@69	390 self.param_names = param_names
jpayne@69	391
jpayne@69	392 self.__name__ = self.function_name
jpayne@69	393 # Set the doc string.
jpayne@69	394 doc = f"{self.__name__}({', '.join(self.param_names)}) -> alignments\n"
jpayne@69	395 doc += """\
jpayne@69	396 \nThe following parameters can also be used with optional
jpayne@69	397 keywords of the same name.\n\n
jpayne@69	398 sequenceA and sequenceB must be of the same type, either
jpayne@69	399 strings, lists or Biopython sequence objects.\n
jpayne@69	400 """
jpayne@69	401 if match_doc:
jpayne@69	402 doc += f"\n{match_doc}\n"
jpayne@69	403 if penalty_doc:
jpayne@69	404 doc += f"\n{penalty_doc}\n"
jpayne@69	405 doc += """\
jpayne@69	406 \nalignments is a list of named tuples (seqA, seqB, score,
jpayne@69	407 begin, end). seqA and seqB are strings showing the alignment
jpayne@69	408 between the sequences. score is the score of the alignment.
jpayne@69	409 begin and end are indexes of seqA and seqB that indicate
jpayne@69	410 where the alignment occurs.
jpayne@69	411 """
jpayne@69	412 self.__doc__ = doc
jpayne@69	413
jpayne@69	414 def decode(self, args, *keywds):
jpayne@69	415 """Decode the arguments for the _align function.
jpayne@69	416
jpayne@69	417 keywds will get passed to it, so translate the arguments
jpayne@69	418 to this function into forms appropriate for _align.
jpayne@69	419 """
jpayne@69	420 keywds = keywds.copy()
jpayne@69	421
jpayne@69	422 # Replace possible "keywords" with arguments:
jpayne@69	423 args += (len(self.param_names) - len(args)) * (None,)
jpayne@69	424 for key in keywds.copy():
jpayne@69	425 if key in self.param_names:
jpayne@69	426 _index = self.param_names.index(key)
jpayne@69	427 args = args[:_index] + (keywds[key],) + args[_index:]
jpayne@69	428 del keywds[key]
jpayne@69	429 args = tuple(arg for arg in args if arg is not None)
jpayne@69	430
jpayne@69	431 if len(args) != len(self.param_names):
jpayne@69	432 raise TypeError(
jpayne@69	433 "%s takes exactly %d argument (%d given)"
jpayne@69	434 % (self.function_name, len(self.param_names), len(args))
jpayne@69	435 )
jpayne@69	436
jpayne@69	437 i = 0
jpayne@69	438 while i < len(self.param_names):
jpayne@69	439 if self.param_names[i] in [
jpayne@69	440 "sequenceA",
jpayne@69	441 "sequenceB",
jpayne@69	442 "gap_A_fn",
jpayne@69	443 "gap_B_fn",
jpayne@69	444 "match_fn",
jpayne@69	445 ]:
jpayne@69	446 keywds[self.param_names[i]] = args[i]
jpayne@69	447 i += 1
jpayne@69	448 elif self.param_names[i] == "match":
jpayne@69	449 assert self.param_names[i + 1] == "mismatch"
jpayne@69	450 match, mismatch = args[i], args[i + 1]
jpayne@69	451 keywds["match_fn"] = identity_match(match, mismatch)
jpayne@69	452 i += 2
jpayne@69	453 elif self.param_names[i] == "match_dict":
jpayne@69	454 keywds["match_fn"] = dictionary_match(args[i])
jpayne@69	455 i += 1
jpayne@69	456 elif self.param_names[i] == "open":
jpayne@69	457 assert self.param_names[i + 1] == "extend"
jpayne@69	458 open, extend = args[i], args[i + 1]
jpayne@69	459 pe = keywds.get("penalize_extend_when_opening", 0)
jpayne@69	460 keywds["gap_A_fn"] = affine_penalty(open, extend, pe)
jpayne@69	461 keywds["gap_B_fn"] = affine_penalty(open, extend, pe)
jpayne@69	462 i += 2
jpayne@69	463 elif self.param_names[i] == "openA":
jpayne@69	464 assert self.param_names[i + 3] == "extendB"
jpayne@69	465 openA, extendA, openB, extendB = args[i : i + 4]
jpayne@69	466 pe = keywds.get("penalize_extend_when_opening", 0)
jpayne@69	467 keywds["gap_A_fn"] = affine_penalty(openA, extendA, pe)
jpayne@69	468 keywds["gap_B_fn"] = affine_penalty(openB, extendB, pe)
jpayne@69	469 i += 4
jpayne@69	470 else:
jpayne@69	471 raise ValueError(f"unknown parameter {self.param_names[i]!r}")
jpayne@69	472
jpayne@69	473 # Here are the default parameters for _align. Assign
jpayne@69	474 # these to keywds, unless already specified.
jpayne@69	475 pe = keywds.get("penalize_extend_when_opening", 0)
jpayne@69	476 default_params = [
jpayne@69	477 ("match_fn", identity_match(1, 0)),
jpayne@69	478 ("gap_A_fn", affine_penalty(0, 0, pe)),
jpayne@69	479 ("gap_B_fn", affine_penalty(0, 0, pe)),
jpayne@69	480 ("penalize_extend_when_opening", 0),
jpayne@69	481 ("penalize_end_gaps", self.align_type == "global"),
jpayne@69	482 ("align_globally", self.align_type == "global"),
jpayne@69	483 ("gap_char", "-"),
jpayne@69	484 ("force_generic", 0),
jpayne@69	485 ("score_only", 0),
jpayne@69	486 ("one_alignment_only", 0),
jpayne@69	487 ]
jpayne@69	488 for name, default in default_params:
jpayne@69	489 keywds[name] = keywds.get(name, default)
jpayne@69	490 value = keywds["penalize_end_gaps"]
jpayne@69	491 try:
jpayne@69	492 n = len(value)
jpayne@69	493 except TypeError:
jpayne@69	494 keywds["penalize_end_gaps"] = tuple([value] * 2)
jpayne@69	495 else:
jpayne@69	496 assert n == 2
jpayne@69	497 return keywds
jpayne@69	498
jpayne@69	499 def __call__(self, args, *keywds):
jpayne@69	500 """Call the alignment instance already created."""
jpayne@69	501 keywds = self.decode(args, *keywds)
jpayne@69	502 return _align(**keywds)
jpayne@69	503
jpayne@69	504 def __getattr__(self, attr):
jpayne@69	505 """Call alignment_function() to check and decode the attributes."""
jpayne@69	506 # The following 'magic' is needed to rewrite the class docstring
jpayne@69	507 # dynamically:
jpayne@69	508 wrapper = self.alignment_function(attr)
jpayne@69	509 wrapper_type = type(wrapper)
jpayne@69	510 wrapper_dict = wrapper_type.__dict__.copy()
jpayne@69	511 wrapper_dict["__doc__"] = wrapper.__doc__
jpayne@69	512 new_alignment_function = type("alignment_function", (object,), wrapper_dict)
jpayne@69	513
jpayne@69	514 return new_alignment_function(attr)
jpayne@69	515
jpayne@69	516
jpayne@69	517 align = align()
jpayne@69	518
jpayne@69	519
jpayne@69	520 def _align(
jpayne@69	521 sequenceA,
jpayne@69	522 sequenceB,
jpayne@69	523 match_fn,
jpayne@69	524 gap_A_fn,
jpayne@69	525 gap_B_fn,
jpayne@69	526 penalize_extend_when_opening,
jpayne@69	527 penalize_end_gaps,
jpayne@69	528 align_globally,
jpayne@69	529 gap_char,
jpayne@69	530 force_generic,
jpayne@69	531 score_only,
jpayne@69	532 one_alignment_only,
jpayne@69	533 ):
jpayne@69	534 """Return optimal alignments between two sequences (PRIVATE).
jpayne@69	535
jpayne@69	536 This method either returns a list of optimal alignments (with the same
jpayne@69	537 score) or just the optimal score.
jpayne@69	538 """
jpayne@69	539 if not sequenceA or not sequenceB:
jpayne@69	540 return []
jpayne@69	541 try:
jpayne@69	542 sequenceA + gap_char
jpayne@69	543 sequenceB + gap_char
jpayne@69	544 except TypeError:
jpayne@69	545 raise TypeError(
jpayne@69	546 "both sequences must be of the same type, either "
jpayne@69	547 "string/sequence object or list. Gap character must "
jpayne@69	548 "fit the sequence type (string or list)"
jpayne@69	549 )
jpayne@69	550
jpayne@69	551 if not isinstance(sequenceA, list):
jpayne@69	552 sequenceA = str(sequenceA)
jpayne@69	553 if not isinstance(sequenceB, list):
jpayne@69	554 sequenceB = str(sequenceB)
jpayne@69	555 if not align_globally and (penalize_end_gaps[0] or penalize_end_gaps[1]):
jpayne@69	556 warnings.warn(
jpayne@69	557 '"penalize_end_gaps" should not be used in local '
jpayne@69	558 "alignments. The resulting score may be wrong.",
jpayne@69	559 BiopythonWarning,
jpayne@69	560 )
jpayne@69	561
jpayne@69	562 if (
jpayne@69	563 (not force_generic)
jpayne@69	564 and isinstance(gap_A_fn, affine_penalty)
jpayne@69	565 and isinstance(gap_B_fn, affine_penalty)
jpayne@69	566 ):
jpayne@69	567 open_A, extend_A = gap_A_fn.open, gap_A_fn.extend
jpayne@69	568 open_B, extend_B = gap_B_fn.open, gap_B_fn.extend
jpayne@69	569 matrices = _make_score_matrix_fast(
jpayne@69	570 sequenceA,
jpayne@69	571 sequenceB,
jpayne@69	572 match_fn,
jpayne@69	573 open_A,
jpayne@69	574 extend_A,
jpayne@69	575 open_B,
jpayne@69	576 extend_B,
jpayne@69	577 penalize_extend_when_opening,
jpayne@69	578 penalize_end_gaps,
jpayne@69	579 align_globally,
jpayne@69	580 score_only,
jpayne@69	581 )
jpayne@69	582 else:
jpayne@69	583 matrices = _make_score_matrix_generic(
jpayne@69	584 sequenceA,
jpayne@69	585 sequenceB,
jpayne@69	586 match_fn,
jpayne@69	587 gap_A_fn,
jpayne@69	588 gap_B_fn,
jpayne@69	589 penalize_end_gaps,
jpayne@69	590 align_globally,
jpayne@69	591 score_only,
jpayne@69	592 )
jpayne@69	593
jpayne@69	594 score_matrix, trace_matrix, best_score = matrices
jpayne@69	595
jpayne@69	596 # print("SCORE %s" % print_matrix(score_matrix))
jpayne@69	597 # print("TRACEBACK %s" % print_matrix(trace_matrix))
jpayne@69	598
jpayne@69	599 # If they only want the score, then return it.
jpayne@69	600 if score_only:
jpayne@69	601 return best_score
jpayne@69	602
jpayne@69	603 starts = _find_start(score_matrix, best_score, align_globally)
jpayne@69	604
jpayne@69	605 # Recover the alignments and return them.
jpayne@69	606 alignments = _recover_alignments(
jpayne@69	607 sequenceA,
jpayne@69	608 sequenceB,
jpayne@69	609 starts,
jpayne@69	610 best_score,
jpayne@69	611 score_matrix,
jpayne@69	612 trace_matrix,
jpayne@69	613 align_globally,
jpayne@69	614 gap_char,
jpayne@69	615 one_alignment_only,
jpayne@69	616 gap_A_fn,
jpayne@69	617 gap_B_fn,
jpayne@69	618 )
jpayne@69	619 if not alignments:
jpayne@69	620 # This may happen, see recover_alignments for explanation
jpayne@69	621 score_matrix, trace_matrix = _reverse_matrices(score_matrix, trace_matrix)
jpayne@69	622 starts = [(z, (y, x)) for z, (x, y) in starts]
jpayne@69	623 alignments = _recover_alignments(
jpayne@69	624 sequenceB,
jpayne@69	625 sequenceA,
jpayne@69	626 starts,
jpayne@69	627 best_score,
jpayne@69	628 score_matrix,
jpayne@69	629 trace_matrix,
jpayne@69	630 align_globally,
jpayne@69	631 gap_char,
jpayne@69	632 one_alignment_only,
jpayne@69	633 gap_B_fn,
jpayne@69	634 gap_A_fn,
jpayne@69	635 reverse=True,
jpayne@69	636 )
jpayne@69	637 return alignments
jpayne@69	638
jpayne@69	639
jpayne@69	640 def _make_score_matrix_generic(
jpayne@69	641 sequenceA,
jpayne@69	642 sequenceB,
jpayne@69	643 match_fn,
jpayne@69	644 gap_A_fn,
jpayne@69	645 gap_B_fn,
jpayne@69	646 penalize_end_gaps,
jpayne@69	647 align_globally,
jpayne@69	648 score_only,
jpayne@69	649 ):
jpayne@69	650 """Generate a score and traceback matrix (PRIVATE).
jpayne@69	651
jpayne@69	652 This implementation according to Needleman-Wunsch allows the usage of
jpayne@69	653 general gap functions and is rather slow. It is automatically called if
jpayne@69	654 you define your own gap functions. You can force the usage of this method
jpayne@69	655 with ``force_generic=True``.
jpayne@69	656 """
jpayne@69	657 local_max_score = 0
jpayne@69	658 # Create the score and traceback matrices. These should be in the
jpayne@69	659 # shape:
jpayne@69	660 # sequenceA (down) x sequenceB (across)
jpayne@69	661 lenA, lenB = len(sequenceA), len(sequenceB)
jpayne@69	662 score_matrix, trace_matrix = [], []
jpayne@69	663 for i in range(lenA + 1):
jpayne@69	664 score_matrix.append([None] * (lenB + 1))
jpayne@69	665 if not score_only:
jpayne@69	666 trace_matrix.append([None] * (lenB + 1))
jpayne@69	667
jpayne@69	668 # Initialize first row and column with gap scores. This is like opening up
jpayne@69	669 # i gaps at the beginning of sequence A or B.
jpayne@69	670 for i in range(lenA + 1):
jpayne@69	671 if penalize_end_gaps[1]: # [1]:gap in sequence B
jpayne@69	672 score = gap_B_fn(0, i)
jpayne@69	673 else:
jpayne@69	674 score = 0.0
jpayne@69	675 score_matrix[i][0] = score
jpayne@69	676 for i in range(lenB + 1):
jpayne@69	677 if penalize_end_gaps[0]: # [0]:gap in sequence A
jpayne@69	678 score = gap_A_fn(0, i)
jpayne@69	679 else:
jpayne@69	680 score = 0.0
jpayne@69	681 score_matrix[0][i] = score
jpayne@69	682
jpayne@69	683 # Fill in the score matrix. Each position in the matrix
jpayne@69	684 # represents an alignment between a character from sequence A to
jpayne@69	685 # one in sequence B. As I iterate through the matrix, find the
jpayne@69	686 # alignment by choose the best of:
jpayne@69	687 # 1) extending a previous alignment without gaps
jpayne@69	688 # 2) adding a gap in sequenceA
jpayne@69	689 # 3) adding a gap in sequenceB
jpayne@69	690 for row in range(1, lenA + 1):
jpayne@69	691 for col in range(1, lenB + 1):
jpayne@69	692 # First, calculate the score that would occur by extending
jpayne@69	693 # the alignment without gaps.
jpayne@69	694 # fmt: off
jpayne@69	695 nogap_score = (
jpayne@69	696 score_matrix[row - 1][col - 1]
jpayne@69	697 + match_fn(sequenceA[row - 1], sequenceB[col - 1])
jpayne@69	698 )
jpayne@69	699
jpayne@69	700 # fmt: on
jpayne@69	701 # Try to find a better score by opening gaps in sequenceA.
jpayne@69	702 # Do this by checking alignments from each column in the row.
jpayne@69	703 # Each column represents a different character to align from,
jpayne@69	704 # and thus a different length gap.
jpayne@69	705 # Although the gap function does not distinguish between opening
jpayne@69	706 # and extending a gap, we distinguish them for the backtrace.
jpayne@69	707 if not penalize_end_gaps[0] and row == lenA:
jpayne@69	708 row_open = score_matrix[row][col - 1]
jpayne@69	709 row_extend = max(score_matrix[row][x] for x in range(col))
jpayne@69	710 else:
jpayne@69	711 row_open = score_matrix[row][col - 1] + gap_A_fn(row, 1)
jpayne@69	712 row_extend = max(
jpayne@69	713 score_matrix[row][x] + gap_A_fn(row, col - x) for x in range(col)
jpayne@69	714 )
jpayne@69	715
jpayne@69	716 # Try to find a better score by opening gaps in sequenceB.
jpayne@69	717 if not penalize_end_gaps[1] and col == lenB:
jpayne@69	718 col_open = score_matrix[row - 1][col]
jpayne@69	719 col_extend = max(score_matrix[x][col] for x in range(row))
jpayne@69	720 else:
jpayne@69	721 col_open = score_matrix[row - 1][col] + gap_B_fn(col, 1)
jpayne@69	722 col_extend = max(
jpayne@69	723 score_matrix[x][col] + gap_B_fn(col, row - x) for x in range(row)
jpayne@69	724 )
jpayne@69	725
jpayne@69	726 best_score = max(nogap_score, row_open, row_extend, col_open, col_extend)
jpayne@69	727 local_max_score = max(local_max_score, best_score)
jpayne@69	728 if not align_globally and best_score < 0:
jpayne@69	729 score_matrix[row][col] = 0.0
jpayne@69	730 else:
jpayne@69	731 score_matrix[row][col] = best_score
jpayne@69	732
jpayne@69	733 # The backtrace is encoded binary. See _make_score_matrix_fast
jpayne@69	734 # for details.
jpayne@69	735 if not score_only:
jpayne@69	736 trace_score = 0
jpayne@69	737 if rint(nogap_score) == rint(best_score):
jpayne@69	738 trace_score += 2
jpayne@69	739 if rint(row_open) == rint(best_score):
jpayne@69	740 trace_score += 1
jpayne@69	741 if rint(row_extend) == rint(best_score):
jpayne@69	742 trace_score += 8
jpayne@69	743 if rint(col_open) == rint(best_score):
jpayne@69	744 trace_score += 4
jpayne@69	745 if rint(col_extend) == rint(best_score):
jpayne@69	746 trace_score += 16
jpayne@69	747 trace_matrix[row][col] = trace_score
jpayne@69	748
jpayne@69	749 if not align_globally:
jpayne@69	750 best_score = local_max_score
jpayne@69	751
jpayne@69	752 return score_matrix, trace_matrix, best_score
jpayne@69	753
jpayne@69	754
jpayne@69	755 def _make_score_matrix_fast(
jpayne@69	756 sequenceA,
jpayne@69	757 sequenceB,
jpayne@69	758 match_fn,
jpayne@69	759 open_A,
jpayne@69	760 extend_A,
jpayne@69	761 open_B,
jpayne@69	762 extend_B,
jpayne@69	763 penalize_extend_when_opening,
jpayne@69	764 penalize_end_gaps,
jpayne@69	765 align_globally,
jpayne@69	766 score_only,
jpayne@69	767 ):
jpayne@69	768 """Generate a score and traceback matrix according to Gotoh (PRIVATE).
jpayne@69	769
jpayne@69	770 This is an implementation of the Needleman-Wunsch dynamic programming
jpayne@69	771 algorithm as modified by Gotoh, implementing affine gap penalties.
jpayne@69	772 In short, we have three matrices, holding scores for alignments ending
jpayne@69	773 in (1) a match/mismatch, (2) a gap in sequence A, and (3) a gap in
jpayne@69	774 sequence B, respectively. However, we can combine them in one matrix,
jpayne@69	775 which holds the best scores, and store only those values from the
jpayne@69	776 other matrices that are actually used for the next step of calculation.
jpayne@69	777 The traceback matrix holds the positions for backtracing the alignment.
jpayne@69	778 """
jpayne@69	779 first_A_gap = calc_affine_penalty(1, open_A, extend_A, penalize_extend_when_opening)
jpayne@69	780 first_B_gap = calc_affine_penalty(1, open_B, extend_B, penalize_extend_when_opening)
jpayne@69	781 local_max_score = 0
jpayne@69	782
jpayne@69	783 # Create the score and traceback matrices. These should be in the
jpayne@69	784 # shape:
jpayne@69	785 # sequenceA (down) x sequenceB (across)
jpayne@69	786 lenA, lenB = len(sequenceA), len(sequenceB)
jpayne@69	787 score_matrix, trace_matrix = [], []
jpayne@69	788 for i in range(lenA + 1):
jpayne@69	789 score_matrix.append([None] * (lenB + 1))
jpayne@69	790 if not score_only:
jpayne@69	791 trace_matrix.append([None] * (lenB + 1))
jpayne@69	792
jpayne@69	793 # Initialize first row and column with gap scores. This is like opening up
jpayne@69	794 # i gaps at the beginning of sequence A or B.
jpayne@69	795 for i in range(lenA + 1):
jpayne@69	796 if penalize_end_gaps[1]: # [1]:gap in sequence B
jpayne@69	797 score = calc_affine_penalty(
jpayne@69	798 i, open_B, extend_B, penalize_extend_when_opening
jpayne@69	799 )
jpayne@69	800 else:
jpayne@69	801 score = 0
jpayne@69	802 score_matrix[i][0] = score
jpayne@69	803 for i in range(lenB + 1):
jpayne@69	804 if penalize_end_gaps[0]: # [0]:gap in sequence A
jpayne@69	805 score = calc_affine_penalty(
jpayne@69	806 i, open_A, extend_A, penalize_extend_when_opening
jpayne@69	807 )
jpayne@69	808 else:
jpayne@69	809 score = 0
jpayne@69	810 score_matrix[0][i] = score
jpayne@69	811
jpayne@69	812 # Now initialize the col 'matrix'. Actually this is only a one dimensional
jpayne@69	813 # list, since we only need the col scores from the last row.
jpayne@69	814 col_score = [0] # Best score, if actual alignment ends with gap in seqB
jpayne@69	815 for i in range(1, lenB + 1):
jpayne@69	816 col_score.append(
jpayne@69	817 calc_affine_penalty(i, 2 * open_B, extend_B, penalize_extend_when_opening)
jpayne@69	818 )
jpayne@69	819
jpayne@69	820 # The row 'matrix' is calculated on the fly. Here we only need the actual
jpayne@69	821 # score.
jpayne@69	822 # Now, filling up the score and traceback matrices:
jpayne@69	823 for row in range(1, lenA + 1):
jpayne@69	824 row_score = calc_affine_penalty(
jpayne@69	825 row, 2 * open_A, extend_A, penalize_extend_when_opening
jpayne@69	826 )
jpayne@69	827 for col in range(1, lenB + 1):
jpayne@69	828 # Calculate the score that would occur by extending the
jpayne@69	829 # alignment without gaps.
jpayne@69	830 # fmt: off
jpayne@69	831 nogap_score = (
jpayne@69	832 score_matrix[row - 1][col - 1]
jpayne@69	833 + match_fn(sequenceA[row - 1], sequenceB[col - 1])
jpayne@69	834 )
jpayne@69	835 # fmt: on
jpayne@69	836 # Check the score that would occur if there were a gap in
jpayne@69	837 # sequence A. This could come from opening a new gap or
jpayne@69	838 # extending an existing one.
jpayne@69	839 # A gap in sequence A can also be opened if it follows a gap in
jpayne@69	840 # sequence B: A-
jpayne@69	841 # -B
jpayne@69	842 if not penalize_end_gaps[0] and row == lenA:
jpayne@69	843 row_open = score_matrix[row][col - 1]
jpayne@69	844 row_extend = row_score
jpayne@69	845 else:
jpayne@69	846 row_open = score_matrix[row][col - 1] + first_A_gap
jpayne@69	847 row_extend = row_score + extend_A
jpayne@69	848 row_score = max(row_open, row_extend)
jpayne@69	849
jpayne@69	850 # The same for sequence B:
jpayne@69	851 if not penalize_end_gaps[1] and col == lenB:
jpayne@69	852 col_open = score_matrix[row - 1][col]
jpayne@69	853 col_extend = col_score[col]
jpayne@69	854 else:
jpayne@69	855 col_open = score_matrix[row - 1][col] + first_B_gap
jpayne@69	856 col_extend = col_score[col] + extend_B
jpayne@69	857 col_score[col] = max(col_open, col_extend)
jpayne@69	858
jpayne@69	859 best_score = max(nogap_score, col_score[col], row_score)
jpayne@69	860 local_max_score = max(local_max_score, best_score)
jpayne@69	861 if not align_globally and best_score < 0:
jpayne@69	862 score_matrix[row][col] = 0
jpayne@69	863 else:
jpayne@69	864 score_matrix[row][col] = best_score
jpayne@69	865
jpayne@69	866 # Now the trace_matrix. The edges of the backtrace are encoded
jpayne@69	867 # binary: 1 = open gap in seqA, 2 = match/mismatch of seqA and
jpayne@69	868 # seqB, 4 = open gap in seqB, 8 = extend gap in seqA, and
jpayne@69	869 # 16 = extend gap in seqB. This values can be summed up.
jpayne@69	870 # Thus, the trace score 7 means that the best score can either
jpayne@69	871 # come from opening a gap in seqA (=1), pairing two characters
jpayne@69	872 # of seqA and seqB (+2=3) or opening a gap in seqB (+4=7).
jpayne@69	873 # However, if we only want the score we don't care about the trace.
jpayne@69	874 if not score_only:
jpayne@69	875 row_score_rint = rint(row_score)
jpayne@69	876 col_score_rint = rint(col_score[col])
jpayne@69	877 row_trace_score = 0
jpayne@69	878 col_trace_score = 0
jpayne@69	879 if rint(row_open) == row_score_rint:
jpayne@69	880 row_trace_score += 1 # Open gap in seqA
jpayne@69	881 if rint(row_extend) == row_score_rint:
jpayne@69	882 row_trace_score += 8 # Extend gap in seqA
jpayne@69	883 if rint(col_open) == col_score_rint:
jpayne@69	884 col_trace_score += 4 # Open gap in seqB
jpayne@69	885 if rint(col_extend) == col_score_rint:
jpayne@69	886 col_trace_score += 16 # Extend gap in seqB
jpayne@69	887
jpayne@69	888 trace_score = 0
jpayne@69	889 best_score_rint = rint(best_score)
jpayne@69	890 if rint(nogap_score) == best_score_rint:
jpayne@69	891 trace_score += 2 # Align seqA with seqB
jpayne@69	892 if row_score_rint == best_score_rint:
jpayne@69	893 trace_score += row_trace_score
jpayne@69	894 if col_score_rint == best_score_rint:
jpayne@69	895 trace_score += col_trace_score
jpayne@69	896 trace_matrix[row][col] = trace_score
jpayne@69	897
jpayne@69	898 if not align_globally:
jpayne@69	899 best_score = local_max_score
jpayne@69	900
jpayne@69	901 return score_matrix, trace_matrix, best_score
jpayne@69	902
jpayne@69	903
jpayne@69	904 def _recover_alignments(
jpayne@69	905 sequenceA,
jpayne@69	906 sequenceB,
jpayne@69	907 starts,
jpayne@69	908 best_score,
jpayne@69	909 score_matrix,
jpayne@69	910 trace_matrix,
jpayne@69	911 align_globally,
jpayne@69	912 gap_char,
jpayne@69	913 one_alignment_only,
jpayne@69	914 gap_A_fn,
jpayne@69	915 gap_B_fn,
jpayne@69	916 reverse=False,
jpayne@69	917 ):
jpayne@69	918 """Do the backtracing and return a list of alignments (PRIVATE).
jpayne@69	919
jpayne@69	920 Recover the alignments by following the traceback matrix. This
jpayne@69	921 is a recursive procedure, but it's implemented here iteratively
jpayne@69	922 with a stack.
jpayne@69	923
jpayne@69	924 sequenceA and sequenceB may be sequences, including strings,
jpayne@69	925 lists, or list-like objects. In order to preserve the type of
jpayne@69	926 the object, we need to use slices on the sequences instead of
jpayne@69	927 indexes. For example, sequenceA[row] may return a type that's
jpayne@69	928 not compatible with sequenceA, e.g. if sequenceA is a list and
jpayne@69	929 sequenceA[row] is a string. Thus, avoid using indexes and use
jpayne@69	930 slices, e.g. sequenceA[row:row+1]. Assume that client-defined
jpayne@69	931 sequence classes preserve these semantics.
jpayne@69	932 """
jpayne@69	933 lenA, lenB = len(sequenceA), len(sequenceB)
jpayne@69	934 ali_seqA, ali_seqB = sequenceA[0:0], sequenceB[0:0]
jpayne@69	935 tracebacks = []
jpayne@69	936 in_process = []
jpayne@69	937
jpayne@69	938 for start in starts:
jpayne@69	939 score, (row, col) = start
jpayne@69	940 begin = 0
jpayne@69	941 if align_globally:
jpayne@69	942 end = None
jpayne@69	943 else:
jpayne@69	944 # If this start is a zero-extension: don't start here!
jpayne@69	945 if (score, (row - 1, col - 1)) in starts:
jpayne@69	946 continue
jpayne@69	947 # Local alignments should start with a positive score!
jpayne@69	948 if score <= 0:
jpayne@69	949 continue
jpayne@69	950 # Local alignments should not end with a gap!:
jpayne@69	951 trace = trace_matrix[row][col]
jpayne@69	952 if (trace - trace % 2) % 4 == 2: # Trace contains 'nogap', fine!
jpayne@69	953 trace_matrix[row][col] = 2
jpayne@69	954 # If not, don't start here!
jpayne@69	955 else:
jpayne@69	956 continue
jpayne@69	957 end = -max(lenA - row, lenB - col)
jpayne@69	958 if not end:
jpayne@69	959 end = None
jpayne@69	960 col_distance = lenB - col
jpayne@69	961 row_distance = lenA - row
jpayne@69	962
jpayne@69	963 # fmt: off
jpayne@69	964 ali_seqA = (
jpayne@69	965 (col_distance - row_distance) * gap_char
jpayne@69	966 + sequenceA[lenA - 1 : row - 1 : -1]
jpayne@69	967 )
jpayne@69	968 ali_seqB = (
jpayne@69	969 (row_distance - col_distance) * gap_char
jpayne@69	970 + sequenceB[lenB - 1 : col - 1 : -1]
jpayne@69	971 )
jpayne@69	972 # fmt: on
jpayne@69	973 in_process += [
jpayne@69	974 (ali_seqA, ali_seqB, end, row, col, False, trace_matrix[row][col])
jpayne@69	975 ]
jpayne@69	976 while in_process and len(tracebacks) < MAX_ALIGNMENTS:
jpayne@69	977 # Although we allow a gap in seqB to be followed by a gap in seqA,
jpayne@69	978 # we don't want to allow it the other way round, since this would
jpayne@69	979 # give redundant alignments of type: A- vs. -A
jpayne@69	980 # -B B-
jpayne@69	981 # Thus we need to keep track if a gap in seqA was opened (col_gap)
jpayne@69	982 # and stop the backtrace (dead_end) if a gap in seqB follows.
jpayne@69	983 #
jpayne@69	984 # Attention: This may fail, if the gap-penalties for both strands are
jpayne@69	985 # different. In this case the second alignment may be the only optimal
jpayne@69	986 # alignment. Thus it can happen that no alignment is returned. For
jpayne@69	987 # this case a workaround was implemented, which reverses the input and
jpayne@69	988 # the matrices (this happens in _reverse_matrices) and repeats the
jpayne@69	989 # backtrace. The variable 'reverse' keeps track of this.
jpayne@69	990 dead_end = False
jpayne@69	991 ali_seqA, ali_seqB, end, row, col, col_gap, trace = in_process.pop()
jpayne@69	992
jpayne@69	993 while (row > 0 or col > 0) and not dead_end:
jpayne@69	994 cache = (ali_seqA[:], ali_seqB[:], end, row, col, col_gap)
jpayne@69	995
jpayne@69	996 # If trace is empty we have reached at least one border of the
jpayne@69	997 # matrix or the end of a local alignment. Just add the rest of
jpayne@69	998 # the sequence(s) and fill with gaps if necessary.
jpayne@69	999 if not trace:
jpayne@69	1000 if col and col_gap:
jpayne@69	1001 dead_end = True
jpayne@69	1002 else:
jpayne@69	1003 ali_seqA, ali_seqB = _finish_backtrace(
jpayne@69	1004 sequenceA, sequenceB, ali_seqA, ali_seqB, row, col, gap_char
jpayne@69	1005 )
jpayne@69	1006 break
jpayne@69	1007 elif trace % 2 == 1: # = row open = open gap in seqA
jpayne@69	1008 trace -= 1
jpayne@69	1009 if col_gap:
jpayne@69	1010 dead_end = True
jpayne@69	1011 else:
jpayne@69	1012 col -= 1
jpayne@69	1013 ali_seqA += gap_char
jpayne@69	1014 ali_seqB += sequenceB[col : col + 1]
jpayne@69	1015 col_gap = False
jpayne@69	1016 elif trace % 4 == 2: # = match/mismatch of seqA with seqB
jpayne@69	1017 trace -= 2
jpayne@69	1018 row -= 1
jpayne@69	1019 col -= 1
jpayne@69	1020 ali_seqA += sequenceA[row : row + 1]
jpayne@69	1021 ali_seqB += sequenceB[col : col + 1]
jpayne@69	1022 col_gap = False
jpayne@69	1023 elif trace % 8 == 4: # = col open = open gap in seqB
jpayne@69	1024 trace -= 4
jpayne@69	1025 row -= 1
jpayne@69	1026 ali_seqA += sequenceA[row : row + 1]
jpayne@69	1027 ali_seqB += gap_char
jpayne@69	1028 col_gap = True
jpayne@69	1029 elif trace in (8, 24): # = row extend = extend gap in seqA
jpayne@69	1030 trace -= 8
jpayne@69	1031 if col_gap:
jpayne@69	1032 dead_end = True
jpayne@69	1033 else:
jpayne@69	1034 col_gap = False
jpayne@69	1035 # We need to find the starting point of the extended gap
jpayne@69	1036 x = _find_gap_open(
jpayne@69	1037 sequenceA,
jpayne@69	1038 sequenceB,
jpayne@69	1039 ali_seqA,
jpayne@69	1040 ali_seqB,
jpayne@69	1041 end,
jpayne@69	1042 row,
jpayne@69	1043 col,
jpayne@69	1044 col_gap,
jpayne@69	1045 gap_char,
jpayne@69	1046 score_matrix,
jpayne@69	1047 trace_matrix,
jpayne@69	1048 in_process,
jpayne@69	1049 gap_A_fn,
jpayne@69	1050 col,
jpayne@69	1051 row,
jpayne@69	1052 "col",
jpayne@69	1053 best_score,
jpayne@69	1054 align_globally,
jpayne@69	1055 )
jpayne@69	1056 ali_seqA, ali_seqB, row, col, in_process, dead_end = x
jpayne@69	1057 elif trace == 16: # = col extend = extend gap in seqB
jpayne@69	1058 trace -= 16
jpayne@69	1059 col_gap = True
jpayne@69	1060 x = _find_gap_open(
jpayne@69	1061 sequenceA,
jpayne@69	1062 sequenceB,
jpayne@69	1063 ali_seqA,
jpayne@69	1064 ali_seqB,
jpayne@69	1065 end,
jpayne@69	1066 row,
jpayne@69	1067 col,
jpayne@69	1068 col_gap,
jpayne@69	1069 gap_char,
jpayne@69	1070 score_matrix,
jpayne@69	1071 trace_matrix,
jpayne@69	1072 in_process,
jpayne@69	1073 gap_B_fn,
jpayne@69	1074 row,
jpayne@69	1075 col,
jpayne@69	1076 "row",
jpayne@69	1077 best_score,
jpayne@69	1078 align_globally,
jpayne@69	1079 )
jpayne@69	1080 ali_seqA, ali_seqB, row, col, in_process, dead_end = x
jpayne@69	1081
jpayne@69	1082 if trace: # There is another path to follow...
jpayne@69	1083 cache += (trace,)
jpayne@69	1084 in_process.append(cache)
jpayne@69	1085 trace = trace_matrix[row][col]
jpayne@69	1086 if not align_globally:
jpayne@69	1087 if score_matrix[row][col] == best_score:
jpayne@69	1088 # We have gone through a 'zero-score' extension, discard it
jpayne@69	1089 dead_end = True
jpayne@69	1090 elif score_matrix[row][col] <= 0:
jpayne@69	1091 # We have reached the end of the backtrace
jpayne@69	1092 begin = max(row, col)
jpayne@69	1093 trace = 0
jpayne@69	1094 if not dead_end:
jpayne@69	1095 if not reverse:
jpayne@69	1096 tracebacks.append((ali_seqA[::-1], ali_seqB[::-1], score, begin, end))
jpayne@69	1097 else:
jpayne@69	1098 tracebacks.append((ali_seqB[::-1], ali_seqA[::-1], score, begin, end))
jpayne@69	1099 if one_alignment_only:
jpayne@69	1100 break
jpayne@69	1101 return _clean_alignments(tracebacks)
jpayne@69	1102
jpayne@69	1103
jpayne@69	1104 def _find_start(score_matrix, best_score, align_globally):
jpayne@69	1105 """Return a list of starting points (score, (row, col)) (PRIVATE).
jpayne@69	1106
jpayne@69	1107 Indicating every possible place to start the tracebacks.
jpayne@69	1108 """
jpayne@69	1109 nrows, ncols = len(score_matrix), len(score_matrix[0])
jpayne@69	1110 # In this implementation of the global algorithm, the start will always be
jpayne@69	1111 # the bottom right corner of the matrix.
jpayne@69	1112 if align_globally:
jpayne@69	1113 starts = [(best_score, (nrows - 1, ncols - 1))]
jpayne@69	1114 else:
jpayne@69	1115 # For local alignments, there may be many different start points.
jpayne@69	1116 starts = []
jpayne@69	1117 tolerance = 0 # XXX do anything with this?
jpayne@69	1118 # Now find all the positions within some tolerance of the best
jpayne@69	1119 # score.
jpayne@69	1120 for row in range(nrows):
jpayne@69	1121 for col in range(ncols):
jpayne@69	1122 score = score_matrix[row][col]
jpayne@69	1123 if rint(abs(score - best_score)) <= rint(tolerance):
jpayne@69	1124 starts.append((score, (row, col)))
jpayne@69	1125 return starts
jpayne@69	1126
jpayne@69	1127
jpayne@69	1128 def _reverse_matrices(score_matrix, trace_matrix):
jpayne@69	1129 """Reverse score and trace matrices (PRIVATE)."""
jpayne@69	1130 reverse_score_matrix = []
jpayne@69	1131 reverse_trace_matrix = []
jpayne@69	1132 # fmt: off
jpayne@69	1133 reverse_trace = {
jpayne@69	1134 1: 4, 2: 2, 3: 6, 4: 1, 5: 5, 6: 3, 7: 7, 8: 16, 9: 20, 10: 18, 11: 22, 12: 17,
jpayne@69	1135 13: 21, 14: 19, 15: 23, 16: 8, 17: 12, 18: 10, 19: 14, 20: 9, 21: 13, 22: 11,
jpayne@69	1136 23: 15, 24: 24, 25: 28, 26: 26, 27: 30, 28: 25, 29: 29, 30: 27, 31: 31,
jpayne@69	1137 None: None,
jpayne@69	1138 }
jpayne@69	1139 # fmt: on
jpayne@69	1140 for col in range(len(score_matrix[0])):
jpayne@69	1141 new_score_row = []
jpayne@69	1142 new_trace_row = []
jpayne@69	1143 for row in range(len(score_matrix)):
jpayne@69	1144 new_score_row.append(score_matrix[row][col])
jpayne@69	1145 new_trace_row.append(reverse_trace[trace_matrix[row][col]])
jpayne@69	1146 reverse_score_matrix.append(new_score_row)
jpayne@69	1147 reverse_trace_matrix.append(new_trace_row)
jpayne@69	1148 return reverse_score_matrix, reverse_trace_matrix
jpayne@69	1149
jpayne@69	1150
jpayne@69	1151 def _clean_alignments(alignments):
jpayne@69	1152 """Take a list of alignments and return a cleaned version (PRIVATE).
jpayne@69	1153
jpayne@69	1154 Remove duplicates, make sure begin and end are set correctly, remove
jpayne@69	1155 empty alignments.
jpayne@69	1156 """
jpayne@69	1157 unique_alignments = []
jpayne@69	1158 for align in alignments:
jpayne@69	1159 if align not in unique_alignments:
jpayne@69	1160 unique_alignments.append(align)
jpayne@69	1161 i = 0
jpayne@69	1162 while i < len(unique_alignments):
jpayne@69	1163 seqA, seqB, score, begin, end = unique_alignments[i]
jpayne@69	1164 # Make sure end is set reasonably.
jpayne@69	1165 if end is None: # global alignment
jpayne@69	1166 end = len(seqA)
jpayne@69	1167 elif end < 0:
jpayne@69	1168 end = end + len(seqA)
jpayne@69	1169 # If there's no alignment here, get rid of it.
jpayne@69	1170 if begin >= end:
jpayne@69	1171 del unique_alignments[i]
jpayne@69	1172 continue
jpayne@69	1173 unique_alignments[i] = Alignment(seqA, seqB, score, begin, end)
jpayne@69	1174 i += 1
jpayne@69	1175 return unique_alignments
jpayne@69	1176
jpayne@69	1177
jpayne@69	1178 def _finish_backtrace(sequenceA, sequenceB, ali_seqA, ali_seqB, row, col, gap_char):
jpayne@69	1179 """Add remaining sequences and fill with gaps if necessary (PRIVATE)."""
jpayne@69	1180 if row:
jpayne@69	1181 ali_seqA += sequenceA[row - 1 :: -1]
jpayne@69	1182 if col:
jpayne@69	1183 ali_seqB += sequenceB[col - 1 :: -1]
jpayne@69	1184 if row > col:
jpayne@69	1185 ali_seqB += gap_char * (len(ali_seqA) - len(ali_seqB))
jpayne@69	1186 elif col > row:
jpayne@69	1187 ali_seqA += gap_char * (len(ali_seqB) - len(ali_seqA))
jpayne@69	1188 return ali_seqA, ali_seqB
jpayne@69	1189
jpayne@69	1190
jpayne@69	1191 def _find_gap_open(
jpayne@69	1192 sequenceA,
jpayne@69	1193 sequenceB,
jpayne@69	1194 ali_seqA,
jpayne@69	1195 ali_seqB,
jpayne@69	1196 end,
jpayne@69	1197 row,
jpayne@69	1198 col,
jpayne@69	1199 col_gap,
jpayne@69	1200 gap_char,
jpayne@69	1201 score_matrix,
jpayne@69	1202 trace_matrix,
jpayne@69	1203 in_process,
jpayne@69	1204 gap_fn,
jpayne@69	1205 target,
jpayne@69	1206 index,
jpayne@69	1207 direction,
jpayne@69	1208 best_score,
jpayne@69	1209 align_globally,
jpayne@69	1210 ):
jpayne@69	1211 """Find the starting point(s) of the extended gap (PRIVATE)."""
jpayne@69	1212 dead_end = False
jpayne@69	1213 target_score = score_matrix[row][col]
jpayne@69	1214 for n in range(target):
jpayne@69	1215 if direction == "col":
jpayne@69	1216 col -= 1
jpayne@69	1217 ali_seqA += gap_char
jpayne@69	1218 ali_seqB += sequenceB[col : col + 1]
jpayne@69	1219 else:
jpayne@69	1220 row -= 1
jpayne@69	1221 ali_seqA += sequenceA[row : row + 1]
jpayne@69	1222 ali_seqB += gap_char
jpayne@69	1223 actual_score = score_matrix[row][col] + gap_fn(index, n + 1)
jpayne@69	1224 if not align_globally and score_matrix[row][col] == best_score:
jpayne@69	1225 # We have run through a 'zero-score' extension and discard it
jpayne@69	1226 dead_end = True
jpayne@69	1227 break
jpayne@69	1228 if rint(actual_score) == rint(target_score) and n > 0:
jpayne@69	1229 if not trace_matrix[row][col]:
jpayne@69	1230 break
jpayne@69	1231 else:
jpayne@69	1232 in_process.append(
jpayne@69	1233 (
jpayne@69	1234 ali_seqA[:],
jpayne@69	1235 ali_seqB[:],
jpayne@69	1236 end,
jpayne@69	1237 row,
jpayne@69	1238 col,
jpayne@69	1239 col_gap,
jpayne@69	1240 trace_matrix[row][col],
jpayne@69	1241 )
jpayne@69	1242 )
jpayne@69	1243 if not trace_matrix[row][col]:
jpayne@69	1244 dead_end = True
jpayne@69	1245 return ali_seqA, ali_seqB, row, col, in_process, dead_end
jpayne@69	1246
jpayne@69	1247
jpayne@69	1248 _PRECISION = 1000
jpayne@69	1249
jpayne@69	1250
jpayne@69	1251 def rint(x, precision=_PRECISION):
jpayne@69	1252 """Print number with declared precision."""
jpayne@69	1253 return int(x * precision + 0.5)
jpayne@69	1254
jpayne@69	1255
jpayne@69	1256 class identity_match:
jpayne@69	1257 """Create a match function for use in an alignment.
jpayne@69	1258
jpayne@69	1259 match and mismatch are the scores to give when two residues are equal
jpayne@69	1260 or unequal. By default, match is 1 and mismatch is 0.
jpayne@69	1261 """
jpayne@69	1262
jpayne@69	1263 def __init__(self, match=1, mismatch=0):
jpayne@69	1264 """Initialize the class."""
jpayne@69	1265 self.match = match
jpayne@69	1266 self.mismatch = mismatch
jpayne@69	1267
jpayne@69	1268 def __call__(self, charA, charB):
jpayne@69	1269 """Call a match function instance already created."""
jpayne@69	1270 if charA == charB:
jpayne@69	1271 return self.match
jpayne@69	1272 return self.mismatch
jpayne@69	1273
jpayne@69	1274
jpayne@69	1275 class dictionary_match:
jpayne@69	1276 """Create a match function for use in an alignment.
jpayne@69	1277
jpayne@69	1278 Attributes:
jpayne@69	1279 - score_dict - A dictionary where the keys are tuples (residue 1,
jpayne@69	1280 residue 2) and the values are the match scores between those residues.
jpayne@69	1281 - symmetric - A flag that indicates whether the scores are symmetric.
jpayne@69	1282
jpayne@69	1283 """
jpayne@69	1284
jpayne@69	1285 def __init__(self, score_dict, symmetric=1):
jpayne@69	1286 """Initialize the class."""
jpayne@69	1287 if isinstance(score_dict, substitution_matrices.Array):
jpayne@69	1288 score_dict = dict(score_dict) # Access to dict is much faster
jpayne@69	1289 self.score_dict = score_dict
jpayne@69	1290 self.symmetric = symmetric
jpayne@69	1291
jpayne@69	1292 def __call__(self, charA, charB):
jpayne@69	1293 """Call a dictionary match instance already created."""
jpayne@69	1294 if self.symmetric and (charA, charB) not in self.score_dict:
jpayne@69	1295 # If the score dictionary is symmetric, then look up the
jpayne@69	1296 # score both ways.
jpayne@69	1297 charB, charA = charA, charB
jpayne@69	1298 return self.score_dict[(charA, charB)]
jpayne@69	1299
jpayne@69	1300
jpayne@69	1301 class affine_penalty:
jpayne@69	1302 """Create a gap function for use in an alignment."""
jpayne@69	1303
jpayne@69	1304 def __init__(self, open, extend, penalize_extend_when_opening=0):
jpayne@69	1305 """Initialize the class."""
jpayne@69	1306 if open > 0 or extend > 0:
jpayne@69	1307 raise ValueError("Gap penalties should be non-positive.")
jpayne@69	1308 if not penalize_extend_when_opening and (extend < open):
jpayne@69	1309 raise ValueError(
jpayne@69	1310 "Gap opening penalty should be higher than "
jpayne@69	1311 "gap extension penalty (or equal)"
jpayne@69	1312 )
jpayne@69	1313 self.open, self.extend = open, extend
jpayne@69	1314 self.penalize_extend_when_opening = penalize_extend_when_opening
jpayne@69	1315
jpayne@69	1316 def __call__(self, index, length):
jpayne@69	1317 """Call a gap function instance already created."""
jpayne@69	1318 return calc_affine_penalty(
jpayne@69	1319 length, self.open, self.extend, self.penalize_extend_when_opening
jpayne@69	1320 )
jpayne@69	1321
jpayne@69	1322
jpayne@69	1323 def calc_affine_penalty(length, open, extend, penalize_extend_when_opening):
jpayne@69	1324 """Calculate a penalty score for the gap function."""
jpayne@69	1325 if length <= 0:
jpayne@69	1326 return 0.0
jpayne@69	1327 penalty = open + extend * length
jpayne@69	1328 if not penalize_extend_when_opening:
jpayne@69	1329 penalty -= extend
jpayne@69	1330 return penalty
jpayne@69	1331
jpayne@69	1332
jpayne@69	1333 def print_matrix(matrix):
jpayne@69	1334 """Print out a matrix for debugging purposes."""
jpayne@69	1335 # Transpose the matrix and get the length of the values in each column.
jpayne@69	1336 matrixT = [[] for x in range(len(matrix[0]))]
jpayne@69	1337 for i in range(len(matrix)):
jpayne@69	1338 for j in range(len(matrix[i])):
jpayne@69	1339 matrixT[j].append(len(str(matrix[i][j])))
jpayne@69	1340 ndigits = [max(x) for x in matrixT]
jpayne@69	1341 for i in range(len(matrix)):
jpayne@69	1342 # Using string formatting trick to add leading spaces,
jpayne@69	1343 print(
jpayne@69	1344 " ".join("%*s " % (ndigits[j], matrix[i][j]) for j in range(len(matrix[i])))
jpayne@69	1345 )
jpayne@69	1346
jpayne@69	1347
jpayne@69	1348 def format_alignment(align1, align2, score, begin, end, full_sequences=False):
jpayne@69	1349 """Format the alignment prettily into a string.
jpayne@69	1350
jpayne@69	1351 IMPORTANT: Gap symbol must be "-" (or ['-'] for lists)!
jpayne@69	1352
jpayne@69	1353 Since Biopython 1.71 identical matches are shown with a pipe
jpayne@69	1354 character, mismatches as a dot, and gaps as a space.
jpayne@69	1355
jpayne@69	1356 Prior releases just used the pipe character to indicate the
jpayne@69	1357 aligned region (matches, mismatches and gaps).
jpayne@69	1358
jpayne@69	1359 Also, in local alignments, if the alignment does not include
jpayne@69	1360 the whole sequences, now only the aligned part is shown,
jpayne@69	1361 together with the start positions of the aligned subsequences.
jpayne@69	1362 The start positions are 1-based; so start position n is the
jpayne@69	1363 n-th base/amino acid in the un-aligned sequence.
jpayne@69	1364
jpayne@69	1365 NOTE: This is different to the alignment's begin/end values,
jpayne@69	1366 which give the Python indices (0-based) of the bases/amino acids
jpayne@69	1367 in the aligned sequences.
jpayne@69	1368
jpayne@69	1369 If you want to restore the 'historic' behaviour, that means
jpayne@69	1370 displaying the whole sequences (including the non-aligned parts),
jpayne@69	1371 use ``full_sequences=True``. In this case, the non-aligned leading
jpayne@69	1372 and trailing parts are also indicated by spaces in the match-line.
jpayne@69	1373 """
jpayne@69	1374 align_begin = begin
jpayne@69	1375 align_end = end
jpayne@69	1376 start1 = start2 = ""
jpayne@69	1377 start_m = begin # Begin of match line (how many spaces to include)
jpayne@69	1378 # For local alignments:
jpayne@69	1379 if not full_sequences and (begin != 0 or end != len(align1)):
jpayne@69	1380 # Calculate the actual start positions in the un-aligned sequences
jpayne@69	1381 # This will only work if the gap symbol is '-' or ['-']!
jpayne@69	1382 start1 = str(len(align1[:begin]) - align1[:begin].count("-") + 1) + " "
jpayne@69	1383 start2 = str(len(align2[:begin]) - align2[:begin].count("-") + 1) + " "
jpayne@69	1384 start_m = max(len(start1), len(start2))
jpayne@69	1385 elif full_sequences:
jpayne@69	1386 start_m = 0
jpayne@69	1387 begin = 0
jpayne@69	1388 end = len(align1)
jpayne@69	1389
jpayne@69	1390 if isinstance(align1, list):
jpayne@69	1391 # List elements will be separated by spaces, since they can be
jpayne@69	1392 # of different lengths
jpayne@69	1393 align1 = [a + " " for a in align1]
jpayne@69	1394 align2 = [a + " " for a in align2]
jpayne@69	1395
jpayne@69	1396 s1_line = ["{:>{width}}".format(start1, width=start_m)] # seq1 line
jpayne@69	1397 m_line = [" " * start_m] # match line
jpayne@69	1398 s2_line = ["{:>{width}}".format(start2, width=start_m)] # seq2 line
jpayne@69	1399
jpayne@69	1400 for n, (a, b) in enumerate(zip(align1[begin:end], align2[begin:end])):
jpayne@69	1401 # Since list elements can be of different length, we center them,
jpayne@69	1402 # using the maximum length of the two compared elements as width
jpayne@69	1403 m_len = max(len(a), len(b))
jpayne@69	1404 s1_line.append("{:^{width}}".format(a, width=m_len))
jpayne@69	1405 s2_line.append("{:^{width}}".format(b, width=m_len))
jpayne@69	1406 if full_sequences and (n < align_begin or n >= align_end):
jpayne@69	1407 m_line.append("{:^{width}}".format(" ", width=m_len)) # space
jpayne@69	1408 continue
jpayne@69	1409 if a == b:
jpayne@69	1410 m_line.append("{:^{width}}".format("\|", width=m_len)) # match
jpayne@69	1411 elif a.strip() == "-" or b.strip() == "-":
jpayne@69	1412 m_line.append("{:^{width}}".format(" ", width=m_len)) # gap
jpayne@69	1413 else:
jpayne@69	1414 m_line.append("{:^{width}}".format(".", width=m_len)) # mismatch
jpayne@69	1415
jpayne@69	1416 s2_line.append(f"\n Score={score:g}\n")
jpayne@69	1417 return "\n".join(["".join(s1_line), "".join(m_line), "".join(s2_line)])
jpayne@69	1418
jpayne@69	1419
jpayne@69	1420 # Try and load C implementations of functions. If I can't,
jpayne@69	1421 # then throw a warning and use the pure Python implementations.
jpayne@69	1422 # The redefinition is deliberate, thus the no quality assurance
jpayne@69	1423 # flag for when using flake8.
jpayne@69	1424 # Before, we secure access to the pure Python functions (for testing purposes):
jpayne@69	1425
jpayne@69	1426 _python_make_score_matrix_fast = _make_score_matrix_fast
jpayne@69	1427 _python_rint = rint
jpayne@69	1428
jpayne@69	1429 try:
jpayne@69	1430 from .cpairwise2 import rint, _make_score_matrix_fast # noqa
jpayne@69	1431 except ImportError:
jpayne@69	1432 warnings.warn(
jpayne@69	1433 "Import of C module failed. Falling back to pure Python "
jpayne@69	1434 "implementation. This may be slooow...",
jpayne@69	1435 BiopythonWarning,
jpayne@69	1436 )
jpayne@69	1437
jpayne@69	1438 if __name__ == "__main__":
jpayne@69	1439 from Bio._utils import run_doctest
jpayne@69	1440
jpayne@69	1441 run_doctest()

Mercurial > repos > rliterman > csp2

annotate CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/lib/python3.8/site-packages/Bio/pairwise2.py @ 69:33d812a61356