csp2: CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/lib/python3.8/site-packages/Bio/pairwise2.py annotate

annotate CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/lib/python3.8/site-packages/Bio/pairwise2.py @ 68:5028fdace37b

planemo upload commit 2e9511a184a1ca667c7be0c6321a36dc4e3d116d

author	jpayne
date	Tue, 18 Mar 2025 16:23:26 -0400
parents
children

rev	line source
jpayne@68	1 # Copyright 2002 by Jeffrey Chang.
jpayne@68	2 # Copyright 2016, 2019, 2020 by Markus Piotrowski.
jpayne@68	3 # All rights reserved.
jpayne@68	4 #
jpayne@68	5 # This file is part of the Biopython distribution and governed by your
jpayne@68	6 # choice of the "Biopython License Agreement" or the "BSD 3-Clause License".
jpayne@68	7 # Please see the LICENSE file that should have been included as part of this
jpayne@68	8 # package.
jpayne@68	9
jpayne@68	10 """Pairwise sequence alignment using a dynamic programming algorithm.
jpayne@68	11
jpayne@68	12 This provides functions to get global and local alignments between two
jpayne@68	13 sequences. A global alignment finds the best concordance between all
jpayne@68	14 characters in two sequences. A local alignment finds just the
jpayne@68	15 subsequences that align the best. Local alignments must have a positive
jpayne@68	16 score to be reported and they will not be extended for 'zero counting'
jpayne@68	17 matches. This means a local alignment will always start and end with
jpayne@68	18 a positive counting match.
jpayne@68	19
jpayne@68	20 When doing alignments, you can specify the match score and gap
jpayne@68	21 penalties. The match score indicates the compatibility between an
jpayne@68	22 alignment of two characters in the sequences. Highly compatible
jpayne@68	23 characters should be given positive scores, and incompatible ones
jpayne@68	24 should be given negative scores or 0. The gap penalties should be
jpayne@68	25 negative.
jpayne@68	26
jpayne@68	27 The names of the alignment functions in this module follow the
jpayne@68	28 convention
jpayne@68	29 <alignment type>XX
jpayne@68	30 where <alignment type> is either "global" or "local" and XX is a 2
jpayne@68	31 character code indicating the parameters it takes. The first
jpayne@68	32 character indicates the parameters for matches (and mismatches), and
jpayne@68	33 the second indicates the parameters for gap penalties.
jpayne@68	34
jpayne@68	35 The match parameters are::
jpayne@68	36
jpayne@68	37 CODE DESCRIPTION & OPTIONAL KEYWORDS
jpayne@68	38 x No parameters. Identical characters have score of 1, otherwise 0.
jpayne@68	39 m A match score is the score of identical chars, otherwise mismatch
jpayne@68	40 score. Keywords ``match``, ``mismatch``.
jpayne@68	41 d A dictionary returns the score of any pair of characters.
jpayne@68	42 Keyword ``match_dict``.
jpayne@68	43 c A callback function returns scores. Keyword ``match_fn``.
jpayne@68	44
jpayne@68	45 The gap penalty parameters are::
jpayne@68	46
jpayne@68	47 CODE DESCRIPTION & OPTIONAL KEYWORDS
jpayne@68	48 x No gap penalties.
jpayne@68	49 s Same open and extend gap penalties for both sequences.
jpayne@68	50 Keywords ``open``, ``extend``.
jpayne@68	51 d The sequences have different open and extend gap penalties.
jpayne@68	52 Keywords ``openA``, ``extendA``, ``openB``, ``extendB``.
jpayne@68	53 c A callback function returns the gap penalties.
jpayne@68	54 Keywords ``gap_A_fn``, ``gap_B_fn``.
jpayne@68	55
jpayne@68	56 All the different alignment functions are contained in an object
jpayne@68	57 ``align``. For example:
jpayne@68	58
jpayne@68	59 >>> from Bio import pairwise2
jpayne@68	60 >>> alignments = pairwise2.align.globalxx("ACCGT", "ACG")
jpayne@68	61
jpayne@68	62 For better readability, the required arguments can be used with optional keywords:
jpayne@68	63
jpayne@68	64 >>> alignments = pairwise2.align.globalxx(sequenceA="ACCGT", sequenceB="ACG")
jpayne@68	65
jpayne@68	66 The result is a list of the alignments between the two strings. Each alignment
jpayne@68	67 is a named tuple consisting of the two aligned sequences, the score and the
jpayne@68	68 start and end positions of the alignment:
jpayne@68	69
jpayne@68	70 >>> print(alignments)
jpayne@68	71 [Alignment(seqA='ACCGT', seqB='A-CG-', score=3.0, start=0, end=5), ...
jpayne@68	72
jpayne@68	73 You can access each element of an alignment by index or name:
jpayne@68	74
jpayne@68	75 >>> alignments[0][2]
jpayne@68	76 3.0
jpayne@68	77 >>> alignments[0].score
jpayne@68	78 3.0
jpayne@68	79
jpayne@68	80 For a nice printout of an alignment, use the ``format_alignment`` method of
jpayne@68	81 the module:
jpayne@68	82
jpayne@68	83 >>> from Bio.pairwise2 import format_alignment
jpayne@68	84 >>> print(format_alignment(*alignments[0]))
jpayne@68	85 ACCGT
jpayne@68	86 \| \|\|
jpayne@68	87 A-CG-
jpayne@68	88 Score=3
jpayne@68	89 <BLANKLINE>
jpayne@68	90
jpayne@68	91 All alignment functions have the following arguments:
jpayne@68	92
jpayne@68	93 - Two sequences: strings, Biopython sequence objects or lists.
jpayne@68	94 Lists are useful for supplying sequences which contain residues that are
jpayne@68	95 encoded by more than one letter.
jpayne@68	96
jpayne@68	97 - ``penalize_extend_when_opening``: boolean (default: False).
jpayne@68	98 Whether to count an extension penalty when opening a gap. If false, a gap of
jpayne@68	99 1 is only penalized an "open" penalty, otherwise it is penalized
jpayne@68	100 "open+extend".
jpayne@68	101
jpayne@68	102 - ``penalize_end_gaps``: boolean.
jpayne@68	103 Whether to count the gaps at the ends of an alignment. By default, they are
jpayne@68	104 counted for global alignments but not for local ones. Setting
jpayne@68	105 ``penalize_end_gaps`` to (boolean, boolean) allows you to specify for the
jpayne@68	106 two sequences separately whether gaps at the end of the alignment should be
jpayne@68	107 counted.
jpayne@68	108
jpayne@68	109 - ``gap_char``: string (default: ``'-'``).
jpayne@68	110 Which character to use as a gap character in the alignment returned. If your
jpayne@68	111 input sequences are lists, you must change this to ``['-']``.
jpayne@68	112
jpayne@68	113 - ``force_generic``: boolean (default: False).
jpayne@68	114 Always use the generic, non-cached, dynamic programming function (slow!).
jpayne@68	115 For debugging.
jpayne@68	116
jpayne@68	117 - ``score_only``: boolean (default: False).
jpayne@68	118 Only get the best score, don't recover any alignments. The return value of
jpayne@68	119 the function is the score. Faster and uses less memory.
jpayne@68	120
jpayne@68	121 - ``one_alignment_only``: boolean (default: False).
jpayne@68	122 Only recover one alignment.
jpayne@68	123
jpayne@68	124 The other parameters of the alignment function depend on the function called.
jpayne@68	125 Some examples:
jpayne@68	126
jpayne@68	127 - Find the best global alignment between the two sequences. Identical
jpayne@68	128 characters are given 1 point. No points are deducted for mismatches or gaps.
jpayne@68	129
jpayne@68	130 >>> for a in pairwise2.align.globalxx("ACCGT", "ACG"):
jpayne@68	131 ... print(format_alignment(*a))
jpayne@68	132 ACCGT
jpayne@68	133 \| \|\|
jpayne@68	134 A-CG-
jpayne@68	135 Score=3
jpayne@68	136 <BLANKLINE>
jpayne@68	137 ACCGT
jpayne@68	138 \|\| \|
jpayne@68	139 AC-G-
jpayne@68	140 Score=3
jpayne@68	141 <BLANKLINE>
jpayne@68	142
jpayne@68	143 - Same thing as before, but with a local alignment. Note that
jpayne@68	144 ``format_alignment`` will only show the aligned parts of the sequences,
jpayne@68	145 together with the starting positions.
jpayne@68	146
jpayne@68	147 >>> for a in pairwise2.align.localxx("ACCGT", "ACG"):
jpayne@68	148 ... print(format_alignment(*a))
jpayne@68	149 1 ACCG
jpayne@68	150 \| \|\|
jpayne@68	151 1 A-CG
jpayne@68	152 Score=3
jpayne@68	153 <BLANKLINE>
jpayne@68	154 1 ACCG
jpayne@68	155 \|\| \|
jpayne@68	156 1 AC-G
jpayne@68	157 Score=3
jpayne@68	158 <BLANKLINE>
jpayne@68	159
jpayne@68	160 To restore the 'historic' behaviour of ``format_alignemt``, i.e., showing
jpayne@68	161 also the un-aligned parts of both sequences, use the new keyword parameter
jpayne@68	162 ``full_sequences``:
jpayne@68	163
jpayne@68	164 >>> for a in pairwise2.align.localxx("ACCGT", "ACG"):
jpayne@68	165 ... print(format_alignment(*a, full_sequences=True))
jpayne@68	166 ACCGT
jpayne@68	167 \| \|\|
jpayne@68	168 A-CG-
jpayne@68	169 Score=3
jpayne@68	170 <BLANKLINE>
jpayne@68	171 ACCGT
jpayne@68	172 \|\| \|
jpayne@68	173 AC-G-
jpayne@68	174 Score=3
jpayne@68	175 <BLANKLINE>
jpayne@68	176
jpayne@68	177
jpayne@68	178 - Do a global alignment. Identical characters are given 2 points, 1 point is
jpayne@68	179 deducted for each non-identical character. Don't penalize gaps.
jpayne@68	180
jpayne@68	181 >>> for a in pairwise2.align.globalmx("ACCGT", "ACG", 2, -1):
jpayne@68	182 ... print(format_alignment(*a))
jpayne@68	183 ACCGT
jpayne@68	184 \| \|\|
jpayne@68	185 A-CG-
jpayne@68	186 Score=6
jpayne@68	187 <BLANKLINE>
jpayne@68	188 ACCGT
jpayne@68	189 \|\| \|
jpayne@68	190 AC-G-
jpayne@68	191 Score=6
jpayne@68	192 <BLANKLINE>
jpayne@68	193
jpayne@68	194 - Same as above, except now 0.5 points are deducted when opening a gap, and
jpayne@68	195 0.1 points are deducted when extending it.
jpayne@68	196
jpayne@68	197 >>> for a in pairwise2.align.globalms("ACCGT", "ACG", 2, -1, -.5, -.1):
jpayne@68	198 ... print(format_alignment(*a))
jpayne@68	199 ACCGT
jpayne@68	200 \| \|\|
jpayne@68	201 A-CG-
jpayne@68	202 Score=5
jpayne@68	203 <BLANKLINE>
jpayne@68	204 ACCGT
jpayne@68	205 \|\| \|
jpayne@68	206 AC-G-
jpayne@68	207 Score=5
jpayne@68	208 <BLANKLINE>
jpayne@68	209
jpayne@68	210 - Note that you can use keywords to increase the readability, e.g.:
jpayne@68	211
jpayne@68	212 >>> a = pairwise2.align.globalms("ACGT", "ACG", match=2, mismatch=-1, open=-.5,
jpayne@68	213 ... extend=-.1)
jpayne@68	214
jpayne@68	215 - Depending on the penalties, a gap in one sequence may be followed by a gap in
jpayne@68	216 the other sequence.If you don't like this behaviour, increase the gap-open
jpayne@68	217 penalty:
jpayne@68	218
jpayne@68	219 >>> for a in pairwise2.align.globalms("A", "T", 5, -4, -1, -.1):
jpayne@68	220 ... print(format_alignment(*a))
jpayne@68	221 A-
jpayne@68	222 <BLANKLINE>
jpayne@68	223 -T
jpayne@68	224 Score=-2
jpayne@68	225 <BLANKLINE>
jpayne@68	226 >>> for a in pairwise2.align.globalms("A", "T", 5, -4, -3, -.1):
jpayne@68	227 ... print(format_alignment(*a))
jpayne@68	228 A
jpayne@68	229 .
jpayne@68	230 T
jpayne@68	231 Score=-4
jpayne@68	232 <BLANKLINE>
jpayne@68	233
jpayne@68	234 - The alignment function can also use known matrices already included in
jpayne@68	235 Biopython (in ``Bio.Align.substitution_matrices``):
jpayne@68	236
jpayne@68	237 >>> from Bio.Align import substitution_matrices
jpayne@68	238 >>> matrix = substitution_matrices.load("BLOSUM62")
jpayne@68	239 >>> for a in pairwise2.align.globaldx("KEVLA", "EVL", matrix):
jpayne@68	240 ... print(format_alignment(*a))
jpayne@68	241 KEVLA
jpayne@68	242 \|\|\|
jpayne@68	243 -EVL-
jpayne@68	244 Score=13
jpayne@68	245 <BLANKLINE>
jpayne@68	246
jpayne@68	247 - With the parameter ``c`` you can define your own match- and gap functions.
jpayne@68	248 E.g. to define an affine logarithmic gap function and using it:
jpayne@68	249
jpayne@68	250 >>> from math import log
jpayne@68	251 >>> def gap_function(x, y): # x is gap position in seq, y is gap length
jpayne@68	252 ... if y == 0: # No gap
jpayne@68	253 ... return 0
jpayne@68	254 ... elif y == 1: # Gap open penalty
jpayne@68	255 ... return -2
jpayne@68	256 ... return - (2 + y/4.0 + log(y)/2.0)
jpayne@68	257 ...
jpayne@68	258 >>> alignment = pairwise2.align.globalmc("ACCCCCGT", "ACG", 5, -4,
jpayne@68	259 ... gap_function, gap_function)
jpayne@68	260
jpayne@68	261 You can define different gap functions for each sequence.
jpayne@68	262 Self-defined match functions must take the two residues to be compared and
jpayne@68	263 return a score.
jpayne@68	264
jpayne@68	265 To see a description of the parameters for a function, please look at
jpayne@68	266 the docstring for the function via the help function, e.g.
jpayne@68	267 type ``help(pairwise2.align.localds)`` at the Python prompt.
jpayne@68	268
jpayne@68	269 """ # noqa: W291
jpayne@68	270
jpayne@68	271 import warnings
jpayne@68	272 from collections import namedtuple
jpayne@68	273
jpayne@68	274 from Bio import BiopythonWarning
jpayne@68	275 from Bio import BiopythonDeprecationWarning
jpayne@68	276 from Bio.Align import substitution_matrices
jpayne@68	277
jpayne@68	278 warnings.warn(
jpayne@68	279 "Bio.pairwise2 has been deprecated, and we intend to remove it in a "
jpayne@68	280 "future release of Biopython. As an alternative, please consider using "
jpayne@68	281 "Bio.Align.PairwiseAligner as a replacement, and contact the "
jpayne@68	282 "Biopython developers if you still need the Bio.pairwise2 module.",
jpayne@68	283 BiopythonDeprecationWarning,
jpayne@68	284 )
jpayne@68	285
jpayne@68	286
jpayne@68	287 MAX_ALIGNMENTS = 1000 # maximum alignments recovered in traceback
jpayne@68	288
jpayne@68	289 Alignment = namedtuple("Alignment", ("seqA, seqB, score, start, end"))
jpayne@68	290
jpayne@68	291
jpayne@68	292 class align:
jpayne@68	293 """Provide functions that do alignments.
jpayne@68	294
jpayne@68	295 Alignment functions are called as:
jpayne@68	296
jpayne@68	297 pairwise2.align.globalXX
jpayne@68	298
jpayne@68	299 or
jpayne@68	300
jpayne@68	301 pairwise2.align.localXX
jpayne@68	302
jpayne@68	303 Where XX is a 2 character code indicating the match/mismatch parameters
jpayne@68	304 (first character, either x, m, d or c) and the gap penalty parameters
jpayne@68	305 (second character, either x, s, d, or c).
jpayne@68	306
jpayne@68	307 For a detailed description read the main module's docstring (e.g.,
jpayne@68	308 type ``help(pairwise2)``).
jpayne@68	309 To see a description of the parameters for a function, please
jpayne@68	310 look at the docstring for the function, e.g. type
jpayne@68	311 ``help(pairwise2.align.localds)`` at the Python prompt.
jpayne@68	312 """
jpayne@68	313
jpayne@68	314 class alignment_function:
jpayne@68	315 """Callable class which impersonates an alignment function.
jpayne@68	316
jpayne@68	317 The constructor takes the name of the function. This class
jpayne@68	318 will decode the name of the function to figure out how to
jpayne@68	319 interpret the parameters.
jpayne@68	320 """
jpayne@68	321
jpayne@68	322 # match code -> tuple of (parameters, docstring)
jpayne@68	323 match2args = {
jpayne@68	324 "x": ([], ""),
jpayne@68	325 "m": (
jpayne@68	326 ["match", "mismatch"],
jpayne@68	327 "match is the score to given to identical characters.\n"
jpayne@68	328 "mismatch is the score given to non-identical ones.",
jpayne@68	329 ),
jpayne@68	330 "d": (
jpayne@68	331 ["match_dict"],
jpayne@68	332 "match_dict is a dictionary where the keys are tuples\n"
jpayne@68	333 "of pairs of characters and the values are the scores,\n"
jpayne@68	334 "e.g. ('A', 'C') : 2.5.",
jpayne@68	335 ),
jpayne@68	336 "c": (
jpayne@68	337 ["match_fn"],
jpayne@68	338 "match_fn is a callback function that takes two "
jpayne@68	339 "characters and returns the score between them.",
jpayne@68	340 ),
jpayne@68	341 }
jpayne@68	342 # penalty code -> tuple of (parameters, docstring)
jpayne@68	343 penalty2args = {
jpayne@68	344 "x": ([], ""),
jpayne@68	345 "s": (
jpayne@68	346 ["open", "extend"],
jpayne@68	347 "open and extend are the gap penalties when a gap is\n"
jpayne@68	348 "opened and extended. They should be negative.",
jpayne@68	349 ),
jpayne@68	350 "d": (
jpayne@68	351 ["openA", "extendA", "openB", "extendB"],
jpayne@68	352 "openA and extendA are the gap penalties for sequenceA,\n"
jpayne@68	353 "and openB and extendB for sequenceB. The penalties\n"
jpayne@68	354 "should be negative.",
jpayne@68	355 ),
jpayne@68	356 "c": (
jpayne@68	357 ["gap_A_fn", "gap_B_fn"],
jpayne@68	358 "gap_A_fn and gap_B_fn are callback functions that takes\n"
jpayne@68	359 "(1) the index where the gap is opened, and (2) the length\n"
jpayne@68	360 "of the gap. They should return a gap penalty.",
jpayne@68	361 ),
jpayne@68	362 }
jpayne@68	363
jpayne@68	364 def __init__(self, name):
jpayne@68	365 """Check to make sure the name of the function is reasonable."""
jpayne@68	366 if name.startswith("global"):
jpayne@68	367 if len(name) != 8:
jpayne@68	368 raise AttributeError("function should be globalXX")
jpayne@68	369 elif name.startswith("local"):
jpayne@68	370 if len(name) != 7:
jpayne@68	371 raise AttributeError("function should be localXX")
jpayne@68	372 else:
jpayne@68	373 raise AttributeError(name)
jpayne@68	374 align_type, match_type, penalty_type = name[:-2], name[-2], name[-1]
jpayne@68	375 try:
jpayne@68	376 match_args, match_doc = self.match2args[match_type]
jpayne@68	377 except KeyError:
jpayne@68	378 raise AttributeError(f"unknown match type {match_type!r}")
jpayne@68	379 try:
jpayne@68	380 penalty_args, penalty_doc = self.penalty2args[penalty_type]
jpayne@68	381 except KeyError:
jpayne@68	382 raise AttributeError(f"unknown penalty type {penalty_type!r}")
jpayne@68	383
jpayne@68	384 # Now get the names of the parameters to this function.
jpayne@68	385 param_names = ["sequenceA", "sequenceB"]
jpayne@68	386 param_names.extend(match_args)
jpayne@68	387 param_names.extend(penalty_args)
jpayne@68	388 self.function_name = name
jpayne@68	389 self.align_type = align_type
jpayne@68	390 self.param_names = param_names
jpayne@68	391
jpayne@68	392 self.__name__ = self.function_name
jpayne@68	393 # Set the doc string.
jpayne@68	394 doc = f"{self.__name__}({', '.join(self.param_names)}) -> alignments\n"
jpayne@68	395 doc += """\
jpayne@68	396 \nThe following parameters can also be used with optional
jpayne@68	397 keywords of the same name.\n\n
jpayne@68	398 sequenceA and sequenceB must be of the same type, either
jpayne@68	399 strings, lists or Biopython sequence objects.\n
jpayne@68	400 """
jpayne@68	401 if match_doc:
jpayne@68	402 doc += f"\n{match_doc}\n"
jpayne@68	403 if penalty_doc:
jpayne@68	404 doc += f"\n{penalty_doc}\n"
jpayne@68	405 doc += """\
jpayne@68	406 \nalignments is a list of named tuples (seqA, seqB, score,
jpayne@68	407 begin, end). seqA and seqB are strings showing the alignment
jpayne@68	408 between the sequences. score is the score of the alignment.
jpayne@68	409 begin and end are indexes of seqA and seqB that indicate
jpayne@68	410 where the alignment occurs.
jpayne@68	411 """
jpayne@68	412 self.__doc__ = doc
jpayne@68	413
jpayne@68	414 def decode(self, args, *keywds):
jpayne@68	415 """Decode the arguments for the _align function.
jpayne@68	416
jpayne@68	417 keywds will get passed to it, so translate the arguments
jpayne@68	418 to this function into forms appropriate for _align.
jpayne@68	419 """
jpayne@68	420 keywds = keywds.copy()
jpayne@68	421
jpayne@68	422 # Replace possible "keywords" with arguments:
jpayne@68	423 args += (len(self.param_names) - len(args)) * (None,)
jpayne@68	424 for key in keywds.copy():
jpayne@68	425 if key in self.param_names:
jpayne@68	426 _index = self.param_names.index(key)
jpayne@68	427 args = args[:_index] + (keywds[key],) + args[_index:]
jpayne@68	428 del keywds[key]
jpayne@68	429 args = tuple(arg for arg in args if arg is not None)
jpayne@68	430
jpayne@68	431 if len(args) != len(self.param_names):
jpayne@68	432 raise TypeError(
jpayne@68	433 "%s takes exactly %d argument (%d given)"
jpayne@68	434 % (self.function_name, len(self.param_names), len(args))
jpayne@68	435 )
jpayne@68	436
jpayne@68	437 i = 0
jpayne@68	438 while i < len(self.param_names):
jpayne@68	439 if self.param_names[i] in [
jpayne@68	440 "sequenceA",
jpayne@68	441 "sequenceB",
jpayne@68	442 "gap_A_fn",
jpayne@68	443 "gap_B_fn",
jpayne@68	444 "match_fn",
jpayne@68	445 ]:
jpayne@68	446 keywds[self.param_names[i]] = args[i]
jpayne@68	447 i += 1
jpayne@68	448 elif self.param_names[i] == "match":
jpayne@68	449 assert self.param_names[i + 1] == "mismatch"
jpayne@68	450 match, mismatch = args[i], args[i + 1]
jpayne@68	451 keywds["match_fn"] = identity_match(match, mismatch)
jpayne@68	452 i += 2
jpayne@68	453 elif self.param_names[i] == "match_dict":
jpayne@68	454 keywds["match_fn"] = dictionary_match(args[i])
jpayne@68	455 i += 1
jpayne@68	456 elif self.param_names[i] == "open":
jpayne@68	457 assert self.param_names[i + 1] == "extend"
jpayne@68	458 open, extend = args[i], args[i + 1]
jpayne@68	459 pe = keywds.get("penalize_extend_when_opening", 0)
jpayne@68	460 keywds["gap_A_fn"] = affine_penalty(open, extend, pe)
jpayne@68	461 keywds["gap_B_fn"] = affine_penalty(open, extend, pe)
jpayne@68	462 i += 2
jpayne@68	463 elif self.param_names[i] == "openA":
jpayne@68	464 assert self.param_names[i + 3] == "extendB"
jpayne@68	465 openA, extendA, openB, extendB = args[i : i + 4]
jpayne@68	466 pe = keywds.get("penalize_extend_when_opening", 0)
jpayne@68	467 keywds["gap_A_fn"] = affine_penalty(openA, extendA, pe)
jpayne@68	468 keywds["gap_B_fn"] = affine_penalty(openB, extendB, pe)
jpayne@68	469 i += 4
jpayne@68	470 else:
jpayne@68	471 raise ValueError(f"unknown parameter {self.param_names[i]!r}")
jpayne@68	472
jpayne@68	473 # Here are the default parameters for _align. Assign
jpayne@68	474 # these to keywds, unless already specified.
jpayne@68	475 pe = keywds.get("penalize_extend_when_opening", 0)
jpayne@68	476 default_params = [
jpayne@68	477 ("match_fn", identity_match(1, 0)),
jpayne@68	478 ("gap_A_fn", affine_penalty(0, 0, pe)),
jpayne@68	479 ("gap_B_fn", affine_penalty(0, 0, pe)),
jpayne@68	480 ("penalize_extend_when_opening", 0),
jpayne@68	481 ("penalize_end_gaps", self.align_type == "global"),
jpayne@68	482 ("align_globally", self.align_type == "global"),
jpayne@68	483 ("gap_char", "-"),
jpayne@68	484 ("force_generic", 0),
jpayne@68	485 ("score_only", 0),
jpayne@68	486 ("one_alignment_only", 0),
jpayne@68	487 ]
jpayne@68	488 for name, default in default_params:
jpayne@68	489 keywds[name] = keywds.get(name, default)
jpayne@68	490 value = keywds["penalize_end_gaps"]
jpayne@68	491 try:
jpayne@68	492 n = len(value)
jpayne@68	493 except TypeError:
jpayne@68	494 keywds["penalize_end_gaps"] = tuple([value] * 2)
jpayne@68	495 else:
jpayne@68	496 assert n == 2
jpayne@68	497 return keywds
jpayne@68	498
jpayne@68	499 def __call__(self, args, *keywds):
jpayne@68	500 """Call the alignment instance already created."""
jpayne@68	501 keywds = self.decode(args, *keywds)
jpayne@68	502 return _align(**keywds)
jpayne@68	503
jpayne@68	504 def __getattr__(self, attr):
jpayne@68	505 """Call alignment_function() to check and decode the attributes."""
jpayne@68	506 # The following 'magic' is needed to rewrite the class docstring
jpayne@68	507 # dynamically:
jpayne@68	508 wrapper = self.alignment_function(attr)
jpayne@68	509 wrapper_type = type(wrapper)
jpayne@68	510 wrapper_dict = wrapper_type.__dict__.copy()
jpayne@68	511 wrapper_dict["__doc__"] = wrapper.__doc__
jpayne@68	512 new_alignment_function = type("alignment_function", (object,), wrapper_dict)
jpayne@68	513
jpayne@68	514 return new_alignment_function(attr)
jpayne@68	515
jpayne@68	516
jpayne@68	517 align = align()
jpayne@68	518
jpayne@68	519
jpayne@68	520 def _align(
jpayne@68	521 sequenceA,
jpayne@68	522 sequenceB,
jpayne@68	523 match_fn,
jpayne@68	524 gap_A_fn,
jpayne@68	525 gap_B_fn,
jpayne@68	526 penalize_extend_when_opening,
jpayne@68	527 penalize_end_gaps,
jpayne@68	528 align_globally,
jpayne@68	529 gap_char,
jpayne@68	530 force_generic,
jpayne@68	531 score_only,
jpayne@68	532 one_alignment_only,
jpayne@68	533 ):
jpayne@68	534 """Return optimal alignments between two sequences (PRIVATE).
jpayne@68	535
jpayne@68	536 This method either returns a list of optimal alignments (with the same
jpayne@68	537 score) or just the optimal score.
jpayne@68	538 """
jpayne@68	539 if not sequenceA or not sequenceB:
jpayne@68	540 return []
jpayne@68	541 try:
jpayne@68	542 sequenceA + gap_char
jpayne@68	543 sequenceB + gap_char
jpayne@68	544 except TypeError:
jpayne@68	545 raise TypeError(
jpayne@68	546 "both sequences must be of the same type, either "
jpayne@68	547 "string/sequence object or list. Gap character must "
jpayne@68	548 "fit the sequence type (string or list)"
jpayne@68	549 )
jpayne@68	550
jpayne@68	551 if not isinstance(sequenceA, list):
jpayne@68	552 sequenceA = str(sequenceA)
jpayne@68	553 if not isinstance(sequenceB, list):
jpayne@68	554 sequenceB = str(sequenceB)
jpayne@68	555 if not align_globally and (penalize_end_gaps[0] or penalize_end_gaps[1]):
jpayne@68	556 warnings.warn(
jpayne@68	557 '"penalize_end_gaps" should not be used in local '
jpayne@68	558 "alignments. The resulting score may be wrong.",
jpayne@68	559 BiopythonWarning,
jpayne@68	560 )
jpayne@68	561
jpayne@68	562 if (
jpayne@68	563 (not force_generic)
jpayne@68	564 and isinstance(gap_A_fn, affine_penalty)
jpayne@68	565 and isinstance(gap_B_fn, affine_penalty)
jpayne@68	566 ):
jpayne@68	567 open_A, extend_A = gap_A_fn.open, gap_A_fn.extend
jpayne@68	568 open_B, extend_B = gap_B_fn.open, gap_B_fn.extend
jpayne@68	569 matrices = _make_score_matrix_fast(
jpayne@68	570 sequenceA,
jpayne@68	571 sequenceB,
jpayne@68	572 match_fn,
jpayne@68	573 open_A,
jpayne@68	574 extend_A,
jpayne@68	575 open_B,
jpayne@68	576 extend_B,
jpayne@68	577 penalize_extend_when_opening,
jpayne@68	578 penalize_end_gaps,
jpayne@68	579 align_globally,
jpayne@68	580 score_only,
jpayne@68	581 )
jpayne@68	582 else:
jpayne@68	583 matrices = _make_score_matrix_generic(
jpayne@68	584 sequenceA,
jpayne@68	585 sequenceB,
jpayne@68	586 match_fn,
jpayne@68	587 gap_A_fn,
jpayne@68	588 gap_B_fn,
jpayne@68	589 penalize_end_gaps,
jpayne@68	590 align_globally,
jpayne@68	591 score_only,
jpayne@68	592 )
jpayne@68	593
jpayne@68	594 score_matrix, trace_matrix, best_score = matrices
jpayne@68	595
jpayne@68	596 # print("SCORE %s" % print_matrix(score_matrix))
jpayne@68	597 # print("TRACEBACK %s" % print_matrix(trace_matrix))
jpayne@68	598
jpayne@68	599 # If they only want the score, then return it.
jpayne@68	600 if score_only:
jpayne@68	601 return best_score
jpayne@68	602
jpayne@68	603 starts = _find_start(score_matrix, best_score, align_globally)
jpayne@68	604
jpayne@68	605 # Recover the alignments and return them.
jpayne@68	606 alignments = _recover_alignments(
jpayne@68	607 sequenceA,
jpayne@68	608 sequenceB,
jpayne@68	609 starts,
jpayne@68	610 best_score,
jpayne@68	611 score_matrix,
jpayne@68	612 trace_matrix,
jpayne@68	613 align_globally,
jpayne@68	614 gap_char,
jpayne@68	615 one_alignment_only,
jpayne@68	616 gap_A_fn,
jpayne@68	617 gap_B_fn,
jpayne@68	618 )
jpayne@68	619 if not alignments:
jpayne@68	620 # This may happen, see recover_alignments for explanation
jpayne@68	621 score_matrix, trace_matrix = _reverse_matrices(score_matrix, trace_matrix)
jpayne@68	622 starts = [(z, (y, x)) for z, (x, y) in starts]
jpayne@68	623 alignments = _recover_alignments(
jpayne@68	624 sequenceB,
jpayne@68	625 sequenceA,
jpayne@68	626 starts,
jpayne@68	627 best_score,
jpayne@68	628 score_matrix,
jpayne@68	629 trace_matrix,
jpayne@68	630 align_globally,
jpayne@68	631 gap_char,
jpayne@68	632 one_alignment_only,
jpayne@68	633 gap_B_fn,
jpayne@68	634 gap_A_fn,
jpayne@68	635 reverse=True,
jpayne@68	636 )
jpayne@68	637 return alignments
jpayne@68	638
jpayne@68	639
jpayne@68	640 def _make_score_matrix_generic(
jpayne@68	641 sequenceA,
jpayne@68	642 sequenceB,
jpayne@68	643 match_fn,
jpayne@68	644 gap_A_fn,
jpayne@68	645 gap_B_fn,
jpayne@68	646 penalize_end_gaps,
jpayne@68	647 align_globally,
jpayne@68	648 score_only,
jpayne@68	649 ):
jpayne@68	650 """Generate a score and traceback matrix (PRIVATE).
jpayne@68	651
jpayne@68	652 This implementation according to Needleman-Wunsch allows the usage of
jpayne@68	653 general gap functions and is rather slow. It is automatically called if
jpayne@68	654 you define your own gap functions. You can force the usage of this method
jpayne@68	655 with ``force_generic=True``.
jpayne@68	656 """
jpayne@68	657 local_max_score = 0
jpayne@68	658 # Create the score and traceback matrices. These should be in the
jpayne@68	659 # shape:
jpayne@68	660 # sequenceA (down) x sequenceB (across)
jpayne@68	661 lenA, lenB = len(sequenceA), len(sequenceB)
jpayne@68	662 score_matrix, trace_matrix = [], []
jpayne@68	663 for i in range(lenA + 1):
jpayne@68	664 score_matrix.append([None] * (lenB + 1))
jpayne@68	665 if not score_only:
jpayne@68	666 trace_matrix.append([None] * (lenB + 1))
jpayne@68	667
jpayne@68	668 # Initialize first row and column with gap scores. This is like opening up
jpayne@68	669 # i gaps at the beginning of sequence A or B.
jpayne@68	670 for i in range(lenA + 1):
jpayne@68	671 if penalize_end_gaps[1]: # [1]:gap in sequence B
jpayne@68	672 score = gap_B_fn(0, i)
jpayne@68	673 else:
jpayne@68	674 score = 0.0
jpayne@68	675 score_matrix[i][0] = score
jpayne@68	676 for i in range(lenB + 1):
jpayne@68	677 if penalize_end_gaps[0]: # [0]:gap in sequence A
jpayne@68	678 score = gap_A_fn(0, i)
jpayne@68	679 else:
jpayne@68	680 score = 0.0
jpayne@68	681 score_matrix[0][i] = score
jpayne@68	682
jpayne@68	683 # Fill in the score matrix. Each position in the matrix
jpayne@68	684 # represents an alignment between a character from sequence A to
jpayne@68	685 # one in sequence B. As I iterate through the matrix, find the
jpayne@68	686 # alignment by choose the best of:
jpayne@68	687 # 1) extending a previous alignment without gaps
jpayne@68	688 # 2) adding a gap in sequenceA
jpayne@68	689 # 3) adding a gap in sequenceB
jpayne@68	690 for row in range(1, lenA + 1):
jpayne@68	691 for col in range(1, lenB + 1):
jpayne@68	692 # First, calculate the score that would occur by extending
jpayne@68	693 # the alignment without gaps.
jpayne@68	694 # fmt: off
jpayne@68	695 nogap_score = (
jpayne@68	696 score_matrix[row - 1][col - 1]
jpayne@68	697 + match_fn(sequenceA[row - 1], sequenceB[col - 1])
jpayne@68	698 )
jpayne@68	699
jpayne@68	700 # fmt: on
jpayne@68	701 # Try to find a better score by opening gaps in sequenceA.
jpayne@68	702 # Do this by checking alignments from each column in the row.
jpayne@68	703 # Each column represents a different character to align from,
jpayne@68	704 # and thus a different length gap.
jpayne@68	705 # Although the gap function does not distinguish between opening
jpayne@68	706 # and extending a gap, we distinguish them for the backtrace.
jpayne@68	707 if not penalize_end_gaps[0] and row == lenA:
jpayne@68	708 row_open = score_matrix[row][col - 1]
jpayne@68	709 row_extend = max(score_matrix[row][x] for x in range(col))
jpayne@68	710 else:
jpayne@68	711 row_open = score_matrix[row][col - 1] + gap_A_fn(row, 1)
jpayne@68	712 row_extend = max(
jpayne@68	713 score_matrix[row][x] + gap_A_fn(row, col - x) for x in range(col)
jpayne@68	714 )
jpayne@68	715
jpayne@68	716 # Try to find a better score by opening gaps in sequenceB.
jpayne@68	717 if not penalize_end_gaps[1] and col == lenB:
jpayne@68	718 col_open = score_matrix[row - 1][col]
jpayne@68	719 col_extend = max(score_matrix[x][col] for x in range(row))
jpayne@68	720 else:
jpayne@68	721 col_open = score_matrix[row - 1][col] + gap_B_fn(col, 1)
jpayne@68	722 col_extend = max(
jpayne@68	723 score_matrix[x][col] + gap_B_fn(col, row - x) for x in range(row)
jpayne@68	724 )
jpayne@68	725
jpayne@68	726 best_score = max(nogap_score, row_open, row_extend, col_open, col_extend)
jpayne@68	727 local_max_score = max(local_max_score, best_score)
jpayne@68	728 if not align_globally and best_score < 0:
jpayne@68	729 score_matrix[row][col] = 0.0
jpayne@68	730 else:
jpayne@68	731 score_matrix[row][col] = best_score
jpayne@68	732
jpayne@68	733 # The backtrace is encoded binary. See _make_score_matrix_fast
jpayne@68	734 # for details.
jpayne@68	735 if not score_only:
jpayne@68	736 trace_score = 0
jpayne@68	737 if rint(nogap_score) == rint(best_score):
jpayne@68	738 trace_score += 2
jpayne@68	739 if rint(row_open) == rint(best_score):
jpayne@68	740 trace_score += 1
jpayne@68	741 if rint(row_extend) == rint(best_score):
jpayne@68	742 trace_score += 8
jpayne@68	743 if rint(col_open) == rint(best_score):
jpayne@68	744 trace_score += 4
jpayne@68	745 if rint(col_extend) == rint(best_score):
jpayne@68	746 trace_score += 16
jpayne@68	747 trace_matrix[row][col] = trace_score
jpayne@68	748
jpayne@68	749 if not align_globally:
jpayne@68	750 best_score = local_max_score
jpayne@68	751
jpayne@68	752 return score_matrix, trace_matrix, best_score
jpayne@68	753
jpayne@68	754
jpayne@68	755 def _make_score_matrix_fast(
jpayne@68	756 sequenceA,
jpayne@68	757 sequenceB,
jpayne@68	758 match_fn,
jpayne@68	759 open_A,
jpayne@68	760 extend_A,
jpayne@68	761 open_B,
jpayne@68	762 extend_B,
jpayne@68	763 penalize_extend_when_opening,
jpayne@68	764 penalize_end_gaps,
jpayne@68	765 align_globally,
jpayne@68	766 score_only,
jpayne@68	767 ):
jpayne@68	768 """Generate a score and traceback matrix according to Gotoh (PRIVATE).
jpayne@68	769
jpayne@68	770 This is an implementation of the Needleman-Wunsch dynamic programming
jpayne@68	771 algorithm as modified by Gotoh, implementing affine gap penalties.
jpayne@68	772 In short, we have three matrices, holding scores for alignments ending
jpayne@68	773 in (1) a match/mismatch, (2) a gap in sequence A, and (3) a gap in
jpayne@68	774 sequence B, respectively. However, we can combine them in one matrix,
jpayne@68	775 which holds the best scores, and store only those values from the
jpayne@68	776 other matrices that are actually used for the next step of calculation.
jpayne@68	777 The traceback matrix holds the positions for backtracing the alignment.
jpayne@68	778 """
jpayne@68	779 first_A_gap = calc_affine_penalty(1, open_A, extend_A, penalize_extend_when_opening)
jpayne@68	780 first_B_gap = calc_affine_penalty(1, open_B, extend_B, penalize_extend_when_opening)
jpayne@68	781 local_max_score = 0
jpayne@68	782
jpayne@68	783 # Create the score and traceback matrices. These should be in the
jpayne@68	784 # shape:
jpayne@68	785 # sequenceA (down) x sequenceB (across)
jpayne@68	786 lenA, lenB = len(sequenceA), len(sequenceB)
jpayne@68	787 score_matrix, trace_matrix = [], []
jpayne@68	788 for i in range(lenA + 1):
jpayne@68	789 score_matrix.append([None] * (lenB + 1))
jpayne@68	790 if not score_only:
jpayne@68	791 trace_matrix.append([None] * (lenB + 1))
jpayne@68	792
jpayne@68	793 # Initialize first row and column with gap scores. This is like opening up
jpayne@68	794 # i gaps at the beginning of sequence A or B.
jpayne@68	795 for i in range(lenA + 1):
jpayne@68	796 if penalize_end_gaps[1]: # [1]:gap in sequence B
jpayne@68	797 score = calc_affine_penalty(
jpayne@68	798 i, open_B, extend_B, penalize_extend_when_opening
jpayne@68	799 )
jpayne@68	800 else:
jpayne@68	801 score = 0
jpayne@68	802 score_matrix[i][0] = score
jpayne@68	803 for i in range(lenB + 1):
jpayne@68	804 if penalize_end_gaps[0]: # [0]:gap in sequence A
jpayne@68	805 score = calc_affine_penalty(
jpayne@68	806 i, open_A, extend_A, penalize_extend_when_opening
jpayne@68	807 )
jpayne@68	808 else:
jpayne@68	809 score = 0
jpayne@68	810 score_matrix[0][i] = score
jpayne@68	811
jpayne@68	812 # Now initialize the col 'matrix'. Actually this is only a one dimensional
jpayne@68	813 # list, since we only need the col scores from the last row.
jpayne@68	814 col_score = [0] # Best score, if actual alignment ends with gap in seqB
jpayne@68	815 for i in range(1, lenB + 1):
jpayne@68	816 col_score.append(
jpayne@68	817 calc_affine_penalty(i, 2 * open_B, extend_B, penalize_extend_when_opening)
jpayne@68	818 )
jpayne@68	819
jpayne@68	820 # The row 'matrix' is calculated on the fly. Here we only need the actual
jpayne@68	821 # score.
jpayne@68	822 # Now, filling up the score and traceback matrices:
jpayne@68	823 for row in range(1, lenA + 1):
jpayne@68	824 row_score = calc_affine_penalty(
jpayne@68	825 row, 2 * open_A, extend_A, penalize_extend_when_opening
jpayne@68	826 )
jpayne@68	827 for col in range(1, lenB + 1):
jpayne@68	828 # Calculate the score that would occur by extending the
jpayne@68	829 # alignment without gaps.
jpayne@68	830 # fmt: off
jpayne@68	831 nogap_score = (
jpayne@68	832 score_matrix[row - 1][col - 1]
jpayne@68	833 + match_fn(sequenceA[row - 1], sequenceB[col - 1])
jpayne@68	834 )
jpayne@68	835 # fmt: on
jpayne@68	836 # Check the score that would occur if there were a gap in
jpayne@68	837 # sequence A. This could come from opening a new gap or
jpayne@68	838 # extending an existing one.
jpayne@68	839 # A gap in sequence A can also be opened if it follows a gap in
jpayne@68	840 # sequence B: A-
jpayne@68	841 # -B
jpayne@68	842 if not penalize_end_gaps[0] and row == lenA:
jpayne@68	843 row_open = score_matrix[row][col - 1]
jpayne@68	844 row_extend = row_score
jpayne@68	845 else:
jpayne@68	846 row_open = score_matrix[row][col - 1] + first_A_gap
jpayne@68	847 row_extend = row_score + extend_A
jpayne@68	848 row_score = max(row_open, row_extend)
jpayne@68	849
jpayne@68	850 # The same for sequence B:
jpayne@68	851 if not penalize_end_gaps[1] and col == lenB:
jpayne@68	852 col_open = score_matrix[row - 1][col]
jpayne@68	853 col_extend = col_score[col]
jpayne@68	854 else:
jpayne@68	855 col_open = score_matrix[row - 1][col] + first_B_gap
jpayne@68	856 col_extend = col_score[col] + extend_B
jpayne@68	857 col_score[col] = max(col_open, col_extend)
jpayne@68	858
jpayne@68	859 best_score = max(nogap_score, col_score[col], row_score)
jpayne@68	860 local_max_score = max(local_max_score, best_score)
jpayne@68	861 if not align_globally and best_score < 0:
jpayne@68	862 score_matrix[row][col] = 0
jpayne@68	863 else:
jpayne@68	864 score_matrix[row][col] = best_score
jpayne@68	865
jpayne@68	866 # Now the trace_matrix. The edges of the backtrace are encoded
jpayne@68	867 # binary: 1 = open gap in seqA, 2 = match/mismatch of seqA and
jpayne@68	868 # seqB, 4 = open gap in seqB, 8 = extend gap in seqA, and
jpayne@68	869 # 16 = extend gap in seqB. This values can be summed up.
jpayne@68	870 # Thus, the trace score 7 means that the best score can either
jpayne@68	871 # come from opening a gap in seqA (=1), pairing two characters
jpayne@68	872 # of seqA and seqB (+2=3) or opening a gap in seqB (+4=7).
jpayne@68	873 # However, if we only want the score we don't care about the trace.
jpayne@68	874 if not score_only:
jpayne@68	875 row_score_rint = rint(row_score)
jpayne@68	876 col_score_rint = rint(col_score[col])
jpayne@68	877 row_trace_score = 0
jpayne@68	878 col_trace_score = 0
jpayne@68	879 if rint(row_open) == row_score_rint:
jpayne@68	880 row_trace_score += 1 # Open gap in seqA
jpayne@68	881 if rint(row_extend) == row_score_rint:
jpayne@68	882 row_trace_score += 8 # Extend gap in seqA
jpayne@68	883 if rint(col_open) == col_score_rint:
jpayne@68	884 col_trace_score += 4 # Open gap in seqB
jpayne@68	885 if rint(col_extend) == col_score_rint:
jpayne@68	886 col_trace_score += 16 # Extend gap in seqB
jpayne@68	887
jpayne@68	888 trace_score = 0
jpayne@68	889 best_score_rint = rint(best_score)
jpayne@68	890 if rint(nogap_score) == best_score_rint:
jpayne@68	891 trace_score += 2 # Align seqA with seqB
jpayne@68	892 if row_score_rint == best_score_rint:
jpayne@68	893 trace_score += row_trace_score
jpayne@68	894 if col_score_rint == best_score_rint:
jpayne@68	895 trace_score += col_trace_score
jpayne@68	896 trace_matrix[row][col] = trace_score
jpayne@68	897
jpayne@68	898 if not align_globally:
jpayne@68	899 best_score = local_max_score
jpayne@68	900
jpayne@68	901 return score_matrix, trace_matrix, best_score
jpayne@68	902
jpayne@68	903
jpayne@68	904 def _recover_alignments(
jpayne@68	905 sequenceA,
jpayne@68	906 sequenceB,
jpayne@68	907 starts,
jpayne@68	908 best_score,
jpayne@68	909 score_matrix,
jpayne@68	910 trace_matrix,
jpayne@68	911 align_globally,
jpayne@68	912 gap_char,
jpayne@68	913 one_alignment_only,
jpayne@68	914 gap_A_fn,
jpayne@68	915 gap_B_fn,
jpayne@68	916 reverse=False,
jpayne@68	917 ):
jpayne@68	918 """Do the backtracing and return a list of alignments (PRIVATE).
jpayne@68	919
jpayne@68	920 Recover the alignments by following the traceback matrix. This
jpayne@68	921 is a recursive procedure, but it's implemented here iteratively
jpayne@68	922 with a stack.
jpayne@68	923
jpayne@68	924 sequenceA and sequenceB may be sequences, including strings,
jpayne@68	925 lists, or list-like objects. In order to preserve the type of
jpayne@68	926 the object, we need to use slices on the sequences instead of
jpayne@68	927 indexes. For example, sequenceA[row] may return a type that's
jpayne@68	928 not compatible with sequenceA, e.g. if sequenceA is a list and
jpayne@68	929 sequenceA[row] is a string. Thus, avoid using indexes and use
jpayne@68	930 slices, e.g. sequenceA[row:row+1]. Assume that client-defined
jpayne@68	931 sequence classes preserve these semantics.
jpayne@68	932 """
jpayne@68	933 lenA, lenB = len(sequenceA), len(sequenceB)
jpayne@68	934 ali_seqA, ali_seqB = sequenceA[0:0], sequenceB[0:0]
jpayne@68	935 tracebacks = []
jpayne@68	936 in_process = []
jpayne@68	937
jpayne@68	938 for start in starts:
jpayne@68	939 score, (row, col) = start
jpayne@68	940 begin = 0
jpayne@68	941 if align_globally:
jpayne@68	942 end = None
jpayne@68	943 else:
jpayne@68	944 # If this start is a zero-extension: don't start here!
jpayne@68	945 if (score, (row - 1, col - 1)) in starts:
jpayne@68	946 continue
jpayne@68	947 # Local alignments should start with a positive score!
jpayne@68	948 if score <= 0:
jpayne@68	949 continue
jpayne@68	950 # Local alignments should not end with a gap!:
jpayne@68	951 trace = trace_matrix[row][col]
jpayne@68	952 if (trace - trace % 2) % 4 == 2: # Trace contains 'nogap', fine!
jpayne@68	953 trace_matrix[row][col] = 2
jpayne@68	954 # If not, don't start here!
jpayne@68	955 else:
jpayne@68	956 continue
jpayne@68	957 end = -max(lenA - row, lenB - col)
jpayne@68	958 if not end:
jpayne@68	959 end = None
jpayne@68	960 col_distance = lenB - col
jpayne@68	961 row_distance = lenA - row
jpayne@68	962
jpayne@68	963 # fmt: off
jpayne@68	964 ali_seqA = (
jpayne@68	965 (col_distance - row_distance) * gap_char
jpayne@68	966 + sequenceA[lenA - 1 : row - 1 : -1]
jpayne@68	967 )
jpayne@68	968 ali_seqB = (
jpayne@68	969 (row_distance - col_distance) * gap_char
jpayne@68	970 + sequenceB[lenB - 1 : col - 1 : -1]
jpayne@68	971 )
jpayne@68	972 # fmt: on
jpayne@68	973 in_process += [
jpayne@68	974 (ali_seqA, ali_seqB, end, row, col, False, trace_matrix[row][col])
jpayne@68	975 ]
jpayne@68	976 while in_process and len(tracebacks) < MAX_ALIGNMENTS:
jpayne@68	977 # Although we allow a gap in seqB to be followed by a gap in seqA,
jpayne@68	978 # we don't want to allow it the other way round, since this would
jpayne@68	979 # give redundant alignments of type: A- vs. -A
jpayne@68	980 # -B B-
jpayne@68	981 # Thus we need to keep track if a gap in seqA was opened (col_gap)
jpayne@68	982 # and stop the backtrace (dead_end) if a gap in seqB follows.
jpayne@68	983 #
jpayne@68	984 # Attention: This may fail, if the gap-penalties for both strands are
jpayne@68	985 # different. In this case the second alignment may be the only optimal
jpayne@68	986 # alignment. Thus it can happen that no alignment is returned. For
jpayne@68	987 # this case a workaround was implemented, which reverses the input and
jpayne@68	988 # the matrices (this happens in _reverse_matrices) and repeats the
jpayne@68	989 # backtrace. The variable 'reverse' keeps track of this.
jpayne@68	990 dead_end = False
jpayne@68	991 ali_seqA, ali_seqB, end, row, col, col_gap, trace = in_process.pop()
jpayne@68	992
jpayne@68	993 while (row > 0 or col > 0) and not dead_end:
jpayne@68	994 cache = (ali_seqA[:], ali_seqB[:], end, row, col, col_gap)
jpayne@68	995
jpayne@68	996 # If trace is empty we have reached at least one border of the
jpayne@68	997 # matrix or the end of a local alignment. Just add the rest of
jpayne@68	998 # the sequence(s) and fill with gaps if necessary.
jpayne@68	999 if not trace:
jpayne@68	1000 if col and col_gap:
jpayne@68	1001 dead_end = True
jpayne@68	1002 else:
jpayne@68	1003 ali_seqA, ali_seqB = _finish_backtrace(
jpayne@68	1004 sequenceA, sequenceB, ali_seqA, ali_seqB, row, col, gap_char
jpayne@68	1005 )
jpayne@68	1006 break
jpayne@68	1007 elif trace % 2 == 1: # = row open = open gap in seqA
jpayne@68	1008 trace -= 1
jpayne@68	1009 if col_gap:
jpayne@68	1010 dead_end = True
jpayne@68	1011 else:
jpayne@68	1012 col -= 1
jpayne@68	1013 ali_seqA += gap_char
jpayne@68	1014 ali_seqB += sequenceB[col : col + 1]
jpayne@68	1015 col_gap = False
jpayne@68	1016 elif trace % 4 == 2: # = match/mismatch of seqA with seqB
jpayne@68	1017 trace -= 2
jpayne@68	1018 row -= 1
jpayne@68	1019 col -= 1
jpayne@68	1020 ali_seqA += sequenceA[row : row + 1]
jpayne@68	1021 ali_seqB += sequenceB[col : col + 1]
jpayne@68	1022 col_gap = False
jpayne@68	1023 elif trace % 8 == 4: # = col open = open gap in seqB
jpayne@68	1024 trace -= 4
jpayne@68	1025 row -= 1
jpayne@68	1026 ali_seqA += sequenceA[row : row + 1]
jpayne@68	1027 ali_seqB += gap_char
jpayne@68	1028 col_gap = True
jpayne@68	1029 elif trace in (8, 24): # = row extend = extend gap in seqA
jpayne@68	1030 trace -= 8
jpayne@68	1031 if col_gap:
jpayne@68	1032 dead_end = True
jpayne@68	1033 else:
jpayne@68	1034 col_gap = False
jpayne@68	1035 # We need to find the starting point of the extended gap
jpayne@68	1036 x = _find_gap_open(
jpayne@68	1037 sequenceA,
jpayne@68	1038 sequenceB,
jpayne@68	1039 ali_seqA,
jpayne@68	1040 ali_seqB,
jpayne@68	1041 end,
jpayne@68	1042 row,
jpayne@68	1043 col,
jpayne@68	1044 col_gap,
jpayne@68	1045 gap_char,
jpayne@68	1046 score_matrix,
jpayne@68	1047 trace_matrix,
jpayne@68	1048 in_process,
jpayne@68	1049 gap_A_fn,
jpayne@68	1050 col,
jpayne@68	1051 row,
jpayne@68	1052 "col",
jpayne@68	1053 best_score,
jpayne@68	1054 align_globally,
jpayne@68	1055 )
jpayne@68	1056 ali_seqA, ali_seqB, row, col, in_process, dead_end = x
jpayne@68	1057 elif trace == 16: # = col extend = extend gap in seqB
jpayne@68	1058 trace -= 16
jpayne@68	1059 col_gap = True
jpayne@68	1060 x = _find_gap_open(
jpayne@68	1061 sequenceA,
jpayne@68	1062 sequenceB,
jpayne@68	1063 ali_seqA,
jpayne@68	1064 ali_seqB,
jpayne@68	1065 end,
jpayne@68	1066 row,
jpayne@68	1067 col,
jpayne@68	1068 col_gap,
jpayne@68	1069 gap_char,
jpayne@68	1070 score_matrix,
jpayne@68	1071 trace_matrix,
jpayne@68	1072 in_process,
jpayne@68	1073 gap_B_fn,
jpayne@68	1074 row,
jpayne@68	1075 col,
jpayne@68	1076 "row",
jpayne@68	1077 best_score,
jpayne@68	1078 align_globally,
jpayne@68	1079 )
jpayne@68	1080 ali_seqA, ali_seqB, row, col, in_process, dead_end = x
jpayne@68	1081
jpayne@68	1082 if trace: # There is another path to follow...
jpayne@68	1083 cache += (trace,)
jpayne@68	1084 in_process.append(cache)
jpayne@68	1085 trace = trace_matrix[row][col]
jpayne@68	1086 if not align_globally:
jpayne@68	1087 if score_matrix[row][col] == best_score:
jpayne@68	1088 # We have gone through a 'zero-score' extension, discard it
jpayne@68	1089 dead_end = True
jpayne@68	1090 elif score_matrix[row][col] <= 0:
jpayne@68	1091 # We have reached the end of the backtrace
jpayne@68	1092 begin = max(row, col)
jpayne@68	1093 trace = 0
jpayne@68	1094 if not dead_end:
jpayne@68	1095 if not reverse:
jpayne@68	1096 tracebacks.append((ali_seqA[::-1], ali_seqB[::-1], score, begin, end))
jpayne@68	1097 else:
jpayne@68	1098 tracebacks.append((ali_seqB[::-1], ali_seqA[::-1], score, begin, end))
jpayne@68	1099 if one_alignment_only:
jpayne@68	1100 break
jpayne@68	1101 return _clean_alignments(tracebacks)
jpayne@68	1102
jpayne@68	1103
jpayne@68	1104 def _find_start(score_matrix, best_score, align_globally):
jpayne@68	1105 """Return a list of starting points (score, (row, col)) (PRIVATE).
jpayne@68	1106
jpayne@68	1107 Indicating every possible place to start the tracebacks.
jpayne@68	1108 """
jpayne@68	1109 nrows, ncols = len(score_matrix), len(score_matrix[0])
jpayne@68	1110 # In this implementation of the global algorithm, the start will always be
jpayne@68	1111 # the bottom right corner of the matrix.
jpayne@68	1112 if align_globally:
jpayne@68	1113 starts = [(best_score, (nrows - 1, ncols - 1))]
jpayne@68	1114 else:
jpayne@68	1115 # For local alignments, there may be many different start points.
jpayne@68	1116 starts = []
jpayne@68	1117 tolerance = 0 # XXX do anything with this?
jpayne@68	1118 # Now find all the positions within some tolerance of the best
jpayne@68	1119 # score.
jpayne@68	1120 for row in range(nrows):
jpayne@68	1121 for col in range(ncols):
jpayne@68	1122 score = score_matrix[row][col]
jpayne@68	1123 if rint(abs(score - best_score)) <= rint(tolerance):
jpayne@68	1124 starts.append((score, (row, col)))
jpayne@68	1125 return starts
jpayne@68	1126
jpayne@68	1127
jpayne@68	1128 def _reverse_matrices(score_matrix, trace_matrix):
jpayne@68	1129 """Reverse score and trace matrices (PRIVATE)."""
jpayne@68	1130 reverse_score_matrix = []
jpayne@68	1131 reverse_trace_matrix = []
jpayne@68	1132 # fmt: off
jpayne@68	1133 reverse_trace = {
jpayne@68	1134 1: 4, 2: 2, 3: 6, 4: 1, 5: 5, 6: 3, 7: 7, 8: 16, 9: 20, 10: 18, 11: 22, 12: 17,
jpayne@68	1135 13: 21, 14: 19, 15: 23, 16: 8, 17: 12, 18: 10, 19: 14, 20: 9, 21: 13, 22: 11,
jpayne@68	1136 23: 15, 24: 24, 25: 28, 26: 26, 27: 30, 28: 25, 29: 29, 30: 27, 31: 31,
jpayne@68	1137 None: None,
jpayne@68	1138 }
jpayne@68	1139 # fmt: on
jpayne@68	1140 for col in range(len(score_matrix[0])):
jpayne@68	1141 new_score_row = []
jpayne@68	1142 new_trace_row = []
jpayne@68	1143 for row in range(len(score_matrix)):
jpayne@68	1144 new_score_row.append(score_matrix[row][col])
jpayne@68	1145 new_trace_row.append(reverse_trace[trace_matrix[row][col]])
jpayne@68	1146 reverse_score_matrix.append(new_score_row)
jpayne@68	1147 reverse_trace_matrix.append(new_trace_row)
jpayne@68	1148 return reverse_score_matrix, reverse_trace_matrix
jpayne@68	1149
jpayne@68	1150
jpayne@68	1151 def _clean_alignments(alignments):
jpayne@68	1152 """Take a list of alignments and return a cleaned version (PRIVATE).
jpayne@68	1153
jpayne@68	1154 Remove duplicates, make sure begin and end are set correctly, remove
jpayne@68	1155 empty alignments.
jpayne@68	1156 """
jpayne@68	1157 unique_alignments = []
jpayne@68	1158 for align in alignments:
jpayne@68	1159 if align not in unique_alignments:
jpayne@68	1160 unique_alignments.append(align)
jpayne@68	1161 i = 0
jpayne@68	1162 while i < len(unique_alignments):
jpayne@68	1163 seqA, seqB, score, begin, end = unique_alignments[i]
jpayne@68	1164 # Make sure end is set reasonably.
jpayne@68	1165 if end is None: # global alignment
jpayne@68	1166 end = len(seqA)
jpayne@68	1167 elif end < 0:
jpayne@68	1168 end = end + len(seqA)
jpayne@68	1169 # If there's no alignment here, get rid of it.
jpayne@68	1170 if begin >= end:
jpayne@68	1171 del unique_alignments[i]
jpayne@68	1172 continue
jpayne@68	1173 unique_alignments[i] = Alignment(seqA, seqB, score, begin, end)
jpayne@68	1174 i += 1
jpayne@68	1175 return unique_alignments
jpayne@68	1176
jpayne@68	1177
jpayne@68	1178 def _finish_backtrace(sequenceA, sequenceB, ali_seqA, ali_seqB, row, col, gap_char):
jpayne@68	1179 """Add remaining sequences and fill with gaps if necessary (PRIVATE)."""
jpayne@68	1180 if row:
jpayne@68	1181 ali_seqA += sequenceA[row - 1 :: -1]
jpayne@68	1182 if col:
jpayne@68	1183 ali_seqB += sequenceB[col - 1 :: -1]
jpayne@68	1184 if row > col:
jpayne@68	1185 ali_seqB += gap_char * (len(ali_seqA) - len(ali_seqB))
jpayne@68	1186 elif col > row:
jpayne@68	1187 ali_seqA += gap_char * (len(ali_seqB) - len(ali_seqA))
jpayne@68	1188 return ali_seqA, ali_seqB
jpayne@68	1189
jpayne@68	1190
jpayne@68	1191 def _find_gap_open(
jpayne@68	1192 sequenceA,
jpayne@68	1193 sequenceB,
jpayne@68	1194 ali_seqA,
jpayne@68	1195 ali_seqB,
jpayne@68	1196 end,
jpayne@68	1197 row,
jpayne@68	1198 col,
jpayne@68	1199 col_gap,
jpayne@68	1200 gap_char,
jpayne@68	1201 score_matrix,
jpayne@68	1202 trace_matrix,
jpayne@68	1203 in_process,
jpayne@68	1204 gap_fn,
jpayne@68	1205 target,
jpayne@68	1206 index,
jpayne@68	1207 direction,
jpayne@68	1208 best_score,
jpayne@68	1209 align_globally,
jpayne@68	1210 ):
jpayne@68	1211 """Find the starting point(s) of the extended gap (PRIVATE)."""
jpayne@68	1212 dead_end = False
jpayne@68	1213 target_score = score_matrix[row][col]
jpayne@68	1214 for n in range(target):
jpayne@68	1215 if direction == "col":
jpayne@68	1216 col -= 1
jpayne@68	1217 ali_seqA += gap_char
jpayne@68	1218 ali_seqB += sequenceB[col : col + 1]
jpayne@68	1219 else:
jpayne@68	1220 row -= 1
jpayne@68	1221 ali_seqA += sequenceA[row : row + 1]
jpayne@68	1222 ali_seqB += gap_char
jpayne@68	1223 actual_score = score_matrix[row][col] + gap_fn(index, n + 1)
jpayne@68	1224 if not align_globally and score_matrix[row][col] == best_score:
jpayne@68	1225 # We have run through a 'zero-score' extension and discard it
jpayne@68	1226 dead_end = True
jpayne@68	1227 break
jpayne@68	1228 if rint(actual_score) == rint(target_score) and n > 0:
jpayne@68	1229 if not trace_matrix[row][col]:
jpayne@68	1230 break
jpayne@68	1231 else:
jpayne@68	1232 in_process.append(
jpayne@68	1233 (
jpayne@68	1234 ali_seqA[:],
jpayne@68	1235 ali_seqB[:],
jpayne@68	1236 end,
jpayne@68	1237 row,
jpayne@68	1238 col,
jpayne@68	1239 col_gap,
jpayne@68	1240 trace_matrix[row][col],
jpayne@68	1241 )
jpayne@68	1242 )
jpayne@68	1243 if not trace_matrix[row][col]:
jpayne@68	1244 dead_end = True
jpayne@68	1245 return ali_seqA, ali_seqB, row, col, in_process, dead_end
jpayne@68	1246
jpayne@68	1247
jpayne@68	1248 _PRECISION = 1000
jpayne@68	1249
jpayne@68	1250
jpayne@68	1251 def rint(x, precision=_PRECISION):
jpayne@68	1252 """Print number with declared precision."""
jpayne@68	1253 return int(x * precision + 0.5)
jpayne@68	1254
jpayne@68	1255
jpayne@68	1256 class identity_match:
jpayne@68	1257 """Create a match function for use in an alignment.
jpayne@68	1258
jpayne@68	1259 match and mismatch are the scores to give when two residues are equal
jpayne@68	1260 or unequal. By default, match is 1 and mismatch is 0.
jpayne@68	1261 """
jpayne@68	1262
jpayne@68	1263 def __init__(self, match=1, mismatch=0):
jpayne@68	1264 """Initialize the class."""
jpayne@68	1265 self.match = match
jpayne@68	1266 self.mismatch = mismatch
jpayne@68	1267
jpayne@68	1268 def __call__(self, charA, charB):
jpayne@68	1269 """Call a match function instance already created."""
jpayne@68	1270 if charA == charB:
jpayne@68	1271 return self.match
jpayne@68	1272 return self.mismatch
jpayne@68	1273
jpayne@68	1274
jpayne@68	1275 class dictionary_match:
jpayne@68	1276 """Create a match function for use in an alignment.
jpayne@68	1277
jpayne@68	1278 Attributes:
jpayne@68	1279 - score_dict - A dictionary where the keys are tuples (residue 1,
jpayne@68	1280 residue 2) and the values are the match scores between those residues.
jpayne@68	1281 - symmetric - A flag that indicates whether the scores are symmetric.
jpayne@68	1282
jpayne@68	1283 """
jpayne@68	1284
jpayne@68	1285 def __init__(self, score_dict, symmetric=1):
jpayne@68	1286 """Initialize the class."""
jpayne@68	1287 if isinstance(score_dict, substitution_matrices.Array):
jpayne@68	1288 score_dict = dict(score_dict) # Access to dict is much faster
jpayne@68	1289 self.score_dict = score_dict
jpayne@68	1290 self.symmetric = symmetric
jpayne@68	1291
jpayne@68	1292 def __call__(self, charA, charB):
jpayne@68	1293 """Call a dictionary match instance already created."""
jpayne@68	1294 if self.symmetric and (charA, charB) not in self.score_dict:
jpayne@68	1295 # If the score dictionary is symmetric, then look up the
jpayne@68	1296 # score both ways.
jpayne@68	1297 charB, charA = charA, charB
jpayne@68	1298 return self.score_dict[(charA, charB)]
jpayne@68	1299
jpayne@68	1300
jpayne@68	1301 class affine_penalty:
jpayne@68	1302 """Create a gap function for use in an alignment."""
jpayne@68	1303
jpayne@68	1304 def __init__(self, open, extend, penalize_extend_when_opening=0):
jpayne@68	1305 """Initialize the class."""
jpayne@68	1306 if open > 0 or extend > 0:
jpayne@68	1307 raise ValueError("Gap penalties should be non-positive.")
jpayne@68	1308 if not penalize_extend_when_opening and (extend < open):
jpayne@68	1309 raise ValueError(
jpayne@68	1310 "Gap opening penalty should be higher than "
jpayne@68	1311 "gap extension penalty (or equal)"
jpayne@68	1312 )
jpayne@68	1313 self.open, self.extend = open, extend
jpayne@68	1314 self.penalize_extend_when_opening = penalize_extend_when_opening
jpayne@68	1315
jpayne@68	1316 def __call__(self, index, length):
jpayne@68	1317 """Call a gap function instance already created."""
jpayne@68	1318 return calc_affine_penalty(
jpayne@68	1319 length, self.open, self.extend, self.penalize_extend_when_opening
jpayne@68	1320 )
jpayne@68	1321
jpayne@68	1322
jpayne@68	1323 def calc_affine_penalty(length, open, extend, penalize_extend_when_opening):
jpayne@68	1324 """Calculate a penalty score for the gap function."""
jpayne@68	1325 if length <= 0:
jpayne@68	1326 return 0.0
jpayne@68	1327 penalty = open + extend * length
jpayne@68	1328 if not penalize_extend_when_opening:
jpayne@68	1329 penalty -= extend
jpayne@68	1330 return penalty
jpayne@68	1331
jpayne@68	1332
jpayne@68	1333 def print_matrix(matrix):
jpayne@68	1334 """Print out a matrix for debugging purposes."""
jpayne@68	1335 # Transpose the matrix and get the length of the values in each column.
jpayne@68	1336 matrixT = [[] for x in range(len(matrix[0]))]
jpayne@68	1337 for i in range(len(matrix)):
jpayne@68	1338 for j in range(len(matrix[i])):
jpayne@68	1339 matrixT[j].append(len(str(matrix[i][j])))
jpayne@68	1340 ndigits = [max(x) for x in matrixT]
jpayne@68	1341 for i in range(len(matrix)):
jpayne@68	1342 # Using string formatting trick to add leading spaces,
jpayne@68	1343 print(
jpayne@68	1344 " ".join("%*s " % (ndigits[j], matrix[i][j]) for j in range(len(matrix[i])))
jpayne@68	1345 )
jpayne@68	1346
jpayne@68	1347
jpayne@68	1348 def format_alignment(align1, align2, score, begin, end, full_sequences=False):
jpayne@68	1349 """Format the alignment prettily into a string.
jpayne@68	1350
jpayne@68	1351 IMPORTANT: Gap symbol must be "-" (or ['-'] for lists)!
jpayne@68	1352
jpayne@68	1353 Since Biopython 1.71 identical matches are shown with a pipe
jpayne@68	1354 character, mismatches as a dot, and gaps as a space.
jpayne@68	1355
jpayne@68	1356 Prior releases just used the pipe character to indicate the
jpayne@68	1357 aligned region (matches, mismatches and gaps).
jpayne@68	1358
jpayne@68	1359 Also, in local alignments, if the alignment does not include
jpayne@68	1360 the whole sequences, now only the aligned part is shown,
jpayne@68	1361 together with the start positions of the aligned subsequences.
jpayne@68	1362 The start positions are 1-based; so start position n is the
jpayne@68	1363 n-th base/amino acid in the un-aligned sequence.
jpayne@68	1364
jpayne@68	1365 NOTE: This is different to the alignment's begin/end values,
jpayne@68	1366 which give the Python indices (0-based) of the bases/amino acids
jpayne@68	1367 in the aligned sequences.
jpayne@68	1368
jpayne@68	1369 If you want to restore the 'historic' behaviour, that means
jpayne@68	1370 displaying the whole sequences (including the non-aligned parts),
jpayne@68	1371 use ``full_sequences=True``. In this case, the non-aligned leading
jpayne@68	1372 and trailing parts are also indicated by spaces in the match-line.
jpayne@68	1373 """
jpayne@68	1374 align_begin = begin
jpayne@68	1375 align_end = end
jpayne@68	1376 start1 = start2 = ""
jpayne@68	1377 start_m = begin # Begin of match line (how many spaces to include)
jpayne@68	1378 # For local alignments:
jpayne@68	1379 if not full_sequences and (begin != 0 or end != len(align1)):
jpayne@68	1380 # Calculate the actual start positions in the un-aligned sequences
jpayne@68	1381 # This will only work if the gap symbol is '-' or ['-']!
jpayne@68	1382 start1 = str(len(align1[:begin]) - align1[:begin].count("-") + 1) + " "
jpayne@68	1383 start2 = str(len(align2[:begin]) - align2[:begin].count("-") + 1) + " "
jpayne@68	1384 start_m = max(len(start1), len(start2))
jpayne@68	1385 elif full_sequences:
jpayne@68	1386 start_m = 0
jpayne@68	1387 begin = 0
jpayne@68	1388 end = len(align1)
jpayne@68	1389
jpayne@68	1390 if isinstance(align1, list):
jpayne@68	1391 # List elements will be separated by spaces, since they can be
jpayne@68	1392 # of different lengths
jpayne@68	1393 align1 = [a + " " for a in align1]
jpayne@68	1394 align2 = [a + " " for a in align2]
jpayne@68	1395
jpayne@68	1396 s1_line = ["{:>{width}}".format(start1, width=start_m)] # seq1 line
jpayne@68	1397 m_line = [" " * start_m] # match line
jpayne@68	1398 s2_line = ["{:>{width}}".format(start2, width=start_m)] # seq2 line
jpayne@68	1399
jpayne@68	1400 for n, (a, b) in enumerate(zip(align1[begin:end], align2[begin:end])):
jpayne@68	1401 # Since list elements can be of different length, we center them,
jpayne@68	1402 # using the maximum length of the two compared elements as width
jpayne@68	1403 m_len = max(len(a), len(b))
jpayne@68	1404 s1_line.append("{:^{width}}".format(a, width=m_len))
jpayne@68	1405 s2_line.append("{:^{width}}".format(b, width=m_len))
jpayne@68	1406 if full_sequences and (n < align_begin or n >= align_end):
jpayne@68	1407 m_line.append("{:^{width}}".format(" ", width=m_len)) # space
jpayne@68	1408 continue
jpayne@68	1409 if a == b:
jpayne@68	1410 m_line.append("{:^{width}}".format("\|", width=m_len)) # match
jpayne@68	1411 elif a.strip() == "-" or b.strip() == "-":
jpayne@68	1412 m_line.append("{:^{width}}".format(" ", width=m_len)) # gap
jpayne@68	1413 else:
jpayne@68	1414 m_line.append("{:^{width}}".format(".", width=m_len)) # mismatch
jpayne@68	1415
jpayne@68	1416 s2_line.append(f"\n Score={score:g}\n")
jpayne@68	1417 return "\n".join(["".join(s1_line), "".join(m_line), "".join(s2_line)])
jpayne@68	1418
jpayne@68	1419
jpayne@68	1420 # Try and load C implementations of functions. If I can't,
jpayne@68	1421 # then throw a warning and use the pure Python implementations.
jpayne@68	1422 # The redefinition is deliberate, thus the no quality assurance
jpayne@68	1423 # flag for when using flake8.
jpayne@68	1424 # Before, we secure access to the pure Python functions (for testing purposes):
jpayne@68	1425
jpayne@68	1426 _python_make_score_matrix_fast = _make_score_matrix_fast
jpayne@68	1427 _python_rint = rint
jpayne@68	1428
jpayne@68	1429 try:
jpayne@68	1430 from .cpairwise2 import rint, _make_score_matrix_fast # noqa
jpayne@68	1431 except ImportError:
jpayne@68	1432 warnings.warn(
jpayne@68	1433 "Import of C module failed. Falling back to pure Python "
jpayne@68	1434 "implementation. This may be slooow...",
jpayne@68	1435 BiopythonWarning,
jpayne@68	1436 )
jpayne@68	1437
jpayne@68	1438 if __name__ == "__main__":
jpayne@68	1439 from Bio._utils import run_doctest
jpayne@68	1440
jpayne@68	1441 run_doctest()

Mercurial > repos > rliterman > csp2

annotate CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/lib/python3.8/site-packages/Bio/pairwise2.py @ 68:5028fdace37b