annotate CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/lib/python3.8/site-packages/Bio/pairwise2.py @ 68:5028fdace37b

planemo upload commit 2e9511a184a1ca667c7be0c6321a36dc4e3d116d
author jpayne
date Tue, 18 Mar 2025 16:23:26 -0400
parents
children
rev   line source
jpayne@68 1 # Copyright 2002 by Jeffrey Chang.
jpayne@68 2 # Copyright 2016, 2019, 2020 by Markus Piotrowski.
jpayne@68 3 # All rights reserved.
jpayne@68 4 #
jpayne@68 5 # This file is part of the Biopython distribution and governed by your
jpayne@68 6 # choice of the "Biopython License Agreement" or the "BSD 3-Clause License".
jpayne@68 7 # Please see the LICENSE file that should have been included as part of this
jpayne@68 8 # package.
jpayne@68 9
jpayne@68 10 """Pairwise sequence alignment using a dynamic programming algorithm.
jpayne@68 11
jpayne@68 12 This provides functions to get global and local alignments between two
jpayne@68 13 sequences. A global alignment finds the best concordance between all
jpayne@68 14 characters in two sequences. A local alignment finds just the
jpayne@68 15 subsequences that align the best. Local alignments must have a positive
jpayne@68 16 score to be reported and they will not be extended for 'zero counting'
jpayne@68 17 matches. This means a local alignment will always start and end with
jpayne@68 18 a positive counting match.
jpayne@68 19
jpayne@68 20 When doing alignments, you can specify the match score and gap
jpayne@68 21 penalties. The match score indicates the compatibility between an
jpayne@68 22 alignment of two characters in the sequences. Highly compatible
jpayne@68 23 characters should be given positive scores, and incompatible ones
jpayne@68 24 should be given negative scores or 0. The gap penalties should be
jpayne@68 25 negative.
jpayne@68 26
jpayne@68 27 The names of the alignment functions in this module follow the
jpayne@68 28 convention
jpayne@68 29 <alignment type>XX
jpayne@68 30 where <alignment type> is either "global" or "local" and XX is a 2
jpayne@68 31 character code indicating the parameters it takes. The first
jpayne@68 32 character indicates the parameters for matches (and mismatches), and
jpayne@68 33 the second indicates the parameters for gap penalties.
jpayne@68 34
jpayne@68 35 The match parameters are::
jpayne@68 36
jpayne@68 37 CODE DESCRIPTION & OPTIONAL KEYWORDS
jpayne@68 38 x No parameters. Identical characters have score of 1, otherwise 0.
jpayne@68 39 m A match score is the score of identical chars, otherwise mismatch
jpayne@68 40 score. Keywords ``match``, ``mismatch``.
jpayne@68 41 d A dictionary returns the score of any pair of characters.
jpayne@68 42 Keyword ``match_dict``.
jpayne@68 43 c A callback function returns scores. Keyword ``match_fn``.
jpayne@68 44
jpayne@68 45 The gap penalty parameters are::
jpayne@68 46
jpayne@68 47 CODE DESCRIPTION & OPTIONAL KEYWORDS
jpayne@68 48 x No gap penalties.
jpayne@68 49 s Same open and extend gap penalties for both sequences.
jpayne@68 50 Keywords ``open``, ``extend``.
jpayne@68 51 d The sequences have different open and extend gap penalties.
jpayne@68 52 Keywords ``openA``, ``extendA``, ``openB``, ``extendB``.
jpayne@68 53 c A callback function returns the gap penalties.
jpayne@68 54 Keywords ``gap_A_fn``, ``gap_B_fn``.
jpayne@68 55
jpayne@68 56 All the different alignment functions are contained in an object
jpayne@68 57 ``align``. For example:
jpayne@68 58
jpayne@68 59 >>> from Bio import pairwise2
jpayne@68 60 >>> alignments = pairwise2.align.globalxx("ACCGT", "ACG")
jpayne@68 61
jpayne@68 62 For better readability, the required arguments can be used with optional keywords:
jpayne@68 63
jpayne@68 64 >>> alignments = pairwise2.align.globalxx(sequenceA="ACCGT", sequenceB="ACG")
jpayne@68 65
jpayne@68 66 The result is a list of the alignments between the two strings. Each alignment
jpayne@68 67 is a named tuple consisting of the two aligned sequences, the score and the
jpayne@68 68 start and end positions of the alignment:
jpayne@68 69
jpayne@68 70 >>> print(alignments)
jpayne@68 71 [Alignment(seqA='ACCGT', seqB='A-CG-', score=3.0, start=0, end=5), ...
jpayne@68 72
jpayne@68 73 You can access each element of an alignment by index or name:
jpayne@68 74
jpayne@68 75 >>> alignments[0][2]
jpayne@68 76 3.0
jpayne@68 77 >>> alignments[0].score
jpayne@68 78 3.0
jpayne@68 79
jpayne@68 80 For a nice printout of an alignment, use the ``format_alignment`` method of
jpayne@68 81 the module:
jpayne@68 82
jpayne@68 83 >>> from Bio.pairwise2 import format_alignment
jpayne@68 84 >>> print(format_alignment(*alignments[0]))
jpayne@68 85 ACCGT
jpayne@68 86 | ||
jpayne@68 87 A-CG-
jpayne@68 88 Score=3
jpayne@68 89 <BLANKLINE>
jpayne@68 90
jpayne@68 91 All alignment functions have the following arguments:
jpayne@68 92
jpayne@68 93 - Two sequences: strings, Biopython sequence objects or lists.
jpayne@68 94 Lists are useful for supplying sequences which contain residues that are
jpayne@68 95 encoded by more than one letter.
jpayne@68 96
jpayne@68 97 - ``penalize_extend_when_opening``: boolean (default: False).
jpayne@68 98 Whether to count an extension penalty when opening a gap. If false, a gap of
jpayne@68 99 1 is only penalized an "open" penalty, otherwise it is penalized
jpayne@68 100 "open+extend".
jpayne@68 101
jpayne@68 102 - ``penalize_end_gaps``: boolean.
jpayne@68 103 Whether to count the gaps at the ends of an alignment. By default, they are
jpayne@68 104 counted for global alignments but not for local ones. Setting
jpayne@68 105 ``penalize_end_gaps`` to (boolean, boolean) allows you to specify for the
jpayne@68 106 two sequences separately whether gaps at the end of the alignment should be
jpayne@68 107 counted.
jpayne@68 108
jpayne@68 109 - ``gap_char``: string (default: ``'-'``).
jpayne@68 110 Which character to use as a gap character in the alignment returned. If your
jpayne@68 111 input sequences are lists, you must change this to ``['-']``.
jpayne@68 112
jpayne@68 113 - ``force_generic``: boolean (default: False).
jpayne@68 114 Always use the generic, non-cached, dynamic programming function (slow!).
jpayne@68 115 For debugging.
jpayne@68 116
jpayne@68 117 - ``score_only``: boolean (default: False).
jpayne@68 118 Only get the best score, don't recover any alignments. The return value of
jpayne@68 119 the function is the score. Faster and uses less memory.
jpayne@68 120
jpayne@68 121 - ``one_alignment_only``: boolean (default: False).
jpayne@68 122 Only recover one alignment.
jpayne@68 123
jpayne@68 124 The other parameters of the alignment function depend on the function called.
jpayne@68 125 Some examples:
jpayne@68 126
jpayne@68 127 - Find the best global alignment between the two sequences. Identical
jpayne@68 128 characters are given 1 point. No points are deducted for mismatches or gaps.
jpayne@68 129
jpayne@68 130 >>> for a in pairwise2.align.globalxx("ACCGT", "ACG"):
jpayne@68 131 ... print(format_alignment(*a))
jpayne@68 132 ACCGT
jpayne@68 133 | ||
jpayne@68 134 A-CG-
jpayne@68 135 Score=3
jpayne@68 136 <BLANKLINE>
jpayne@68 137 ACCGT
jpayne@68 138 || |
jpayne@68 139 AC-G-
jpayne@68 140 Score=3
jpayne@68 141 <BLANKLINE>
jpayne@68 142
jpayne@68 143 - Same thing as before, but with a local alignment. Note that
jpayne@68 144 ``format_alignment`` will only show the aligned parts of the sequences,
jpayne@68 145 together with the starting positions.
jpayne@68 146
jpayne@68 147 >>> for a in pairwise2.align.localxx("ACCGT", "ACG"):
jpayne@68 148 ... print(format_alignment(*a))
jpayne@68 149 1 ACCG
jpayne@68 150 | ||
jpayne@68 151 1 A-CG
jpayne@68 152 Score=3
jpayne@68 153 <BLANKLINE>
jpayne@68 154 1 ACCG
jpayne@68 155 || |
jpayne@68 156 1 AC-G
jpayne@68 157 Score=3
jpayne@68 158 <BLANKLINE>
jpayne@68 159
jpayne@68 160 To restore the 'historic' behaviour of ``format_alignemt``, i.e., showing
jpayne@68 161 also the un-aligned parts of both sequences, use the new keyword parameter
jpayne@68 162 ``full_sequences``:
jpayne@68 163
jpayne@68 164 >>> for a in pairwise2.align.localxx("ACCGT", "ACG"):
jpayne@68 165 ... print(format_alignment(*a, full_sequences=True))
jpayne@68 166 ACCGT
jpayne@68 167 | ||
jpayne@68 168 A-CG-
jpayne@68 169 Score=3
jpayne@68 170 <BLANKLINE>
jpayne@68 171 ACCGT
jpayne@68 172 || |
jpayne@68 173 AC-G-
jpayne@68 174 Score=3
jpayne@68 175 <BLANKLINE>
jpayne@68 176
jpayne@68 177
jpayne@68 178 - Do a global alignment. Identical characters are given 2 points, 1 point is
jpayne@68 179 deducted for each non-identical character. Don't penalize gaps.
jpayne@68 180
jpayne@68 181 >>> for a in pairwise2.align.globalmx("ACCGT", "ACG", 2, -1):
jpayne@68 182 ... print(format_alignment(*a))
jpayne@68 183 ACCGT
jpayne@68 184 | ||
jpayne@68 185 A-CG-
jpayne@68 186 Score=6
jpayne@68 187 <BLANKLINE>
jpayne@68 188 ACCGT
jpayne@68 189 || |
jpayne@68 190 AC-G-
jpayne@68 191 Score=6
jpayne@68 192 <BLANKLINE>
jpayne@68 193
jpayne@68 194 - Same as above, except now 0.5 points are deducted when opening a gap, and
jpayne@68 195 0.1 points are deducted when extending it.
jpayne@68 196
jpayne@68 197 >>> for a in pairwise2.align.globalms("ACCGT", "ACG", 2, -1, -.5, -.1):
jpayne@68 198 ... print(format_alignment(*a))
jpayne@68 199 ACCGT
jpayne@68 200 | ||
jpayne@68 201 A-CG-
jpayne@68 202 Score=5
jpayne@68 203 <BLANKLINE>
jpayne@68 204 ACCGT
jpayne@68 205 || |
jpayne@68 206 AC-G-
jpayne@68 207 Score=5
jpayne@68 208 <BLANKLINE>
jpayne@68 209
jpayne@68 210 - Note that you can use keywords to increase the readability, e.g.:
jpayne@68 211
jpayne@68 212 >>> a = pairwise2.align.globalms("ACGT", "ACG", match=2, mismatch=-1, open=-.5,
jpayne@68 213 ... extend=-.1)
jpayne@68 214
jpayne@68 215 - Depending on the penalties, a gap in one sequence may be followed by a gap in
jpayne@68 216 the other sequence.If you don't like this behaviour, increase the gap-open
jpayne@68 217 penalty:
jpayne@68 218
jpayne@68 219 >>> for a in pairwise2.align.globalms("A", "T", 5, -4, -1, -.1):
jpayne@68 220 ... print(format_alignment(*a))
jpayne@68 221 A-
jpayne@68 222 <BLANKLINE>
jpayne@68 223 -T
jpayne@68 224 Score=-2
jpayne@68 225 <BLANKLINE>
jpayne@68 226 >>> for a in pairwise2.align.globalms("A", "T", 5, -4, -3, -.1):
jpayne@68 227 ... print(format_alignment(*a))
jpayne@68 228 A
jpayne@68 229 .
jpayne@68 230 T
jpayne@68 231 Score=-4
jpayne@68 232 <BLANKLINE>
jpayne@68 233
jpayne@68 234 - The alignment function can also use known matrices already included in
jpayne@68 235 Biopython (in ``Bio.Align.substitution_matrices``):
jpayne@68 236
jpayne@68 237 >>> from Bio.Align import substitution_matrices
jpayne@68 238 >>> matrix = substitution_matrices.load("BLOSUM62")
jpayne@68 239 >>> for a in pairwise2.align.globaldx("KEVLA", "EVL", matrix):
jpayne@68 240 ... print(format_alignment(*a))
jpayne@68 241 KEVLA
jpayne@68 242 |||
jpayne@68 243 -EVL-
jpayne@68 244 Score=13
jpayne@68 245 <BLANKLINE>
jpayne@68 246
jpayne@68 247 - With the parameter ``c`` you can define your own match- and gap functions.
jpayne@68 248 E.g. to define an affine logarithmic gap function and using it:
jpayne@68 249
jpayne@68 250 >>> from math import log
jpayne@68 251 >>> def gap_function(x, y): # x is gap position in seq, y is gap length
jpayne@68 252 ... if y == 0: # No gap
jpayne@68 253 ... return 0
jpayne@68 254 ... elif y == 1: # Gap open penalty
jpayne@68 255 ... return -2
jpayne@68 256 ... return - (2 + y/4.0 + log(y)/2.0)
jpayne@68 257 ...
jpayne@68 258 >>> alignment = pairwise2.align.globalmc("ACCCCCGT", "ACG", 5, -4,
jpayne@68 259 ... gap_function, gap_function)
jpayne@68 260
jpayne@68 261 You can define different gap functions for each sequence.
jpayne@68 262 Self-defined match functions must take the two residues to be compared and
jpayne@68 263 return a score.
jpayne@68 264
jpayne@68 265 To see a description of the parameters for a function, please look at
jpayne@68 266 the docstring for the function via the help function, e.g.
jpayne@68 267 type ``help(pairwise2.align.localds)`` at the Python prompt.
jpayne@68 268
jpayne@68 269 """ # noqa: W291
jpayne@68 270
jpayne@68 271 import warnings
jpayne@68 272 from collections import namedtuple
jpayne@68 273
jpayne@68 274 from Bio import BiopythonWarning
jpayne@68 275 from Bio import BiopythonDeprecationWarning
jpayne@68 276 from Bio.Align import substitution_matrices
jpayne@68 277
jpayne@68 278 warnings.warn(
jpayne@68 279 "Bio.pairwise2 has been deprecated, and we intend to remove it in a "
jpayne@68 280 "future release of Biopython. As an alternative, please consider using "
jpayne@68 281 "Bio.Align.PairwiseAligner as a replacement, and contact the "
jpayne@68 282 "Biopython developers if you still need the Bio.pairwise2 module.",
jpayne@68 283 BiopythonDeprecationWarning,
jpayne@68 284 )
jpayne@68 285
jpayne@68 286
jpayne@68 287 MAX_ALIGNMENTS = 1000 # maximum alignments recovered in traceback
jpayne@68 288
jpayne@68 289 Alignment = namedtuple("Alignment", ("seqA, seqB, score, start, end"))
jpayne@68 290
jpayne@68 291
jpayne@68 292 class align:
jpayne@68 293 """Provide functions that do alignments.
jpayne@68 294
jpayne@68 295 Alignment functions are called as:
jpayne@68 296
jpayne@68 297 pairwise2.align.globalXX
jpayne@68 298
jpayne@68 299 or
jpayne@68 300
jpayne@68 301 pairwise2.align.localXX
jpayne@68 302
jpayne@68 303 Where XX is a 2 character code indicating the match/mismatch parameters
jpayne@68 304 (first character, either x, m, d or c) and the gap penalty parameters
jpayne@68 305 (second character, either x, s, d, or c).
jpayne@68 306
jpayne@68 307 For a detailed description read the main module's docstring (e.g.,
jpayne@68 308 type ``help(pairwise2)``).
jpayne@68 309 To see a description of the parameters for a function, please
jpayne@68 310 look at the docstring for the function, e.g. type
jpayne@68 311 ``help(pairwise2.align.localds)`` at the Python prompt.
jpayne@68 312 """
jpayne@68 313
jpayne@68 314 class alignment_function:
jpayne@68 315 """Callable class which impersonates an alignment function.
jpayne@68 316
jpayne@68 317 The constructor takes the name of the function. This class
jpayne@68 318 will decode the name of the function to figure out how to
jpayne@68 319 interpret the parameters.
jpayne@68 320 """
jpayne@68 321
jpayne@68 322 # match code -> tuple of (parameters, docstring)
jpayne@68 323 match2args = {
jpayne@68 324 "x": ([], ""),
jpayne@68 325 "m": (
jpayne@68 326 ["match", "mismatch"],
jpayne@68 327 "match is the score to given to identical characters.\n"
jpayne@68 328 "mismatch is the score given to non-identical ones.",
jpayne@68 329 ),
jpayne@68 330 "d": (
jpayne@68 331 ["match_dict"],
jpayne@68 332 "match_dict is a dictionary where the keys are tuples\n"
jpayne@68 333 "of pairs of characters and the values are the scores,\n"
jpayne@68 334 "e.g. ('A', 'C') : 2.5.",
jpayne@68 335 ),
jpayne@68 336 "c": (
jpayne@68 337 ["match_fn"],
jpayne@68 338 "match_fn is a callback function that takes two "
jpayne@68 339 "characters and returns the score between them.",
jpayne@68 340 ),
jpayne@68 341 }
jpayne@68 342 # penalty code -> tuple of (parameters, docstring)
jpayne@68 343 penalty2args = {
jpayne@68 344 "x": ([], ""),
jpayne@68 345 "s": (
jpayne@68 346 ["open", "extend"],
jpayne@68 347 "open and extend are the gap penalties when a gap is\n"
jpayne@68 348 "opened and extended. They should be negative.",
jpayne@68 349 ),
jpayne@68 350 "d": (
jpayne@68 351 ["openA", "extendA", "openB", "extendB"],
jpayne@68 352 "openA and extendA are the gap penalties for sequenceA,\n"
jpayne@68 353 "and openB and extendB for sequenceB. The penalties\n"
jpayne@68 354 "should be negative.",
jpayne@68 355 ),
jpayne@68 356 "c": (
jpayne@68 357 ["gap_A_fn", "gap_B_fn"],
jpayne@68 358 "gap_A_fn and gap_B_fn are callback functions that takes\n"
jpayne@68 359 "(1) the index where the gap is opened, and (2) the length\n"
jpayne@68 360 "of the gap. They should return a gap penalty.",
jpayne@68 361 ),
jpayne@68 362 }
jpayne@68 363
jpayne@68 364 def __init__(self, name):
jpayne@68 365 """Check to make sure the name of the function is reasonable."""
jpayne@68 366 if name.startswith("global"):
jpayne@68 367 if len(name) != 8:
jpayne@68 368 raise AttributeError("function should be globalXX")
jpayne@68 369 elif name.startswith("local"):
jpayne@68 370 if len(name) != 7:
jpayne@68 371 raise AttributeError("function should be localXX")
jpayne@68 372 else:
jpayne@68 373 raise AttributeError(name)
jpayne@68 374 align_type, match_type, penalty_type = name[:-2], name[-2], name[-1]
jpayne@68 375 try:
jpayne@68 376 match_args, match_doc = self.match2args[match_type]
jpayne@68 377 except KeyError:
jpayne@68 378 raise AttributeError(f"unknown match type {match_type!r}")
jpayne@68 379 try:
jpayne@68 380 penalty_args, penalty_doc = self.penalty2args[penalty_type]
jpayne@68 381 except KeyError:
jpayne@68 382 raise AttributeError(f"unknown penalty type {penalty_type!r}")
jpayne@68 383
jpayne@68 384 # Now get the names of the parameters to this function.
jpayne@68 385 param_names = ["sequenceA", "sequenceB"]
jpayne@68 386 param_names.extend(match_args)
jpayne@68 387 param_names.extend(penalty_args)
jpayne@68 388 self.function_name = name
jpayne@68 389 self.align_type = align_type
jpayne@68 390 self.param_names = param_names
jpayne@68 391
jpayne@68 392 self.__name__ = self.function_name
jpayne@68 393 # Set the doc string.
jpayne@68 394 doc = f"{self.__name__}({', '.join(self.param_names)}) -> alignments\n"
jpayne@68 395 doc += """\
jpayne@68 396 \nThe following parameters can also be used with optional
jpayne@68 397 keywords of the same name.\n\n
jpayne@68 398 sequenceA and sequenceB must be of the same type, either
jpayne@68 399 strings, lists or Biopython sequence objects.\n
jpayne@68 400 """
jpayne@68 401 if match_doc:
jpayne@68 402 doc += f"\n{match_doc}\n"
jpayne@68 403 if penalty_doc:
jpayne@68 404 doc += f"\n{penalty_doc}\n"
jpayne@68 405 doc += """\
jpayne@68 406 \nalignments is a list of named tuples (seqA, seqB, score,
jpayne@68 407 begin, end). seqA and seqB are strings showing the alignment
jpayne@68 408 between the sequences. score is the score of the alignment.
jpayne@68 409 begin and end are indexes of seqA and seqB that indicate
jpayne@68 410 where the alignment occurs.
jpayne@68 411 """
jpayne@68 412 self.__doc__ = doc
jpayne@68 413
jpayne@68 414 def decode(self, *args, **keywds):
jpayne@68 415 """Decode the arguments for the _align function.
jpayne@68 416
jpayne@68 417 keywds will get passed to it, so translate the arguments
jpayne@68 418 to this function into forms appropriate for _align.
jpayne@68 419 """
jpayne@68 420 keywds = keywds.copy()
jpayne@68 421
jpayne@68 422 # Replace possible "keywords" with arguments:
jpayne@68 423 args += (len(self.param_names) - len(args)) * (None,)
jpayne@68 424 for key in keywds.copy():
jpayne@68 425 if key in self.param_names:
jpayne@68 426 _index = self.param_names.index(key)
jpayne@68 427 args = args[:_index] + (keywds[key],) + args[_index:]
jpayne@68 428 del keywds[key]
jpayne@68 429 args = tuple(arg for arg in args if arg is not None)
jpayne@68 430
jpayne@68 431 if len(args) != len(self.param_names):
jpayne@68 432 raise TypeError(
jpayne@68 433 "%s takes exactly %d argument (%d given)"
jpayne@68 434 % (self.function_name, len(self.param_names), len(args))
jpayne@68 435 )
jpayne@68 436
jpayne@68 437 i = 0
jpayne@68 438 while i < len(self.param_names):
jpayne@68 439 if self.param_names[i] in [
jpayne@68 440 "sequenceA",
jpayne@68 441 "sequenceB",
jpayne@68 442 "gap_A_fn",
jpayne@68 443 "gap_B_fn",
jpayne@68 444 "match_fn",
jpayne@68 445 ]:
jpayne@68 446 keywds[self.param_names[i]] = args[i]
jpayne@68 447 i += 1
jpayne@68 448 elif self.param_names[i] == "match":
jpayne@68 449 assert self.param_names[i + 1] == "mismatch"
jpayne@68 450 match, mismatch = args[i], args[i + 1]
jpayne@68 451 keywds["match_fn"] = identity_match(match, mismatch)
jpayne@68 452 i += 2
jpayne@68 453 elif self.param_names[i] == "match_dict":
jpayne@68 454 keywds["match_fn"] = dictionary_match(args[i])
jpayne@68 455 i += 1
jpayne@68 456 elif self.param_names[i] == "open":
jpayne@68 457 assert self.param_names[i + 1] == "extend"
jpayne@68 458 open, extend = args[i], args[i + 1]
jpayne@68 459 pe = keywds.get("penalize_extend_when_opening", 0)
jpayne@68 460 keywds["gap_A_fn"] = affine_penalty(open, extend, pe)
jpayne@68 461 keywds["gap_B_fn"] = affine_penalty(open, extend, pe)
jpayne@68 462 i += 2
jpayne@68 463 elif self.param_names[i] == "openA":
jpayne@68 464 assert self.param_names[i + 3] == "extendB"
jpayne@68 465 openA, extendA, openB, extendB = args[i : i + 4]
jpayne@68 466 pe = keywds.get("penalize_extend_when_opening", 0)
jpayne@68 467 keywds["gap_A_fn"] = affine_penalty(openA, extendA, pe)
jpayne@68 468 keywds["gap_B_fn"] = affine_penalty(openB, extendB, pe)
jpayne@68 469 i += 4
jpayne@68 470 else:
jpayne@68 471 raise ValueError(f"unknown parameter {self.param_names[i]!r}")
jpayne@68 472
jpayne@68 473 # Here are the default parameters for _align. Assign
jpayne@68 474 # these to keywds, unless already specified.
jpayne@68 475 pe = keywds.get("penalize_extend_when_opening", 0)
jpayne@68 476 default_params = [
jpayne@68 477 ("match_fn", identity_match(1, 0)),
jpayne@68 478 ("gap_A_fn", affine_penalty(0, 0, pe)),
jpayne@68 479 ("gap_B_fn", affine_penalty(0, 0, pe)),
jpayne@68 480 ("penalize_extend_when_opening", 0),
jpayne@68 481 ("penalize_end_gaps", self.align_type == "global"),
jpayne@68 482 ("align_globally", self.align_type == "global"),
jpayne@68 483 ("gap_char", "-"),
jpayne@68 484 ("force_generic", 0),
jpayne@68 485 ("score_only", 0),
jpayne@68 486 ("one_alignment_only", 0),
jpayne@68 487 ]
jpayne@68 488 for name, default in default_params:
jpayne@68 489 keywds[name] = keywds.get(name, default)
jpayne@68 490 value = keywds["penalize_end_gaps"]
jpayne@68 491 try:
jpayne@68 492 n = len(value)
jpayne@68 493 except TypeError:
jpayne@68 494 keywds["penalize_end_gaps"] = tuple([value] * 2)
jpayne@68 495 else:
jpayne@68 496 assert n == 2
jpayne@68 497 return keywds
jpayne@68 498
jpayne@68 499 def __call__(self, *args, **keywds):
jpayne@68 500 """Call the alignment instance already created."""
jpayne@68 501 keywds = self.decode(*args, **keywds)
jpayne@68 502 return _align(**keywds)
jpayne@68 503
jpayne@68 504 def __getattr__(self, attr):
jpayne@68 505 """Call alignment_function() to check and decode the attributes."""
jpayne@68 506 # The following 'magic' is needed to rewrite the class docstring
jpayne@68 507 # dynamically:
jpayne@68 508 wrapper = self.alignment_function(attr)
jpayne@68 509 wrapper_type = type(wrapper)
jpayne@68 510 wrapper_dict = wrapper_type.__dict__.copy()
jpayne@68 511 wrapper_dict["__doc__"] = wrapper.__doc__
jpayne@68 512 new_alignment_function = type("alignment_function", (object,), wrapper_dict)
jpayne@68 513
jpayne@68 514 return new_alignment_function(attr)
jpayne@68 515
jpayne@68 516
jpayne@68 517 align = align()
jpayne@68 518
jpayne@68 519
jpayne@68 520 def _align(
jpayne@68 521 sequenceA,
jpayne@68 522 sequenceB,
jpayne@68 523 match_fn,
jpayne@68 524 gap_A_fn,
jpayne@68 525 gap_B_fn,
jpayne@68 526 penalize_extend_when_opening,
jpayne@68 527 penalize_end_gaps,
jpayne@68 528 align_globally,
jpayne@68 529 gap_char,
jpayne@68 530 force_generic,
jpayne@68 531 score_only,
jpayne@68 532 one_alignment_only,
jpayne@68 533 ):
jpayne@68 534 """Return optimal alignments between two sequences (PRIVATE).
jpayne@68 535
jpayne@68 536 This method either returns a list of optimal alignments (with the same
jpayne@68 537 score) or just the optimal score.
jpayne@68 538 """
jpayne@68 539 if not sequenceA or not sequenceB:
jpayne@68 540 return []
jpayne@68 541 try:
jpayne@68 542 sequenceA + gap_char
jpayne@68 543 sequenceB + gap_char
jpayne@68 544 except TypeError:
jpayne@68 545 raise TypeError(
jpayne@68 546 "both sequences must be of the same type, either "
jpayne@68 547 "string/sequence object or list. Gap character must "
jpayne@68 548 "fit the sequence type (string or list)"
jpayne@68 549 )
jpayne@68 550
jpayne@68 551 if not isinstance(sequenceA, list):
jpayne@68 552 sequenceA = str(sequenceA)
jpayne@68 553 if not isinstance(sequenceB, list):
jpayne@68 554 sequenceB = str(sequenceB)
jpayne@68 555 if not align_globally and (penalize_end_gaps[0] or penalize_end_gaps[1]):
jpayne@68 556 warnings.warn(
jpayne@68 557 '"penalize_end_gaps" should not be used in local '
jpayne@68 558 "alignments. The resulting score may be wrong.",
jpayne@68 559 BiopythonWarning,
jpayne@68 560 )
jpayne@68 561
jpayne@68 562 if (
jpayne@68 563 (not force_generic)
jpayne@68 564 and isinstance(gap_A_fn, affine_penalty)
jpayne@68 565 and isinstance(gap_B_fn, affine_penalty)
jpayne@68 566 ):
jpayne@68 567 open_A, extend_A = gap_A_fn.open, gap_A_fn.extend
jpayne@68 568 open_B, extend_B = gap_B_fn.open, gap_B_fn.extend
jpayne@68 569 matrices = _make_score_matrix_fast(
jpayne@68 570 sequenceA,
jpayne@68 571 sequenceB,
jpayne@68 572 match_fn,
jpayne@68 573 open_A,
jpayne@68 574 extend_A,
jpayne@68 575 open_B,
jpayne@68 576 extend_B,
jpayne@68 577 penalize_extend_when_opening,
jpayne@68 578 penalize_end_gaps,
jpayne@68 579 align_globally,
jpayne@68 580 score_only,
jpayne@68 581 )
jpayne@68 582 else:
jpayne@68 583 matrices = _make_score_matrix_generic(
jpayne@68 584 sequenceA,
jpayne@68 585 sequenceB,
jpayne@68 586 match_fn,
jpayne@68 587 gap_A_fn,
jpayne@68 588 gap_B_fn,
jpayne@68 589 penalize_end_gaps,
jpayne@68 590 align_globally,
jpayne@68 591 score_only,
jpayne@68 592 )
jpayne@68 593
jpayne@68 594 score_matrix, trace_matrix, best_score = matrices
jpayne@68 595
jpayne@68 596 # print("SCORE %s" % print_matrix(score_matrix))
jpayne@68 597 # print("TRACEBACK %s" % print_matrix(trace_matrix))
jpayne@68 598
jpayne@68 599 # If they only want the score, then return it.
jpayne@68 600 if score_only:
jpayne@68 601 return best_score
jpayne@68 602
jpayne@68 603 starts = _find_start(score_matrix, best_score, align_globally)
jpayne@68 604
jpayne@68 605 # Recover the alignments and return them.
jpayne@68 606 alignments = _recover_alignments(
jpayne@68 607 sequenceA,
jpayne@68 608 sequenceB,
jpayne@68 609 starts,
jpayne@68 610 best_score,
jpayne@68 611 score_matrix,
jpayne@68 612 trace_matrix,
jpayne@68 613 align_globally,
jpayne@68 614 gap_char,
jpayne@68 615 one_alignment_only,
jpayne@68 616 gap_A_fn,
jpayne@68 617 gap_B_fn,
jpayne@68 618 )
jpayne@68 619 if not alignments:
jpayne@68 620 # This may happen, see recover_alignments for explanation
jpayne@68 621 score_matrix, trace_matrix = _reverse_matrices(score_matrix, trace_matrix)
jpayne@68 622 starts = [(z, (y, x)) for z, (x, y) in starts]
jpayne@68 623 alignments = _recover_alignments(
jpayne@68 624 sequenceB,
jpayne@68 625 sequenceA,
jpayne@68 626 starts,
jpayne@68 627 best_score,
jpayne@68 628 score_matrix,
jpayne@68 629 trace_matrix,
jpayne@68 630 align_globally,
jpayne@68 631 gap_char,
jpayne@68 632 one_alignment_only,
jpayne@68 633 gap_B_fn,
jpayne@68 634 gap_A_fn,
jpayne@68 635 reverse=True,
jpayne@68 636 )
jpayne@68 637 return alignments
jpayne@68 638
jpayne@68 639
jpayne@68 640 def _make_score_matrix_generic(
jpayne@68 641 sequenceA,
jpayne@68 642 sequenceB,
jpayne@68 643 match_fn,
jpayne@68 644 gap_A_fn,
jpayne@68 645 gap_B_fn,
jpayne@68 646 penalize_end_gaps,
jpayne@68 647 align_globally,
jpayne@68 648 score_only,
jpayne@68 649 ):
jpayne@68 650 """Generate a score and traceback matrix (PRIVATE).
jpayne@68 651
jpayne@68 652 This implementation according to Needleman-Wunsch allows the usage of
jpayne@68 653 general gap functions and is rather slow. It is automatically called if
jpayne@68 654 you define your own gap functions. You can force the usage of this method
jpayne@68 655 with ``force_generic=True``.
jpayne@68 656 """
jpayne@68 657 local_max_score = 0
jpayne@68 658 # Create the score and traceback matrices. These should be in the
jpayne@68 659 # shape:
jpayne@68 660 # sequenceA (down) x sequenceB (across)
jpayne@68 661 lenA, lenB = len(sequenceA), len(sequenceB)
jpayne@68 662 score_matrix, trace_matrix = [], []
jpayne@68 663 for i in range(lenA + 1):
jpayne@68 664 score_matrix.append([None] * (lenB + 1))
jpayne@68 665 if not score_only:
jpayne@68 666 trace_matrix.append([None] * (lenB + 1))
jpayne@68 667
jpayne@68 668 # Initialize first row and column with gap scores. This is like opening up
jpayne@68 669 # i gaps at the beginning of sequence A or B.
jpayne@68 670 for i in range(lenA + 1):
jpayne@68 671 if penalize_end_gaps[1]: # [1]:gap in sequence B
jpayne@68 672 score = gap_B_fn(0, i)
jpayne@68 673 else:
jpayne@68 674 score = 0.0
jpayne@68 675 score_matrix[i][0] = score
jpayne@68 676 for i in range(lenB + 1):
jpayne@68 677 if penalize_end_gaps[0]: # [0]:gap in sequence A
jpayne@68 678 score = gap_A_fn(0, i)
jpayne@68 679 else:
jpayne@68 680 score = 0.0
jpayne@68 681 score_matrix[0][i] = score
jpayne@68 682
jpayne@68 683 # Fill in the score matrix. Each position in the matrix
jpayne@68 684 # represents an alignment between a character from sequence A to
jpayne@68 685 # one in sequence B. As I iterate through the matrix, find the
jpayne@68 686 # alignment by choose the best of:
jpayne@68 687 # 1) extending a previous alignment without gaps
jpayne@68 688 # 2) adding a gap in sequenceA
jpayne@68 689 # 3) adding a gap in sequenceB
jpayne@68 690 for row in range(1, lenA + 1):
jpayne@68 691 for col in range(1, lenB + 1):
jpayne@68 692 # First, calculate the score that would occur by extending
jpayne@68 693 # the alignment without gaps.
jpayne@68 694 # fmt: off
jpayne@68 695 nogap_score = (
jpayne@68 696 score_matrix[row - 1][col - 1]
jpayne@68 697 + match_fn(sequenceA[row - 1], sequenceB[col - 1])
jpayne@68 698 )
jpayne@68 699
jpayne@68 700 # fmt: on
jpayne@68 701 # Try to find a better score by opening gaps in sequenceA.
jpayne@68 702 # Do this by checking alignments from each column in the row.
jpayne@68 703 # Each column represents a different character to align from,
jpayne@68 704 # and thus a different length gap.
jpayne@68 705 # Although the gap function does not distinguish between opening
jpayne@68 706 # and extending a gap, we distinguish them for the backtrace.
jpayne@68 707 if not penalize_end_gaps[0] and row == lenA:
jpayne@68 708 row_open = score_matrix[row][col - 1]
jpayne@68 709 row_extend = max(score_matrix[row][x] for x in range(col))
jpayne@68 710 else:
jpayne@68 711 row_open = score_matrix[row][col - 1] + gap_A_fn(row, 1)
jpayne@68 712 row_extend = max(
jpayne@68 713 score_matrix[row][x] + gap_A_fn(row, col - x) for x in range(col)
jpayne@68 714 )
jpayne@68 715
jpayne@68 716 # Try to find a better score by opening gaps in sequenceB.
jpayne@68 717 if not penalize_end_gaps[1] and col == lenB:
jpayne@68 718 col_open = score_matrix[row - 1][col]
jpayne@68 719 col_extend = max(score_matrix[x][col] for x in range(row))
jpayne@68 720 else:
jpayne@68 721 col_open = score_matrix[row - 1][col] + gap_B_fn(col, 1)
jpayne@68 722 col_extend = max(
jpayne@68 723 score_matrix[x][col] + gap_B_fn(col, row - x) for x in range(row)
jpayne@68 724 )
jpayne@68 725
jpayne@68 726 best_score = max(nogap_score, row_open, row_extend, col_open, col_extend)
jpayne@68 727 local_max_score = max(local_max_score, best_score)
jpayne@68 728 if not align_globally and best_score < 0:
jpayne@68 729 score_matrix[row][col] = 0.0
jpayne@68 730 else:
jpayne@68 731 score_matrix[row][col] = best_score
jpayne@68 732
jpayne@68 733 # The backtrace is encoded binary. See _make_score_matrix_fast
jpayne@68 734 # for details.
jpayne@68 735 if not score_only:
jpayne@68 736 trace_score = 0
jpayne@68 737 if rint(nogap_score) == rint(best_score):
jpayne@68 738 trace_score += 2
jpayne@68 739 if rint(row_open) == rint(best_score):
jpayne@68 740 trace_score += 1
jpayne@68 741 if rint(row_extend) == rint(best_score):
jpayne@68 742 trace_score += 8
jpayne@68 743 if rint(col_open) == rint(best_score):
jpayne@68 744 trace_score += 4
jpayne@68 745 if rint(col_extend) == rint(best_score):
jpayne@68 746 trace_score += 16
jpayne@68 747 trace_matrix[row][col] = trace_score
jpayne@68 748
jpayne@68 749 if not align_globally:
jpayne@68 750 best_score = local_max_score
jpayne@68 751
jpayne@68 752 return score_matrix, trace_matrix, best_score
jpayne@68 753
jpayne@68 754
jpayne@68 755 def _make_score_matrix_fast(
jpayne@68 756 sequenceA,
jpayne@68 757 sequenceB,
jpayne@68 758 match_fn,
jpayne@68 759 open_A,
jpayne@68 760 extend_A,
jpayne@68 761 open_B,
jpayne@68 762 extend_B,
jpayne@68 763 penalize_extend_when_opening,
jpayne@68 764 penalize_end_gaps,
jpayne@68 765 align_globally,
jpayne@68 766 score_only,
jpayne@68 767 ):
jpayne@68 768 """Generate a score and traceback matrix according to Gotoh (PRIVATE).
jpayne@68 769
jpayne@68 770 This is an implementation of the Needleman-Wunsch dynamic programming
jpayne@68 771 algorithm as modified by Gotoh, implementing affine gap penalties.
jpayne@68 772 In short, we have three matrices, holding scores for alignments ending
jpayne@68 773 in (1) a match/mismatch, (2) a gap in sequence A, and (3) a gap in
jpayne@68 774 sequence B, respectively. However, we can combine them in one matrix,
jpayne@68 775 which holds the best scores, and store only those values from the
jpayne@68 776 other matrices that are actually used for the next step of calculation.
jpayne@68 777 The traceback matrix holds the positions for backtracing the alignment.
jpayne@68 778 """
jpayne@68 779 first_A_gap = calc_affine_penalty(1, open_A, extend_A, penalize_extend_when_opening)
jpayne@68 780 first_B_gap = calc_affine_penalty(1, open_B, extend_B, penalize_extend_when_opening)
jpayne@68 781 local_max_score = 0
jpayne@68 782
jpayne@68 783 # Create the score and traceback matrices. These should be in the
jpayne@68 784 # shape:
jpayne@68 785 # sequenceA (down) x sequenceB (across)
jpayne@68 786 lenA, lenB = len(sequenceA), len(sequenceB)
jpayne@68 787 score_matrix, trace_matrix = [], []
jpayne@68 788 for i in range(lenA + 1):
jpayne@68 789 score_matrix.append([None] * (lenB + 1))
jpayne@68 790 if not score_only:
jpayne@68 791 trace_matrix.append([None] * (lenB + 1))
jpayne@68 792
jpayne@68 793 # Initialize first row and column with gap scores. This is like opening up
jpayne@68 794 # i gaps at the beginning of sequence A or B.
jpayne@68 795 for i in range(lenA + 1):
jpayne@68 796 if penalize_end_gaps[1]: # [1]:gap in sequence B
jpayne@68 797 score = calc_affine_penalty(
jpayne@68 798 i, open_B, extend_B, penalize_extend_when_opening
jpayne@68 799 )
jpayne@68 800 else:
jpayne@68 801 score = 0
jpayne@68 802 score_matrix[i][0] = score
jpayne@68 803 for i in range(lenB + 1):
jpayne@68 804 if penalize_end_gaps[0]: # [0]:gap in sequence A
jpayne@68 805 score = calc_affine_penalty(
jpayne@68 806 i, open_A, extend_A, penalize_extend_when_opening
jpayne@68 807 )
jpayne@68 808 else:
jpayne@68 809 score = 0
jpayne@68 810 score_matrix[0][i] = score
jpayne@68 811
jpayne@68 812 # Now initialize the col 'matrix'. Actually this is only a one dimensional
jpayne@68 813 # list, since we only need the col scores from the last row.
jpayne@68 814 col_score = [0] # Best score, if actual alignment ends with gap in seqB
jpayne@68 815 for i in range(1, lenB + 1):
jpayne@68 816 col_score.append(
jpayne@68 817 calc_affine_penalty(i, 2 * open_B, extend_B, penalize_extend_when_opening)
jpayne@68 818 )
jpayne@68 819
jpayne@68 820 # The row 'matrix' is calculated on the fly. Here we only need the actual
jpayne@68 821 # score.
jpayne@68 822 # Now, filling up the score and traceback matrices:
jpayne@68 823 for row in range(1, lenA + 1):
jpayne@68 824 row_score = calc_affine_penalty(
jpayne@68 825 row, 2 * open_A, extend_A, penalize_extend_when_opening
jpayne@68 826 )
jpayne@68 827 for col in range(1, lenB + 1):
jpayne@68 828 # Calculate the score that would occur by extending the
jpayne@68 829 # alignment without gaps.
jpayne@68 830 # fmt: off
jpayne@68 831 nogap_score = (
jpayne@68 832 score_matrix[row - 1][col - 1]
jpayne@68 833 + match_fn(sequenceA[row - 1], sequenceB[col - 1])
jpayne@68 834 )
jpayne@68 835 # fmt: on
jpayne@68 836 # Check the score that would occur if there were a gap in
jpayne@68 837 # sequence A. This could come from opening a new gap or
jpayne@68 838 # extending an existing one.
jpayne@68 839 # A gap in sequence A can also be opened if it follows a gap in
jpayne@68 840 # sequence B: A-
jpayne@68 841 # -B
jpayne@68 842 if not penalize_end_gaps[0] and row == lenA:
jpayne@68 843 row_open = score_matrix[row][col - 1]
jpayne@68 844 row_extend = row_score
jpayne@68 845 else:
jpayne@68 846 row_open = score_matrix[row][col - 1] + first_A_gap
jpayne@68 847 row_extend = row_score + extend_A
jpayne@68 848 row_score = max(row_open, row_extend)
jpayne@68 849
jpayne@68 850 # The same for sequence B:
jpayne@68 851 if not penalize_end_gaps[1] and col == lenB:
jpayne@68 852 col_open = score_matrix[row - 1][col]
jpayne@68 853 col_extend = col_score[col]
jpayne@68 854 else:
jpayne@68 855 col_open = score_matrix[row - 1][col] + first_B_gap
jpayne@68 856 col_extend = col_score[col] + extend_B
jpayne@68 857 col_score[col] = max(col_open, col_extend)
jpayne@68 858
jpayne@68 859 best_score = max(nogap_score, col_score[col], row_score)
jpayne@68 860 local_max_score = max(local_max_score, best_score)
jpayne@68 861 if not align_globally and best_score < 0:
jpayne@68 862 score_matrix[row][col] = 0
jpayne@68 863 else:
jpayne@68 864 score_matrix[row][col] = best_score
jpayne@68 865
jpayne@68 866 # Now the trace_matrix. The edges of the backtrace are encoded
jpayne@68 867 # binary: 1 = open gap in seqA, 2 = match/mismatch of seqA and
jpayne@68 868 # seqB, 4 = open gap in seqB, 8 = extend gap in seqA, and
jpayne@68 869 # 16 = extend gap in seqB. This values can be summed up.
jpayne@68 870 # Thus, the trace score 7 means that the best score can either
jpayne@68 871 # come from opening a gap in seqA (=1), pairing two characters
jpayne@68 872 # of seqA and seqB (+2=3) or opening a gap in seqB (+4=7).
jpayne@68 873 # However, if we only want the score we don't care about the trace.
jpayne@68 874 if not score_only:
jpayne@68 875 row_score_rint = rint(row_score)
jpayne@68 876 col_score_rint = rint(col_score[col])
jpayne@68 877 row_trace_score = 0
jpayne@68 878 col_trace_score = 0
jpayne@68 879 if rint(row_open) == row_score_rint:
jpayne@68 880 row_trace_score += 1 # Open gap in seqA
jpayne@68 881 if rint(row_extend) == row_score_rint:
jpayne@68 882 row_trace_score += 8 # Extend gap in seqA
jpayne@68 883 if rint(col_open) == col_score_rint:
jpayne@68 884 col_trace_score += 4 # Open gap in seqB
jpayne@68 885 if rint(col_extend) == col_score_rint:
jpayne@68 886 col_trace_score += 16 # Extend gap in seqB
jpayne@68 887
jpayne@68 888 trace_score = 0
jpayne@68 889 best_score_rint = rint(best_score)
jpayne@68 890 if rint(nogap_score) == best_score_rint:
jpayne@68 891 trace_score += 2 # Align seqA with seqB
jpayne@68 892 if row_score_rint == best_score_rint:
jpayne@68 893 trace_score += row_trace_score
jpayne@68 894 if col_score_rint == best_score_rint:
jpayne@68 895 trace_score += col_trace_score
jpayne@68 896 trace_matrix[row][col] = trace_score
jpayne@68 897
jpayne@68 898 if not align_globally:
jpayne@68 899 best_score = local_max_score
jpayne@68 900
jpayne@68 901 return score_matrix, trace_matrix, best_score
jpayne@68 902
jpayne@68 903
jpayne@68 904 def _recover_alignments(
jpayne@68 905 sequenceA,
jpayne@68 906 sequenceB,
jpayne@68 907 starts,
jpayne@68 908 best_score,
jpayne@68 909 score_matrix,
jpayne@68 910 trace_matrix,
jpayne@68 911 align_globally,
jpayne@68 912 gap_char,
jpayne@68 913 one_alignment_only,
jpayne@68 914 gap_A_fn,
jpayne@68 915 gap_B_fn,
jpayne@68 916 reverse=False,
jpayne@68 917 ):
jpayne@68 918 """Do the backtracing and return a list of alignments (PRIVATE).
jpayne@68 919
jpayne@68 920 Recover the alignments by following the traceback matrix. This
jpayne@68 921 is a recursive procedure, but it's implemented here iteratively
jpayne@68 922 with a stack.
jpayne@68 923
jpayne@68 924 sequenceA and sequenceB may be sequences, including strings,
jpayne@68 925 lists, or list-like objects. In order to preserve the type of
jpayne@68 926 the object, we need to use slices on the sequences instead of
jpayne@68 927 indexes. For example, sequenceA[row] may return a type that's
jpayne@68 928 not compatible with sequenceA, e.g. if sequenceA is a list and
jpayne@68 929 sequenceA[row] is a string. Thus, avoid using indexes and use
jpayne@68 930 slices, e.g. sequenceA[row:row+1]. Assume that client-defined
jpayne@68 931 sequence classes preserve these semantics.
jpayne@68 932 """
jpayne@68 933 lenA, lenB = len(sequenceA), len(sequenceB)
jpayne@68 934 ali_seqA, ali_seqB = sequenceA[0:0], sequenceB[0:0]
jpayne@68 935 tracebacks = []
jpayne@68 936 in_process = []
jpayne@68 937
jpayne@68 938 for start in starts:
jpayne@68 939 score, (row, col) = start
jpayne@68 940 begin = 0
jpayne@68 941 if align_globally:
jpayne@68 942 end = None
jpayne@68 943 else:
jpayne@68 944 # If this start is a zero-extension: don't start here!
jpayne@68 945 if (score, (row - 1, col - 1)) in starts:
jpayne@68 946 continue
jpayne@68 947 # Local alignments should start with a positive score!
jpayne@68 948 if score <= 0:
jpayne@68 949 continue
jpayne@68 950 # Local alignments should not end with a gap!:
jpayne@68 951 trace = trace_matrix[row][col]
jpayne@68 952 if (trace - trace % 2) % 4 == 2: # Trace contains 'nogap', fine!
jpayne@68 953 trace_matrix[row][col] = 2
jpayne@68 954 # If not, don't start here!
jpayne@68 955 else:
jpayne@68 956 continue
jpayne@68 957 end = -max(lenA - row, lenB - col)
jpayne@68 958 if not end:
jpayne@68 959 end = None
jpayne@68 960 col_distance = lenB - col
jpayne@68 961 row_distance = lenA - row
jpayne@68 962
jpayne@68 963 # fmt: off
jpayne@68 964 ali_seqA = (
jpayne@68 965 (col_distance - row_distance) * gap_char
jpayne@68 966 + sequenceA[lenA - 1 : row - 1 : -1]
jpayne@68 967 )
jpayne@68 968 ali_seqB = (
jpayne@68 969 (row_distance - col_distance) * gap_char
jpayne@68 970 + sequenceB[lenB - 1 : col - 1 : -1]
jpayne@68 971 )
jpayne@68 972 # fmt: on
jpayne@68 973 in_process += [
jpayne@68 974 (ali_seqA, ali_seqB, end, row, col, False, trace_matrix[row][col])
jpayne@68 975 ]
jpayne@68 976 while in_process and len(tracebacks) < MAX_ALIGNMENTS:
jpayne@68 977 # Although we allow a gap in seqB to be followed by a gap in seqA,
jpayne@68 978 # we don't want to allow it the other way round, since this would
jpayne@68 979 # give redundant alignments of type: A- vs. -A
jpayne@68 980 # -B B-
jpayne@68 981 # Thus we need to keep track if a gap in seqA was opened (col_gap)
jpayne@68 982 # and stop the backtrace (dead_end) if a gap in seqB follows.
jpayne@68 983 #
jpayne@68 984 # Attention: This may fail, if the gap-penalties for both strands are
jpayne@68 985 # different. In this case the second alignment may be the only optimal
jpayne@68 986 # alignment. Thus it can happen that no alignment is returned. For
jpayne@68 987 # this case a workaround was implemented, which reverses the input and
jpayne@68 988 # the matrices (this happens in _reverse_matrices) and repeats the
jpayne@68 989 # backtrace. The variable 'reverse' keeps track of this.
jpayne@68 990 dead_end = False
jpayne@68 991 ali_seqA, ali_seqB, end, row, col, col_gap, trace = in_process.pop()
jpayne@68 992
jpayne@68 993 while (row > 0 or col > 0) and not dead_end:
jpayne@68 994 cache = (ali_seqA[:], ali_seqB[:], end, row, col, col_gap)
jpayne@68 995
jpayne@68 996 # If trace is empty we have reached at least one border of the
jpayne@68 997 # matrix or the end of a local alignment. Just add the rest of
jpayne@68 998 # the sequence(s) and fill with gaps if necessary.
jpayne@68 999 if not trace:
jpayne@68 1000 if col and col_gap:
jpayne@68 1001 dead_end = True
jpayne@68 1002 else:
jpayne@68 1003 ali_seqA, ali_seqB = _finish_backtrace(
jpayne@68 1004 sequenceA, sequenceB, ali_seqA, ali_seqB, row, col, gap_char
jpayne@68 1005 )
jpayne@68 1006 break
jpayne@68 1007 elif trace % 2 == 1: # = row open = open gap in seqA
jpayne@68 1008 trace -= 1
jpayne@68 1009 if col_gap:
jpayne@68 1010 dead_end = True
jpayne@68 1011 else:
jpayne@68 1012 col -= 1
jpayne@68 1013 ali_seqA += gap_char
jpayne@68 1014 ali_seqB += sequenceB[col : col + 1]
jpayne@68 1015 col_gap = False
jpayne@68 1016 elif trace % 4 == 2: # = match/mismatch of seqA with seqB
jpayne@68 1017 trace -= 2
jpayne@68 1018 row -= 1
jpayne@68 1019 col -= 1
jpayne@68 1020 ali_seqA += sequenceA[row : row + 1]
jpayne@68 1021 ali_seqB += sequenceB[col : col + 1]
jpayne@68 1022 col_gap = False
jpayne@68 1023 elif trace % 8 == 4: # = col open = open gap in seqB
jpayne@68 1024 trace -= 4
jpayne@68 1025 row -= 1
jpayne@68 1026 ali_seqA += sequenceA[row : row + 1]
jpayne@68 1027 ali_seqB += gap_char
jpayne@68 1028 col_gap = True
jpayne@68 1029 elif trace in (8, 24): # = row extend = extend gap in seqA
jpayne@68 1030 trace -= 8
jpayne@68 1031 if col_gap:
jpayne@68 1032 dead_end = True
jpayne@68 1033 else:
jpayne@68 1034 col_gap = False
jpayne@68 1035 # We need to find the starting point of the extended gap
jpayne@68 1036 x = _find_gap_open(
jpayne@68 1037 sequenceA,
jpayne@68 1038 sequenceB,
jpayne@68 1039 ali_seqA,
jpayne@68 1040 ali_seqB,
jpayne@68 1041 end,
jpayne@68 1042 row,
jpayne@68 1043 col,
jpayne@68 1044 col_gap,
jpayne@68 1045 gap_char,
jpayne@68 1046 score_matrix,
jpayne@68 1047 trace_matrix,
jpayne@68 1048 in_process,
jpayne@68 1049 gap_A_fn,
jpayne@68 1050 col,
jpayne@68 1051 row,
jpayne@68 1052 "col",
jpayne@68 1053 best_score,
jpayne@68 1054 align_globally,
jpayne@68 1055 )
jpayne@68 1056 ali_seqA, ali_seqB, row, col, in_process, dead_end = x
jpayne@68 1057 elif trace == 16: # = col extend = extend gap in seqB
jpayne@68 1058 trace -= 16
jpayne@68 1059 col_gap = True
jpayne@68 1060 x = _find_gap_open(
jpayne@68 1061 sequenceA,
jpayne@68 1062 sequenceB,
jpayne@68 1063 ali_seqA,
jpayne@68 1064 ali_seqB,
jpayne@68 1065 end,
jpayne@68 1066 row,
jpayne@68 1067 col,
jpayne@68 1068 col_gap,
jpayne@68 1069 gap_char,
jpayne@68 1070 score_matrix,
jpayne@68 1071 trace_matrix,
jpayne@68 1072 in_process,
jpayne@68 1073 gap_B_fn,
jpayne@68 1074 row,
jpayne@68 1075 col,
jpayne@68 1076 "row",
jpayne@68 1077 best_score,
jpayne@68 1078 align_globally,
jpayne@68 1079 )
jpayne@68 1080 ali_seqA, ali_seqB, row, col, in_process, dead_end = x
jpayne@68 1081
jpayne@68 1082 if trace: # There is another path to follow...
jpayne@68 1083 cache += (trace,)
jpayne@68 1084 in_process.append(cache)
jpayne@68 1085 trace = trace_matrix[row][col]
jpayne@68 1086 if not align_globally:
jpayne@68 1087 if score_matrix[row][col] == best_score:
jpayne@68 1088 # We have gone through a 'zero-score' extension, discard it
jpayne@68 1089 dead_end = True
jpayne@68 1090 elif score_matrix[row][col] <= 0:
jpayne@68 1091 # We have reached the end of the backtrace
jpayne@68 1092 begin = max(row, col)
jpayne@68 1093 trace = 0
jpayne@68 1094 if not dead_end:
jpayne@68 1095 if not reverse:
jpayne@68 1096 tracebacks.append((ali_seqA[::-1], ali_seqB[::-1], score, begin, end))
jpayne@68 1097 else:
jpayne@68 1098 tracebacks.append((ali_seqB[::-1], ali_seqA[::-1], score, begin, end))
jpayne@68 1099 if one_alignment_only:
jpayne@68 1100 break
jpayne@68 1101 return _clean_alignments(tracebacks)
jpayne@68 1102
jpayne@68 1103
jpayne@68 1104 def _find_start(score_matrix, best_score, align_globally):
jpayne@68 1105 """Return a list of starting points (score, (row, col)) (PRIVATE).
jpayne@68 1106
jpayne@68 1107 Indicating every possible place to start the tracebacks.
jpayne@68 1108 """
jpayne@68 1109 nrows, ncols = len(score_matrix), len(score_matrix[0])
jpayne@68 1110 # In this implementation of the global algorithm, the start will always be
jpayne@68 1111 # the bottom right corner of the matrix.
jpayne@68 1112 if align_globally:
jpayne@68 1113 starts = [(best_score, (nrows - 1, ncols - 1))]
jpayne@68 1114 else:
jpayne@68 1115 # For local alignments, there may be many different start points.
jpayne@68 1116 starts = []
jpayne@68 1117 tolerance = 0 # XXX do anything with this?
jpayne@68 1118 # Now find all the positions within some tolerance of the best
jpayne@68 1119 # score.
jpayne@68 1120 for row in range(nrows):
jpayne@68 1121 for col in range(ncols):
jpayne@68 1122 score = score_matrix[row][col]
jpayne@68 1123 if rint(abs(score - best_score)) <= rint(tolerance):
jpayne@68 1124 starts.append((score, (row, col)))
jpayne@68 1125 return starts
jpayne@68 1126
jpayne@68 1127
jpayne@68 1128 def _reverse_matrices(score_matrix, trace_matrix):
jpayne@68 1129 """Reverse score and trace matrices (PRIVATE)."""
jpayne@68 1130 reverse_score_matrix = []
jpayne@68 1131 reverse_trace_matrix = []
jpayne@68 1132 # fmt: off
jpayne@68 1133 reverse_trace = {
jpayne@68 1134 1: 4, 2: 2, 3: 6, 4: 1, 5: 5, 6: 3, 7: 7, 8: 16, 9: 20, 10: 18, 11: 22, 12: 17,
jpayne@68 1135 13: 21, 14: 19, 15: 23, 16: 8, 17: 12, 18: 10, 19: 14, 20: 9, 21: 13, 22: 11,
jpayne@68 1136 23: 15, 24: 24, 25: 28, 26: 26, 27: 30, 28: 25, 29: 29, 30: 27, 31: 31,
jpayne@68 1137 None: None,
jpayne@68 1138 }
jpayne@68 1139 # fmt: on
jpayne@68 1140 for col in range(len(score_matrix[0])):
jpayne@68 1141 new_score_row = []
jpayne@68 1142 new_trace_row = []
jpayne@68 1143 for row in range(len(score_matrix)):
jpayne@68 1144 new_score_row.append(score_matrix[row][col])
jpayne@68 1145 new_trace_row.append(reverse_trace[trace_matrix[row][col]])
jpayne@68 1146 reverse_score_matrix.append(new_score_row)
jpayne@68 1147 reverse_trace_matrix.append(new_trace_row)
jpayne@68 1148 return reverse_score_matrix, reverse_trace_matrix
jpayne@68 1149
jpayne@68 1150
jpayne@68 1151 def _clean_alignments(alignments):
jpayne@68 1152 """Take a list of alignments and return a cleaned version (PRIVATE).
jpayne@68 1153
jpayne@68 1154 Remove duplicates, make sure begin and end are set correctly, remove
jpayne@68 1155 empty alignments.
jpayne@68 1156 """
jpayne@68 1157 unique_alignments = []
jpayne@68 1158 for align in alignments:
jpayne@68 1159 if align not in unique_alignments:
jpayne@68 1160 unique_alignments.append(align)
jpayne@68 1161 i = 0
jpayne@68 1162 while i < len(unique_alignments):
jpayne@68 1163 seqA, seqB, score, begin, end = unique_alignments[i]
jpayne@68 1164 # Make sure end is set reasonably.
jpayne@68 1165 if end is None: # global alignment
jpayne@68 1166 end = len(seqA)
jpayne@68 1167 elif end < 0:
jpayne@68 1168 end = end + len(seqA)
jpayne@68 1169 # If there's no alignment here, get rid of it.
jpayne@68 1170 if begin >= end:
jpayne@68 1171 del unique_alignments[i]
jpayne@68 1172 continue
jpayne@68 1173 unique_alignments[i] = Alignment(seqA, seqB, score, begin, end)
jpayne@68 1174 i += 1
jpayne@68 1175 return unique_alignments
jpayne@68 1176
jpayne@68 1177
jpayne@68 1178 def _finish_backtrace(sequenceA, sequenceB, ali_seqA, ali_seqB, row, col, gap_char):
jpayne@68 1179 """Add remaining sequences and fill with gaps if necessary (PRIVATE)."""
jpayne@68 1180 if row:
jpayne@68 1181 ali_seqA += sequenceA[row - 1 :: -1]
jpayne@68 1182 if col:
jpayne@68 1183 ali_seqB += sequenceB[col - 1 :: -1]
jpayne@68 1184 if row > col:
jpayne@68 1185 ali_seqB += gap_char * (len(ali_seqA) - len(ali_seqB))
jpayne@68 1186 elif col > row:
jpayne@68 1187 ali_seqA += gap_char * (len(ali_seqB) - len(ali_seqA))
jpayne@68 1188 return ali_seqA, ali_seqB
jpayne@68 1189
jpayne@68 1190
jpayne@68 1191 def _find_gap_open(
jpayne@68 1192 sequenceA,
jpayne@68 1193 sequenceB,
jpayne@68 1194 ali_seqA,
jpayne@68 1195 ali_seqB,
jpayne@68 1196 end,
jpayne@68 1197 row,
jpayne@68 1198 col,
jpayne@68 1199 col_gap,
jpayne@68 1200 gap_char,
jpayne@68 1201 score_matrix,
jpayne@68 1202 trace_matrix,
jpayne@68 1203 in_process,
jpayne@68 1204 gap_fn,
jpayne@68 1205 target,
jpayne@68 1206 index,
jpayne@68 1207 direction,
jpayne@68 1208 best_score,
jpayne@68 1209 align_globally,
jpayne@68 1210 ):
jpayne@68 1211 """Find the starting point(s) of the extended gap (PRIVATE)."""
jpayne@68 1212 dead_end = False
jpayne@68 1213 target_score = score_matrix[row][col]
jpayne@68 1214 for n in range(target):
jpayne@68 1215 if direction == "col":
jpayne@68 1216 col -= 1
jpayne@68 1217 ali_seqA += gap_char
jpayne@68 1218 ali_seqB += sequenceB[col : col + 1]
jpayne@68 1219 else:
jpayne@68 1220 row -= 1
jpayne@68 1221 ali_seqA += sequenceA[row : row + 1]
jpayne@68 1222 ali_seqB += gap_char
jpayne@68 1223 actual_score = score_matrix[row][col] + gap_fn(index, n + 1)
jpayne@68 1224 if not align_globally and score_matrix[row][col] == best_score:
jpayne@68 1225 # We have run through a 'zero-score' extension and discard it
jpayne@68 1226 dead_end = True
jpayne@68 1227 break
jpayne@68 1228 if rint(actual_score) == rint(target_score) and n > 0:
jpayne@68 1229 if not trace_matrix[row][col]:
jpayne@68 1230 break
jpayne@68 1231 else:
jpayne@68 1232 in_process.append(
jpayne@68 1233 (
jpayne@68 1234 ali_seqA[:],
jpayne@68 1235 ali_seqB[:],
jpayne@68 1236 end,
jpayne@68 1237 row,
jpayne@68 1238 col,
jpayne@68 1239 col_gap,
jpayne@68 1240 trace_matrix[row][col],
jpayne@68 1241 )
jpayne@68 1242 )
jpayne@68 1243 if not trace_matrix[row][col]:
jpayne@68 1244 dead_end = True
jpayne@68 1245 return ali_seqA, ali_seqB, row, col, in_process, dead_end
jpayne@68 1246
jpayne@68 1247
jpayne@68 1248 _PRECISION = 1000
jpayne@68 1249
jpayne@68 1250
jpayne@68 1251 def rint(x, precision=_PRECISION):
jpayne@68 1252 """Print number with declared precision."""
jpayne@68 1253 return int(x * precision + 0.5)
jpayne@68 1254
jpayne@68 1255
jpayne@68 1256 class identity_match:
jpayne@68 1257 """Create a match function for use in an alignment.
jpayne@68 1258
jpayne@68 1259 match and mismatch are the scores to give when two residues are equal
jpayne@68 1260 or unequal. By default, match is 1 and mismatch is 0.
jpayne@68 1261 """
jpayne@68 1262
jpayne@68 1263 def __init__(self, match=1, mismatch=0):
jpayne@68 1264 """Initialize the class."""
jpayne@68 1265 self.match = match
jpayne@68 1266 self.mismatch = mismatch
jpayne@68 1267
jpayne@68 1268 def __call__(self, charA, charB):
jpayne@68 1269 """Call a match function instance already created."""
jpayne@68 1270 if charA == charB:
jpayne@68 1271 return self.match
jpayne@68 1272 return self.mismatch
jpayne@68 1273
jpayne@68 1274
jpayne@68 1275 class dictionary_match:
jpayne@68 1276 """Create a match function for use in an alignment.
jpayne@68 1277
jpayne@68 1278 Attributes:
jpayne@68 1279 - score_dict - A dictionary where the keys are tuples (residue 1,
jpayne@68 1280 residue 2) and the values are the match scores between those residues.
jpayne@68 1281 - symmetric - A flag that indicates whether the scores are symmetric.
jpayne@68 1282
jpayne@68 1283 """
jpayne@68 1284
jpayne@68 1285 def __init__(self, score_dict, symmetric=1):
jpayne@68 1286 """Initialize the class."""
jpayne@68 1287 if isinstance(score_dict, substitution_matrices.Array):
jpayne@68 1288 score_dict = dict(score_dict) # Access to dict is much faster
jpayne@68 1289 self.score_dict = score_dict
jpayne@68 1290 self.symmetric = symmetric
jpayne@68 1291
jpayne@68 1292 def __call__(self, charA, charB):
jpayne@68 1293 """Call a dictionary match instance already created."""
jpayne@68 1294 if self.symmetric and (charA, charB) not in self.score_dict:
jpayne@68 1295 # If the score dictionary is symmetric, then look up the
jpayne@68 1296 # score both ways.
jpayne@68 1297 charB, charA = charA, charB
jpayne@68 1298 return self.score_dict[(charA, charB)]
jpayne@68 1299
jpayne@68 1300
jpayne@68 1301 class affine_penalty:
jpayne@68 1302 """Create a gap function for use in an alignment."""
jpayne@68 1303
jpayne@68 1304 def __init__(self, open, extend, penalize_extend_when_opening=0):
jpayne@68 1305 """Initialize the class."""
jpayne@68 1306 if open > 0 or extend > 0:
jpayne@68 1307 raise ValueError("Gap penalties should be non-positive.")
jpayne@68 1308 if not penalize_extend_when_opening and (extend < open):
jpayne@68 1309 raise ValueError(
jpayne@68 1310 "Gap opening penalty should be higher than "
jpayne@68 1311 "gap extension penalty (or equal)"
jpayne@68 1312 )
jpayne@68 1313 self.open, self.extend = open, extend
jpayne@68 1314 self.penalize_extend_when_opening = penalize_extend_when_opening
jpayne@68 1315
jpayne@68 1316 def __call__(self, index, length):
jpayne@68 1317 """Call a gap function instance already created."""
jpayne@68 1318 return calc_affine_penalty(
jpayne@68 1319 length, self.open, self.extend, self.penalize_extend_when_opening
jpayne@68 1320 )
jpayne@68 1321
jpayne@68 1322
jpayne@68 1323 def calc_affine_penalty(length, open, extend, penalize_extend_when_opening):
jpayne@68 1324 """Calculate a penalty score for the gap function."""
jpayne@68 1325 if length <= 0:
jpayne@68 1326 return 0.0
jpayne@68 1327 penalty = open + extend * length
jpayne@68 1328 if not penalize_extend_when_opening:
jpayne@68 1329 penalty -= extend
jpayne@68 1330 return penalty
jpayne@68 1331
jpayne@68 1332
jpayne@68 1333 def print_matrix(matrix):
jpayne@68 1334 """Print out a matrix for debugging purposes."""
jpayne@68 1335 # Transpose the matrix and get the length of the values in each column.
jpayne@68 1336 matrixT = [[] for x in range(len(matrix[0]))]
jpayne@68 1337 for i in range(len(matrix)):
jpayne@68 1338 for j in range(len(matrix[i])):
jpayne@68 1339 matrixT[j].append(len(str(matrix[i][j])))
jpayne@68 1340 ndigits = [max(x) for x in matrixT]
jpayne@68 1341 for i in range(len(matrix)):
jpayne@68 1342 # Using string formatting trick to add leading spaces,
jpayne@68 1343 print(
jpayne@68 1344 " ".join("%*s " % (ndigits[j], matrix[i][j]) for j in range(len(matrix[i])))
jpayne@68 1345 )
jpayne@68 1346
jpayne@68 1347
jpayne@68 1348 def format_alignment(align1, align2, score, begin, end, full_sequences=False):
jpayne@68 1349 """Format the alignment prettily into a string.
jpayne@68 1350
jpayne@68 1351 IMPORTANT: Gap symbol must be "-" (or ['-'] for lists)!
jpayne@68 1352
jpayne@68 1353 Since Biopython 1.71 identical matches are shown with a pipe
jpayne@68 1354 character, mismatches as a dot, and gaps as a space.
jpayne@68 1355
jpayne@68 1356 Prior releases just used the pipe character to indicate the
jpayne@68 1357 aligned region (matches, mismatches and gaps).
jpayne@68 1358
jpayne@68 1359 Also, in local alignments, if the alignment does not include
jpayne@68 1360 the whole sequences, now only the aligned part is shown,
jpayne@68 1361 together with the start positions of the aligned subsequences.
jpayne@68 1362 The start positions are 1-based; so start position n is the
jpayne@68 1363 n-th base/amino acid in the *un-aligned* sequence.
jpayne@68 1364
jpayne@68 1365 NOTE: This is different to the alignment's begin/end values,
jpayne@68 1366 which give the Python indices (0-based) of the bases/amino acids
jpayne@68 1367 in the *aligned* sequences.
jpayne@68 1368
jpayne@68 1369 If you want to restore the 'historic' behaviour, that means
jpayne@68 1370 displaying the whole sequences (including the non-aligned parts),
jpayne@68 1371 use ``full_sequences=True``. In this case, the non-aligned leading
jpayne@68 1372 and trailing parts are also indicated by spaces in the match-line.
jpayne@68 1373 """
jpayne@68 1374 align_begin = begin
jpayne@68 1375 align_end = end
jpayne@68 1376 start1 = start2 = ""
jpayne@68 1377 start_m = begin # Begin of match line (how many spaces to include)
jpayne@68 1378 # For local alignments:
jpayne@68 1379 if not full_sequences and (begin != 0 or end != len(align1)):
jpayne@68 1380 # Calculate the actual start positions in the un-aligned sequences
jpayne@68 1381 # This will only work if the gap symbol is '-' or ['-']!
jpayne@68 1382 start1 = str(len(align1[:begin]) - align1[:begin].count("-") + 1) + " "
jpayne@68 1383 start2 = str(len(align2[:begin]) - align2[:begin].count("-") + 1) + " "
jpayne@68 1384 start_m = max(len(start1), len(start2))
jpayne@68 1385 elif full_sequences:
jpayne@68 1386 start_m = 0
jpayne@68 1387 begin = 0
jpayne@68 1388 end = len(align1)
jpayne@68 1389
jpayne@68 1390 if isinstance(align1, list):
jpayne@68 1391 # List elements will be separated by spaces, since they can be
jpayne@68 1392 # of different lengths
jpayne@68 1393 align1 = [a + " " for a in align1]
jpayne@68 1394 align2 = [a + " " for a in align2]
jpayne@68 1395
jpayne@68 1396 s1_line = ["{:>{width}}".format(start1, width=start_m)] # seq1 line
jpayne@68 1397 m_line = [" " * start_m] # match line
jpayne@68 1398 s2_line = ["{:>{width}}".format(start2, width=start_m)] # seq2 line
jpayne@68 1399
jpayne@68 1400 for n, (a, b) in enumerate(zip(align1[begin:end], align2[begin:end])):
jpayne@68 1401 # Since list elements can be of different length, we center them,
jpayne@68 1402 # using the maximum length of the two compared elements as width
jpayne@68 1403 m_len = max(len(a), len(b))
jpayne@68 1404 s1_line.append("{:^{width}}".format(a, width=m_len))
jpayne@68 1405 s2_line.append("{:^{width}}".format(b, width=m_len))
jpayne@68 1406 if full_sequences and (n < align_begin or n >= align_end):
jpayne@68 1407 m_line.append("{:^{width}}".format(" ", width=m_len)) # space
jpayne@68 1408 continue
jpayne@68 1409 if a == b:
jpayne@68 1410 m_line.append("{:^{width}}".format("|", width=m_len)) # match
jpayne@68 1411 elif a.strip() == "-" or b.strip() == "-":
jpayne@68 1412 m_line.append("{:^{width}}".format(" ", width=m_len)) # gap
jpayne@68 1413 else:
jpayne@68 1414 m_line.append("{:^{width}}".format(".", width=m_len)) # mismatch
jpayne@68 1415
jpayne@68 1416 s2_line.append(f"\n Score={score:g}\n")
jpayne@68 1417 return "\n".join(["".join(s1_line), "".join(m_line), "".join(s2_line)])
jpayne@68 1418
jpayne@68 1419
jpayne@68 1420 # Try and load C implementations of functions. If I can't,
jpayne@68 1421 # then throw a warning and use the pure Python implementations.
jpayne@68 1422 # The redefinition is deliberate, thus the no quality assurance
jpayne@68 1423 # flag for when using flake8.
jpayne@68 1424 # Before, we secure access to the pure Python functions (for testing purposes):
jpayne@68 1425
jpayne@68 1426 _python_make_score_matrix_fast = _make_score_matrix_fast
jpayne@68 1427 _python_rint = rint
jpayne@68 1428
jpayne@68 1429 try:
jpayne@68 1430 from .cpairwise2 import rint, _make_score_matrix_fast # noqa
jpayne@68 1431 except ImportError:
jpayne@68 1432 warnings.warn(
jpayne@68 1433 "Import of C module failed. Falling back to pure Python "
jpayne@68 1434 "implementation. This may be slooow...",
jpayne@68 1435 BiopythonWarning,
jpayne@68 1436 )
jpayne@68 1437
jpayne@68 1438 if __name__ == "__main__":
jpayne@68 1439 from Bio._utils import run_doctest
jpayne@68 1440
jpayne@68 1441 run_doctest()