annotate CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/lib/python3.8/site-packages/Bio/pairwise2.py @ 69:33d812a61356

planemo upload commit 2e9511a184a1ca667c7be0c6321a36dc4e3d116d
author jpayne
date Tue, 18 Mar 2025 17:55:14 -0400
parents
children
rev   line source
jpayne@69 1 # Copyright 2002 by Jeffrey Chang.
jpayne@69 2 # Copyright 2016, 2019, 2020 by Markus Piotrowski.
jpayne@69 3 # All rights reserved.
jpayne@69 4 #
jpayne@69 5 # This file is part of the Biopython distribution and governed by your
jpayne@69 6 # choice of the "Biopython License Agreement" or the "BSD 3-Clause License".
jpayne@69 7 # Please see the LICENSE file that should have been included as part of this
jpayne@69 8 # package.
jpayne@69 9
jpayne@69 10 """Pairwise sequence alignment using a dynamic programming algorithm.
jpayne@69 11
jpayne@69 12 This provides functions to get global and local alignments between two
jpayne@69 13 sequences. A global alignment finds the best concordance between all
jpayne@69 14 characters in two sequences. A local alignment finds just the
jpayne@69 15 subsequences that align the best. Local alignments must have a positive
jpayne@69 16 score to be reported and they will not be extended for 'zero counting'
jpayne@69 17 matches. This means a local alignment will always start and end with
jpayne@69 18 a positive counting match.
jpayne@69 19
jpayne@69 20 When doing alignments, you can specify the match score and gap
jpayne@69 21 penalties. The match score indicates the compatibility between an
jpayne@69 22 alignment of two characters in the sequences. Highly compatible
jpayne@69 23 characters should be given positive scores, and incompatible ones
jpayne@69 24 should be given negative scores or 0. The gap penalties should be
jpayne@69 25 negative.
jpayne@69 26
jpayne@69 27 The names of the alignment functions in this module follow the
jpayne@69 28 convention
jpayne@69 29 <alignment type>XX
jpayne@69 30 where <alignment type> is either "global" or "local" and XX is a 2
jpayne@69 31 character code indicating the parameters it takes. The first
jpayne@69 32 character indicates the parameters for matches (and mismatches), and
jpayne@69 33 the second indicates the parameters for gap penalties.
jpayne@69 34
jpayne@69 35 The match parameters are::
jpayne@69 36
jpayne@69 37 CODE DESCRIPTION & OPTIONAL KEYWORDS
jpayne@69 38 x No parameters. Identical characters have score of 1, otherwise 0.
jpayne@69 39 m A match score is the score of identical chars, otherwise mismatch
jpayne@69 40 score. Keywords ``match``, ``mismatch``.
jpayne@69 41 d A dictionary returns the score of any pair of characters.
jpayne@69 42 Keyword ``match_dict``.
jpayne@69 43 c A callback function returns scores. Keyword ``match_fn``.
jpayne@69 44
jpayne@69 45 The gap penalty parameters are::
jpayne@69 46
jpayne@69 47 CODE DESCRIPTION & OPTIONAL KEYWORDS
jpayne@69 48 x No gap penalties.
jpayne@69 49 s Same open and extend gap penalties for both sequences.
jpayne@69 50 Keywords ``open``, ``extend``.
jpayne@69 51 d The sequences have different open and extend gap penalties.
jpayne@69 52 Keywords ``openA``, ``extendA``, ``openB``, ``extendB``.
jpayne@69 53 c A callback function returns the gap penalties.
jpayne@69 54 Keywords ``gap_A_fn``, ``gap_B_fn``.
jpayne@69 55
jpayne@69 56 All the different alignment functions are contained in an object
jpayne@69 57 ``align``. For example:
jpayne@69 58
jpayne@69 59 >>> from Bio import pairwise2
jpayne@69 60 >>> alignments = pairwise2.align.globalxx("ACCGT", "ACG")
jpayne@69 61
jpayne@69 62 For better readability, the required arguments can be used with optional keywords:
jpayne@69 63
jpayne@69 64 >>> alignments = pairwise2.align.globalxx(sequenceA="ACCGT", sequenceB="ACG")
jpayne@69 65
jpayne@69 66 The result is a list of the alignments between the two strings. Each alignment
jpayne@69 67 is a named tuple consisting of the two aligned sequences, the score and the
jpayne@69 68 start and end positions of the alignment:
jpayne@69 69
jpayne@69 70 >>> print(alignments)
jpayne@69 71 [Alignment(seqA='ACCGT', seqB='A-CG-', score=3.0, start=0, end=5), ...
jpayne@69 72
jpayne@69 73 You can access each element of an alignment by index or name:
jpayne@69 74
jpayne@69 75 >>> alignments[0][2]
jpayne@69 76 3.0
jpayne@69 77 >>> alignments[0].score
jpayne@69 78 3.0
jpayne@69 79
jpayne@69 80 For a nice printout of an alignment, use the ``format_alignment`` method of
jpayne@69 81 the module:
jpayne@69 82
jpayne@69 83 >>> from Bio.pairwise2 import format_alignment
jpayne@69 84 >>> print(format_alignment(*alignments[0]))
jpayne@69 85 ACCGT
jpayne@69 86 | ||
jpayne@69 87 A-CG-
jpayne@69 88 Score=3
jpayne@69 89 <BLANKLINE>
jpayne@69 90
jpayne@69 91 All alignment functions have the following arguments:
jpayne@69 92
jpayne@69 93 - Two sequences: strings, Biopython sequence objects or lists.
jpayne@69 94 Lists are useful for supplying sequences which contain residues that are
jpayne@69 95 encoded by more than one letter.
jpayne@69 96
jpayne@69 97 - ``penalize_extend_when_opening``: boolean (default: False).
jpayne@69 98 Whether to count an extension penalty when opening a gap. If false, a gap of
jpayne@69 99 1 is only penalized an "open" penalty, otherwise it is penalized
jpayne@69 100 "open+extend".
jpayne@69 101
jpayne@69 102 - ``penalize_end_gaps``: boolean.
jpayne@69 103 Whether to count the gaps at the ends of an alignment. By default, they are
jpayne@69 104 counted for global alignments but not for local ones. Setting
jpayne@69 105 ``penalize_end_gaps`` to (boolean, boolean) allows you to specify for the
jpayne@69 106 two sequences separately whether gaps at the end of the alignment should be
jpayne@69 107 counted.
jpayne@69 108
jpayne@69 109 - ``gap_char``: string (default: ``'-'``).
jpayne@69 110 Which character to use as a gap character in the alignment returned. If your
jpayne@69 111 input sequences are lists, you must change this to ``['-']``.
jpayne@69 112
jpayne@69 113 - ``force_generic``: boolean (default: False).
jpayne@69 114 Always use the generic, non-cached, dynamic programming function (slow!).
jpayne@69 115 For debugging.
jpayne@69 116
jpayne@69 117 - ``score_only``: boolean (default: False).
jpayne@69 118 Only get the best score, don't recover any alignments. The return value of
jpayne@69 119 the function is the score. Faster and uses less memory.
jpayne@69 120
jpayne@69 121 - ``one_alignment_only``: boolean (default: False).
jpayne@69 122 Only recover one alignment.
jpayne@69 123
jpayne@69 124 The other parameters of the alignment function depend on the function called.
jpayne@69 125 Some examples:
jpayne@69 126
jpayne@69 127 - Find the best global alignment between the two sequences. Identical
jpayne@69 128 characters are given 1 point. No points are deducted for mismatches or gaps.
jpayne@69 129
jpayne@69 130 >>> for a in pairwise2.align.globalxx("ACCGT", "ACG"):
jpayne@69 131 ... print(format_alignment(*a))
jpayne@69 132 ACCGT
jpayne@69 133 | ||
jpayne@69 134 A-CG-
jpayne@69 135 Score=3
jpayne@69 136 <BLANKLINE>
jpayne@69 137 ACCGT
jpayne@69 138 || |
jpayne@69 139 AC-G-
jpayne@69 140 Score=3
jpayne@69 141 <BLANKLINE>
jpayne@69 142
jpayne@69 143 - Same thing as before, but with a local alignment. Note that
jpayne@69 144 ``format_alignment`` will only show the aligned parts of the sequences,
jpayne@69 145 together with the starting positions.
jpayne@69 146
jpayne@69 147 >>> for a in pairwise2.align.localxx("ACCGT", "ACG"):
jpayne@69 148 ... print(format_alignment(*a))
jpayne@69 149 1 ACCG
jpayne@69 150 | ||
jpayne@69 151 1 A-CG
jpayne@69 152 Score=3
jpayne@69 153 <BLANKLINE>
jpayne@69 154 1 ACCG
jpayne@69 155 || |
jpayne@69 156 1 AC-G
jpayne@69 157 Score=3
jpayne@69 158 <BLANKLINE>
jpayne@69 159
jpayne@69 160 To restore the 'historic' behaviour of ``format_alignemt``, i.e., showing
jpayne@69 161 also the un-aligned parts of both sequences, use the new keyword parameter
jpayne@69 162 ``full_sequences``:
jpayne@69 163
jpayne@69 164 >>> for a in pairwise2.align.localxx("ACCGT", "ACG"):
jpayne@69 165 ... print(format_alignment(*a, full_sequences=True))
jpayne@69 166 ACCGT
jpayne@69 167 | ||
jpayne@69 168 A-CG-
jpayne@69 169 Score=3
jpayne@69 170 <BLANKLINE>
jpayne@69 171 ACCGT
jpayne@69 172 || |
jpayne@69 173 AC-G-
jpayne@69 174 Score=3
jpayne@69 175 <BLANKLINE>
jpayne@69 176
jpayne@69 177
jpayne@69 178 - Do a global alignment. Identical characters are given 2 points, 1 point is
jpayne@69 179 deducted for each non-identical character. Don't penalize gaps.
jpayne@69 180
jpayne@69 181 >>> for a in pairwise2.align.globalmx("ACCGT", "ACG", 2, -1):
jpayne@69 182 ... print(format_alignment(*a))
jpayne@69 183 ACCGT
jpayne@69 184 | ||
jpayne@69 185 A-CG-
jpayne@69 186 Score=6
jpayne@69 187 <BLANKLINE>
jpayne@69 188 ACCGT
jpayne@69 189 || |
jpayne@69 190 AC-G-
jpayne@69 191 Score=6
jpayne@69 192 <BLANKLINE>
jpayne@69 193
jpayne@69 194 - Same as above, except now 0.5 points are deducted when opening a gap, and
jpayne@69 195 0.1 points are deducted when extending it.
jpayne@69 196
jpayne@69 197 >>> for a in pairwise2.align.globalms("ACCGT", "ACG", 2, -1, -.5, -.1):
jpayne@69 198 ... print(format_alignment(*a))
jpayne@69 199 ACCGT
jpayne@69 200 | ||
jpayne@69 201 A-CG-
jpayne@69 202 Score=5
jpayne@69 203 <BLANKLINE>
jpayne@69 204 ACCGT
jpayne@69 205 || |
jpayne@69 206 AC-G-
jpayne@69 207 Score=5
jpayne@69 208 <BLANKLINE>
jpayne@69 209
jpayne@69 210 - Note that you can use keywords to increase the readability, e.g.:
jpayne@69 211
jpayne@69 212 >>> a = pairwise2.align.globalms("ACGT", "ACG", match=2, mismatch=-1, open=-.5,
jpayne@69 213 ... extend=-.1)
jpayne@69 214
jpayne@69 215 - Depending on the penalties, a gap in one sequence may be followed by a gap in
jpayne@69 216 the other sequence.If you don't like this behaviour, increase the gap-open
jpayne@69 217 penalty:
jpayne@69 218
jpayne@69 219 >>> for a in pairwise2.align.globalms("A", "T", 5, -4, -1, -.1):
jpayne@69 220 ... print(format_alignment(*a))
jpayne@69 221 A-
jpayne@69 222 <BLANKLINE>
jpayne@69 223 -T
jpayne@69 224 Score=-2
jpayne@69 225 <BLANKLINE>
jpayne@69 226 >>> for a in pairwise2.align.globalms("A", "T", 5, -4, -3, -.1):
jpayne@69 227 ... print(format_alignment(*a))
jpayne@69 228 A
jpayne@69 229 .
jpayne@69 230 T
jpayne@69 231 Score=-4
jpayne@69 232 <BLANKLINE>
jpayne@69 233
jpayne@69 234 - The alignment function can also use known matrices already included in
jpayne@69 235 Biopython (in ``Bio.Align.substitution_matrices``):
jpayne@69 236
jpayne@69 237 >>> from Bio.Align import substitution_matrices
jpayne@69 238 >>> matrix = substitution_matrices.load("BLOSUM62")
jpayne@69 239 >>> for a in pairwise2.align.globaldx("KEVLA", "EVL", matrix):
jpayne@69 240 ... print(format_alignment(*a))
jpayne@69 241 KEVLA
jpayne@69 242 |||
jpayne@69 243 -EVL-
jpayne@69 244 Score=13
jpayne@69 245 <BLANKLINE>
jpayne@69 246
jpayne@69 247 - With the parameter ``c`` you can define your own match- and gap functions.
jpayne@69 248 E.g. to define an affine logarithmic gap function and using it:
jpayne@69 249
jpayne@69 250 >>> from math import log
jpayne@69 251 >>> def gap_function(x, y): # x is gap position in seq, y is gap length
jpayne@69 252 ... if y == 0: # No gap
jpayne@69 253 ... return 0
jpayne@69 254 ... elif y == 1: # Gap open penalty
jpayne@69 255 ... return -2
jpayne@69 256 ... return - (2 + y/4.0 + log(y)/2.0)
jpayne@69 257 ...
jpayne@69 258 >>> alignment = pairwise2.align.globalmc("ACCCCCGT", "ACG", 5, -4,
jpayne@69 259 ... gap_function, gap_function)
jpayne@69 260
jpayne@69 261 You can define different gap functions for each sequence.
jpayne@69 262 Self-defined match functions must take the two residues to be compared and
jpayne@69 263 return a score.
jpayne@69 264
jpayne@69 265 To see a description of the parameters for a function, please look at
jpayne@69 266 the docstring for the function via the help function, e.g.
jpayne@69 267 type ``help(pairwise2.align.localds)`` at the Python prompt.
jpayne@69 268
jpayne@69 269 """ # noqa: W291
jpayne@69 270
jpayne@69 271 import warnings
jpayne@69 272 from collections import namedtuple
jpayne@69 273
jpayne@69 274 from Bio import BiopythonWarning
jpayne@69 275 from Bio import BiopythonDeprecationWarning
jpayne@69 276 from Bio.Align import substitution_matrices
jpayne@69 277
jpayne@69 278 warnings.warn(
jpayne@69 279 "Bio.pairwise2 has been deprecated, and we intend to remove it in a "
jpayne@69 280 "future release of Biopython. As an alternative, please consider using "
jpayne@69 281 "Bio.Align.PairwiseAligner as a replacement, and contact the "
jpayne@69 282 "Biopython developers if you still need the Bio.pairwise2 module.",
jpayne@69 283 BiopythonDeprecationWarning,
jpayne@69 284 )
jpayne@69 285
jpayne@69 286
jpayne@69 287 MAX_ALIGNMENTS = 1000 # maximum alignments recovered in traceback
jpayne@69 288
jpayne@69 289 Alignment = namedtuple("Alignment", ("seqA, seqB, score, start, end"))
jpayne@69 290
jpayne@69 291
jpayne@69 292 class align:
jpayne@69 293 """Provide functions that do alignments.
jpayne@69 294
jpayne@69 295 Alignment functions are called as:
jpayne@69 296
jpayne@69 297 pairwise2.align.globalXX
jpayne@69 298
jpayne@69 299 or
jpayne@69 300
jpayne@69 301 pairwise2.align.localXX
jpayne@69 302
jpayne@69 303 Where XX is a 2 character code indicating the match/mismatch parameters
jpayne@69 304 (first character, either x, m, d or c) and the gap penalty parameters
jpayne@69 305 (second character, either x, s, d, or c).
jpayne@69 306
jpayne@69 307 For a detailed description read the main module's docstring (e.g.,
jpayne@69 308 type ``help(pairwise2)``).
jpayne@69 309 To see a description of the parameters for a function, please
jpayne@69 310 look at the docstring for the function, e.g. type
jpayne@69 311 ``help(pairwise2.align.localds)`` at the Python prompt.
jpayne@69 312 """
jpayne@69 313
jpayne@69 314 class alignment_function:
jpayne@69 315 """Callable class which impersonates an alignment function.
jpayne@69 316
jpayne@69 317 The constructor takes the name of the function. This class
jpayne@69 318 will decode the name of the function to figure out how to
jpayne@69 319 interpret the parameters.
jpayne@69 320 """
jpayne@69 321
jpayne@69 322 # match code -> tuple of (parameters, docstring)
jpayne@69 323 match2args = {
jpayne@69 324 "x": ([], ""),
jpayne@69 325 "m": (
jpayne@69 326 ["match", "mismatch"],
jpayne@69 327 "match is the score to given to identical characters.\n"
jpayne@69 328 "mismatch is the score given to non-identical ones.",
jpayne@69 329 ),
jpayne@69 330 "d": (
jpayne@69 331 ["match_dict"],
jpayne@69 332 "match_dict is a dictionary where the keys are tuples\n"
jpayne@69 333 "of pairs of characters and the values are the scores,\n"
jpayne@69 334 "e.g. ('A', 'C') : 2.5.",
jpayne@69 335 ),
jpayne@69 336 "c": (
jpayne@69 337 ["match_fn"],
jpayne@69 338 "match_fn is a callback function that takes two "
jpayne@69 339 "characters and returns the score between them.",
jpayne@69 340 ),
jpayne@69 341 }
jpayne@69 342 # penalty code -> tuple of (parameters, docstring)
jpayne@69 343 penalty2args = {
jpayne@69 344 "x": ([], ""),
jpayne@69 345 "s": (
jpayne@69 346 ["open", "extend"],
jpayne@69 347 "open and extend are the gap penalties when a gap is\n"
jpayne@69 348 "opened and extended. They should be negative.",
jpayne@69 349 ),
jpayne@69 350 "d": (
jpayne@69 351 ["openA", "extendA", "openB", "extendB"],
jpayne@69 352 "openA and extendA are the gap penalties for sequenceA,\n"
jpayne@69 353 "and openB and extendB for sequenceB. The penalties\n"
jpayne@69 354 "should be negative.",
jpayne@69 355 ),
jpayne@69 356 "c": (
jpayne@69 357 ["gap_A_fn", "gap_B_fn"],
jpayne@69 358 "gap_A_fn and gap_B_fn are callback functions that takes\n"
jpayne@69 359 "(1) the index where the gap is opened, and (2) the length\n"
jpayne@69 360 "of the gap. They should return a gap penalty.",
jpayne@69 361 ),
jpayne@69 362 }
jpayne@69 363
jpayne@69 364 def __init__(self, name):
jpayne@69 365 """Check to make sure the name of the function is reasonable."""
jpayne@69 366 if name.startswith("global"):
jpayne@69 367 if len(name) != 8:
jpayne@69 368 raise AttributeError("function should be globalXX")
jpayne@69 369 elif name.startswith("local"):
jpayne@69 370 if len(name) != 7:
jpayne@69 371 raise AttributeError("function should be localXX")
jpayne@69 372 else:
jpayne@69 373 raise AttributeError(name)
jpayne@69 374 align_type, match_type, penalty_type = name[:-2], name[-2], name[-1]
jpayne@69 375 try:
jpayne@69 376 match_args, match_doc = self.match2args[match_type]
jpayne@69 377 except KeyError:
jpayne@69 378 raise AttributeError(f"unknown match type {match_type!r}")
jpayne@69 379 try:
jpayne@69 380 penalty_args, penalty_doc = self.penalty2args[penalty_type]
jpayne@69 381 except KeyError:
jpayne@69 382 raise AttributeError(f"unknown penalty type {penalty_type!r}")
jpayne@69 383
jpayne@69 384 # Now get the names of the parameters to this function.
jpayne@69 385 param_names = ["sequenceA", "sequenceB"]
jpayne@69 386 param_names.extend(match_args)
jpayne@69 387 param_names.extend(penalty_args)
jpayne@69 388 self.function_name = name
jpayne@69 389 self.align_type = align_type
jpayne@69 390 self.param_names = param_names
jpayne@69 391
jpayne@69 392 self.__name__ = self.function_name
jpayne@69 393 # Set the doc string.
jpayne@69 394 doc = f"{self.__name__}({', '.join(self.param_names)}) -> alignments\n"
jpayne@69 395 doc += """\
jpayne@69 396 \nThe following parameters can also be used with optional
jpayne@69 397 keywords of the same name.\n\n
jpayne@69 398 sequenceA and sequenceB must be of the same type, either
jpayne@69 399 strings, lists or Biopython sequence objects.\n
jpayne@69 400 """
jpayne@69 401 if match_doc:
jpayne@69 402 doc += f"\n{match_doc}\n"
jpayne@69 403 if penalty_doc:
jpayne@69 404 doc += f"\n{penalty_doc}\n"
jpayne@69 405 doc += """\
jpayne@69 406 \nalignments is a list of named tuples (seqA, seqB, score,
jpayne@69 407 begin, end). seqA and seqB are strings showing the alignment
jpayne@69 408 between the sequences. score is the score of the alignment.
jpayne@69 409 begin and end are indexes of seqA and seqB that indicate
jpayne@69 410 where the alignment occurs.
jpayne@69 411 """
jpayne@69 412 self.__doc__ = doc
jpayne@69 413
jpayne@69 414 def decode(self, *args, **keywds):
jpayne@69 415 """Decode the arguments for the _align function.
jpayne@69 416
jpayne@69 417 keywds will get passed to it, so translate the arguments
jpayne@69 418 to this function into forms appropriate for _align.
jpayne@69 419 """
jpayne@69 420 keywds = keywds.copy()
jpayne@69 421
jpayne@69 422 # Replace possible "keywords" with arguments:
jpayne@69 423 args += (len(self.param_names) - len(args)) * (None,)
jpayne@69 424 for key in keywds.copy():
jpayne@69 425 if key in self.param_names:
jpayne@69 426 _index = self.param_names.index(key)
jpayne@69 427 args = args[:_index] + (keywds[key],) + args[_index:]
jpayne@69 428 del keywds[key]
jpayne@69 429 args = tuple(arg for arg in args if arg is not None)
jpayne@69 430
jpayne@69 431 if len(args) != len(self.param_names):
jpayne@69 432 raise TypeError(
jpayne@69 433 "%s takes exactly %d argument (%d given)"
jpayne@69 434 % (self.function_name, len(self.param_names), len(args))
jpayne@69 435 )
jpayne@69 436
jpayne@69 437 i = 0
jpayne@69 438 while i < len(self.param_names):
jpayne@69 439 if self.param_names[i] in [
jpayne@69 440 "sequenceA",
jpayne@69 441 "sequenceB",
jpayne@69 442 "gap_A_fn",
jpayne@69 443 "gap_B_fn",
jpayne@69 444 "match_fn",
jpayne@69 445 ]:
jpayne@69 446 keywds[self.param_names[i]] = args[i]
jpayne@69 447 i += 1
jpayne@69 448 elif self.param_names[i] == "match":
jpayne@69 449 assert self.param_names[i + 1] == "mismatch"
jpayne@69 450 match, mismatch = args[i], args[i + 1]
jpayne@69 451 keywds["match_fn"] = identity_match(match, mismatch)
jpayne@69 452 i += 2
jpayne@69 453 elif self.param_names[i] == "match_dict":
jpayne@69 454 keywds["match_fn"] = dictionary_match(args[i])
jpayne@69 455 i += 1
jpayne@69 456 elif self.param_names[i] == "open":
jpayne@69 457 assert self.param_names[i + 1] == "extend"
jpayne@69 458 open, extend = args[i], args[i + 1]
jpayne@69 459 pe = keywds.get("penalize_extend_when_opening", 0)
jpayne@69 460 keywds["gap_A_fn"] = affine_penalty(open, extend, pe)
jpayne@69 461 keywds["gap_B_fn"] = affine_penalty(open, extend, pe)
jpayne@69 462 i += 2
jpayne@69 463 elif self.param_names[i] == "openA":
jpayne@69 464 assert self.param_names[i + 3] == "extendB"
jpayne@69 465 openA, extendA, openB, extendB = args[i : i + 4]
jpayne@69 466 pe = keywds.get("penalize_extend_when_opening", 0)
jpayne@69 467 keywds["gap_A_fn"] = affine_penalty(openA, extendA, pe)
jpayne@69 468 keywds["gap_B_fn"] = affine_penalty(openB, extendB, pe)
jpayne@69 469 i += 4
jpayne@69 470 else:
jpayne@69 471 raise ValueError(f"unknown parameter {self.param_names[i]!r}")
jpayne@69 472
jpayne@69 473 # Here are the default parameters for _align. Assign
jpayne@69 474 # these to keywds, unless already specified.
jpayne@69 475 pe = keywds.get("penalize_extend_when_opening", 0)
jpayne@69 476 default_params = [
jpayne@69 477 ("match_fn", identity_match(1, 0)),
jpayne@69 478 ("gap_A_fn", affine_penalty(0, 0, pe)),
jpayne@69 479 ("gap_B_fn", affine_penalty(0, 0, pe)),
jpayne@69 480 ("penalize_extend_when_opening", 0),
jpayne@69 481 ("penalize_end_gaps", self.align_type == "global"),
jpayne@69 482 ("align_globally", self.align_type == "global"),
jpayne@69 483 ("gap_char", "-"),
jpayne@69 484 ("force_generic", 0),
jpayne@69 485 ("score_only", 0),
jpayne@69 486 ("one_alignment_only", 0),
jpayne@69 487 ]
jpayne@69 488 for name, default in default_params:
jpayne@69 489 keywds[name] = keywds.get(name, default)
jpayne@69 490 value = keywds["penalize_end_gaps"]
jpayne@69 491 try:
jpayne@69 492 n = len(value)
jpayne@69 493 except TypeError:
jpayne@69 494 keywds["penalize_end_gaps"] = tuple([value] * 2)
jpayne@69 495 else:
jpayne@69 496 assert n == 2
jpayne@69 497 return keywds
jpayne@69 498
jpayne@69 499 def __call__(self, *args, **keywds):
jpayne@69 500 """Call the alignment instance already created."""
jpayne@69 501 keywds = self.decode(*args, **keywds)
jpayne@69 502 return _align(**keywds)
jpayne@69 503
jpayne@69 504 def __getattr__(self, attr):
jpayne@69 505 """Call alignment_function() to check and decode the attributes."""
jpayne@69 506 # The following 'magic' is needed to rewrite the class docstring
jpayne@69 507 # dynamically:
jpayne@69 508 wrapper = self.alignment_function(attr)
jpayne@69 509 wrapper_type = type(wrapper)
jpayne@69 510 wrapper_dict = wrapper_type.__dict__.copy()
jpayne@69 511 wrapper_dict["__doc__"] = wrapper.__doc__
jpayne@69 512 new_alignment_function = type("alignment_function", (object,), wrapper_dict)
jpayne@69 513
jpayne@69 514 return new_alignment_function(attr)
jpayne@69 515
jpayne@69 516
jpayne@69 517 align = align()
jpayne@69 518
jpayne@69 519
jpayne@69 520 def _align(
jpayne@69 521 sequenceA,
jpayne@69 522 sequenceB,
jpayne@69 523 match_fn,
jpayne@69 524 gap_A_fn,
jpayne@69 525 gap_B_fn,
jpayne@69 526 penalize_extend_when_opening,
jpayne@69 527 penalize_end_gaps,
jpayne@69 528 align_globally,
jpayne@69 529 gap_char,
jpayne@69 530 force_generic,
jpayne@69 531 score_only,
jpayne@69 532 one_alignment_only,
jpayne@69 533 ):
jpayne@69 534 """Return optimal alignments between two sequences (PRIVATE).
jpayne@69 535
jpayne@69 536 This method either returns a list of optimal alignments (with the same
jpayne@69 537 score) or just the optimal score.
jpayne@69 538 """
jpayne@69 539 if not sequenceA or not sequenceB:
jpayne@69 540 return []
jpayne@69 541 try:
jpayne@69 542 sequenceA + gap_char
jpayne@69 543 sequenceB + gap_char
jpayne@69 544 except TypeError:
jpayne@69 545 raise TypeError(
jpayne@69 546 "both sequences must be of the same type, either "
jpayne@69 547 "string/sequence object or list. Gap character must "
jpayne@69 548 "fit the sequence type (string or list)"
jpayne@69 549 )
jpayne@69 550
jpayne@69 551 if not isinstance(sequenceA, list):
jpayne@69 552 sequenceA = str(sequenceA)
jpayne@69 553 if not isinstance(sequenceB, list):
jpayne@69 554 sequenceB = str(sequenceB)
jpayne@69 555 if not align_globally and (penalize_end_gaps[0] or penalize_end_gaps[1]):
jpayne@69 556 warnings.warn(
jpayne@69 557 '"penalize_end_gaps" should not be used in local '
jpayne@69 558 "alignments. The resulting score may be wrong.",
jpayne@69 559 BiopythonWarning,
jpayne@69 560 )
jpayne@69 561
jpayne@69 562 if (
jpayne@69 563 (not force_generic)
jpayne@69 564 and isinstance(gap_A_fn, affine_penalty)
jpayne@69 565 and isinstance(gap_B_fn, affine_penalty)
jpayne@69 566 ):
jpayne@69 567 open_A, extend_A = gap_A_fn.open, gap_A_fn.extend
jpayne@69 568 open_B, extend_B = gap_B_fn.open, gap_B_fn.extend
jpayne@69 569 matrices = _make_score_matrix_fast(
jpayne@69 570 sequenceA,
jpayne@69 571 sequenceB,
jpayne@69 572 match_fn,
jpayne@69 573 open_A,
jpayne@69 574 extend_A,
jpayne@69 575 open_B,
jpayne@69 576 extend_B,
jpayne@69 577 penalize_extend_when_opening,
jpayne@69 578 penalize_end_gaps,
jpayne@69 579 align_globally,
jpayne@69 580 score_only,
jpayne@69 581 )
jpayne@69 582 else:
jpayne@69 583 matrices = _make_score_matrix_generic(
jpayne@69 584 sequenceA,
jpayne@69 585 sequenceB,
jpayne@69 586 match_fn,
jpayne@69 587 gap_A_fn,
jpayne@69 588 gap_B_fn,
jpayne@69 589 penalize_end_gaps,
jpayne@69 590 align_globally,
jpayne@69 591 score_only,
jpayne@69 592 )
jpayne@69 593
jpayne@69 594 score_matrix, trace_matrix, best_score = matrices
jpayne@69 595
jpayne@69 596 # print("SCORE %s" % print_matrix(score_matrix))
jpayne@69 597 # print("TRACEBACK %s" % print_matrix(trace_matrix))
jpayne@69 598
jpayne@69 599 # If they only want the score, then return it.
jpayne@69 600 if score_only:
jpayne@69 601 return best_score
jpayne@69 602
jpayne@69 603 starts = _find_start(score_matrix, best_score, align_globally)
jpayne@69 604
jpayne@69 605 # Recover the alignments and return them.
jpayne@69 606 alignments = _recover_alignments(
jpayne@69 607 sequenceA,
jpayne@69 608 sequenceB,
jpayne@69 609 starts,
jpayne@69 610 best_score,
jpayne@69 611 score_matrix,
jpayne@69 612 trace_matrix,
jpayne@69 613 align_globally,
jpayne@69 614 gap_char,
jpayne@69 615 one_alignment_only,
jpayne@69 616 gap_A_fn,
jpayne@69 617 gap_B_fn,
jpayne@69 618 )
jpayne@69 619 if not alignments:
jpayne@69 620 # This may happen, see recover_alignments for explanation
jpayne@69 621 score_matrix, trace_matrix = _reverse_matrices(score_matrix, trace_matrix)
jpayne@69 622 starts = [(z, (y, x)) for z, (x, y) in starts]
jpayne@69 623 alignments = _recover_alignments(
jpayne@69 624 sequenceB,
jpayne@69 625 sequenceA,
jpayne@69 626 starts,
jpayne@69 627 best_score,
jpayne@69 628 score_matrix,
jpayne@69 629 trace_matrix,
jpayne@69 630 align_globally,
jpayne@69 631 gap_char,
jpayne@69 632 one_alignment_only,
jpayne@69 633 gap_B_fn,
jpayne@69 634 gap_A_fn,
jpayne@69 635 reverse=True,
jpayne@69 636 )
jpayne@69 637 return alignments
jpayne@69 638
jpayne@69 639
jpayne@69 640 def _make_score_matrix_generic(
jpayne@69 641 sequenceA,
jpayne@69 642 sequenceB,
jpayne@69 643 match_fn,
jpayne@69 644 gap_A_fn,
jpayne@69 645 gap_B_fn,
jpayne@69 646 penalize_end_gaps,
jpayne@69 647 align_globally,
jpayne@69 648 score_only,
jpayne@69 649 ):
jpayne@69 650 """Generate a score and traceback matrix (PRIVATE).
jpayne@69 651
jpayne@69 652 This implementation according to Needleman-Wunsch allows the usage of
jpayne@69 653 general gap functions and is rather slow. It is automatically called if
jpayne@69 654 you define your own gap functions. You can force the usage of this method
jpayne@69 655 with ``force_generic=True``.
jpayne@69 656 """
jpayne@69 657 local_max_score = 0
jpayne@69 658 # Create the score and traceback matrices. These should be in the
jpayne@69 659 # shape:
jpayne@69 660 # sequenceA (down) x sequenceB (across)
jpayne@69 661 lenA, lenB = len(sequenceA), len(sequenceB)
jpayne@69 662 score_matrix, trace_matrix = [], []
jpayne@69 663 for i in range(lenA + 1):
jpayne@69 664 score_matrix.append([None] * (lenB + 1))
jpayne@69 665 if not score_only:
jpayne@69 666 trace_matrix.append([None] * (lenB + 1))
jpayne@69 667
jpayne@69 668 # Initialize first row and column with gap scores. This is like opening up
jpayne@69 669 # i gaps at the beginning of sequence A or B.
jpayne@69 670 for i in range(lenA + 1):
jpayne@69 671 if penalize_end_gaps[1]: # [1]:gap in sequence B
jpayne@69 672 score = gap_B_fn(0, i)
jpayne@69 673 else:
jpayne@69 674 score = 0.0
jpayne@69 675 score_matrix[i][0] = score
jpayne@69 676 for i in range(lenB + 1):
jpayne@69 677 if penalize_end_gaps[0]: # [0]:gap in sequence A
jpayne@69 678 score = gap_A_fn(0, i)
jpayne@69 679 else:
jpayne@69 680 score = 0.0
jpayne@69 681 score_matrix[0][i] = score
jpayne@69 682
jpayne@69 683 # Fill in the score matrix. Each position in the matrix
jpayne@69 684 # represents an alignment between a character from sequence A to
jpayne@69 685 # one in sequence B. As I iterate through the matrix, find the
jpayne@69 686 # alignment by choose the best of:
jpayne@69 687 # 1) extending a previous alignment without gaps
jpayne@69 688 # 2) adding a gap in sequenceA
jpayne@69 689 # 3) adding a gap in sequenceB
jpayne@69 690 for row in range(1, lenA + 1):
jpayne@69 691 for col in range(1, lenB + 1):
jpayne@69 692 # First, calculate the score that would occur by extending
jpayne@69 693 # the alignment without gaps.
jpayne@69 694 # fmt: off
jpayne@69 695 nogap_score = (
jpayne@69 696 score_matrix[row - 1][col - 1]
jpayne@69 697 + match_fn(sequenceA[row - 1], sequenceB[col - 1])
jpayne@69 698 )
jpayne@69 699
jpayne@69 700 # fmt: on
jpayne@69 701 # Try to find a better score by opening gaps in sequenceA.
jpayne@69 702 # Do this by checking alignments from each column in the row.
jpayne@69 703 # Each column represents a different character to align from,
jpayne@69 704 # and thus a different length gap.
jpayne@69 705 # Although the gap function does not distinguish between opening
jpayne@69 706 # and extending a gap, we distinguish them for the backtrace.
jpayne@69 707 if not penalize_end_gaps[0] and row == lenA:
jpayne@69 708 row_open = score_matrix[row][col - 1]
jpayne@69 709 row_extend = max(score_matrix[row][x] for x in range(col))
jpayne@69 710 else:
jpayne@69 711 row_open = score_matrix[row][col - 1] + gap_A_fn(row, 1)
jpayne@69 712 row_extend = max(
jpayne@69 713 score_matrix[row][x] + gap_A_fn(row, col - x) for x in range(col)
jpayne@69 714 )
jpayne@69 715
jpayne@69 716 # Try to find a better score by opening gaps in sequenceB.
jpayne@69 717 if not penalize_end_gaps[1] and col == lenB:
jpayne@69 718 col_open = score_matrix[row - 1][col]
jpayne@69 719 col_extend = max(score_matrix[x][col] for x in range(row))
jpayne@69 720 else:
jpayne@69 721 col_open = score_matrix[row - 1][col] + gap_B_fn(col, 1)
jpayne@69 722 col_extend = max(
jpayne@69 723 score_matrix[x][col] + gap_B_fn(col, row - x) for x in range(row)
jpayne@69 724 )
jpayne@69 725
jpayne@69 726 best_score = max(nogap_score, row_open, row_extend, col_open, col_extend)
jpayne@69 727 local_max_score = max(local_max_score, best_score)
jpayne@69 728 if not align_globally and best_score < 0:
jpayne@69 729 score_matrix[row][col] = 0.0
jpayne@69 730 else:
jpayne@69 731 score_matrix[row][col] = best_score
jpayne@69 732
jpayne@69 733 # The backtrace is encoded binary. See _make_score_matrix_fast
jpayne@69 734 # for details.
jpayne@69 735 if not score_only:
jpayne@69 736 trace_score = 0
jpayne@69 737 if rint(nogap_score) == rint(best_score):
jpayne@69 738 trace_score += 2
jpayne@69 739 if rint(row_open) == rint(best_score):
jpayne@69 740 trace_score += 1
jpayne@69 741 if rint(row_extend) == rint(best_score):
jpayne@69 742 trace_score += 8
jpayne@69 743 if rint(col_open) == rint(best_score):
jpayne@69 744 trace_score += 4
jpayne@69 745 if rint(col_extend) == rint(best_score):
jpayne@69 746 trace_score += 16
jpayne@69 747 trace_matrix[row][col] = trace_score
jpayne@69 748
jpayne@69 749 if not align_globally:
jpayne@69 750 best_score = local_max_score
jpayne@69 751
jpayne@69 752 return score_matrix, trace_matrix, best_score
jpayne@69 753
jpayne@69 754
jpayne@69 755 def _make_score_matrix_fast(
jpayne@69 756 sequenceA,
jpayne@69 757 sequenceB,
jpayne@69 758 match_fn,
jpayne@69 759 open_A,
jpayne@69 760 extend_A,
jpayne@69 761 open_B,
jpayne@69 762 extend_B,
jpayne@69 763 penalize_extend_when_opening,
jpayne@69 764 penalize_end_gaps,
jpayne@69 765 align_globally,
jpayne@69 766 score_only,
jpayne@69 767 ):
jpayne@69 768 """Generate a score and traceback matrix according to Gotoh (PRIVATE).
jpayne@69 769
jpayne@69 770 This is an implementation of the Needleman-Wunsch dynamic programming
jpayne@69 771 algorithm as modified by Gotoh, implementing affine gap penalties.
jpayne@69 772 In short, we have three matrices, holding scores for alignments ending
jpayne@69 773 in (1) a match/mismatch, (2) a gap in sequence A, and (3) a gap in
jpayne@69 774 sequence B, respectively. However, we can combine them in one matrix,
jpayne@69 775 which holds the best scores, and store only those values from the
jpayne@69 776 other matrices that are actually used for the next step of calculation.
jpayne@69 777 The traceback matrix holds the positions for backtracing the alignment.
jpayne@69 778 """
jpayne@69 779 first_A_gap = calc_affine_penalty(1, open_A, extend_A, penalize_extend_when_opening)
jpayne@69 780 first_B_gap = calc_affine_penalty(1, open_B, extend_B, penalize_extend_when_opening)
jpayne@69 781 local_max_score = 0
jpayne@69 782
jpayne@69 783 # Create the score and traceback matrices. These should be in the
jpayne@69 784 # shape:
jpayne@69 785 # sequenceA (down) x sequenceB (across)
jpayne@69 786 lenA, lenB = len(sequenceA), len(sequenceB)
jpayne@69 787 score_matrix, trace_matrix = [], []
jpayne@69 788 for i in range(lenA + 1):
jpayne@69 789 score_matrix.append([None] * (lenB + 1))
jpayne@69 790 if not score_only:
jpayne@69 791 trace_matrix.append([None] * (lenB + 1))
jpayne@69 792
jpayne@69 793 # Initialize first row and column with gap scores. This is like opening up
jpayne@69 794 # i gaps at the beginning of sequence A or B.
jpayne@69 795 for i in range(lenA + 1):
jpayne@69 796 if penalize_end_gaps[1]: # [1]:gap in sequence B
jpayne@69 797 score = calc_affine_penalty(
jpayne@69 798 i, open_B, extend_B, penalize_extend_when_opening
jpayne@69 799 )
jpayne@69 800 else:
jpayne@69 801 score = 0
jpayne@69 802 score_matrix[i][0] = score
jpayne@69 803 for i in range(lenB + 1):
jpayne@69 804 if penalize_end_gaps[0]: # [0]:gap in sequence A
jpayne@69 805 score = calc_affine_penalty(
jpayne@69 806 i, open_A, extend_A, penalize_extend_when_opening
jpayne@69 807 )
jpayne@69 808 else:
jpayne@69 809 score = 0
jpayne@69 810 score_matrix[0][i] = score
jpayne@69 811
jpayne@69 812 # Now initialize the col 'matrix'. Actually this is only a one dimensional
jpayne@69 813 # list, since we only need the col scores from the last row.
jpayne@69 814 col_score = [0] # Best score, if actual alignment ends with gap in seqB
jpayne@69 815 for i in range(1, lenB + 1):
jpayne@69 816 col_score.append(
jpayne@69 817 calc_affine_penalty(i, 2 * open_B, extend_B, penalize_extend_when_opening)
jpayne@69 818 )
jpayne@69 819
jpayne@69 820 # The row 'matrix' is calculated on the fly. Here we only need the actual
jpayne@69 821 # score.
jpayne@69 822 # Now, filling up the score and traceback matrices:
jpayne@69 823 for row in range(1, lenA + 1):
jpayne@69 824 row_score = calc_affine_penalty(
jpayne@69 825 row, 2 * open_A, extend_A, penalize_extend_when_opening
jpayne@69 826 )
jpayne@69 827 for col in range(1, lenB + 1):
jpayne@69 828 # Calculate the score that would occur by extending the
jpayne@69 829 # alignment without gaps.
jpayne@69 830 # fmt: off
jpayne@69 831 nogap_score = (
jpayne@69 832 score_matrix[row - 1][col - 1]
jpayne@69 833 + match_fn(sequenceA[row - 1], sequenceB[col - 1])
jpayne@69 834 )
jpayne@69 835 # fmt: on
jpayne@69 836 # Check the score that would occur if there were a gap in
jpayne@69 837 # sequence A. This could come from opening a new gap or
jpayne@69 838 # extending an existing one.
jpayne@69 839 # A gap in sequence A can also be opened if it follows a gap in
jpayne@69 840 # sequence B: A-
jpayne@69 841 # -B
jpayne@69 842 if not penalize_end_gaps[0] and row == lenA:
jpayne@69 843 row_open = score_matrix[row][col - 1]
jpayne@69 844 row_extend = row_score
jpayne@69 845 else:
jpayne@69 846 row_open = score_matrix[row][col - 1] + first_A_gap
jpayne@69 847 row_extend = row_score + extend_A
jpayne@69 848 row_score = max(row_open, row_extend)
jpayne@69 849
jpayne@69 850 # The same for sequence B:
jpayne@69 851 if not penalize_end_gaps[1] and col == lenB:
jpayne@69 852 col_open = score_matrix[row - 1][col]
jpayne@69 853 col_extend = col_score[col]
jpayne@69 854 else:
jpayne@69 855 col_open = score_matrix[row - 1][col] + first_B_gap
jpayne@69 856 col_extend = col_score[col] + extend_B
jpayne@69 857 col_score[col] = max(col_open, col_extend)
jpayne@69 858
jpayne@69 859 best_score = max(nogap_score, col_score[col], row_score)
jpayne@69 860 local_max_score = max(local_max_score, best_score)
jpayne@69 861 if not align_globally and best_score < 0:
jpayne@69 862 score_matrix[row][col] = 0
jpayne@69 863 else:
jpayne@69 864 score_matrix[row][col] = best_score
jpayne@69 865
jpayne@69 866 # Now the trace_matrix. The edges of the backtrace are encoded
jpayne@69 867 # binary: 1 = open gap in seqA, 2 = match/mismatch of seqA and
jpayne@69 868 # seqB, 4 = open gap in seqB, 8 = extend gap in seqA, and
jpayne@69 869 # 16 = extend gap in seqB. This values can be summed up.
jpayne@69 870 # Thus, the trace score 7 means that the best score can either
jpayne@69 871 # come from opening a gap in seqA (=1), pairing two characters
jpayne@69 872 # of seqA and seqB (+2=3) or opening a gap in seqB (+4=7).
jpayne@69 873 # However, if we only want the score we don't care about the trace.
jpayne@69 874 if not score_only:
jpayne@69 875 row_score_rint = rint(row_score)
jpayne@69 876 col_score_rint = rint(col_score[col])
jpayne@69 877 row_trace_score = 0
jpayne@69 878 col_trace_score = 0
jpayne@69 879 if rint(row_open) == row_score_rint:
jpayne@69 880 row_trace_score += 1 # Open gap in seqA
jpayne@69 881 if rint(row_extend) == row_score_rint:
jpayne@69 882 row_trace_score += 8 # Extend gap in seqA
jpayne@69 883 if rint(col_open) == col_score_rint:
jpayne@69 884 col_trace_score += 4 # Open gap in seqB
jpayne@69 885 if rint(col_extend) == col_score_rint:
jpayne@69 886 col_trace_score += 16 # Extend gap in seqB
jpayne@69 887
jpayne@69 888 trace_score = 0
jpayne@69 889 best_score_rint = rint(best_score)
jpayne@69 890 if rint(nogap_score) == best_score_rint:
jpayne@69 891 trace_score += 2 # Align seqA with seqB
jpayne@69 892 if row_score_rint == best_score_rint:
jpayne@69 893 trace_score += row_trace_score
jpayne@69 894 if col_score_rint == best_score_rint:
jpayne@69 895 trace_score += col_trace_score
jpayne@69 896 trace_matrix[row][col] = trace_score
jpayne@69 897
jpayne@69 898 if not align_globally:
jpayne@69 899 best_score = local_max_score
jpayne@69 900
jpayne@69 901 return score_matrix, trace_matrix, best_score
jpayne@69 902
jpayne@69 903
jpayne@69 904 def _recover_alignments(
jpayne@69 905 sequenceA,
jpayne@69 906 sequenceB,
jpayne@69 907 starts,
jpayne@69 908 best_score,
jpayne@69 909 score_matrix,
jpayne@69 910 trace_matrix,
jpayne@69 911 align_globally,
jpayne@69 912 gap_char,
jpayne@69 913 one_alignment_only,
jpayne@69 914 gap_A_fn,
jpayne@69 915 gap_B_fn,
jpayne@69 916 reverse=False,
jpayne@69 917 ):
jpayne@69 918 """Do the backtracing and return a list of alignments (PRIVATE).
jpayne@69 919
jpayne@69 920 Recover the alignments by following the traceback matrix. This
jpayne@69 921 is a recursive procedure, but it's implemented here iteratively
jpayne@69 922 with a stack.
jpayne@69 923
jpayne@69 924 sequenceA and sequenceB may be sequences, including strings,
jpayne@69 925 lists, or list-like objects. In order to preserve the type of
jpayne@69 926 the object, we need to use slices on the sequences instead of
jpayne@69 927 indexes. For example, sequenceA[row] may return a type that's
jpayne@69 928 not compatible with sequenceA, e.g. if sequenceA is a list and
jpayne@69 929 sequenceA[row] is a string. Thus, avoid using indexes and use
jpayne@69 930 slices, e.g. sequenceA[row:row+1]. Assume that client-defined
jpayne@69 931 sequence classes preserve these semantics.
jpayne@69 932 """
jpayne@69 933 lenA, lenB = len(sequenceA), len(sequenceB)
jpayne@69 934 ali_seqA, ali_seqB = sequenceA[0:0], sequenceB[0:0]
jpayne@69 935 tracebacks = []
jpayne@69 936 in_process = []
jpayne@69 937
jpayne@69 938 for start in starts:
jpayne@69 939 score, (row, col) = start
jpayne@69 940 begin = 0
jpayne@69 941 if align_globally:
jpayne@69 942 end = None
jpayne@69 943 else:
jpayne@69 944 # If this start is a zero-extension: don't start here!
jpayne@69 945 if (score, (row - 1, col - 1)) in starts:
jpayne@69 946 continue
jpayne@69 947 # Local alignments should start with a positive score!
jpayne@69 948 if score <= 0:
jpayne@69 949 continue
jpayne@69 950 # Local alignments should not end with a gap!:
jpayne@69 951 trace = trace_matrix[row][col]
jpayne@69 952 if (trace - trace % 2) % 4 == 2: # Trace contains 'nogap', fine!
jpayne@69 953 trace_matrix[row][col] = 2
jpayne@69 954 # If not, don't start here!
jpayne@69 955 else:
jpayne@69 956 continue
jpayne@69 957 end = -max(lenA - row, lenB - col)
jpayne@69 958 if not end:
jpayne@69 959 end = None
jpayne@69 960 col_distance = lenB - col
jpayne@69 961 row_distance = lenA - row
jpayne@69 962
jpayne@69 963 # fmt: off
jpayne@69 964 ali_seqA = (
jpayne@69 965 (col_distance - row_distance) * gap_char
jpayne@69 966 + sequenceA[lenA - 1 : row - 1 : -1]
jpayne@69 967 )
jpayne@69 968 ali_seqB = (
jpayne@69 969 (row_distance - col_distance) * gap_char
jpayne@69 970 + sequenceB[lenB - 1 : col - 1 : -1]
jpayne@69 971 )
jpayne@69 972 # fmt: on
jpayne@69 973 in_process += [
jpayne@69 974 (ali_seqA, ali_seqB, end, row, col, False, trace_matrix[row][col])
jpayne@69 975 ]
jpayne@69 976 while in_process and len(tracebacks) < MAX_ALIGNMENTS:
jpayne@69 977 # Although we allow a gap in seqB to be followed by a gap in seqA,
jpayne@69 978 # we don't want to allow it the other way round, since this would
jpayne@69 979 # give redundant alignments of type: A- vs. -A
jpayne@69 980 # -B B-
jpayne@69 981 # Thus we need to keep track if a gap in seqA was opened (col_gap)
jpayne@69 982 # and stop the backtrace (dead_end) if a gap in seqB follows.
jpayne@69 983 #
jpayne@69 984 # Attention: This may fail, if the gap-penalties for both strands are
jpayne@69 985 # different. In this case the second alignment may be the only optimal
jpayne@69 986 # alignment. Thus it can happen that no alignment is returned. For
jpayne@69 987 # this case a workaround was implemented, which reverses the input and
jpayne@69 988 # the matrices (this happens in _reverse_matrices) and repeats the
jpayne@69 989 # backtrace. The variable 'reverse' keeps track of this.
jpayne@69 990 dead_end = False
jpayne@69 991 ali_seqA, ali_seqB, end, row, col, col_gap, trace = in_process.pop()
jpayne@69 992
jpayne@69 993 while (row > 0 or col > 0) and not dead_end:
jpayne@69 994 cache = (ali_seqA[:], ali_seqB[:], end, row, col, col_gap)
jpayne@69 995
jpayne@69 996 # If trace is empty we have reached at least one border of the
jpayne@69 997 # matrix or the end of a local alignment. Just add the rest of
jpayne@69 998 # the sequence(s) and fill with gaps if necessary.
jpayne@69 999 if not trace:
jpayne@69 1000 if col and col_gap:
jpayne@69 1001 dead_end = True
jpayne@69 1002 else:
jpayne@69 1003 ali_seqA, ali_seqB = _finish_backtrace(
jpayne@69 1004 sequenceA, sequenceB, ali_seqA, ali_seqB, row, col, gap_char
jpayne@69 1005 )
jpayne@69 1006 break
jpayne@69 1007 elif trace % 2 == 1: # = row open = open gap in seqA
jpayne@69 1008 trace -= 1
jpayne@69 1009 if col_gap:
jpayne@69 1010 dead_end = True
jpayne@69 1011 else:
jpayne@69 1012 col -= 1
jpayne@69 1013 ali_seqA += gap_char
jpayne@69 1014 ali_seqB += sequenceB[col : col + 1]
jpayne@69 1015 col_gap = False
jpayne@69 1016 elif trace % 4 == 2: # = match/mismatch of seqA with seqB
jpayne@69 1017 trace -= 2
jpayne@69 1018 row -= 1
jpayne@69 1019 col -= 1
jpayne@69 1020 ali_seqA += sequenceA[row : row + 1]
jpayne@69 1021 ali_seqB += sequenceB[col : col + 1]
jpayne@69 1022 col_gap = False
jpayne@69 1023 elif trace % 8 == 4: # = col open = open gap in seqB
jpayne@69 1024 trace -= 4
jpayne@69 1025 row -= 1
jpayne@69 1026 ali_seqA += sequenceA[row : row + 1]
jpayne@69 1027 ali_seqB += gap_char
jpayne@69 1028 col_gap = True
jpayne@69 1029 elif trace in (8, 24): # = row extend = extend gap in seqA
jpayne@69 1030 trace -= 8
jpayne@69 1031 if col_gap:
jpayne@69 1032 dead_end = True
jpayne@69 1033 else:
jpayne@69 1034 col_gap = False
jpayne@69 1035 # We need to find the starting point of the extended gap
jpayne@69 1036 x = _find_gap_open(
jpayne@69 1037 sequenceA,
jpayne@69 1038 sequenceB,
jpayne@69 1039 ali_seqA,
jpayne@69 1040 ali_seqB,
jpayne@69 1041 end,
jpayne@69 1042 row,
jpayne@69 1043 col,
jpayne@69 1044 col_gap,
jpayne@69 1045 gap_char,
jpayne@69 1046 score_matrix,
jpayne@69 1047 trace_matrix,
jpayne@69 1048 in_process,
jpayne@69 1049 gap_A_fn,
jpayne@69 1050 col,
jpayne@69 1051 row,
jpayne@69 1052 "col",
jpayne@69 1053 best_score,
jpayne@69 1054 align_globally,
jpayne@69 1055 )
jpayne@69 1056 ali_seqA, ali_seqB, row, col, in_process, dead_end = x
jpayne@69 1057 elif trace == 16: # = col extend = extend gap in seqB
jpayne@69 1058 trace -= 16
jpayne@69 1059 col_gap = True
jpayne@69 1060 x = _find_gap_open(
jpayne@69 1061 sequenceA,
jpayne@69 1062 sequenceB,
jpayne@69 1063 ali_seqA,
jpayne@69 1064 ali_seqB,
jpayne@69 1065 end,
jpayne@69 1066 row,
jpayne@69 1067 col,
jpayne@69 1068 col_gap,
jpayne@69 1069 gap_char,
jpayne@69 1070 score_matrix,
jpayne@69 1071 trace_matrix,
jpayne@69 1072 in_process,
jpayne@69 1073 gap_B_fn,
jpayne@69 1074 row,
jpayne@69 1075 col,
jpayne@69 1076 "row",
jpayne@69 1077 best_score,
jpayne@69 1078 align_globally,
jpayne@69 1079 )
jpayne@69 1080 ali_seqA, ali_seqB, row, col, in_process, dead_end = x
jpayne@69 1081
jpayne@69 1082 if trace: # There is another path to follow...
jpayne@69 1083 cache += (trace,)
jpayne@69 1084 in_process.append(cache)
jpayne@69 1085 trace = trace_matrix[row][col]
jpayne@69 1086 if not align_globally:
jpayne@69 1087 if score_matrix[row][col] == best_score:
jpayne@69 1088 # We have gone through a 'zero-score' extension, discard it
jpayne@69 1089 dead_end = True
jpayne@69 1090 elif score_matrix[row][col] <= 0:
jpayne@69 1091 # We have reached the end of the backtrace
jpayne@69 1092 begin = max(row, col)
jpayne@69 1093 trace = 0
jpayne@69 1094 if not dead_end:
jpayne@69 1095 if not reverse:
jpayne@69 1096 tracebacks.append((ali_seqA[::-1], ali_seqB[::-1], score, begin, end))
jpayne@69 1097 else:
jpayne@69 1098 tracebacks.append((ali_seqB[::-1], ali_seqA[::-1], score, begin, end))
jpayne@69 1099 if one_alignment_only:
jpayne@69 1100 break
jpayne@69 1101 return _clean_alignments(tracebacks)
jpayne@69 1102
jpayne@69 1103
jpayne@69 1104 def _find_start(score_matrix, best_score, align_globally):
jpayne@69 1105 """Return a list of starting points (score, (row, col)) (PRIVATE).
jpayne@69 1106
jpayne@69 1107 Indicating every possible place to start the tracebacks.
jpayne@69 1108 """
jpayne@69 1109 nrows, ncols = len(score_matrix), len(score_matrix[0])
jpayne@69 1110 # In this implementation of the global algorithm, the start will always be
jpayne@69 1111 # the bottom right corner of the matrix.
jpayne@69 1112 if align_globally:
jpayne@69 1113 starts = [(best_score, (nrows - 1, ncols - 1))]
jpayne@69 1114 else:
jpayne@69 1115 # For local alignments, there may be many different start points.
jpayne@69 1116 starts = []
jpayne@69 1117 tolerance = 0 # XXX do anything with this?
jpayne@69 1118 # Now find all the positions within some tolerance of the best
jpayne@69 1119 # score.
jpayne@69 1120 for row in range(nrows):
jpayne@69 1121 for col in range(ncols):
jpayne@69 1122 score = score_matrix[row][col]
jpayne@69 1123 if rint(abs(score - best_score)) <= rint(tolerance):
jpayne@69 1124 starts.append((score, (row, col)))
jpayne@69 1125 return starts
jpayne@69 1126
jpayne@69 1127
jpayne@69 1128 def _reverse_matrices(score_matrix, trace_matrix):
jpayne@69 1129 """Reverse score and trace matrices (PRIVATE)."""
jpayne@69 1130 reverse_score_matrix = []
jpayne@69 1131 reverse_trace_matrix = []
jpayne@69 1132 # fmt: off
jpayne@69 1133 reverse_trace = {
jpayne@69 1134 1: 4, 2: 2, 3: 6, 4: 1, 5: 5, 6: 3, 7: 7, 8: 16, 9: 20, 10: 18, 11: 22, 12: 17,
jpayne@69 1135 13: 21, 14: 19, 15: 23, 16: 8, 17: 12, 18: 10, 19: 14, 20: 9, 21: 13, 22: 11,
jpayne@69 1136 23: 15, 24: 24, 25: 28, 26: 26, 27: 30, 28: 25, 29: 29, 30: 27, 31: 31,
jpayne@69 1137 None: None,
jpayne@69 1138 }
jpayne@69 1139 # fmt: on
jpayne@69 1140 for col in range(len(score_matrix[0])):
jpayne@69 1141 new_score_row = []
jpayne@69 1142 new_trace_row = []
jpayne@69 1143 for row in range(len(score_matrix)):
jpayne@69 1144 new_score_row.append(score_matrix[row][col])
jpayne@69 1145 new_trace_row.append(reverse_trace[trace_matrix[row][col]])
jpayne@69 1146 reverse_score_matrix.append(new_score_row)
jpayne@69 1147 reverse_trace_matrix.append(new_trace_row)
jpayne@69 1148 return reverse_score_matrix, reverse_trace_matrix
jpayne@69 1149
jpayne@69 1150
jpayne@69 1151 def _clean_alignments(alignments):
jpayne@69 1152 """Take a list of alignments and return a cleaned version (PRIVATE).
jpayne@69 1153
jpayne@69 1154 Remove duplicates, make sure begin and end are set correctly, remove
jpayne@69 1155 empty alignments.
jpayne@69 1156 """
jpayne@69 1157 unique_alignments = []
jpayne@69 1158 for align in alignments:
jpayne@69 1159 if align not in unique_alignments:
jpayne@69 1160 unique_alignments.append(align)
jpayne@69 1161 i = 0
jpayne@69 1162 while i < len(unique_alignments):
jpayne@69 1163 seqA, seqB, score, begin, end = unique_alignments[i]
jpayne@69 1164 # Make sure end is set reasonably.
jpayne@69 1165 if end is None: # global alignment
jpayne@69 1166 end = len(seqA)
jpayne@69 1167 elif end < 0:
jpayne@69 1168 end = end + len(seqA)
jpayne@69 1169 # If there's no alignment here, get rid of it.
jpayne@69 1170 if begin >= end:
jpayne@69 1171 del unique_alignments[i]
jpayne@69 1172 continue
jpayne@69 1173 unique_alignments[i] = Alignment(seqA, seqB, score, begin, end)
jpayne@69 1174 i += 1
jpayne@69 1175 return unique_alignments
jpayne@69 1176
jpayne@69 1177
jpayne@69 1178 def _finish_backtrace(sequenceA, sequenceB, ali_seqA, ali_seqB, row, col, gap_char):
jpayne@69 1179 """Add remaining sequences and fill with gaps if necessary (PRIVATE)."""
jpayne@69 1180 if row:
jpayne@69 1181 ali_seqA += sequenceA[row - 1 :: -1]
jpayne@69 1182 if col:
jpayne@69 1183 ali_seqB += sequenceB[col - 1 :: -1]
jpayne@69 1184 if row > col:
jpayne@69 1185 ali_seqB += gap_char * (len(ali_seqA) - len(ali_seqB))
jpayne@69 1186 elif col > row:
jpayne@69 1187 ali_seqA += gap_char * (len(ali_seqB) - len(ali_seqA))
jpayne@69 1188 return ali_seqA, ali_seqB
jpayne@69 1189
jpayne@69 1190
jpayne@69 1191 def _find_gap_open(
jpayne@69 1192 sequenceA,
jpayne@69 1193 sequenceB,
jpayne@69 1194 ali_seqA,
jpayne@69 1195 ali_seqB,
jpayne@69 1196 end,
jpayne@69 1197 row,
jpayne@69 1198 col,
jpayne@69 1199 col_gap,
jpayne@69 1200 gap_char,
jpayne@69 1201 score_matrix,
jpayne@69 1202 trace_matrix,
jpayne@69 1203 in_process,
jpayne@69 1204 gap_fn,
jpayne@69 1205 target,
jpayne@69 1206 index,
jpayne@69 1207 direction,
jpayne@69 1208 best_score,
jpayne@69 1209 align_globally,
jpayne@69 1210 ):
jpayne@69 1211 """Find the starting point(s) of the extended gap (PRIVATE)."""
jpayne@69 1212 dead_end = False
jpayne@69 1213 target_score = score_matrix[row][col]
jpayne@69 1214 for n in range(target):
jpayne@69 1215 if direction == "col":
jpayne@69 1216 col -= 1
jpayne@69 1217 ali_seqA += gap_char
jpayne@69 1218 ali_seqB += sequenceB[col : col + 1]
jpayne@69 1219 else:
jpayne@69 1220 row -= 1
jpayne@69 1221 ali_seqA += sequenceA[row : row + 1]
jpayne@69 1222 ali_seqB += gap_char
jpayne@69 1223 actual_score = score_matrix[row][col] + gap_fn(index, n + 1)
jpayne@69 1224 if not align_globally and score_matrix[row][col] == best_score:
jpayne@69 1225 # We have run through a 'zero-score' extension and discard it
jpayne@69 1226 dead_end = True
jpayne@69 1227 break
jpayne@69 1228 if rint(actual_score) == rint(target_score) and n > 0:
jpayne@69 1229 if not trace_matrix[row][col]:
jpayne@69 1230 break
jpayne@69 1231 else:
jpayne@69 1232 in_process.append(
jpayne@69 1233 (
jpayne@69 1234 ali_seqA[:],
jpayne@69 1235 ali_seqB[:],
jpayne@69 1236 end,
jpayne@69 1237 row,
jpayne@69 1238 col,
jpayne@69 1239 col_gap,
jpayne@69 1240 trace_matrix[row][col],
jpayne@69 1241 )
jpayne@69 1242 )
jpayne@69 1243 if not trace_matrix[row][col]:
jpayne@69 1244 dead_end = True
jpayne@69 1245 return ali_seqA, ali_seqB, row, col, in_process, dead_end
jpayne@69 1246
jpayne@69 1247
jpayne@69 1248 _PRECISION = 1000
jpayne@69 1249
jpayne@69 1250
jpayne@69 1251 def rint(x, precision=_PRECISION):
jpayne@69 1252 """Print number with declared precision."""
jpayne@69 1253 return int(x * precision + 0.5)
jpayne@69 1254
jpayne@69 1255
jpayne@69 1256 class identity_match:
jpayne@69 1257 """Create a match function for use in an alignment.
jpayne@69 1258
jpayne@69 1259 match and mismatch are the scores to give when two residues are equal
jpayne@69 1260 or unequal. By default, match is 1 and mismatch is 0.
jpayne@69 1261 """
jpayne@69 1262
jpayne@69 1263 def __init__(self, match=1, mismatch=0):
jpayne@69 1264 """Initialize the class."""
jpayne@69 1265 self.match = match
jpayne@69 1266 self.mismatch = mismatch
jpayne@69 1267
jpayne@69 1268 def __call__(self, charA, charB):
jpayne@69 1269 """Call a match function instance already created."""
jpayne@69 1270 if charA == charB:
jpayne@69 1271 return self.match
jpayne@69 1272 return self.mismatch
jpayne@69 1273
jpayne@69 1274
jpayne@69 1275 class dictionary_match:
jpayne@69 1276 """Create a match function for use in an alignment.
jpayne@69 1277
jpayne@69 1278 Attributes:
jpayne@69 1279 - score_dict - A dictionary where the keys are tuples (residue 1,
jpayne@69 1280 residue 2) and the values are the match scores between those residues.
jpayne@69 1281 - symmetric - A flag that indicates whether the scores are symmetric.
jpayne@69 1282
jpayne@69 1283 """
jpayne@69 1284
jpayne@69 1285 def __init__(self, score_dict, symmetric=1):
jpayne@69 1286 """Initialize the class."""
jpayne@69 1287 if isinstance(score_dict, substitution_matrices.Array):
jpayne@69 1288 score_dict = dict(score_dict) # Access to dict is much faster
jpayne@69 1289 self.score_dict = score_dict
jpayne@69 1290 self.symmetric = symmetric
jpayne@69 1291
jpayne@69 1292 def __call__(self, charA, charB):
jpayne@69 1293 """Call a dictionary match instance already created."""
jpayne@69 1294 if self.symmetric and (charA, charB) not in self.score_dict:
jpayne@69 1295 # If the score dictionary is symmetric, then look up the
jpayne@69 1296 # score both ways.
jpayne@69 1297 charB, charA = charA, charB
jpayne@69 1298 return self.score_dict[(charA, charB)]
jpayne@69 1299
jpayne@69 1300
jpayne@69 1301 class affine_penalty:
jpayne@69 1302 """Create a gap function for use in an alignment."""
jpayne@69 1303
jpayne@69 1304 def __init__(self, open, extend, penalize_extend_when_opening=0):
jpayne@69 1305 """Initialize the class."""
jpayne@69 1306 if open > 0 or extend > 0:
jpayne@69 1307 raise ValueError("Gap penalties should be non-positive.")
jpayne@69 1308 if not penalize_extend_when_opening and (extend < open):
jpayne@69 1309 raise ValueError(
jpayne@69 1310 "Gap opening penalty should be higher than "
jpayne@69 1311 "gap extension penalty (or equal)"
jpayne@69 1312 )
jpayne@69 1313 self.open, self.extend = open, extend
jpayne@69 1314 self.penalize_extend_when_opening = penalize_extend_when_opening
jpayne@69 1315
jpayne@69 1316 def __call__(self, index, length):
jpayne@69 1317 """Call a gap function instance already created."""
jpayne@69 1318 return calc_affine_penalty(
jpayne@69 1319 length, self.open, self.extend, self.penalize_extend_when_opening
jpayne@69 1320 )
jpayne@69 1321
jpayne@69 1322
jpayne@69 1323 def calc_affine_penalty(length, open, extend, penalize_extend_when_opening):
jpayne@69 1324 """Calculate a penalty score for the gap function."""
jpayne@69 1325 if length <= 0:
jpayne@69 1326 return 0.0
jpayne@69 1327 penalty = open + extend * length
jpayne@69 1328 if not penalize_extend_when_opening:
jpayne@69 1329 penalty -= extend
jpayne@69 1330 return penalty
jpayne@69 1331
jpayne@69 1332
jpayne@69 1333 def print_matrix(matrix):
jpayne@69 1334 """Print out a matrix for debugging purposes."""
jpayne@69 1335 # Transpose the matrix and get the length of the values in each column.
jpayne@69 1336 matrixT = [[] for x in range(len(matrix[0]))]
jpayne@69 1337 for i in range(len(matrix)):
jpayne@69 1338 for j in range(len(matrix[i])):
jpayne@69 1339 matrixT[j].append(len(str(matrix[i][j])))
jpayne@69 1340 ndigits = [max(x) for x in matrixT]
jpayne@69 1341 for i in range(len(matrix)):
jpayne@69 1342 # Using string formatting trick to add leading spaces,
jpayne@69 1343 print(
jpayne@69 1344 " ".join("%*s " % (ndigits[j], matrix[i][j]) for j in range(len(matrix[i])))
jpayne@69 1345 )
jpayne@69 1346
jpayne@69 1347
jpayne@69 1348 def format_alignment(align1, align2, score, begin, end, full_sequences=False):
jpayne@69 1349 """Format the alignment prettily into a string.
jpayne@69 1350
jpayne@69 1351 IMPORTANT: Gap symbol must be "-" (or ['-'] for lists)!
jpayne@69 1352
jpayne@69 1353 Since Biopython 1.71 identical matches are shown with a pipe
jpayne@69 1354 character, mismatches as a dot, and gaps as a space.
jpayne@69 1355
jpayne@69 1356 Prior releases just used the pipe character to indicate the
jpayne@69 1357 aligned region (matches, mismatches and gaps).
jpayne@69 1358
jpayne@69 1359 Also, in local alignments, if the alignment does not include
jpayne@69 1360 the whole sequences, now only the aligned part is shown,
jpayne@69 1361 together with the start positions of the aligned subsequences.
jpayne@69 1362 The start positions are 1-based; so start position n is the
jpayne@69 1363 n-th base/amino acid in the *un-aligned* sequence.
jpayne@69 1364
jpayne@69 1365 NOTE: This is different to the alignment's begin/end values,
jpayne@69 1366 which give the Python indices (0-based) of the bases/amino acids
jpayne@69 1367 in the *aligned* sequences.
jpayne@69 1368
jpayne@69 1369 If you want to restore the 'historic' behaviour, that means
jpayne@69 1370 displaying the whole sequences (including the non-aligned parts),
jpayne@69 1371 use ``full_sequences=True``. In this case, the non-aligned leading
jpayne@69 1372 and trailing parts are also indicated by spaces in the match-line.
jpayne@69 1373 """
jpayne@69 1374 align_begin = begin
jpayne@69 1375 align_end = end
jpayne@69 1376 start1 = start2 = ""
jpayne@69 1377 start_m = begin # Begin of match line (how many spaces to include)
jpayne@69 1378 # For local alignments:
jpayne@69 1379 if not full_sequences and (begin != 0 or end != len(align1)):
jpayne@69 1380 # Calculate the actual start positions in the un-aligned sequences
jpayne@69 1381 # This will only work if the gap symbol is '-' or ['-']!
jpayne@69 1382 start1 = str(len(align1[:begin]) - align1[:begin].count("-") + 1) + " "
jpayne@69 1383 start2 = str(len(align2[:begin]) - align2[:begin].count("-") + 1) + " "
jpayne@69 1384 start_m = max(len(start1), len(start2))
jpayne@69 1385 elif full_sequences:
jpayne@69 1386 start_m = 0
jpayne@69 1387 begin = 0
jpayne@69 1388 end = len(align1)
jpayne@69 1389
jpayne@69 1390 if isinstance(align1, list):
jpayne@69 1391 # List elements will be separated by spaces, since they can be
jpayne@69 1392 # of different lengths
jpayne@69 1393 align1 = [a + " " for a in align1]
jpayne@69 1394 align2 = [a + " " for a in align2]
jpayne@69 1395
jpayne@69 1396 s1_line = ["{:>{width}}".format(start1, width=start_m)] # seq1 line
jpayne@69 1397 m_line = [" " * start_m] # match line
jpayne@69 1398 s2_line = ["{:>{width}}".format(start2, width=start_m)] # seq2 line
jpayne@69 1399
jpayne@69 1400 for n, (a, b) in enumerate(zip(align1[begin:end], align2[begin:end])):
jpayne@69 1401 # Since list elements can be of different length, we center them,
jpayne@69 1402 # using the maximum length of the two compared elements as width
jpayne@69 1403 m_len = max(len(a), len(b))
jpayne@69 1404 s1_line.append("{:^{width}}".format(a, width=m_len))
jpayne@69 1405 s2_line.append("{:^{width}}".format(b, width=m_len))
jpayne@69 1406 if full_sequences and (n < align_begin or n >= align_end):
jpayne@69 1407 m_line.append("{:^{width}}".format(" ", width=m_len)) # space
jpayne@69 1408 continue
jpayne@69 1409 if a == b:
jpayne@69 1410 m_line.append("{:^{width}}".format("|", width=m_len)) # match
jpayne@69 1411 elif a.strip() == "-" or b.strip() == "-":
jpayne@69 1412 m_line.append("{:^{width}}".format(" ", width=m_len)) # gap
jpayne@69 1413 else:
jpayne@69 1414 m_line.append("{:^{width}}".format(".", width=m_len)) # mismatch
jpayne@69 1415
jpayne@69 1416 s2_line.append(f"\n Score={score:g}\n")
jpayne@69 1417 return "\n".join(["".join(s1_line), "".join(m_line), "".join(s2_line)])
jpayne@69 1418
jpayne@69 1419
jpayne@69 1420 # Try and load C implementations of functions. If I can't,
jpayne@69 1421 # then throw a warning and use the pure Python implementations.
jpayne@69 1422 # The redefinition is deliberate, thus the no quality assurance
jpayne@69 1423 # flag for when using flake8.
jpayne@69 1424 # Before, we secure access to the pure Python functions (for testing purposes):
jpayne@69 1425
jpayne@69 1426 _python_make_score_matrix_fast = _make_score_matrix_fast
jpayne@69 1427 _python_rint = rint
jpayne@69 1428
jpayne@69 1429 try:
jpayne@69 1430 from .cpairwise2 import rint, _make_score_matrix_fast # noqa
jpayne@69 1431 except ImportError:
jpayne@69 1432 warnings.warn(
jpayne@69 1433 "Import of C module failed. Falling back to pure Python "
jpayne@69 1434 "implementation. This may be slooow...",
jpayne@69 1435 BiopythonWarning,
jpayne@69 1436 )
jpayne@69 1437
jpayne@69 1438 if __name__ == "__main__":
jpayne@69 1439 from Bio._utils import run_doctest
jpayne@69 1440
jpayne@69 1441 run_doctest()