Package Bio :: Package SubsMat
[hide private]
[frames] | no frames]

Source Code for Package Bio.SubsMat

  1  # Copyright 2000-2009 by Iddo Friedberg.  All rights reserved. 
  2  # This code is part of the Biopython distribution and governed by its 
  3  # license.  Please see the LICENSE file that should have been included 
  4  # as part of this package. 
  5  # 
  6  # Iddo Friedberg idoerg@cc.huji.ac.il 
  7   
  8  """Substitution matrices, log odds matrices, and operations on them. 
  9   
 10  General: 
 11  ------- 
 12   
 13  This module provides a class and a few routines for generating 
 14  substitution matrices, similar ot BLOSUM or PAM matrices, but based on 
 15  user-provided data. 
 16  The class used for these matrices is SeqMat 
 17   
 18  Matrices are implemented as a dictionary. Each index contains a 2-tuple, 
 19  which are the two residue/nucleotide types replaced. The value differs 
 20  according to the matrix's purpose: e.g in a log-odds frequency matrix, the 
 21  value would be log(Pij/(Pi*Pj)) where: 
 22  Pij: frequency of substitution of letter (residue/nucletide) i by j  
 23  Pi, Pj: expected frequencies of i and j, respectively. 
 24   
 25  Usage: 
 26  ----- 
 27  The following section is layed out in the order by which most people wish 
 28  to generate a log-odds matrix. Of course, interim matrices can be 
 29  generated and investigated. Most people just want a log-odds matrix, 
 30  that's all. 
 31   
 32  Generating an Accepted Replacement Matrix: 
 33  ----------------------------------------- 
 34  Initially, you should generate an accepted replacement matrix (ARM) 
 35  from your data. The values in ARM are the _counted_ number of 
 36  replacements according to your data. The data could be a set of pairs 
 37  or multiple alignments. So for instance if Alanine was replaced by 
 38  Cysteine 10 times, and Cysteine by Alanine 12 times, the corresponding 
 39  ARM entries would be: 
 40  ['A','C']: 10, 
 41  ['C','A'] 12 
 42  As order doesn't matter, user can already provide only one entry: 
 43  ['A','C']: 22  
 44  A SeqMat instance may be initialized with either a full (first 
 45  method of counting: 10, 12) or half (the latter method, 22) matrices. A 
 46  Full protein alphabet matrix would be of the size 20x20 = 400. A Half 
 47  matrix of that alphabet would be 20x20/2 + 20/2 = 210. That is because 
 48  same-letter entries don't change. (The matrix diagonal). Given an 
 49  alphabet size of N: 
 50  Full matrix size:N*N 
 51  Half matrix size: N(N+1)/2 
 52    
 53  If you provide a full matrix, the constructore will create a half-matrix 
 54  automatically. 
 55  If you provide a half-matrix, make sure of a (low, high) sorted order in 
 56  the keys: there should only be  
 57  a ('A','C') not a ('C','A'). 
 58   
 59  Internal functions: 
 60   
 61  Generating the observed frequency matrix (OFM): 
 62  ---------------------------------------------- 
 63  Use: OFM = _build_obs_freq_mat(ARM) 
 64  The OFM is generated from the ARM, only instead of replacement counts, it 
 65  contains replacement frequencies. 
 66   
 67  Generating an expected frequency matrix (EFM): 
 68  --------------------------------------------- 
 69  Use: EFM = _build_exp_freq_mat(OFM,exp_freq_table) 
 70  exp_freq_table: should be a freqTableC instantiation. See freqTable.py for 
 71  detailed information. Briefly, the expected frequency table has the 
 72  frequencies of appearance for each member of the alphabet 
 73   
 74  Generating a substitution frequency matrix (SFM): 
 75  ------------------------------------------------ 
 76  Use: SFM = _build_subs_mat(OFM,EFM) 
 77  Accepts an OFM, EFM. Provides the division product of the corresponding 
 78  values.  
 79   
 80  Generating a log-odds matrix (LOM): 
 81  ---------------------------------- 
 82  Use: LOM=_build_log_odds_mat(SFM[,logbase=10,factor=10.0,roundit=1]) 
 83  Accepts an SFM. logbase: base of the logarithm used to generate the 
 84  log-odds values. factor: factor used to multiply the log-odds values. 
 85  roundit: default - true. Whether to round the values. 
 86  Each entry is generated by log(LOM[key])*factor 
 87  And rounded if required. 
 88   
 89  External: 
 90  --------- 
 91  In most cases, users will want to generate a log-odds matrix only, without 
 92  explicitly calling the OFM --> EFM --> SFM stages. The function 
 93  build_log_odds_matrix does that. User provides an ARM and an expected 
 94  frequency table. The function returns the log-odds matrix. 
 95   
 96  Methods for subtraction, addition and multiplication of matrices: 
 97  ----------------------------------------------------------------- 
 98  * Generation of an expected frequency table from an observed frequency 
 99    matrix. 
100  * Calculation of linear correlation coefficient between two matrices. 
101  * Calculation of relative entropy is now done using the 
102    _make_relative_entropy method and is stored in the member 
103    self.relative_entropy 
104  * Calculation of entropy is now done using the _make_entropy method and 
105    is stored in the member self.entropy. 
106  * Jensen-Shannon distance between the distributions from which the 
107    matrices are derived. This is a distance function based on the 
108    distribution's entropies. 
109  """ 
110   
111   
112  import re 
113  import sys 
114  import copy 
115  import math 
116  import warnings 
117   
118  # BioPython imports 
119  from Bio import Alphabet 
120  from Bio.SubsMat import FreqTable 
121   
122  log = math.log 
123  # Matrix types 
124  NOTYPE = 0 
125  ACCREP = 1 
126  OBSFREQ = 2 
127  SUBS = 3 
128  EXPFREQ = 4 
129  LO = 5 
130  EPSILON = 0.00000000000001 
131   
132   
133 -class SeqMat(dict):
134 """A Generic sequence matrix class 135 The key is a 2-tuple containing the letter indices of the matrix. Those 136 should be sorted in the tuple (low, high). Because each matrix is dealt 137 with as a half-matrix.""" 138
139 - def _alphabet_from_matrix(self):
140 ab_dict = {} 141 s = '' 142 for i in self.keys(): 143 ab_dict[i[0]] = 1 144 ab_dict[i[1]] = 1 145 letters_list = ab_dict.keys() 146 letters_list.sort() 147 for i in letters_list: 148 s = s + i 149 self.alphabet.letters = s
150
151 - def __init__(self,data=None, alphabet=None, 152 mat_type=None,mat_name='',build_later=0):
153 # User may supply: 154 # data: matrix itself 155 # mat_type: its type. See below (DEPRECATED) 156 # mat_name: its name. See below. 157 # alphabet: an instance of Bio.Alphabet, or a subclass. If not 158 # supplied, constructor builds its own from that matrix.""" 159 # build_later: skip the matrix size assertion. User will build the 160 # matrix after creating the instance. Constructor builds a half matrix 161 # filled with zeroes. 162 163 assert type(mat_name) == type('') 164 if mat_type==None: 165 mat_type==NOTYPE 166 else: 167 assert type(mat_type) == type(1) 168 warnings.warn("values for mat_type other than NOTYPE are deprecated; use the appropriate subclass of SeqMat instead", DeprecationWarning) 169 170 # "data" may be: 171 # 1) None --> then self.data is an empty dictionary 172 # 2) type({}) --> then self takes the items in data 173 # 3) An instance of SeqMat 174 # This whole creation-during-execution is done to avoid changing 175 # default values, the way Python does because default values are 176 # created when the function is defined, not when it is created. 177 if data: 178 try: 179 self.update(data) 180 except ValueError: 181 raise ValueError, "Failed to store data in a dictionary" 182 if alphabet == None: 183 alphabet = Alphabet.Alphabet() 184 assert Alphabet.generic_alphabet.contains(alphabet) 185 self.alphabet = alphabet 186 187 # If passed alphabet is empty, use the letters in the matrix itself 188 if not self.alphabet.letters: 189 self._alphabet_from_matrix() 190 # Assert matrix size: half or full 191 if not build_later: 192 N = len(self.alphabet.letters) 193 assert len(self) == N**2 or len(self) == N*(N+1)/2 194 self.ab_list = list(self.alphabet.letters) 195 self.ab_list.sort() 196 # type can be: ACCREP, OBSFREQ, SUBS, EXPFREQ, LO 197 self.mat_type = mat_type 198 # Names: a string like "BLOSUM62" or "PAM250" 199 self.mat_name = mat_name 200 if build_later: 201 self._init_zero() 202 else: 203 # Convert full to half if matrix is not already a log-odds matrix 204 if self.mat_type != LO: 205 self._full_to_half() 206 self._correct_matrix() 207 self.sum_letters = {} 208 self.relative_entropy = 0
209
210 - def _correct_matrix(self):
211 keylist = self.keys() 212 for key in keylist: 213 if key[0] > key[1]: 214 self[(key[1],key[0])] = self[key] 215 del self[key]
216
217 - def _full_to_half(self):
218 """ 219 Convert a full-matrix to a half-matrix 220 """ 221 # For instance: two entries ('A','C'):13 and ('C','A'):20 will be summed 222 # into ('A','C'): 33 and the index ('C','A') will be deleted 223 # alphabet.letters:('A','A') and ('C','C') will remain the same. 224 225 N = len(self.alphabet.letters) 226 # Do nothing if this is already a half-matrix 227 if len(self) == N*(N+1)/2: 228 return 229 for i in self.ab_list: 230 for j in self.ab_list[:self.ab_list.index(i)+1]: 231 if i != j: 232 self[j,i] = self[j,i] + self[i,j] 233 del self[i,j]
234
235 - def _init_zero(self):
236 for i in self.ab_list: 237 for j in self.ab_list[:self.ab_list.index(i)+1]: 238 self[j,i] = 0.
239
240 - def make_relative_entropy(self,obs_freq_mat):
241 """if this matrix is a log-odds matrix, return its entropy 242 Needs the observed frequency matrix for that""" 243 # This method should be removed once the usage of mat_type is removed. 244 ent = 0. 245 if self.mat_type == LO: 246 for i in self.keys(): 247 ent += obs_freq_mat[i]*self[i]/log(2) 248 elif self.mat_type == SUBS: 249 for i in self.keys(): 250 if self[i] > EPSILON: 251 ent += obs_freq_mat[i]*log(self[i])/log(2) 252 else: 253 raise TypeError("entropy: substitution or log-odds matrices only") 254 self.relative_entropy = ent
255 #
256 - def make_entropy(self):
257 self.entropy = 0 258 for i in self.keys(): 259 if self[i] > EPSILON: 260 self.entropy += self[i]*log(self[i])/log(2) 261 self.entropy = -self.entropy
262
263 - def letter_sum(self,letter):
264 warnings.warn("SeqMat.letter_sum is deprecated; please use SeqMat.sum instead", DeprecationWarning) 265 assert letter in self.alphabet.letters 266 sum = 0. 267 for i in self.keys(): 268 if letter in i: 269 if i[0] == i[1]: 270 sum += self[i] 271 else: 272 sum += (self[i] / 2.) 273 return sum
274
275 - def all_letters_sum(self):
276 import warnings 277 warnings.warn("SeqMat.all_letters_sum is deprecated; please use SeqMat.sum instead", DeprecationWarning) 278 for letter in self.alphabet.letters: 279 self.sum_letters[letter] = self.letter_sum(letter)
280
281 - def sum(self):
282 result = {} 283 for letter in self.alphabet.letters: 284 result[letter] = 0.0 285 for pair, value in self.iteritems(): 286 i1, i2 = pair 287 if i1==i2: 288 result[i1] += value 289 else: 290 result[i1] += value / 2 291 result[i2] += value / 2 292 return result
293
294 - def print_full_mat(self,f=None,format="%4d",topformat="%4s", 295 alphabet=None,factor=1,non_sym=None):
296 f = f or sys.stdout 297 # create a temporary dictionary, which holds the full matrix for 298 # printing 299 assert non_sym == None or type(non_sym) == type(1.) or \ 300 type(non_sym) == type(1) 301 full_mat = copy.copy(self) 302 for i in self: 303 if i[0] != i[1]: 304 full_mat[(i[1],i[0])] = full_mat[i] 305 if not alphabet: 306 alphabet = self.ab_list 307 topline = '' 308 for i in alphabet: 309 topline = topline + topformat % i 310 topline = topline + '\n' 311 f.write(topline) 312 for i in alphabet: 313 outline = i 314 for j in alphabet: 315 if alphabet.index(j) > alphabet.index(i) and non_sym is not None: 316 val = non_sym 317 else: 318 val = full_mat[i,j] 319 val *= factor 320 if val <= -999: 321 cur_str = ' ND' 322 else: 323 cur_str = format % val 324 325 outline = outline+cur_str 326 outline = outline+'\n' 327 f.write(outline)
328
329 - def print_mat(self,f=None,format="%4d",bottomformat="%4s", 330 alphabet=None,factor=1):
331 """Print a nice half-matrix. f=sys.stdout to see on the screen 332 User may pass own alphabet, which should contain all letters in the 333 alphabet of the matrix, but may be in a different order. This 334 order will be the order of the letters on the axes""" 335 336 f = f or sys.stdout 337 if not alphabet: 338 alphabet = self.ab_list 339 bottomline = '' 340 for i in alphabet: 341 bottomline = bottomline + bottomformat % i 342 bottomline = bottomline + '\n' 343 for i in alphabet: 344 outline = i 345 for j in alphabet[:alphabet.index(i)+1]: 346 try: 347 val = self[j,i] 348 except KeyError: 349 val = self[i,j] 350 val *= factor 351 if val == -999: 352 cur_str = ' ND' 353 else: 354 cur_str = format % val 355 356 outline = outline+cur_str 357 outline = outline+'\n' 358 f.write(outline) 359 f.write(bottomline)
360
361 - def __str__(self):
362 """Print a nice half-matrix.""" 363 output = "" 364 alphabet = self.ab_list 365 n = len(alphabet) 366 for i in range(n): 367 c1 = alphabet[i] 368 output += c1 369 for j in range(i+1): 370 c2 = alphabet[j] 371 try: 372 val = self[c2,c1] 373 except KeyError: 374 val = self[c1,c2] 375 if val == -999: 376 output += ' ND' 377 else: 378 output += "%4d" % val 379 output += '\n' 380 output += '%4s' * n % tuple(alphabet) + "\n" 381 return output
382
383 - def __sub__(self,other):
384 """ returns a number which is the subtraction product of the two matrices""" 385 mat_diff = 0 386 for i in self.keys(): 387 mat_diff += (self[i] - other[i]) 388 return mat_diff
389
390 - def __mul__(self,other):
391 """ returns a matrix for which each entry is the multiplication product of the 392 two matrices passed""" 393 new_mat = copy.copy(self) 394 for i in self.keys(): 395 new_mat[i] *= other[i] 396 return new_mat
397
398 - def __add__(self, other):
399 new_mat = copy.copy(self) 400 for i in self.keys(): 401 new_mat[i] += other[i] 402 return new_mat
403
404 -class AcceptedReplacementsMatrix(SeqMat):
405 """Accepted replacements matrix"""
406
407 -class ObservedFrequencyMatrix(SeqMat):
408 """Observed frequency matrix"""
409
410 -class ExpectedFrequencyMatrix(SeqMat):
411 """Expected frequency matrix"""
412 413
414 -class SubstitutionMatrix(SeqMat):
415 """Substitution matrix""" 416
417 - def calculate_relative_entropy(self, obs_freq_mat):
418 """Calculate and return the relative entropy with respect to an 419 observed frequency matrix""" 420 relative_entropy = 0. 421 for key, value in self.iteritems(): 422 if value > EPSILON: 423 relative_entropy += obs_freq_mat[key]*log(value) 424 relative_entropy /= log(2) 425 return relative_entropy
426 427
428 -class LogOddsMatrix(SeqMat):
429 """Log odds matrix""" 430
431 - def calculate_relative_entropy(self,obs_freq_mat):
432 """Calculate and return the relative entropy with respect to an 433 observed frequency matrix""" 434 relative_entropy = 0. 435 for key, value in self.iteritems(): 436 relative_entropy += obs_freq_mat[key]*value/log(2) 437 return relative_entropy
438 439
440 -def _build_obs_freq_mat(acc_rep_mat):
441 """ 442 build_obs_freq_mat(acc_rep_mat): 443 Build the observed frequency matrix, from an accepted replacements matrix 444 The acc_rep_mat matrix should be generated by the user. 445 """ 446 # Note: acc_rep_mat should already be a half_matrix!! 447 total = float(sum(acc_rep_mat.values())) 448 obs_freq_mat = ObservedFrequencyMatrix(alphabet=acc_rep_mat.alphabet, 449 build_later=1) 450 for i in acc_rep_mat: 451 obs_freq_mat[i] = acc_rep_mat[i]/total 452 return obs_freq_mat
453
454 -def _exp_freq_table_from_obs_freq(obs_freq_mat):
455 exp_freq_table = {} 456 for i in obs_freq_mat.alphabet.letters: 457 exp_freq_table[i] = 0. 458 for i in obs_freq_mat.keys(): 459 if i[0] == i[1]: 460 exp_freq_table[i[0]] += obs_freq_mat[i] 461 else: 462 exp_freq_table[i[0]] += obs_freq_mat[i] / 2. 463 exp_freq_table[i[1]] += obs_freq_mat[i] / 2. 464 return FreqTable.FreqTable(exp_freq_table,FreqTable.FREQ)
465
466 -def _build_exp_freq_mat(exp_freq_table):
467 """Build an expected frequency matrix 468 exp_freq_table: should be a FreqTable instance 469 """ 470 exp_freq_mat = ExpectedFrequencyMatrix(alphabet=exp_freq_table.alphabet, 471 build_later=1) 472 for i in exp_freq_mat: 473 if i[0] == i[1]: 474 exp_freq_mat[i] = exp_freq_table[i[0]]**2 475 else: 476 exp_freq_mat[i] = 2.0*exp_freq_table[i[0]]*exp_freq_table[i[1]] 477 return exp_freq_mat
478 # 479 # Build the substitution matrix 480 #
481 -def _build_subs_mat(obs_freq_mat,exp_freq_mat):
482 """ Build the substitution matrix """ 483 if obs_freq_mat.ab_list != exp_freq_mat.ab_list: 484 raise ValueError("Alphabet mismatch in passed matrices") 485 subs_mat = SubstitutionMatrix(obs_freq_mat) 486 for i in obs_freq_mat: 487 subs_mat[i] = obs_freq_mat[i]/exp_freq_mat[i] 488 return subs_mat
489 490 # 491 # Build a log-odds matrix 492 #
493 -def _build_log_odds_mat(subs_mat,logbase=2,factor=10.0,round_digit=0,keep_nd=0):
494 """_build_log_odds_mat(subs_mat,logbase=10,factor=10.0,round_digit=1): 495 Build a log-odds matrix 496 logbase=2: base of logarithm used to build (default 2) 497 factor=10.: a factor by which each matrix entry is multiplied 498 round_digit: roundoff place after decimal point 499 keep_nd: if true, keeps the -999 value for non-determined values (for which there 500 are no substitutions in the frequency substitutions matrix). If false, plants the 501 minimum log-odds value of the matrix in entries containing -999 502 """ 503 lo_mat = LogOddsMatrix(subs_mat) 504 for key, value in subs_mat.iteritems(): 505 if value < EPSILON: 506 lo_mat[key] = -999 507 else: 508 lo_mat[key] = round(factor*log(value)/log(logbase),round_digit) 509 mat_min = min(lo_mat.values()) 510 if not keep_nd: 511 for i in lo_mat.keys(): 512 if lo_mat[i] <= -999: 513 lo_mat[i] = mat_min 514 return lo_mat
515 516 # 517 # External function. User provides an accepted replacement matrix, and, 518 # optionally the following: expected frequency table, log base, mult. factor, 519 # and rounding factor. Generates a log-odds matrix, calling internal SubsMat 520 # functions. 521 #
522 -def make_log_odds_matrix(acc_rep_mat,exp_freq_table=None,logbase=2, 523 factor=1.,round_digit=9,keep_nd=0):
524 obs_freq_mat = _build_obs_freq_mat(acc_rep_mat) 525 if not exp_freq_table: 526 exp_freq_table = _exp_freq_table_from_obs_freq(obs_freq_mat) 527 exp_freq_mat = _build_exp_freq_mat(exp_freq_table) 528 subs_mat = _build_subs_mat(obs_freq_mat, exp_freq_mat) 529 lo_mat = _build_log_odds_mat(subs_mat,logbase,factor,round_digit,keep_nd) 530 return lo_mat
531
532 -def observed_frequency_to_substitution_matrix(obs_freq_mat):
533 exp_freq_table = _exp_freq_table_from_obs_freq(obs_freq_mat) 534 exp_freq_mat = _build_exp_freq_mat(exp_freq_table) 535 subs_mat = _build_subs_mat(obs_freq_mat, exp_freq_mat) 536 return subs_mat
537
538 -def read_text_matrix(data_file,mat_type=NOTYPE):
539 matrix = {} 540 tmp = data_file.read().split("\n") 541 table=[] 542 for i in tmp: 543 table.append(i.split()) 544 # remove records beginning with ``#'' 545 for rec in table[:]: 546 if (rec.count('#') > 0): 547 table.remove(rec) 548 549 # remove null lists 550 while (table.count([]) > 0): 551 table.remove([]) 552 # build a dictionary 553 alphabet = table[0] 554 j = 0 555 for rec in table[1:]: 556 # print j 557 row = alphabet[j] 558 # row = rec[0] 559 if re.compile('[A-z\*]').match(rec[0]): 560 first_col = 1 561 else: 562 first_col = 0 563 i = 0 564 for field in rec[first_col:]: 565 col = alphabet[i] 566 matrix[(row,col)] = float(field) 567 i += 1 568 j += 1 569 # delete entries with an asterisk 570 for i in matrix.keys(): 571 if '*' in i: del(matrix[i]) 572 ret_mat = SeqMat(matrix,mat_type=mat_type) 573 return ret_mat
574 575 diagNO = 1 576 diagONLY = 2 577 diagALL = 3 578
579 -def two_mat_relative_entropy(mat_1,mat_2,logbase=2,diag=diagALL):
580 rel_ent = 0. 581 key_list_1 = mat_1.keys(); key_list_2 = mat_2.keys() 582 key_list_1.sort(); key_list_2.sort() 583 key_list = [] 584 sum_ent_1 = 0.; sum_ent_2 = 0. 585 for i in key_list_1: 586 if i in key_list_2: 587 key_list.append(i) 588 if len(key_list_1) != len(key_list_2): 589 590 sys.stderr.write("Warning:first matrix has more entries than the second\n") 591 if key_list_1 != key_list_2: 592 sys.stderr.write("Warning: indices not the same between matrices\n") 593 for key in key_list: 594 if diag == diagNO and key[0] == key[1]: 595 continue 596 if diag == diagONLY and key[0] != key[1]: 597 continue 598 if mat_1[key] > EPSILON and mat_2[key] > EPSILON: 599 sum_ent_1 += mat_1[key] 600 sum_ent_2 += mat_2[key] 601 602 for key in key_list: 603 if diag == diagNO and key[0] == key[1]: 604 continue 605 if diag == diagONLY and key[0] != key[1]: 606 continue 607 if mat_1[key] > EPSILON and mat_2[key] > EPSILON: 608 val_1 = mat_1[key] / sum_ent_1 609 val_2 = mat_2[key] / sum_ent_2 610 # rel_ent += mat_1[key] * log(mat_1[key]/mat_2[key])/log(logbase) 611 rel_ent += val_1 * log(val_1/val_2)/log(logbase) 612 return rel_ent
613 614 ## Gives the linear correlation coefficient between two matrices
615 -def two_mat_correlation(mat_1, mat_2):
616 try: 617 import numpy 618 except ImportError: 619 raise ImportError, "Please install Numerical Python (numpy) if you want to use this function" 620 values = [] 621 assert mat_1.ab_list == mat_2.ab_list 622 for ab_pair in mat_1: 623 try: 624 values.append((mat_1[ab_pair], mat_2[ab_pair])) 625 except KeyError: 626 raise ValueError, "%s is not a common key" % ab_pair 627 correlation_matrix = numpy.corrcoef(values, rowvar=0) 628 correlation = correlation_matrix[0,1] 629 return correlation
630 631 # Jensen-Shannon Distance 632 # Need to input observed frequency matrices
633 -def two_mat_DJS(mat_1,mat_2,pi_1=0.5,pi_2=0.5):
634 assert mat_1.ab_list == mat_2.ab_list 635 assert pi_1 > 0 and pi_2 > 0 and pi_1< 1 and pi_2 <1 636 assert not (pi_1 + pi_2 - 1.0 > EPSILON) 637 sum_mat = SeqMat(build_later=1) 638 sum_mat.ab_list = mat_1.ab_list 639 for i in mat_1.keys(): 640 sum_mat[i] = pi_1 * mat_1[i] + pi_2 * mat_2[i] 641 sum_mat.make_entropy() 642 mat_1.make_entropy() 643 mat_2.make_entropy() 644 # print mat_1.entropy, mat_2.entropy 645 dJS = sum_mat.entropy - pi_1 * mat_1.entropy - pi_2 *mat_2.entropy 646 return dJS
647 648 """ 649 This isn't working yet. Boo hoo! 650 def two_mat_print(mat_1, mat_2, f=None,alphabet=None,factor_1=1, factor_2=1, 651 format="%4d",bottomformat="%4s",topformat="%4s", 652 topindent=7*" ", bottomindent=1*" "): 653 f = f or sys.stdout 654 if not alphabet: 655 assert mat_1.ab_list == mat_2.ab_list 656 alphabet = mat_1.ab_list 657 len_alphabet = len(alphabet) 658 print_mat = {} 659 topline = topindent 660 bottomline = bottomindent 661 for i in alphabet: 662 bottomline += bottomformat % i 663 topline += topformat % alphabet[len_alphabet-alphabet.index(i)-1] 664 topline += '\n' 665 bottomline += '\n' 666 f.write(topline) 667 for i in alphabet: 668 for j in alphabet: 669 print_mat[i,j] = -999 670 diag_1 = {}; diag_2 = {} 671 for i in alphabet: 672 for j in alphabet[:alphabet.index(i)+1]: 673 if i == j: 674 diag_1[i] = mat_1[(i,i)] 675 diag_2[i] = mat_2[(alphabet[len_alphabet-alphabet.index(i)-1], 676 alphabet[len_alphabet-alphabet.index(i)-1])] 677 else: 678 if i > j: 679 key = (j,i) 680 else: 681 key = (i,j) 682 mat_2_key = [alphabet[len_alphabet-alphabet.index(key[0])-1], 683 alphabet[len_alphabet-alphabet.index(key[1])-1]] 684 # print mat_2_key 685 mat_2_key.sort(); mat_2_key = tuple(mat_2_key) 686 # print key ,"||", mat_2_key 687 print_mat[key] = mat_2[mat_2_key] 688 print_mat[(key[1],key[0])] = mat_1[key] 689 for i in alphabet: 690 outline = i 691 for j in alphabet: 692 if i == j: 693 if diag_1[i] == -999: 694 val_1 = ' ND' 695 else: 696 val_1 = format % (diag_1[i]*factor_1) 697 if diag_2[i] == -999: 698 val_2 = ' ND' 699 else: 700 val_2 = format % (diag_2[i]*factor_2) 701 cur_str = val_1 + " " + val_2 702 else: 703 if print_mat[(i,j)] == -999: 704 val = ' ND' 705 elif alphabet.index(i) > alphabet.index(j): 706 val = format % (print_mat[(i,j)]*factor_1) 707 else: 708 val = format % (print_mat[(i,j)]*factor_2) 709 cur_str = val 710 outline += cur_str 711 outline += bottomformat % (alphabet[len_alphabet-alphabet.index(i)-1] + 712 '\n') 713 f.write(outline) 714 f.write(bottomline) 715 """ 716