Frequently, diverse protein families contain members that will produce pair-wise similarity scores with expectation values > 0.05. How can we tell if sequences with high, but not statistically significant similarity scores, are related to the query sequence?
If the family is large, a relatively simple test is available. Simply perform additional searches using the sequences with marginal similarity scores. Thus, in the search below with the bacterial DCMA_METSP sequence shows clearly that this protein belongs to the glutathione transferase superfamily. All of the sequences that with statistically significant similarity scores belong to the family.
The best scores are: s-w Z-score E(43470) DCMA_METSP DICHLOROMETHANE DEHALOGENASE (EC 4.5.1. 1982 2721.7 0 GTTR_RAT GLUTATHIONE S-TRANSFERASE YRS-YRS (EC 2.5 344 463.2 2.3e-19 GTT1_RAT GLUTHATHIONE S-TRANSFERASE 5 (EC 2.5.1.18 330 444.1 2.7e-18 GT1_MUSDO GLUTATHIONE S-TRANSFERASE 1 (EC 2.5.1.18 271 364.0 7.9e-14 GT1_DROME GLUTATHIONE S-TRANSFERASE 1-1 (EC 2.5.1. 231 308.7 9.4e-11 GT32_MAIZE GLUTATHIONE S-TRANSFERASE III (EC 2.5.1 204 271.0 1.2e-08
In contrast, a search with the yeast isopentyl transferase, the highest scoring unrelated sequence, shows high (but not statistically significant) with a variety of unrelated sequences, as would be expected by chance.
The best scores are: s-w Z-score E(43470) MOD5_YEAST TRNA ISOPENTENYLTRANSFERASE (EC 2.5.1.8 2876 3535.1 0 MIAA_AGRTU TRNA DELTA(2)-ISOPENTENYLPYROPHOSPHATE 436 527.2 6.4e-23 MIAA_ECOLI TRNA DELTA(2)-ISOPENTENYLPYROPHOSPHATE 397 478.4 3.3e-20 MIAA_SALTY TRNA DELTA(2)-ISOPENTENYLPYROPHOSPHATE 139 172.2 0.004 GTB3_MOUSE GLUTATHIONE S-TRANSFERASE GT9.3 (EC 2.5 118 137.9 0.31 G6PD_ECOLI GLUCOSE-6-PHOSPHATE 1-DEHYDROGENASE (EC 119 130.9 0.76 GTB1_CRILO GLUTATHIONE S-TRANSFERASE Y1 (EC 2.5.1. 110 128.0 1.1 CAMT_PETCR CAFFEOYL-COA O-METHYLTRANSFERASE (EC 2. 107 123.2 2.0 GTMU_MESAU GLUTATHIONE S-TRANSFERASE (EC 2.5.1.18) 104 120.6 2.8 ACCO_PERAE 1-AMINOCYCLOPROPANE-1-CARBOXYLATE OXIDA 107 120.4 2.9 YIK4_YEAST HYPOTHETICAL 59.2 KD PROTEIN IN PFK26-S 110 119.4 3.3 G6PD_ERWCH GLUCOSE-6-PHOSPHATE 1-DEHYDROGENASE (EC 109 118.5 3.7 RASX_XENLA TRANSFORMING PROTEIN P21/K-RAS. 101 118.4 3.8 GNTV_ECOLI GLUCONOKINASE (EC 2.7.1.12) (GLUCONATE 100 117.2 4.4 TFC1_YEAST TRANSCRIPTION FACTOR TAU 95 KD SUBUNIT 110 116.9 4.6 GTM4_HUMAN GLUTATHIONE S-TRANSFERASE MUSCLE (EC 2. 101 116.9 4.6
While there are several glutathione transferases on this list, none has a statistically signficant similarity score. The presence of several family members is not very informative because they are all very closely related.