Multiple sequence alignment of Cas9 homologsΒΆ

This script searches for proteins homologous to Cas9 from Streptococcus pyogenes via NCBI BLAST and performs a multiple sequence alignment of the hit sequences afterwards, using MUSCLE.

../../../_images/sphx_glr_homolog_msa_001.png

Out:

MSA results:
Q99ZW2     ---------------------MDKKYSIGLDIGTNSVGWAVITDEYKVPSKKFKVLGNTDRHSIKKNLIGALLFDSGETA-------EATRLKRTARRRYTRRKNRICYLQEIFSNEMAKVDDSFFHRLEESFLVEEDKKHERHPIFGNIVDEVAYHEKYP-TIYHLRKKLVDSTDKADLRLIYLALAHMIKFRGHFLIEGDLNPDNSDVDKLFIQLVQTYNQLFEENP------INASGVDAKAILSARLSKSRRLENLIAQLPGEKKNGLFGNLIALSLGLTPNFKSNFDLAEDAKLQLSKDTYDDDLDNLLAQIGDQYADLFLAAKNLSDAILLSDILRVNTEITKAPLSASMIKRYDEHHQDLTLLKALVRQQLPEKYKEIFFDQSKNGYAGYIDGGASQEEFYKFIKPILEKMDGTEELLVKLNREDLLRKQRTFDNGSIPHQIHLGELHAILRRQEDF-YPFLKDN-REKIEKILTFRIPYYVGPLARGNSRFAWMTRKSEETITPWNFEEVVDKGASAQSFIERMTNFDKNLPNEKVLPKHSLLYEYFTVYNELTKVKYVTEGMRKPAFLSGEQKKAIVDLLFKTNRKVTVKQLKEDYFKKIECFDSVEISG--VEDR---FNASLGTYHDLLKIIKDKDFLDN----EENEDILEDIVLTLTLFEDREMIEERLK-TYAH--LFDDKVMKQLKRRRYT------GWGRLSRKLINGIRDKQSGKTILDFLKSDGFANRNFMQLIHDDSLTFKEDIQKAQVSGQGD--SLHEHIANLAGSPAIKKGILQTVKVVDELVKVMGRHKPENIVIEMARENQTTQKGQKNSRERMKRIEEGIKELGSQILKEH------PVEN----TQLQNEKLYLYYLQNGRDMYVDQELDINRL----SDYDVDHIVPQSFLKDDSIDNKVLTRSDKNRGKSDNVPSEEVVKKMKNY-WRQLLNAKLITQRKFDNLTKAERGGLSEL--DKAG-------FIKRQLVETRQITKHVAQILDSRMNTKYDENDKLIREVKVITLKSKLVSDFRKDFQFYKVREINNYHHAHDAYLNAVVGTALIKKYPKLESEFV-YGDYKVYDVRKMIAKSEQEIGKATAKYFFYSNIMNFFKTEITLANGEIRKRPLIETNGE-TGEIVWDKGRDFATVRKVLSMPQVNIVKKTEVQTGGFSKE----------SILPKRNSDKLIARKKD---WDPKKYGGFDSPTVAYSVLVVAKV--EKGKSKKLKSVKELLGITIMERSSFEKNP------IDFLEAKGYKEVKKDLIIKLPKYSLFELENGRKRMLASA-------GELQKGNELALPSKYVNFLYLASHYEKL--------KGSPEDNEQKQLFVEQ--HKHYLDEIIEQISEFSKRVILADANLDKVLSAYNKHRDKPIREQAENII--------HLFTLTNLGAPAA--FKYFDTTID------------RKR-YTSTKEVLDATLIHQS--------------------ITGLYETRIDLSQLGGD---------------
Q8DTE3     ---------------------MKKPYSIGLDIGTNSVGWAVVTDDYKVPAKKMKVLGNTDKSHIEKNLLGALLFDSGNTA-------EDRRLKRTARRRYTRRRNRILYLQEIFSEEMGKVDDSFFHRLEDSFLVTEDKRGERHPIFGNLEEEVKYHENFP-TIYHLRQYLADNPEKVDLRLVYLALAHIIKFRGHFLIEGKFDTRNNDVQRLFQEFLAVYDNTFENSS------LQEQNVQVEEILTDKISKSAKKDRVLKLFPNEKSNGRFAEFLKLIVGNQADFKKHFELEEKAPLQFSKDTYEEELEVLLAQIGDNYAELFLSAKKLYDSILLSGILTVTDVGTKAPLSASMIQRYNEHQMDLAQLKQFIRQKLSDKYNEVFSDVSKDGYAGYIDGKTNQEAFYKYLKGLLNKIEGSGYFLDKIEREDFLRKQRTFDNGSIPHQIHLQEMRAIIRRQAEF-YPFLADN-QDRIEKLLTFRIPYYVGPLARGKSDFAWLSRKSADKITPWNFDEIVDKESSAEAFINRMTNYDLYLPNQKVLPKHSLLYEKFTVYNELTKVKYKTE-QGKTAFFDANMKQEIFDGVFKVYRKVTKDKLMDFLEKEFDEFRIVDLTG--LDKENKVFNASYGTYHDLCKIL-DKDFLDN----SKNEKILEDIVLTLTLFEDREMIRKRLE-NYSD--LLTKEQVKKLERRHYT------GWGRLSAELIHGIRNKESRKTILDYLIDDGNSNRNFMQLINDDALSFKEEIAKAQVIGETD--NLNQVVSDIAGSPAIKKGILQSLKIVDELVKIMG-HQPENIVVEMARENQFTNQGRRNSQQRLKGLTDSIKEFGSQILKEH------PVEN----SQLQNDRLFLYYLQNGRDMYTGEELDIDYL----SQYDIDHIIPQAFIKDNSIDNRVLTSSKENRGKSDDVPSKDVVRKMKSY-WSKLLSAKLITQRKFDNLTKAERGGLTDD--DKAG-------FIKRQLVETRQITKHVARILDERFNTETDENNKKIRQVKIVTLKSNLVSNFRKEFELYKVREINDYHHAHDAYLNAVIGKALLGVYPQLEPEFV-YGDYPHFHGHKE--------NKATAKKFFYSNIMNFFKKD-------------DVRTDK-NGEIIWKKDEHISNIKKVLSYPQVNIVKKVEEQTGGFSKE----------SILPKGNSDKLIPRKTKKFYWDTKKYGGFDSPIVAYSILVIADI--EKGKSKKLKTVKALVGVTIMEKMTFERDP------VAFLERKGYRNVQEENIIKLPKYSLFKLENGRKRLLASA-------RELQKGNEIVLPNHLGTLLYHAKNIHKV------------DEPKHLD-YVDK--HKDEFKELLDVVSNFSKKYTLAEGNLEKIKELYAQNNGEDLKELASSFI--------NLLTFTAIGAPAT--FKFFDKNID------------RKR-YTSTTEILNATLIHQS--------------------ITGLYETRIDLNKLGGD---------------
Q03JI6     ---------------------MTKPYSIGLDIGTNSVGWAVTTDNYKVPSKKMKVLGNTSKKYIKKNLLGVLLFDSGITA-------EGRRLKRTARRRYTRRRNRILYLQEIFSTEMATLDDAFFQRLDDSFLVPDDKRDSKYPIFGNLVEEKAYHDEFP-TIYHLRKYLADSTKKADLRLVYLALAHMIKYRGHFLIEGEFNSKNNDIQKNFQDFLDTYNAIFESDL------SLENSKQLEEIVKDKISKLEKKDRILKLFPGEKNSGIFSEFLKLIVGNQADFRKCFNLDEKASLHFSKESYDEDLETLLGYIGDDYSDVFLKAKKLYDAILLSGFLTVTDNETEAPLSSAMIKRYNEHKEDLALLKEYIRNISLKTYNEVFKDDTKNGYAGYIDGKTNQEDFYVYLKKLLAEFEGADYFLEKIDREDFLRKQRTFDNGSIPYQIHLQEMRAILDKQAKF-YPFLAKN-KERIEKILTFRIPYYVGPLARGNSDFAWSIRKRNEKITPWNFEDVIDKESSAEAFINRMTSFDLYLPEEKVLPKHSLLYETFNVYNELTKVRFIAESMRDYQFLDSKQKKDIVRLYFKDKRKVTDKDIIE-YLHAIYGYDGIELKG--IEKQ---FNSSLSTYHDLLNIINDKEFLDD----SSNEAIIEEIIHTLTIFEDREMIKQRLS-KFEN--IFDKSVLKKLSRRHYT------GWGKLSAKLINGIRDEKSGNTILDYLIDDGISNRNFMQLIHDDALSFKKKIQKAQIIGDEDKGNIKEVVKSLPGSPAIKKGILQSIKIVDELVKVMGGRKPESIVVEMARENQYTNQGKSNSQQRLKRLEKSLKELGSKILKENIPAKLSKIDN----NALQNDRLYLYYLQNGKDMYTGDDLDIDRL----SNYDIDHIIPQAFLKDNSIDNKVLVSSASNRGKSDDVPSLEVVKKRKTF-WYQLLKSKLISQRKFDNLTKAERGGLSPE--DKAG-------FIQRQLVETRQITKHVARLLDEKFNNKKDENNRAVRTVKIITLKSTLVSQFRKDFELYKVREINDFHHAHDAYLNAVVASALLKKYPKLEPEFV-YGDYPKYNSFRE-------RKSATEKVYFYSNIMNIFKKSISLADGRVIERPLIEVNEE-TGESVWNKESDLATVRRVLSYPQVNVVKKVEEQNHGLDRGKPKGLFNANLSSKPKPNSNENLVGAKE--YLDPKKYGGYAGISNSFTVLVKGTI--EKGAKKKITNVLEFQGISILDRINYRKDK------LNFLLEKGYKDI--ELIIELPKYSLFELSDGSRRMLASILSTNNKRGEIHKGNQIFLSQKFVKLLYHAKRISNT------------INENHRK-YVEN--HKKEFEELFYYILEFNENYVGAKKNGKLLNSAFQSWQNHSIDELCSSFIGPTGSERKGLFELTSRGSAAD--FEFLGVKIP------------RYRDYTPSSLLKDATLIHQS--------------------VTGLYETRIDLAKLGEG---------------
G3ECR1     MLFNKCIIISINLDFSNKEKCMTKPYSIGLDIGTNSVGWAVITDNYKVPSKKMKVLGNTSKKYIKKNLLGVLLFDSGITA-------EGRRLKRTARRRYTRRRNRILYLQEIFSTEMATLDDAFFQRLDDSFLVPDDKRDSKYPIFGNLVEEKVYHDEFP-TIYHLRKYLADSTKKADLRLVYLALAHMIKYRGHFLIEGEFNSKNNDIQKNFQDFLDTYNAIFESDL------SLENSKQLEEIVKDKISKLEKKDRILKLFPGEKNSGIFSEFLKLIVGNQADFRKCFNLDEKASLHFSKESYDEDLETLLGYIGDDYSDVFLKAKKLYDAILLSGFLTVTDNETEAPLSSAMIKRYNEHKEDLALLKEYIRNISLKTYNEVFKDDTKNGYAGYIDGKTNQEDFYVYLKNLLAEFEGADYFLEKIDREDFLRKQRTFDNGSIPYQIHLQEMRAILDKQAKF-YPFLAKN-KERIEKILTFRIPYYVGPLARGNSDFAWSIRKRNEKITPWNFEDVIDKESSAEAFINRMTSFDLYLPEEKVLPKHSLLYETFNVYNELTKVRFIAESMRDYQFLDSKQKKDIVRLYFKDKRKVTDKDIIE-YLHAIYGYDGIELKG--IEKQ---FNSSLSTYHDLLNIINDKEFLDD----SSNEAIIEEIIHTLTIFEDREMIKQRLS-KFEN--IFDKSVLKKLSRRHYT------GWGKLSAKLINGIRDEKSGNTILDYLIDDGISNRNFMQLIHDDALSFKKKIQKAQIIGDEDKGNIKEVVKSLPGSPAIKKGILQSIKIVDELVKVMGGRKPESIVVEMARENQYTNQGKSNSQQRLKRLEKSLKELGSKILKENIPAKLSKIDN----NALQNDRLYLYYLQNGKDMYTGDDLDIDRL----SNYDIDHIIPQAFLKDNSIDNKVLVSSASNRGKSDDFPSLEVVKKRKTF-WYQLLKSKLISQRKFDNLTKAERGGLLPE--DKAG-------FIQRQLVETRQITKHVARLLDEKFNNKKDENNRAVRTVKIITLKSTLVSQFRKDFELYKVREINDFHHAHDAYLNAVIASALLKKYPKLEPEFV-YGDYPKYNSFRE-------RKSATEKVYFYSNIMNIFKKSISLADGRVIERPLIEVNEE-TGESVWNKESDLATVRRVLSYPQVNVVKKVEEQNHGLDRGKPKGLFNANLSSKPKPNSNENLVGAKE--YLDPKKYGGYAGISNSFAVLVKGTI--EKGAKKKITNVLEFQGISILDRINYRKDK------LNFLLEKGYKDI--ELIIELPKYSLFELSDGSRRMLASILSTNNKRGEIHKGNQIFLSQKFVKLLYHAKRISNT------------INENHRK-YVEN--HKKEFEELFYYILEFNENYVGAKKNGKLLNSAFQSWQNHSIDELCSSFIGPTGSERKGLFELTSRGSAAD--FEFLGVKIP------------RYRDYTPSSLLKDATLIHQS--------------------VTGLYETRIDLAKLGEG---------------
Q927P4     ---------------------MKKPYTIGLDIGTNSVGWAVLTDQYDLVKRKMKIAGDSEKKQIKKNFWGVRLFDEGQTA-------ADRRMARTARRRIERRRNRISYLQGIFAEEMSKTDANFFCRLSDSFYVDNEKRNSRHPFFATIEEEVEYHKNYP-TIYHLREELVNSSEKADLRLVYLALAHIIKYRGNFLIEGALDTQNTSVDGIYKQFIQTYNQVFASGIEDGSLKKLEDNKDVAKILVEKVTRKEKLERILKLYPGEKSAGMFAQFISLIVGSKGNFQKPFDLIEKSDIECAKDSYEEDLESLLALIGDEYAELFVAAKNAYSAVVLSSIITVAETETNAKLSASMIERFDTHEEDLGELKAFIKLHLPKHYEEIFSNTEKHGYAGYIDGKTKQADFYKYMKMTLENIEGADYFIAKIEKENFLRKQRTFDNGAIPHQLHLEELEAILHQQAKY-YPFLKEN-YDKIKSLVTFRIPYFVGPLANGQSEFAWLTRKADGEIRPWNIEEKVDFGKSAVDFIEKMTNKDTYLPKENVLPKHSLCYQKYLVYNELTKVRYIND-QGKTSYFSGQEKEQIFNDLFKQKRKVKKKDLEL-FLRNMSHVESPTIEG--LEDS---FNSSYSTYHDLLKVGIKQEILDN----PVNTEMLENIVKILTVFEDKRMIKEQLQ-QFSD--VLDGVVLKKLERRHYT------GWGRLSAKLLMGIRDKQSHLTILDYLMNDDGLNRNLMQLINDSNLSFKSIIEKEQVTTADK--DIQSIVADLAGSPAIKKGILQSLKIVDELVSVMG-YPPQTIVVEMARENQTTGKGKNNSRPRYKSLEKAIKEFGSQILKEH------PTDN----QELRNNRLYLYYLQNGKDMYTGQDLDIHNL----SNYDIDHIVPQSFITDNSIDNLVLTSSAGNREKGDDVPPLEIVRKRKVF-WEKLYQGNLMSKRKFDYLTKAERGGLTEA--DKAR-------FIHRQLVETRQITKNVANILHQRFNYEKDDHGNTMKQVRIVTLKSALVSQFRKQFQLYKVRDVNDYHHAHDAYLNGVVANTLLKVYPQLEPEFV-YGDYHQFDWFKA--------NKATAKKQFYTNIMLFFAQK-------------DRIIDE-NGEILWDK-KYLDTVKKVMSYRQMNIVKKTEIQKGEFSKA----------TIKPKGNSSKLIPRKTN---WDPMKYGGLDSPNMAYAVVIEY----AKGKNKLVFE-KKIIRVTIMERKAFEKDE------KAFLEEQGYRQP--KVLAKLPKYTLYECEEGRRRMLASA-------NEAQKGNQQVLPNHLVTLLHHAANCE--------------VSDGKSLDYIES--NREMFAELLAHVSEFAKRYTLAEANLNKINQLFEQNKEGDIKAIAQSFV--------DLMAFNAMGAPAS--FKFFETTIE------------RKR-YNNLKELLNSTIIYQS--------------------ITGLYESRKRLDD-------------------
J7RUA5     ---------------------MKRNYILGLDIGITSVGYGII--DYETRD----VID-----------AGVRLFKEANVENN-----EGRRSKRGARRLKRRRRHRIQRVKKLLFDYNLLTDHSELSGI---------------------------------NPYEAR--VKGLSQKLSEEEFSAALLHLAKRRGVHNVNEVEEDTGNEL--STKEQISRNSKALEE------------------------------------------------------------KYVAELQ--------------------------------------------------------------LERLKKD--------------------------------GEVRG-----------------------SINRFKTSDYVK-----------------EAKQLLKVQKA--YHQLDQSFIDTYIDLLETRRTYYEGP----------------GEGSPFGWKDI-------KEWYEMLMGHCTYFPEELRSVKYAYNADLYNALNDLNNLVITRDENEKLEYY---EKFQIIENVFKQKKKPTLKQIAKEILVNEEDIKGYRVTS--TGKP---EFTNLKVYHDIKDITARKEIIE-------NAELLDQIAKILTIYQSSEDIQEELTNLNSE--LTQEEIEQISNLKGYT------GTHNLSLKAINLILDELWH------------TNDNQIAIFNRLKLVPKKVDLSQQKEIPTT------LVDDFILSPVVKRSFIQSIKVINAIIKKYG--LPNDIIIELAREKN-SKDAQKMINEMQKR-NRQTNERIEEIIRTT------GKEN----AKYLIEKIKLHDMQEGKCLYSLEAIPLEDLLNNPFNYEVDHIIPRSVSFDNSFNNKVLVKQEENSKKGNRTPFQYLSSSDSKISYETFKKHILNLAKGKGRISKTKKEYLLEE--RDINRFSVQKDFINRNLVDTRYATRGLMNLLRSYFRVNNLD-------VKVKSINGGFTSFLRRKWKFKKERNKGYKHHAEDALIIA-NADFIFKEWKKLDKAKK-VMENQMFEEKQAESMPEIETEQEYKEIFITPHQIKHIKD--------------------------FKDYKY---------------------------------------SHRVDKKPNRELINDTL--YSTRKDDKG--------NTLIVNNL--NGLYDKDNDKLKKLINKSPEKLLMYHHDPQTYQK-LKLIMEQYGDE----------KNPLYKYYEETGNYLTKY-------SKKDNGPVIKKIKYYGNKLNAHLDITDDYPNSRNKVVKLSLKPYRFDVYLDN--GVYKFVTVKNL--DVIKKENYYEVNSKCYEEAKKLKKISNQAEFIASFY------NNDLIKINGEL------YRVIGVNNDLLNRIEVNMIDITYREYLENMNDKRPPRIIKTIA------------------SKTQSIKKYSTDILGNLYEVKSKKHPQIIKKG
Q9CLT2     ------------------MQTTNLSYILGLDLGIASVGWAVV--EINENEDPIGLID-----------VGVRIFERAEVPKTGESLALSRRLARSTRRLIRRRAHRLLLAKRFLKREGILST-------------------------------IDLEKGLPNQAWELR--VAGLERRLSAIEWGAVLLHLIKHRGYLSKRKNESQTNNKELGALLSGVAQNHQLLQS---------------------------------------------------------DDYRTPAELA--------------------------------------------------------------LKKFAKEE-------------------------------GHIRNQRGA-------------------YTHTFNRLDLLA-----------------ELNLLFAQQHQFGNPHCKEHIQQYMTELLMWQKPALSG------------------------------------EAILKMLGKCTHEKNEFKAAKHTYSAERFVWLTKLNNLRILEDGAERA--LNEEERQLLINHPYEKSKLTYAQVRKLLGLSEQAIFKHLRYSKENAESA---TFMELKAWHAIRKALENQGLKDTWQDLAKKPDLLDEIGTAFSLYKTDEDIQQYLTNKVPN--SVINALLVSLNFDK---------FIELSLKSLRKILP----------LMEQGKRYDQACREIYGHHYGEANQKTSQ---------LLPAIPAQEIRNPVVLRTLSQARKVINAIIRQYG--SPARVHIETGRELGKSFKERREIQKQQED-NRTKRESAVQKFKELFSDFSSEPKS----KDIL--KFRLYEQQHGKCLYSGKEINIHRL-NEKGYVEIDHALPFSRTWDDSFNNKVLVLASENQNKGNQTPYEWLQGKINSERWKNFVALVLGSQ-----CSAAKKQRLLTQVIDDNK-------FIDRNLNDTRYIARFLSNYIQENLLLVGKN------KKNVFTPNGQITALLRSRWGLIKARENNNRHHALDAIVVACATPSMQQKITR----FIRFKEVHPYKIENRYEMVDQESGEIISP--HFPEPWAYFRQEVN-----------IRVFDN-HPDTVLKEML----------------------------------------PDRPQANHQFVQPL-----FVSRAPTRKMSGQGHMETIKSAKRL--AEGISVLRIPLTQLKPNLLENMVNKEREPALYAG-LKARLAEFNQDP-----AKAFATPFYKQG-----------------GQQVKAIRVEQVQKSGVLVRENNGVADN------------ASIVRTDVFIKN--NKFFLVPIYTW--QVAKGILPNKA-------IVAHKNEDEWEEMDEGAKFKFSLFPNDLVELKTKKEYF---FGYY-IGLD------------RATGNISLKEHDGEISKGKDGVYR-VGV------------KLALSFEKYQVDELGKNRQICRPQQRQPVR--
A1IQ68     ---------------MAAFKPNPINYILGLDIGIASVGWAMV--EIDEDENPICLID-----------LGVRVFERAEVPKTGDSLAMARRLARSVRRLTRRRAHRLLRARRLLKREGVLQAADF--------------------------DENGLIKSLPNTPWQLR--AAALDRKLTPLEWSAVLLHLIKHRGYLSQRKNEGETADKELGALLKGVADNAHALQT---------------------------------------------------------GDFRTPAELA--------------------------------------------------------------LNKFEKES-------------------------------GHIRNQRGD-------------------YSHTFSRKDLQA-----------------ELILLFEKQKEFGNPHVSGGLKEGIETLLMTQRPALSG------------------------------------DAVQKMLGHCTFEPAEPKAAKNTYTAERFIWLTKLNNLRILEQGSERP--LTDTERATLMDEPYRKSKLTYAQARKLLGLEDTAFFKGLRYGKDNAEAS---TLMEMKAYHAISRALEKEGLKDKKSPLNLSPELQDEIGTAFSLFKTDEDITGRLKDRIQP--EILEALLKHISFDK---------FVQISLKALRRIVP----------LMEQGKRYDEACAEIYGDHYGKKNTEEKI---------YLPPIPADEIRNPVVLRALSQARKVINGVVRRYG--SPARIHIETAREVGKSFKDRKEIEKRQEE-NRKDREKAAAKFREYFPNFVGEPKS----KDIL--KLRLYEQQHGKCLYSGKEINLGRL-NEKGYVEIDHALPFSRTWDDSFNNKVLVLGSENQNKGNQTPYEYFNGKDNSREWQEF-KARV----ETSRFPRSKKQRILLQKFDEDG-------FKERNLNDTRYVNRFLCQFVADRMRLTGKG------KKRVFASNGQITNLLRGFWGLRKVRAENDRHHALDAVVVACSTVAMQQKITR----FVRYKEMNAFDGKTI----DKETGEVLHQKTHFPQPWEFFAQEVM-----------IRVFGKPDGKPEFEEADTPEKLRTLLAEKL---------------------------SSRPEAVHEYVTPL-----FVSRAPNRKMSGQGHMETVKSAKRL--DEGVSVLRVPLTQLKLKDLEKMVNREREPKLYEA-LKARLEAHKDDP-----AKAFAEPFYKYDKAGNR------------TQQVKAVRVEQVQKTGVWVRNHNGIADN------------ATMVRVDVFEKG--DKYYLVPIYSW--QVAKGILPDRA-------VVQGKDEEDWQLIDDSFNFKFSLHPNDLVEVITKKARM---FGYF-ASCH------------RGTGNINIRIHDLDHKIGKNGILEGIGV------------KTALSFQKYQIDELGKEIRPCRLKKRPPVR--
C9X1G5     ---------------MAAFKPNSINYILGLDIGIASVGWAMV--EIDEEENPIRLID-----------LGVRVFERAEVPKTGDSLAMARRLARSVRRLTRRRAHRLLRTRRLLKREGVLQAANF--------------------------DENGLIKSLPNTPWQLR--AAALDRKLTPLEWSAVLLHLIKHRGYLSQRKNEGETADKELGALLKGVAGNAHALQT---------------------------------------------------------GDFRTPAELA--------------------------------------------------------------LNKFEKES-------------------------------GHIRNQRSD-------------------YSHTFSRKDLQA-----------------ELILLFEKQKEFGNPHVSGGLKEGIETLLMTQRPALSG------------------------------------DAVQKMLGHCTFEPAEPKAAKNTYTAERFIWLTKLNNLRILEQGSERP--LTDTERATLMDEPYRKSKLTYAQARKLLGLEDTAFFKGLRYGKDNAEAS---TLMEMKAYHAISRALEKEGLKDKKSPLNLSPELQDEIGTAFSLFKTDEDITGRLKDRIQP--EILEALLKHISFDK---------FVQISLKALRRIVP----------LMEQGKRYDEACAEIYGDHYGKKNTEEKI---------YLPPIPADEIRNPVVLRALSQARKVINGVVRRYG--SPARIHIETAREVGKSFKDRKEIEKRQEE-NRKDREKAAAKFREYFPNFVGEPKS----KDIL--KLRLYEQQHGKCLYSGKEINLGRL-NEKGYVEIDHALPFSRTWDDSFNNKVLVLGSENQNKGNQTPYEYFNGKDNSREWQEF-KARV----ETSRFPRSKKQRILLQKFDEDG-------FKERNLNDTRYVNRFLCQFVADRMRLTGKG------KKRVFASNGQITNLLRGFWGLRKVRAENDRHHALDAVVVACSTVAMQQKITR----FVRYKEMNAFDGKTI----DKETGEVLHQKTHFPQPWEFFAQEVM-----------IRVFGKPDGKPEFEEADTLEKLRTLLAEKL---------------------------SSRPEAVHEYVTPL-----FVSRAPNRKMSGQGHMETVKSAKRL--DEGVSVLRVPLTQLKLKDLEKMVNREREPKLYEA-LKARLEAHKDDP-----AKAFAEPFYKYDKAGNR------------TQQVKAVRVEQVQKTGVWVRNHNGIADN------------ATMVRVDVFEKG--DKYYLVPIYSW--QVAKGILPDRA-------VVQGKDEEDWQLIDDSFNFKFSLHPNDLVEVITKKARM---FGYF-ASCH------------RGTGNINIRIHDLDHKIGKNGILEGIGV------------KTALSFQKYQIDELGKEIRPCRLKKRPPVR--
Q03LF7     ----------------------MSDLVLGLDIGIGSVGVGILN----------KVTGEIIHK-------NSRIFPAAQAENN-----LVRRTNRQGRRLARRKKHRRVRLNRLFEESGLITD---FTKISINL-----------------------------NPYQLR--VKGLTDELSNEELFIALKNMVKHRGISYLDDASDDGNSSV-GDYAQIVKENSKQLET------------------------------------------------------------KTPGQIQ--------------------------------------------------------------LERYQTY--------------------------------GQLRG---------------------DFTVEKDGKKHRLI-------NVFPTSAYRSEALRILQTQQEF-NPQITDEFINRYLEILTGKRKYYHGP---------------GNEKSRTDYGRYRTSGETLDNIFGILIGKCTFYPDEFRAAKASYTAQEFNLLNDLNNLTVPTETKK----LSKEQKNQIIN-YVKNEKAMGPAKLFK-YIAKLLSCDVADIKGYRIDKSGKAEIHTFEAYRKMKTL---ETLDIE----QMDRETLDKLAYVLTLNTEREGIQEALEHEFADGSFSQKQVDELVQFRKANSSIFGKGWHNFSVKLMMELIPELYE------------TSEEQMTIL--TRLGKQKTTSSSNKTKYID----EKLLTEEIYNPVVAKSVRQAIKIVNAAIKEYG--DFDNIVIEMARETN--EDDEKKAIQKIQKANKDEKDAAMLKAANQYNGKAELPHSVFHGHKQLATKIRLWHQQGERCLYTGKTISIHDLINNSNQFEVDHILPLSITFDDSLANKVLVYATANQEKGQRTPYQALDSMDDAWSFREL-KAFV---RESKTLSNKKKEYLLTE--EDISKFDVRKKFIERNLVDTRYASRVVLNALQEHFRAHKID-------TKVSVVRGQFTSQLRRHWGIEKTRDTYH-HHAVDALIIA--ASSQLNLWKKQKNTLVSYSEDQLLDIETG-----ELISDDEYKESVFKAPYQHFVDT-------------LKSKEF-EDSILF--------------------------------------------SYQVDSKFNRKISDATI--YATRQAKVGKDKADETYVLGKIKDIYTQDGYD----AFMKIYKKDKSKFLMYRHDPQTFEKVIEPILENYPNKQINEKGKEVPCNPFLKYKEEHGYIRKY--------SKKGNGPEIKSLKYYDSKLGNHIDITPK--DSNNKVVLQSVSPWRADVYFNKTTGKYEILGLKYADLQFEKGTGTYKISQEKYNDIKKKEGVDSDSEFKFTLY------KNDLLLVKDTETKEQQLFRFLSRTMP------------KQKHYVELKPYDKQKFEGGEALIKVLGNVANSGQCKKGLGKSNISIYKVRTDVLGNQHIIKNEGDKPKLDF-

# Code source: Patrick Kunzmann
# License: BSD 3 clause

import biotite
import biotite.sequence as seq
import biotite.sequence.io.fasta as fasta
import biotite.sequence.graphics as graphics
import biotite.application.muscle as muscle
import biotite.application.blast as blast
import biotite.database.entrez as entrez
import matplotlib.pyplot as plt

# Download sequence of Streptococcus pyogenes Cas9
file_name = entrez.fetch("Q99ZW2", biotite.temp_dir(), "fa", "protein", "fasta")
file = fasta.FastaFile()
file.read(file_name)
ref_seq = fasta.get_sequence(file)
# Find homologous proteins using NCBI Blast
# Search only the UniProt/SwissProt database
blast_app = blast.BlastWebApp("blastp", ref_seq, "swissprot", obey_rules=False)
blast_app.start()
blast_app.join()
alignments = blast_app.get_alignments()
# Get hit IDs for hits with score > 200
hits = []
for ali in alignments:
    if ali.score > 200:
        hits.append(ali.hit_id)
# Get the sequences from hit IDs
hit_seqs = []
for hit in hits:
    file_name = entrez.fetch(hit, biotite.temp_dir(), "fa", "protein", "fasta")
    file = fasta.FastaFile()
    file.read(file_name)
    hit_seqs.append(fasta.get_sequence(file))

# Perform a multiple sequence alignment using MUSCLE
app = muscle.MuscleApp(hit_seqs)
app.start()
app.join()
alignment = app.get_alignment()
# Print the MSA with hit IDs
print("MSA results:")
gapped_seqs = alignment.get_gapped_sequences()
for i in range(len(gapped_seqs)):
    print(hits[i], " "*3, gapped_seqs[i])

# Visualize the first 200 columns of the alignment
# Reorder alignments to reflect sequence distance

fig = plt.figure(figsize=(8.0, 8.0))
ax = fig.add_subplot(111)
order = app.get_alignment_order()
graphics.plot_alignment_type_based(
    ax, alignment[:200, order.tolist()], labels=[hits[i] for i in order],
    show_numbers=True, color_scheme="clustalx"
)
fig.tight_layout()

plt.show()

Gallery generated by Sphinx-Gallery