Getting Started
You can use guido to search for gRNAs in a reference genome or DNA sequence. The following example shows how to search for gRNAs in the malaria mosquito (Anopheles gambiae) reference genome (AgamP4):
Genome instance
First we need to create a Genome instance. This instance will be used to search for gRNAs in the genome. The Genome class will link the genome sequence to the annotation file and will create a bowtie index for the genome. The Genome class takes the following arguments: genome_name (name of the genome), genome_file_abspath (FASTA sequence), and annotation_file_abspath (GTF annotation file).
import guido
genome = guido.Genome(genome_name='AgamP4',
genome_file_abspath='data/AgamP4.fa',
annotation_file_abspath='data/AgamP4.12.gtf')
Build the FASTA index and bowtie index files and create a AgamP4.guido file in genome_file_abspath that contains the genome information and can be used to create a Genome instance next time without needing to build the genome indices again.
genome.build(n_threads=2)
The Genome class also has a bowtie_index_abspath argument that can be used to specify the directory and bowtie index name. If this argument is not specified, the bowtie index will be created when running genome.build().
To load a Genome instance from a AgamP4.guido file that was created using genome.build(), use the Genome.load() method:
genome = guido.load_genome_from_file(guido_file='data/AgamP4.guido')
Locus instance
Locus instances are used to search for gRNAs in a specific genomic region. The Locus class takes the following arguments: genome (a Genome instance), chromosome (chromosome name), start (start position), and end (end position). The start and end positions are 1-based and inclusive.
genome = guido.load_genome_from_file(guido_file='data/AgamP4.guido')
loc = guido.locus_from_coordinates(genome, 'AgamP4_2R', 48714541, 48714666)
Alternatively, you can use the locus_from_gene() function to create a Locus instance and limit the search for gRNAs to a specific feature that is defined in the genome annotation file:
Find gRNAs
import guido
loc = guido.locus_from_gene(genome, 'AGAP005958')
loc.find_guides(selected_features='exon')
['gRNA-1(AAGTTTATCATCCACTCTGACGG|AgamP4_2R:48714550-48714572|+|)',
'gRNA-2(CGCAATACCACCCGTCAGAGTGG|AgamP4_2R:48714561-48714583|-|)',
...
'gRNA-7(GTTTAACACAGGTCAAGCGGTGG|AgamP4_2R:48714637-48714659|-|)',
'gRNA-8(TATGTTTAACACAGGTCAAGCGG|AgamP4_2R:48714640-48714662|-|)']
After running loc.find_guides(), the Locus instance will contain a list of Guide instances that can be accessed using the loc.guides attribute. Each gRNA is a Guide instance that contains different attributes.
loc.guides
['gRNA-1(AAGTTTATCATCCACTCTGACGG|AgamP4_2R:48714550-48714572|+|)',
'gRNA-2(CGCAATACCACCCGTCAGAGTGG|AgamP4_2R:48714561-48714583|-|)',
...
'gRNA-7(GTTTAACACAGGTCAAGCGGTGG|AgamP4_2R:48714637-48714659|-|'),
'gRNA-8(TATGTTTAACACAGGTCAAGCGG|AgamP4_2R:48714640-48714662|-|)']
You can access a gRNA by its index or a name:
loc.guides[0]
'gRNA-1(AAGTTTATCATCCACTCTGACGG|AgamP4_2R:48714550-48714572|+|)'
loc.guides['gRNA-1']
'gRNA-1(AAGTTTATCATCCACTCTGACGG|AgamP4_2R:48714550-48714572|+|)'
Analysis
You can analyze the gRNAs using the different methods. For example loc.simulate_end_joining() will simulate MMEJ events for each gRNA in the Locus instance and will attach the results to the Guide instances. Other useful methods include: loc.find_off_targets() and loc.add_azimuth_score(), which will search for off-target sites and add the Azimuth score (on-target sgRNA activity score) to the Guide instances, respectively.
import guido
genome = guido.load_genome_from_file(guido_file='data/AgamP4.guido')
loc = guido.locus_from_coordinates(genome, 'AgamP4_2R', 48714541, 48714666)
loc.find_guides(selected_features='exon')
loc.simulate_end_joining()
loc.find_off_targets()
loc.add_azimuth_score()
To access the results, use the Guide instance attributes. For example, to access the MMEJ results for the first gRNA in the Locus instance:
loc.find_off_targets()
loc.guide(0).off_targets
[{'ix': 0,
'mismatches': {21: 'A', 20: 'G', 18: 'G', 10: 'C', 6: 'C'},
'mismatches_string': '.....C...C.......G.GA..',
'chromosome': 'AgamP4_2L',
'start': 8079046,
'strand': '+',
'seq': 'AAGTTCATCCTCCACTCGGGAGG',
'cfd_score': 0.07724301822162806},
...]
Certain methods will add a layer with the results of the analysis to each gRNA. Layers are a handy way to attach different data, values and/or scores to each gRNA. For example, the loc.find_off_targets() method will add a few layers with different scores pertaining the off-target analysis. To access the layers, use the Guide instance layers attribute:
loc.guide(0).layers
{'ot_sum_score': 8
'ot_cfd_score_mean': 0.07819736379860964
'ot_cfd_score_max': 0.1777777776973545
'ot_cfd_score_sum': 0.46918418279165786
'azimuth_score': 0.5720475679414121
'mmej_sum_score': 946.8
'mmej_oof_score': 40.95901985635826}
The description of the different layers can be found in the Guide class documentation. You can also add your own layers to the Guide instances by using the loc.add_layer() method. For example:
layer_data = np.random.rand(len(loc.sequence))
loc.add_layer("random", layer_data)
loc.guide(0).layers
{'ot_sum_score': 8
'ot_cfd_score_mean': 0.07819736379860964
'ot_cfd_score_max': 0.1777777776973545
'ot_cfd_score_sum': 0.46918418279165786
'azimuth_score': 0.5720475679414121
'mmej_sum_score': 946.8
'mmej_oof_score': 40.95901985635826
'random': 0.5}
Layers can be added either to all gRNAs in the Locus instance or to a specific gRNA. To add a layer to a specific gRNA, use the guide.add_layer() method.
Note
When adding a layer to the locus, you need to ensure the length of the layer data is equal to the length of the locus sequence. Each gRNA will then have this layer applied with the values corresponding to the gRNA sequence.
Example: Adding Conservation score (Cs) to a locus You can download the Cs for Anopheles gambiae from github repo https://github.com/nkran/AgamP4_conservation_score. Follow the instructions in the README file to download the data and create the AgamP4_conservation.h5 file. Then, you can add the Cs layer to the Locus instance:
import h5py
import numpy as np
with h5py.File('path/to/AgamP4_conservation.h5', mode='r+') as data_h5:
snp_density = data_h5[l.chromosome.split('_')[1]]['snp_density'][0,l.start-1:l.end]
phylop = data_h5[l.chromosome.split('_')[1]]['phyloP'][0,l.start-1:l.end]
cs = data_h5[l.chromosome.split('_')[1]]['Cs'][0,l.start-1:l.end]
l.add_layer('cs', layer_data=np.array(cs))
rank = l.rank_guides(layer_names=['mmej_sum_score', 'mmej_oof_score', 'azimuth_score', 'ot_sum_score', 'ot_cfd_score_mean', 'ot_cfd_score_max', 'ot_cfd_score_sum', 'cs'],\
layer_is_benefit=[True, True, True, False, True, True, True, True])
Ranking
You can rank the gRNAs in a Locus instance using the loc.rank_guides() method. The ranking is based on the different layers that are attached to each gRNA. You can specify which layers to use for the ranking and the weights of each layer. The default ranking will take into account all layers and will assign equal weights to each layer.
loc.rank_guides()
loc.guides(0).rank
3
loc.guides(0).rank_score
0.4596
rank informs the rank of the gRNA in the locus, where 1 is the highest ranking gRNA, while rank_score is the final ranking score of the gRNA based on the comparison of the gRNA to all other gRNAs in the locus. It can be used to compare its value relative to other gRNAs in the locus, rather than just to order in which gRNAs are ranked.
Export
gRNAs can be exported to a CSV file using methods loc.guides_to_bed(), loc.guides_to_csv(), loc.guides_detailed_table() and loc.guides_to_dataframe(). The first two methods will export the gRNAs to a BED file, to a CSV file, and to a TXT file with detailed information about each gRNA, respectively. loc.guides_to_dataframe() method will export the gRNAs to a pandas DataFrame. Additionaly, to transform the gRNAs from Guide instances to a dictionary, use the loc.guides_to_dict() method.