guido
Submodules
Package Contents
Classes
Functions
|
Load a genome from a .guido file. This file is created when a genome is |
|
Check whether name is on PATH and marked as executable. |
|
|
|
Create a locus from coordinates. Coordinates are 1-based. If annotation |
|
Create a locus from gene name. If annotation file is provided, it will |
|
Create a locus from sequence. |
- class guido.Genome(genome_name, genome_file_abspath=None, annotation_file_abspath=None, bowtie_index_abspath=None)[source]
- property is_built
Check if genome index is built.
- Returns:
- Bool
True if genome index is built, False otherwise.
- property sequence
- build(n_threads=1, save_pickle=True, bowtie_ignore=False, bowtie_path='')[source]
Build genome index.
This method creates index files for the genome and annotation files and saves them in a .guido file in the same directory as the genome Fasta file. This file can be later loaded using
guido.genome.load_genome_from_file()without having to re-build the index.- Parameters:
- n_threadsint, optional
Number of threads, by default 1
- save_picklebool, optional
Pickle the dictionary into .guido file, by default True
- bowtie_ignorebool, optional
Ignore building bowtie index. Use if you already have bowtie index built, by default False
- bowtie_pathstr, optional
Path to bowtie binary if it’s not in the path, by default “”
- guido.load_genome_from_file(guido_file)[source]
Load a genome from a .guido file. This file is created when a genome is built. It contains all the information needed to use the genome. Guido files are saved in the same directory as the genome FASTA file. They are named after the genome name and have the .guido extension. Genome object can be created by using the :method:`build` method.
- Parameters:
- guido_filestr
Path to the .guido file.
- Returns:
- Genome
Genomeobject.
- class guido.Guide(sequence, pam_position, pam_len, strand='+', max_flanking_length=75, cut_offset=3, chromosome='seq', start=0)[source]
- property location
Returns the location of the guide in the format: chr:start-end.
- Returns:
- str
String representation of gRNA location
- property off_targets_string
Returns a string representation of the off-targets.
The string representatio captures the number of off-targets with certain number of mismatches: n0|n1|n2|n3|n4|n5 (total), where n0 is the number of off-targets with 0 mismatches, n1 is the number of off-targets with 1 mismatch, etc.
For example, if there are 3 off-targets with 0 mismatches, 2 with 1 mismatch, 1 with 2 mismatches, 0 with 3 mismatches, 5 with 4 mismatches and 1 with 5 mismatches the string representation will be “3|2|1|0|5|1 (13)”. In the parenthesis, the total number of off-targets is given.
- Returns:
- str
String representation of off-targets.
- property layers
- simulate_end_joining(n_patterns=5, length_weight=20)[source]
Simulate Microhomology-Mediated End Joining (MMEJ) events for the gRNA.
MMEJ scoring is based on the Bae et al. 2014 paper (https://doi.org/10.1038/nmeth.3015)
- Parameters:
- n_patternsint, optional
Number of top-scoring MMEJ patterns to keep, by default 5
- length_weightint, optional
Lengeth weight, by default 20
- find_off_targets(genome, **kwargs)[source]
Finds off-targets for the guide. The off-targets are found using Bowtie. Bowtie index for the genome must be built before running this function.
- Parameters:
- genomeGenome
Genome object with the Bowtie index built
Notes
The off-targets are stored in the off_targets attribute. Based on the off-targets, the following layers are added to the guide:
ot_sum_score: sum of the off-target scores - the lower the better
ot_cfd_score_mean: mean of the CFD scores of the off-targets
ot_cfd_score_max: max CFD scores of the off-targets
ot_cfd_score_sum: sum CFD scores of the off-targets
- add_layer(name, layer_data)[source]
_summary_
- Parameters:
- namestr
_description_
- layer_datafloat
_description_
- layer(key)[source]
_summary_
- Parameters:
- key_type_
_description_
- Returns:
- _type_
_description_
- Raises:
- ValueError
_description_
- add_azimuth_score(model_file='V3_model_nopos.pickle')[source]
Apply Azimuth score to a list of guides.
Azimuth is a machine learning-based predictive modelling of CRISPR/Cas9 guide efficiency. Sometimes its reffered to as Doench 2016 score.
Described in https://doi.org/10.1038/nbt.3437 (Doench et al., 2016)
- Returns:
- float
Azimuth score.
- class guido.Locus(sequence, name=None, start=1, end=None, genome=None, annotation=None, **kwargs)[source]
- property layers
Layers of the locus.
- Returns:
- Layers
List of layers
- guide(ix)[source]
Fetch a guide from the locus by its index or name.
- Parameters:
- ixstr or int
Index of the gRNA.
- Returns:
- g: Guide
Guide object representing a gRNA
Examples
>>> import guido >>> seq = "TTATCATCCACTCTGACGGGTGGTATTGCGCAACTCCACGCCATCAAACATGTTCAGATTATGCAATCGTGAGTATTCGTTGACCACCGCTTGACCTGTGT" >>> loc = guido.Locus( ... sequence=seq, name="AgamP4_2R", start=48714554, end=48714654 ... ) >>> loc.find_guides() >>> loc.guide("gRNA-1") gRNA-1(CGCAATACCACCCGTCAGAGTGG|AgamP4_2R:48714561-48714584|-|) >>> loc.guide(0) gRNA-1(CGCAATACCACCCGTCAGAGTGG|AgamP4_2R:48714561-48714584|-|)
- find_guides(pam='NGG', min_flanking_length=0, selected_features='all')[source]
Find gRNAs in the locus.
- Parameters:
- pamstr, optional
gRNA PAM sequence, by default “NGG”
- min_flanking_lengthint, optional
Defines flanking region from the locus where gRNAs are ignored. By default 0, however simulate_end_joining() requires flanking region of 75 bp to simulate MMEJ.
- selected_featuresstr, optional
Limit gRNA search on only specified genomic features. Features are defined in the provided genome annotation file. By default {“all”}
- Returns:
- sorted_guideslist
List of gRNAs sorted by their position in the locus.
Examples
>>> import guido >>> genome = guido.load_genome_from_file( ... guido_file="/Users/nkranjc/imperial/ref/new/AgamP4.guido" ... ) >>> loc = guido.locus_from_coordinates(genome, "AgamP4_2R", 48714541, 48714666) >>> loc.find_guides() >>> loc.guides [gRNA-1(AAGTTTATCATCCACTCTGACGG|AgamP4_2R:48714550-48714572|+|), gRNA-2(CGCAATACCACCCGTCAGAGTGG|AgamP4_2R:48714561-48714583|-|), gRNA-3(AGTTTATCATCCACTCTGACGGG|AgamP4_2R:48714551-48714573|+|), gRNA-4(TTATCATCCACTCTGACGGGTGG|AgamP4_2R:48714554-48714576|+|), gRNA-5(TCTGAACATGTTTGATGGCGTGG|AgamP4_2R:48714589-48714611|-|), gRNA-6(CATAATCTGAACATGTTTGATGG|AgamP4_2R:48714594-48714616|-|), gRNA-7(GTTTAACACAGGTCAAGCGGTGG|AgamP4_2R:48714637-48714659|-|), gRNA-8(TATGTTTAACACAGGTCAAGCGG|AgamP4_2R:48714640-48714662|-|)]
Searching for gRNAs in a specific genomic feature:
>>> loc.find_guides(selected_features="exon") >>> loc.guides [gRNA-1(AAGTTTATCATCCACTCTGACGG|AgamP4_2R:48714550-48714572|+|), gRNA-2(CGCAATACCACCCGTCAGAGTGG|AgamP4_2R:48714561-48714583|-|), gRNA-3(AGTTTATCATCCACTCTGACGGG|AgamP4_2R:48714551-48714573|+|), gRNA-4(TTATCATCCACTCTGACGGGTGG|AgamP4_2R:48714554-48714576|+|), gRNA-5(TCTGAACATGTTTGATGGCGTGG|AgamP4_2R:48714589-48714611|-|), gRNA-6(CATAATCTGAACATGTTTGATGG|AgamP4_2R:48714594-48714616|-|)]
- simulate_end_joining(n_patterns=5, length_weight=20)[source]
Simulate end-joining and find MMEJ deletion patterns for each gRNA.
Microhomology scores are calculated based on proposed scoring model described by Bae et al. 2014.
- Parameters:
- n_patternsint, optional
Number of top scored MMEJ deletion patterns reported. By default 5.
- length_weightint, optional
Length weight parameter used in MMEJ scoring as defined by Bae et al. 2015. By default, 20.
- find_off_targets(external_genome=None, **kwargs)[source]
Find off-targets in the genome for each gRNA.
- Parameters:
- external_genomeGenome, optional
If provided, off-target search is performed in the external genome rather than in the genome which Locus is a part of. By default None.
- _guide_sequence_diversity(guide, g, pos)[source]
Calculate sequence diversity for each region of the guide.
- _guide_alt_ac(guide, g, pos)[source]
Calculate alternative allele count for each region of the guide.
- _guide_n_variants(guide, g, pos)[source]
Calculate number of variants for each region of the guide.
- _apply_variation_layer_data(guides, layer_name, layer_genotype_data, layer_pos)[source]
Apply sequence diversity, alternative allele count and number of variants as layers.
- add_layer(name, layer_data, layer_pos=None, apply_to_guides=True, is_variation=False)[source]
Adds a layer with the data to the locus.
- Parameters:
- namestr
Name of the layer
- layer_datanp.ndarray
Layer data. Needs to be the same shape as the locus.
- apply_to_guidesbool, optional
Apply layer data to gRNAs when adding it to the locus. By default True.
Examples
>>> locus = Locus("chr1", 100, 200) >>> layer_data = np.random.rand(100) >>> locus.add_layer("random", layer_data)
- _prepare_alt_matrix(rank_layer_names, method=np.mean)[source]
Prepares numerical matrix with the gRNA layer data to be used later in the ranking.
- Parameters:
- rank_layer_nameslist
List of layer names to be used in the ranking.
- method[type], optional
Method to use to combine the layer data, by default np.mean
- Returns:
- np.ndarray
Matrix with the layer data for each gRNA.
- rank_guides(layer_names=None, layer_is_benefit=None, weight_vector=None, ranking_method='TOPSIS', norm_method='Vector')[source]
Ranks guides based on the layer data.
- Returns:
- list
List of ranked guides.
- add_azimuth_score()[source]
Apply Azimuth score to a list of guides.
Azimuth is a machine learning-based predictive modelling of CRISPR/Cas9 guide efficiency. Sometimes its reffered to as Doench 2016 score.
Described in https://doi.org/10.1038/nbt.3437 (Doench et al., 2016)
- guido.locus_from_coordinates(genome, chromosome, start, end)[source]
Create a locus from coordinates. Coordinates are 1-based. If annotation file is provided, it will be used to annotate the locus.
- Parameters:
- genomeGenome
Genome object. Can be created using Genome class.
- chromosomestr
Chromosome name.
- startint
Start position.
- endint
End position.
- Returns:
- Locus
Locus object.
- guido.locus_from_gene(genome, gene_name)[source]
Create a locus from gene name. If annotation file is provided, it will be used to annotate the locus.
- Parameters:
- genomeGenome
Genome object. Can be created using Genome class.
- gene_namestr
Gene name. Needs to be present in the annotation file.
- Returns:
- Locus
Locus object.