guido

Submodules

Package Contents

Classes

Genome

Guide

Locus

Functions

load_genome_from_file(guido_file)

Load a genome from a .guido file. This file is created when a genome is

is_tool(name)

Check whether name is on PATH and marked as executable.

rev_comp(seq[, rna])

locus_from_coordinates(genome, chromosome, start, end)

Create a locus from coordinates. Coordinates are 1-based. If annotation

locus_from_gene(genome, gene_name)

Create a locus from gene name. If annotation file is provided, it will

locus_from_sequence(sequence[, sequence_name])

Create a locus from sequence.

class guido.Genome(genome_name, genome_file_abspath=None, annotation_file_abspath=None, bowtie_index_abspath=None)[source]
property is_built

Check if genome index is built.

Returns:
Bool

True if genome index is built, False otherwise.

property sequence
build(n_threads=1, save_pickle=True, bowtie_ignore=False, bowtie_path='')[source]

Build genome index.

This method creates index files for the genome and annotation files and saves them in a .guido file in the same directory as the genome Fasta file. This file can be later loaded using guido.genome.load_genome_from_file() without having to re-build the index.

Parameters:
n_threadsint, optional

Number of threads, by default 1

save_picklebool, optional

Pickle the dictionary into .guido file, by default True

bowtie_ignorebool, optional

Ignore building bowtie index. Use if you already have bowtie index built, by default False

bowtie_pathstr, optional

Path to bowtie binary if it’s not in the path, by default “”

guido.load_genome_from_file(guido_file)[source]

Load a genome from a .guido file. This file is created when a genome is built. It contains all the information needed to use the genome. Guido files are saved in the same directory as the genome FASTA file. They are named after the genome name and have the .guido extension. Genome object can be created by using the :method:`build` method.

Parameters:
guido_filestr

Path to the .guido file.

Returns:
Genome

Genome object.

class guido.Guide(sequence, pam_position, pam_len, strand='+', max_flanking_length=75, cut_offset=3, chromosome='seq', start=0)[source]
property location

Returns the location of the guide in the format: chr:start-end.

Returns:
str

String representation of gRNA location

property off_targets_string

Returns a string representation of the off-targets.

The string representatio captures the number of off-targets with certain number of mismatches: n0|n1|n2|n3|n4|n5 (total), where n0 is the number of off-targets with 0 mismatches, n1 is the number of off-targets with 1 mismatch, etc.

For example, if there are 3 off-targets with 0 mismatches, 2 with 1 mismatch, 1 with 2 mismatches, 0 with 3 mismatches, 5 with 4 mismatches and 1 with 5 mismatches the string representation will be “3|2|1|0|5|1 (13)”. In the parenthesis, the total number of off-targets is given.

Returns:
str

String representation of off-targets.

property layers
__repr__()[source]

Return repr(self).

__getattr__(attr)[source]
_create_mmej_oof_string(mmej_patterns)[source]
simulate_end_joining(n_patterns=5, length_weight=20)[source]

Simulate Microhomology-Mediated End Joining (MMEJ) events for the gRNA.

MMEJ scoring is based on the Bae et al. 2014 paper (https://doi.org/10.1038/nmeth.3015)

Parameters:
n_patternsint, optional

Number of top-scoring MMEJ patterns to keep, by default 5

length_weightint, optional

Lengeth weight, by default 20

find_off_targets(genome, **kwargs)[source]

Finds off-targets for the guide. The off-targets are found using Bowtie. Bowtie index for the genome must be built before running this function.

Parameters:
genomeGenome

Genome object with the Bowtie index built

Notes

The off-targets are stored in the off_targets attribute. Based on the off-targets, the following layers are added to the guide:

  • ot_sum_score: sum of the off-target scores - the lower the better

  • ot_cfd_score_mean: mean of the CFD scores of the off-targets

  • ot_cfd_score_max: max CFD scores of the off-targets

  • ot_cfd_score_sum: sum CFD scores of the off-targets

add_layer(name, layer_data)[source]

_summary_

Parameters:
namestr

_description_

layer_datafloat

_description_

layer(key)[source]

_summary_

Parameters:
key_type_

_description_

Returns:
_type_

_description_

Raises:
ValueError

_description_

add_azimuth_score(model_file='V3_model_nopos.pickle')[source]

Apply Azimuth score to a list of guides.

Azimuth is a machine learning-based predictive modelling of CRISPR/Cas9 guide efficiency. Sometimes its reffered to as Doench 2016 score.

Described in https://doi.org/10.1038/nbt.3437 (Doench et al., 2016)

Returns:
float

Azimuth score.

guido.is_tool(name)[source]

Check whether name is on PATH and marked as executable.

guido.rev_comp(seq, rna=False)[source]
class guido.Locus(sequence, name=None, start=1, end=None, genome=None, annotation=None, **kwargs)[source]
property layers

Layers of the locus.

Returns:
Layers

List of layers

__repr__()[source]

Returns a string representation of the locus object.

to_dict()[source]

Converts the locus object to a dictionary.

guide(ix)[source]

Fetch a guide from the locus by its index or name.

Parameters:
ixstr or int

Index of the gRNA.

Returns:
g: Guide

Guide object representing a gRNA

Examples

>>> import guido
>>> seq = "TTATCATCCACTCTGACGGGTGGTATTGCGCAACTCCACGCCATCAAACATGTTCAGATTATGCAATCGTGAGTATTCGTTGACCACCGCTTGACCTGTGT"
>>> loc = guido.Locus(
...     sequence=seq, name="AgamP4_2R", start=48714554, end=48714654
... )
>>> loc.find_guides()
>>> loc.guide("gRNA-1")
gRNA-1(CGCAATACCACCCGTCAGAGTGG|AgamP4_2R:48714561-48714584|-|)
>>> loc.guide(0)
gRNA-1(CGCAATACCACCCGTCAGAGTGG|AgamP4_2R:48714561-48714584|-|)
_flatten_intervals(intervals)[source]

Flattens overlapping intervals into an union.

_find_guides_in_interval(sequence, start, pam)[source]

Finds guides in interval.

find_guides(pam='NGG', min_flanking_length=0, selected_features='all')[source]

Find gRNAs in the locus.

Parameters:
pamstr, optional

gRNA PAM sequence, by default “NGG”

min_flanking_lengthint, optional

Defines flanking region from the locus where gRNAs are ignored. By default 0, however simulate_end_joining() requires flanking region of 75 bp to simulate MMEJ.

selected_featuresstr, optional

Limit gRNA search on only specified genomic features. Features are defined in the provided genome annotation file. By default {“all”}

Returns:
sorted_guideslist

List of gRNAs sorted by their position in the locus.

Examples

>>> import guido
>>> genome = guido.load_genome_from_file(
...     guido_file="/Users/nkranjc/imperial/ref/new/AgamP4.guido"
... )
>>> loc = guido.locus_from_coordinates(genome, "AgamP4_2R", 48714541, 48714666)
>>> loc.find_guides()
>>> loc.guides
[gRNA-1(AAGTTTATCATCCACTCTGACGG|AgamP4_2R:48714550-48714572|+|),
gRNA-2(CGCAATACCACCCGTCAGAGTGG|AgamP4_2R:48714561-48714583|-|),
gRNA-3(AGTTTATCATCCACTCTGACGGG|AgamP4_2R:48714551-48714573|+|),
gRNA-4(TTATCATCCACTCTGACGGGTGG|AgamP4_2R:48714554-48714576|+|),
gRNA-5(TCTGAACATGTTTGATGGCGTGG|AgamP4_2R:48714589-48714611|-|),
gRNA-6(CATAATCTGAACATGTTTGATGG|AgamP4_2R:48714594-48714616|-|),
gRNA-7(GTTTAACACAGGTCAAGCGGTGG|AgamP4_2R:48714637-48714659|-|),
gRNA-8(TATGTTTAACACAGGTCAAGCGG|AgamP4_2R:48714640-48714662|-|)]

Searching for gRNAs in a specific genomic feature:

>>> loc.find_guides(selected_features="exon")
>>> loc.guides
[gRNA-1(AAGTTTATCATCCACTCTGACGG|AgamP4_2R:48714550-48714572|+|),
gRNA-2(CGCAATACCACCCGTCAGAGTGG|AgamP4_2R:48714561-48714583|-|),
gRNA-3(AGTTTATCATCCACTCTGACGGG|AgamP4_2R:48714551-48714573|+|),
gRNA-4(TTATCATCCACTCTGACGGGTGG|AgamP4_2R:48714554-48714576|+|),
gRNA-5(TCTGAACATGTTTGATGGCGTGG|AgamP4_2R:48714589-48714611|-|),
gRNA-6(CATAATCTGAACATGTTTGATGG|AgamP4_2R:48714594-48714616|-|)]
simulate_end_joining(n_patterns=5, length_weight=20)[source]

Simulate end-joining and find MMEJ deletion patterns for each gRNA.

Microhomology scores are calculated based on proposed scoring model described by Bae et al. 2014.

Parameters:
n_patternsint, optional

Number of top scored MMEJ deletion patterns reported. By default 5.

length_weightint, optional

Length weight parameter used in MMEJ scoring as defined by Bae et al. 2015. By default, 20.

find_off_targets(external_genome=None, **kwargs)[source]

Find off-targets in the genome for each gRNA.

Parameters:
external_genomeGenome, optional

If provided, off-target search is performed in the external genome rather than in the genome which Locus is a part of. By default None.

_apply_clipped_layer_data(guides, layer_name, layer_data)[source]

Apply layer data to guides.

_get_guide_regions(guide)[source]
_guide_sequence_diversity(guide, g, pos)[source]

Calculate sequence diversity for each region of the guide.

_guide_alt_ac(guide, g, pos)[source]

Calculate alternative allele count for each region of the guide.

_guide_n_variants(guide, g, pos)[source]

Calculate number of variants for each region of the guide.

_apply_variation_layer_data(guides, layer_name, layer_genotype_data, layer_pos)[source]

Apply sequence diversity, alternative allele count and number of variants as layers.

add_layer(name, layer_data, layer_pos=None, apply_to_guides=True, is_variation=False)[source]

Adds a layer with the data to the locus.

Parameters:
namestr

Name of the layer

layer_datanp.ndarray

Layer data. Needs to be the same shape as the locus.

apply_to_guidesbool, optional

Apply layer data to gRNAs when adding it to the locus. By default True.

Examples

>>> locus = Locus("chr1", 100, 200)
>>> layer_data = np.random.rand(100)
>>> locus.add_layer("random", layer_data)
_guide_layers()[source]

Returns a list of all the layers that are present in the gRNAs.

_prepare_alt_matrix(rank_layer_names, method=np.mean)[source]

Prepares numerical matrix with the gRNA layer data to be used later in the ranking.

Parameters:
rank_layer_nameslist

List of layer names to be used in the ranking.

method[type], optional

Method to use to combine the layer data, by default np.mean

Returns:
np.ndarray

Matrix with the layer data for each gRNA.

rank_guides(layer_names=None, layer_is_benefit=None, weight_vector=None, ranking_method='TOPSIS', norm_method='Vector')[source]

Ranks guides based on the layer data.

Returns:
list

List of ranked guides.

guides_to_dataframe()[source]

Returns gRNAs in Pandas dataframe.

guides_to_csv(filename)[source]

Save gRNAs in CSV file.

guides_to_bed(filename)[source]

Save gRNAs in BED file.

guides_detailed_table(filename)[source]

Save gRNAs in a detailed text file.

add_azimuth_score()[source]

Apply Azimuth score to a list of guides.

Azimuth is a machine learning-based predictive modelling of CRISPR/Cas9 guide efficiency. Sometimes its reffered to as Doench 2016 score.

Described in https://doi.org/10.1038/nbt.3437 (Doench et al., 2016)

guido.locus_from_coordinates(genome, chromosome, start, end)[source]

Create a locus from coordinates. Coordinates are 1-based. If annotation file is provided, it will be used to annotate the locus.

Parameters:
genomeGenome

Genome object. Can be created using Genome class.

chromosomestr

Chromosome name.

startint

Start position.

endint

End position.

Returns:
Locus

Locus object.

guido.locus_from_gene(genome, gene_name)[source]

Create a locus from gene name. If annotation file is provided, it will be used to annotate the locus.

Parameters:
genomeGenome

Genome object. Can be created using Genome class.

gene_namestr

Gene name. Needs to be present in the annotation file.

Returns:
Locus

Locus object.

guido.locus_from_sequence(sequence, sequence_name=None)[source]

Create a locus from sequence.

Parameters:
sequencestr

DNA sequence

sequence_namestr, optional

Sequence name, by default None

Returns:
Locus

Object representing a locus from given sequence.