guido.locus

Module Contents

Classes

Locus

Functions

_prepare_annotation(annotation_file_abspath[, as_df])

Prepare annotation file for use with pyranges.

locus_from_coordinates(genome, chromosome, start, end)

Create a locus from coordinates. Coordinates are 1-based. If annotation

locus_from_sequence(sequence[, sequence_name])

Create a locus from sequence.

locus_from_gene(genome, gene_name)

Create a locus from gene name. If annotation file is provided, it will

class guido.locus.Locus(sequence, name=None, start=1, end=None, genome=None, annotation=None, **kwargs)[source]
property layers[source]

Layers of the locus.

Returns:
Layers

List of layers

__repr__()[source]

Returns a string representation of the locus object.

to_dict()[source]

Converts the locus object to a dictionary.

guide(ix)[source]

Fetch a guide from the locus by its index or name.

Parameters:
ixstr or int

Index of the gRNA.

Returns:
g: Guide

Guide object representing a gRNA

Examples

>>> import guido
>>> seq = "TTATCATCCACTCTGACGGGTGGTATTGCGCAACTCCACGCCATCAAACATGTTCAGATTATGCAATCGTGAGTATTCGTTGACCACCGCTTGACCTGTGT"
>>> loc = guido.Locus(
...     sequence=seq, name="AgamP4_2R", start=48714554, end=48714654
... )
>>> loc.find_guides()
>>> loc.guide("gRNA-1")
gRNA-1(CGCAATACCACCCGTCAGAGTGG|AgamP4_2R:48714561-48714584|-|)
>>> loc.guide(0)
gRNA-1(CGCAATACCACCCGTCAGAGTGG|AgamP4_2R:48714561-48714584|-|)
_flatten_intervals(intervals)[source]

Flattens overlapping intervals into an union.

_find_guides_in_interval(sequence, start, pam)[source]

Finds guides in interval.

find_guides(pam='NGG', min_flanking_length=0, selected_features='all')[source]

Find gRNAs in the locus.

Parameters:
pamstr, optional

gRNA PAM sequence, by default “NGG”

min_flanking_lengthint, optional

Defines flanking region from the locus where gRNAs are ignored. By default 0, however simulate_end_joining() requires flanking region of 75 bp to simulate MMEJ.

selected_featuresstr, optional

Limit gRNA search on only specified genomic features. Features are defined in the provided genome annotation file. By default {“all”}

Returns:
sorted_guideslist

List of gRNAs sorted by their position in the locus.

Examples

>>> import guido
>>> genome = guido.load_genome_from_file(
...     guido_file="/Users/nkranjc/imperial/ref/new/AgamP4.guido"
... )
>>> loc = guido.locus_from_coordinates(genome, "AgamP4_2R", 48714541, 48714666)
>>> loc.find_guides()
>>> loc.guides
[gRNA-1(AAGTTTATCATCCACTCTGACGG|AgamP4_2R:48714550-48714572|+|),
gRNA-2(CGCAATACCACCCGTCAGAGTGG|AgamP4_2R:48714561-48714583|-|),
gRNA-3(AGTTTATCATCCACTCTGACGGG|AgamP4_2R:48714551-48714573|+|),
gRNA-4(TTATCATCCACTCTGACGGGTGG|AgamP4_2R:48714554-48714576|+|),
gRNA-5(TCTGAACATGTTTGATGGCGTGG|AgamP4_2R:48714589-48714611|-|),
gRNA-6(CATAATCTGAACATGTTTGATGG|AgamP4_2R:48714594-48714616|-|),
gRNA-7(GTTTAACACAGGTCAAGCGGTGG|AgamP4_2R:48714637-48714659|-|),
gRNA-8(TATGTTTAACACAGGTCAAGCGG|AgamP4_2R:48714640-48714662|-|)]

Searching for gRNAs in a specific genomic feature:

>>> loc.find_guides(selected_features="exon")
>>> loc.guides
[gRNA-1(AAGTTTATCATCCACTCTGACGG|AgamP4_2R:48714550-48714572|+|),
gRNA-2(CGCAATACCACCCGTCAGAGTGG|AgamP4_2R:48714561-48714583|-|),
gRNA-3(AGTTTATCATCCACTCTGACGGG|AgamP4_2R:48714551-48714573|+|),
gRNA-4(TTATCATCCACTCTGACGGGTGG|AgamP4_2R:48714554-48714576|+|),
gRNA-5(TCTGAACATGTTTGATGGCGTGG|AgamP4_2R:48714589-48714611|-|),
gRNA-6(CATAATCTGAACATGTTTGATGG|AgamP4_2R:48714594-48714616|-|)]
simulate_end_joining(n_patterns=5, length_weight=20)[source]

Simulate end-joining and find MMEJ deletion patterns for each gRNA.

Microhomology scores are calculated based on proposed scoring model described by Bae et al. 2014.

Parameters:
n_patternsint, optional

Number of top scored MMEJ deletion patterns reported. By default 5.

length_weightint, optional

Length weight parameter used in MMEJ scoring as defined by Bae et al. 2015. By default, 20.

find_off_targets(external_genome=None, **kwargs)[source]

Find off-targets in the genome for each gRNA.

Parameters:
external_genomeGenome, optional

If provided, off-target search is performed in the external genome rather than in the genome which Locus is a part of. By default None.

_apply_clipped_layer_data(guides, layer_name, layer_data)[source]

Apply layer data to guides.

_get_guide_regions(guide)[source]
_guide_sequence_diversity(guide, g, pos)[source]

Calculate sequence diversity for each region of the guide.

_guide_alt_ac(guide, g, pos)[source]

Calculate alternative allele count for each region of the guide.

_guide_n_variants(guide, g, pos)[source]

Calculate number of variants for each region of the guide.

_apply_variation_layer_data(guides, layer_name, layer_genotype_data, layer_pos)[source]

Apply sequence diversity, alternative allele count and number of variants as layers.

add_layer(name, layer_data, layer_pos=None, apply_to_guides=True, is_variation=False)[source]

Adds a layer with the data to the locus.

Parameters:
namestr

Name of the layer

layer_datanp.ndarray

Layer data. Needs to be the same shape as the locus.

apply_to_guidesbool, optional

Apply layer data to gRNAs when adding it to the locus. By default True.

Examples

>>> locus = Locus("chr1", 100, 200)
>>> layer_data = np.random.rand(100)
>>> locus.add_layer("random", layer_data)
_guide_layers()[source]

Returns a list of all the layers that are present in the gRNAs.

_prepare_alt_matrix(rank_layer_names, method=np.mean)[source]

Prepares numerical matrix with the gRNA layer data to be used later in the ranking.

Parameters:
rank_layer_nameslist

List of layer names to be used in the ranking.

method[type], optional

Method to use to combine the layer data, by default np.mean

Returns:
np.ndarray

Matrix with the layer data for each gRNA.

rank_guides(layer_names=None, layer_is_benefit=None, weight_vector=None, ranking_method='TOPSIS', norm_method='Vector')[source]

Ranks guides based on the layer data.

Returns:
list

List of ranked guides.

guides_to_dataframe()[source]

Returns gRNAs in Pandas dataframe.

guides_to_csv(filename)[source]

Save gRNAs in CSV file.

guides_to_bed(filename)[source]

Save gRNAs in BED file.

guides_detailed_table(filename)[source]

Save gRNAs in a detailed text file.

add_azimuth_score()[source]

Apply Azimuth score to a list of guides.

Azimuth is a machine learning-based predictive modelling of CRISPR/Cas9 guide efficiency. Sometimes its reffered to as Doench 2016 score.

Described in https://doi.org/10.1038/nbt.3437 (Doench et al., 2016)

guido.locus._prepare_annotation(annotation_file_abspath, as_df=True)[source]

Prepare annotation file for use with pyranges.

guido.locus.locus_from_coordinates(genome, chromosome, start, end)[source]

Create a locus from coordinates. Coordinates are 1-based. If annotation file is provided, it will be used to annotate the locus.

Parameters:
genomeGenome

Genome object. Can be created using Genome class.

chromosomestr

Chromosome name.

startint

Start position.

endint

End position.

Returns:
Locus

Locus object.

guido.locus.locus_from_sequence(sequence, sequence_name=None)[source]

Create a locus from sequence.

Parameters:
sequencestr

DNA sequence

sequence_namestr, optional

Sequence name, by default None

Returns:
Locus

Object representing a locus from given sequence.

guido.locus.locus_from_gene(genome, gene_name)[source]

Create a locus from gene name. If annotation file is provided, it will be used to annotate the locus.

Parameters:
genomeGenome

Genome object. Can be created using Genome class.

gene_namestr

Gene name. Needs to be present in the annotation file.

Returns:
Locus

Locus object.