epivizfileserver.parser package¶
Submodules¶
epivizfileserver.parser.BamFile module¶
-
class
epivizfileserver.parser.BamFile.
BamFile
(file, columns=None)[source]¶ Bases:
epivizfileserver.parser.SamFile.SamFile
Bam File Class to parse bam files
Parameters: -
file
¶ a pysam file object
-
fileSrc
¶ location of the file
-
cacheData
¶ cache of accessed data in memory
-
columns
¶ column names to use
-
getRange
(chr, start, end, bins=2000, zoomlvl=-1, metric='AVG', respType='DataFrame')[source]¶ Get data for a given genomic location
Parameters: Returns: - result
a DataFrame with matched regions from the input genomic location if respType is DataFrame else result is an array
- error
if there was any error during the process
-
epivizfileserver.parser.BaseFile module¶
Genomics file classes
-
class
epivizfileserver.parser.BaseFile.
BaseFile
(file)[source]¶ Bases:
object
Base file class for parser module
This class provides various useful functions
Parameters: file – file location -
local
¶ if file is local or hosted on a public server
-
endian
¶ check for endianess
-
HEADER_STRUCT
= <Struct object>¶
-
SUMMARY_STRUCT
= <Struct object>¶
-
bin_rows
(data, chr, start, end, columns=None, metadata=None, bins=400)[source]¶ Bin genome by bin length and summarize the bin
-
decompress_binary
(bin_block)[source]¶ decompress a binary string
Parameters: bin_block – binary string Returns: a zlib decompressed binary string
-
formatAsJSON
(data)[source]¶ Encode a data object as JSON
Parameters: data – any data object to encode Returns: data encoded as JSON
-
get_bytes
(offset, size)[source]¶ Get bytes within a given range
Parameters: Returns: binary string from offset to (offset + size)
-
epivizfileserver.parser.BigBed module¶
-
class
epivizfileserver.parser.BigBed.
BigBed
(file, columns=None)[source]¶ Bases:
epivizfileserver.parser.BigWig.BigWig
Bed file parser
Parameters: file (str) – bigbed file location -
get_autosql
()[source]¶ parse autosql stored in file
Returns: an array of columns in file parsed from autosql
-
magic
= '0x8789F2EB'¶
-
epivizfileserver.parser.BigWig module¶
-
class
epivizfileserver.parser.BigWig.
BigWig
(file, columns=None)[source]¶ Bases:
epivizfileserver.parser.BaseFile.BaseFile
BigWig file parser
Parameters: file (str) – bigwig file location -
tree
¶ chromosome tree parsed from file
-
columns
¶ column names
-
cacheData
¶ locally cached data for this file
-
daskWrapper
(fileObj, chr, start, end, bins=2000, zoomlvl=-1, metric='AVG', respType='JSON')[source]¶ Dask Wrapper
-
getId
(chrmzone)[source]¶ Get mapping of chromosome to id stored in file
Parameters: chrmzone (str) – chromosome Returns: id in file for the given chromosome
-
getRange
(chr, start, end, bins=2000, zoomlvl=-1, metric='AVG', respType='DataFrame', treedisk=None)[source]¶ Get data for a given genomic location
Parameters: Returns: - result
a DataFrame with matched regions from the input genomic location if respType is DataFrame else result is an array
- error
if there was any error during the process
-
getTree
(zoomlvl)[source]¶ Get chromosome tree for a given zoom level
Parameters: zoomlvl (int) – zoomlvl to get Returns: Tree binary bytes
-
getValues
(chr, start, end, zoomlvl)[source]¶ Get data for a region
Note: Do not use this directly, use getRange
Parameters: Returns: data for the region
-
getZoom
(zoomlvl, binSize)[source]¶ Get Zoom record for the given bin size
Parameters: Returns: zoom level
-
get_autosql
()[source]¶ parse autosql in file
Returns: an array of columns in file parsed from autosql
-
locateTree
(chrmId, start, end, zoomlvl, offset)[source]¶ Locate tree for the given region
Parameters: Returns: nodes in the stored R-tree
-
magic
= '0x888FFC26'¶
-
parseLeafDataNode
(chrmId, start, end, zoomlvl, rStartChromIx, rStartBase, rEndChromIx, rEndBase, rdataOffset, rDataSize)[source]¶ Parse an Rtree leaf node
-
readRtreeHeaderNode
(zoomlvl)[source]¶ Parse an Rtree Header node
Parameters: zoomlvl (int) – zoom level Returns: header node Rtree object
-
epivizfileserver.parser.GWASBigBed module¶
-
class
epivizfileserver.parser.GWASBigBed.
GWASBigBed
(file, columns=None)[source]¶ Bases:
epivizfileserver.parser.BigBed.BigBed
Bed file parser
Parameters: file (str) – GWASBigBed file location -
getRange
(chr, start, end, bins=2000, zoomlvl=-1, metric='AVG', respType='DataFrame', treedisk=None)[source]¶ Get data for a given genomic location
Parameters: Returns: - result
a DataFrame with matched regions from the input genomic location if respType is DataFrame else result is an array
- error
if there was any error during the process
-
magic
= '0x8789F2EB'¶
-
epivizfileserver.parser.GtfFile module¶
-
class
epivizfileserver.parser.GtfFile.
GtfFile
(file, columns=['chr', 'source', 'feature', 'start', 'end', 'score', 'strand', 'frame', 'group'])[source]¶ Bases:
object
GTF File Class to parse gtf/gff files
Parameters: -
file
¶ a pysam file object
-
fileSrc
¶ location of the file
-
cacheData
¶ cache of accessed data in memory
-
columns
¶ column names to use
-
getRange
(chr, start, end, bins=2000, zoomlvl=-1, metric='AVG', respType='DataFrame')[source]¶ Get data for a given genomic location
Parameters: Returns: - result
a DataFrame with matched regions from the input genomic location if respType is DataFrame else result is an array
- error
if there was any error during the process
-
epivizfileserver.parser.GtfParsedFile module¶
-
class
epivizfileserver.parser.GtfParsedFile.
GtfParsedFile
(file, columns=['chr', 'start', 'end', 'width', 'strand', 'geneid', 'exon_starts', 'exon_ends', 'gene'])[source]¶ Bases:
object
GTF File Class to parse gtf/gff files
Parameters: -
file
¶ a pysam file object
-
fileSrc
¶ location of the file
-
cacheData
¶ cache of accessed data in memory
-
columns
¶ column names to use
-
getRange
(chr, start, end, bins=2000, zoomlvl=-1, metric='AVG', respType='DataFrame')[source]¶ Get data for a given genomic location
Parameters: Returns: - result
a DataFrame with matched regions from the input genomic location if respType is DataFrame else result is an array
- error
if there was any error during the process
-
epivizfileserver.parser.GtfTabixFile module¶
-
class
epivizfileserver.parser.GtfTabixFile.
GtfTabixFile
(file, columns=None)[source]¶ Bases:
epivizfileserver.parser.SamFile.SamFile
GTF File Class to parse gtf/gff files
Parameters: -
file
¶ a pysam file object
-
fileSrc
¶ location of the file
-
cacheData
¶ cache of accessed data in memory
-
columns
¶ column names to use
-
getRange
(chr, start, end, bins=2000, zoomlvl=-1, metric='AVG', respType='DataFrame', ensembl=True)[source]¶ Get data for a given genomic location
Parameters: Returns: - result
a DataFrame with matched regions from the input genomic location if respType is DataFrame else result is an array
- error
if there was any error during the process
-
epivizfileserver.parser.HDF5File module¶
-
class
epivizfileserver.parser.HDF5File.
HDF5File
(file)[source]¶ Bases:
object
HDF5 File Class to parse only local hdf5 files
Parameters: -
file
¶ a pysam file object
-
fileSrc
¶ location of the file
-
cacheData
¶ cache of accessed data in memory
-
columns
¶ column names to use
-
getRange
(chr, start=None, end=None, row_names=None)[source]¶ Get data for a given genomic location
Parameters: Returns: - result
a DataFrame with matched regions from the input genomic location if respType is DataFrame else result is an array
- error
if there was any error during the process
-
epivizfileserver.parser.Helper module¶
epivizfileserver.parser.InteractionBigBed module¶
-
class
epivizfileserver.parser.InteractionBigBed.
InteractionBigBed
(file, columns=['chr', 'start', 'end', 'name', 'score', 'value', 'exp', 'color', 'region1chr', 'region1start', 'region1end', 'region1name', 'region1strand', 'region2chr', 'region2start', 'region2end', 'region2name', 'region2strand'])[source]¶ Bases:
epivizfileserver.parser.BigBed.BigBed
BigBed file parser for chromosome interaction Data
Columns in the bed file are
(chr, start, end, name, score, value (strength of interaction, same as value), exp, color, region1chr, region1start, region1end, region1name, region1strand, region2chr, region2start, region2end, region2name, region2strand)Parameters: file (str) – InteractionBigBed file location -
getRange
(chr, start, end, bins=2000, zoomlvl=-1, metric='AVG', respType='DataFrame', treedisk=None)[source]¶ Get data for a given genomic location
Parameters: Returns: - result
a DataFrame with matched regions from the input genomic location if respType is DataFrame else result is an array
- error
if there was any error during the process
-
magic
= '0x8789F2EB'¶
-
epivizfileserver.parser.SamFile module¶
-
class
epivizfileserver.parser.SamFile.
SamFile
(file, columns=None)[source]¶ Bases:
object
SAM File Class to parse sam files
Parameters: -
file
¶ a pysam file object
-
fileSrc
¶ location of the file
-
cacheData
¶ cache of accessed data in memory
-
columns
¶ column names to use
-
getRange
(chr, start, end, bins=2000, zoomlvl=-1, metric='AVG', respType='DataFrame')[source]¶ Get data for a given genomic location
Parameters: Returns: - result
a DataFrame with matched regions from the input genomic location if respType is DataFrame else result is an array
- error
if there was any error during the process
-
epivizfileserver.parser.TbxFile module¶
-
class
epivizfileserver.parser.TbxFile.
TbxFile
(file, columns=['chr', 'start', 'end', 'width', 'strand', 'geneid', 'exon_starts', 'exon_ends', 'gene'])[source]¶ Bases:
epivizfileserver.parser.SamFile.SamFile
TBX File Class to parse tbx files
Parameters: -
file
¶ a pysam file object
-
fileSrc
¶ location of the file
-
cacheData
¶ cache of accessed data in memory
-
columns
¶ column names to use
-
getRange
(chr, start, end, bins=2000, zoomlvl=-1, metric='AVG', respType='DataFrame')[source]¶ Get data for a given genomic location
Parameters: Returns: - result
a DataFrame with matched regions from the input genomic location if respType is DataFrame else result is an array
- error
if there was any error during the process
-
epivizfileserver.parser.TileDB module¶
-
class
epivizfileserver.parser.TileDB.
TileDB
(path)[source]¶ Bases:
object
TileDB Class to parse only local tiledb files
Parameters: - Detail:
- The tiledb_folder should contain:
‘data.tiledb’ directory - corresponds to the uri of a tiledb array. The tiledb array must have a ‘vals’ attribute from which values are read. The array should have as many rows as the number of lines in the ‘rows’ file, and as many columns as the number of lines in the ‘cols’ file.
‘rows’ file - this is a tab-separated value file describing the rows of the tiledb array it must have as many lines as rows in the tiledb file. There should be no index column in this file (i.e., it is read with pandas.read_csv(…, sep=’ ‘, index_col=False)). It must have columns ‘chr’, ‘start’ and ‘end’.
‘cols’ file - this is a tab-separated value file describing the columns of the tiledb array. It must have as many files as columns in the tiledb file. Column names for the tiledb array will be obtained from the first column in this file (i.e., iti is read with pandas.read_csv(…, sep=’ ‘, index_col=0)).
-
getRange
(chr, start=None, end=None, bins=2000, zoomlvl=-1, metric='AVG', respType='DataFrame', treedisk=None)[source]¶ Get data for a given genomic location
Parameters: Returns: - result
a DataFrame with matched regions from the input genomic location if respType is DataFrame else result is an array
- error
if there was any error during the process
epivizfileserver.parser.TranscriptTbxFile module¶
-
class
epivizfileserver.parser.TranscriptTbxFile.
TranscriptTbxFile
(file, columns=['chr', 'start', 'end', 'strand', 'transcript_id', 'exon_starts', 'exon_ends', 'gene'])[source]¶ Bases:
epivizfileserver.parser.TbxFile.TbxFile
Class for tabix indexed transcript files
Parameters: -
file
¶ a pysam file object
-
fileSrc
¶ location of the file
-
cacheData
¶ cache of accessed data in memory
-
columns
¶ column names to use
-