epivizfileserver.measurements package

Submodules

epivizfileserver.measurements.measurementClass module

class epivizfileserver.measurements.measurementClass.ComputedMeasurement(mtype, mid, name, measurements, source='computed', computeFunc=None, datasource='computed', genome=None, annotation={'group': 'computed'}, metadata=None, isComputed=True, isGenes=False, fileHandler=None, columns=None, computeAxis=1)[source]

Bases: epivizfileserver.measurements.measurementClass.Measurement

Class for representing computed measurements

In addition to params on base Measurement class -

Parameters:
  • computeFunc – a NumPy function to apply on our dataframe
  • source – defaults to ‘computed’
  • datasource – defaults to ‘computed’
computeWrapper(computeFunc, columns)[source]

a wrapper for the ‘computeFunc’ function

Parameters:
  • computeFunc – a NumPy compute function
  • columns – columns from file to apply
Returns:

a dataframe with results

get_columns()[source]

get columns from file

get_data(chr, start, end, bins, dropna=True)[source]

Get data for a genomic region from files and apply the computeFunc function

Parameters:
  • chr (str) – chromosome
  • start (int) – genomic start
  • end (int) – genomic end
  • dropna (bool) – True to dropna from a measurement since any computation is going to fail on this row
Returns:

a dataframe with results

class epivizfileserver.measurements.measurementClass.DbMeasurement(mtype, mid, name, source, datasource, dbConn, genome=None, annotation=None, metadata=None, isComputed=False, isGenes=False, minValue=None, maxValue=None, columns=None)[source]

Bases: epivizfileserver.measurements.measurementClass.Measurement

Class representing a database measurement

In addition to params from the base measurement class -

Parameters:dbConn – a database connection object
connection

a database connection object

get_data(chr, start, end, bin=False)[source]

Get data for a genomic region from database

Parameters:
  • chr (str) – chromosome
  • start (int) – genomic start
  • end (int) – genomic end
  • bin (bool) – True to bin the results, defaults to False
Returns:

a dataframe with results

query(obj, params)[source]

Query from db/source

Parameters:
  • obj – the query string
  • query_params – query parameters to search
Returns:

a dataframe of results from the database

class epivizfileserver.measurements.measurementClass.FileMeasurement(mtype, mid, name, source, datasource='files', genome=None, annotation=None, metadata=None, isComputed=False, isGenes=False, minValue=None, maxValue=None, fileHandler=None, columns=None)[source]

Bases: epivizfileserver.measurements.measurementClass.Measurement

Class for file based measurement

In addition to params from the base Measurement class

Parameters:fileHandler – an optional file handler object to process query requests (uses dask)
create_parser_object(type, name, columns=None)[source]

Create appropriate File class based on file format

Parameters:
  • type (str) – format of file
  • name (str) – location of file
  • columns ([str]) – list of columns from file
Returns:

An file object

get_data(chr, start, end, bins, bin=True)[source]

Get data for a genomic region from file

Parameters:
  • chr (str) – chromosome
  • start (int) – genomic start
  • end (int) – genomic end
  • bin (bool) – True to bin the results, defaults to False
Returns:

a dataframe with results

get_status()[source]

Get status of this measurement (most pertinent for files)

search_gene(query, maxResults)[source]

Get data for a genomic region from file

Parameters:
  • chr (str) – chromosome
  • start (int) – genomic start
  • end (int) – genomic end
Returns:

a array of matched genes

class epivizfileserver.measurements.measurementClass.Measurement(mtype, mid, name, source, datasource, genome=None, annotation=None, metadata=None, isComputed=False, isGenes=False, minValue=None, maxValue=None, columns=None)[source]

Bases: object

Base class for managing measurements from files

Parameters:
  • mtype – Measurement type, either ‘file’ or ‘db’
  • mid – unique id to use for this measurement
  • name – name of the measurement
  • source – location of the measurement, if mtype is ‘db’ use table name, if file, file location
  • datasource – is the database name if mtype is ‘db’ use database name, else ‘files’
  • annotation – annotation for this measurement, defaults to None
  • metadata – metadata for this measurement, defaults to None
  • isComputed – True if this measurement is Computed from other measurements, defaults to False
  • isGenes – True if this measurement is an annotation (for example: reference genome hg19), defaults to False
  • minValue – min value of all values, defaults to None
  • maxValue – max value of all values, defaults to None
  • columns – column names for the file
bin_rows(data, chr, start, end, bins=2000)[source]

Bin genome by bin length and summarize the bin

Parameters:
  • data – DataFrame from the file
  • chr – chromosome
  • start – genomic start
  • end – genomic end
  • length – max rows to summarize the data frame into
Returns:

a binned data frame whose max rows is length

get_columns()[source]

get columns from file

get_data(chr, start, end)[source]

Get Data for this measurement

Parameters:
  • chr – chromosome
  • start – genomic start
  • end – genomic end
get_measurement_annotation()[source]

Get measurement annotation

get_measurement_genome()[source]

Get measurement genome

get_measurement_id()[source]

Get measurement id

get_measurement_max()[source]

Get measurement max value

get_measurement_metadata()[source]

Get measurement metadata

get_measurement_min()[source]

Get measurement min value

get_measurement_name()[source]

Get measurement name

get_measurement_source()[source]

Get source

get_measurement_type()[source]

Get measurement type

get_status()[source]

Get status of this measurement (most pertinent for files)

is_computed()[source]

Is measurement computed ?

is_file()[source]

Is measurement a file ?

is_gene()[source]

is the file a genome annotation ?

query(obj, query_params)[source]

Query from db/source

Parameters:
  • obj – db obj
  • query_params – query parameters to search
class epivizfileserver.measurements.measurementClass.WebServerMeasurement(mtype, mid, name, source, datasource, datasourceGroup, annotation=None, metadata=None, isComputed=False, isGenes=False, minValue=None, maxValue=None)[source]

Bases: epivizfileserver.measurements.measurementClass.Measurement

Class representing a web server measurement

In addition to params from the base measurement class, source is now server API endpoint

get_data(chr, start, end, bin=False, requestId=624)[source]

Get data for a genomic region from the API

Parameters:
  • chr (str) – chromosome
  • start (int) – genomic start
  • end (int) – genomic end
  • bin (bool) – True to bin the results, defaults to False
Returns:

a dataframe with results

epivizfileserver.measurements.measurementManager module

class epivizfileserver.measurements.measurementManager.EMDMeasurementMap(url, fileHandler)[source]

Bases: object

Manage mapping between measuremnts in EFS and metadata service

add_new_collections(new_collection_ids)[source]
add_new_measurements(new_ms_ids)[source]
init()[source]
init_collections()[source]
init_measurements()[source]
process_emd_record(rec)[source]
sync(current_ms)[source]
sync_collections()[source]
sync_measurements(current_ms)[source]
class epivizfileserver.measurements.measurementManager.MeasurementManager[source]

Bases: object

Measurement manager class

measurements

list of all measurements managed by the system

add_computed_measurement(mtype, mid, name, measurements, computeFunc, genome=None, annotation=None, metadata=None, computeAxis=1)[source]

Add a Computed Measurement

Parameters:
  • mtype – measurement type, defaults to ‘computed’
  • mid – measurement id
  • name – name for this measurement
  • measurements – list of measurement to use
  • computeFuncNumPy function to apply
Returns:

a ComputedMeasurement object

add_genome(genome, url='http://obj.umiacs.umd.edu/genomes/', type=None, fileHandler=None)[source]
Add a genome to the list of measurements. The genome has to be tabix indexed for the file server
to make remote queries. Our tabix indexed files are available at https://obj.umiacs.umd.edu/genomes/index.html
Parameters:
  • genome – for example : hg19 if type = “tabix” or full location of gtf file if type = “gtf”
  • genome_id – required if type = “gtf”
  • url – url to the genome file
get_from_emd(url=None)[source]

Make a GET request to a metadata api

Parameters:url – the url of the epiviz-md api. If none the url on self.emd_endpoint is used if available (None)
get_genomes()[source]

Get all available genomes

get_measurement(ms_id)[source]

Get a specific measurement

get_measurements()[source]

Get all available measurements

import_ahub(ahub, handler=None)[source]

Import measurements from annotationHub objects.

Parameters:
  • ahub – list of file records from annotationHub
  • handler – an optional filehandler to use
import_dbm(dbConn)[source]

Import measurements from a database.The database needs to have a measurements_index table with information of files imported into the database.

Parameters:dbConn – a database connection
import_emd(url, fileHandler=None, listen=True)[source]

Import measurements from an epiviz-md metadata service api.

Parameters:
  • url – the url of the epiviz-md api
  • handler – an optional filehandler to use
  • listen – activate ‘updateCollections’ endpoint to add measurements from the service upon request
import_files(fileSource, fileHandler=None, genome=None)[source]

Import measurements from a file.

Parameters:
  • fileSource – location of the configuration file to load
  • fileHandler – an optional filehandler to use
import_records(records, fileHandler=None, genome=None)[source]

Import measurements from a list of records (usually from a decoded json string)

Parameters:
  • fileSource – location of the configuration json file to load
  • fileHandler – an optional filehandler to use
import_trackhub(hub, handler=None)[source]

Import measurements from annotationHub objects.

Parameters:
  • ahub – list of file records from annotationHub
  • handler – an optional filehandler to use
use_emd(url, fileHandler=None)[source]

Delegate all getMeasurement calls to an epiviz-md metdata service api

Parameters:
  • url – the url of the epiviz-md api
  • fileHandler – an optional filehandler to use
class epivizfileserver.measurements.measurementManager.MeasurementSet[source]

Bases: object

append(ms)[source]
get(key)[source]
get_measurements()[source]
get_mids()[source]

Module contents