Epiviz File Server - Query & Transform Data from Indexed Genomic Files in Python

Epiviz file Server is a scalable data query and compute system for indexed genomic files. In addition to querying data, users can also compute transformations, summarization and aggregation using NumPy functions directly on data queried from files.

Since the genomic files are indexed, the library will only request and parse necessary bytes from these files to process the request (without loading the entire file into memory). We implemented a cache system to efficiently manage already accessed bytes of a file. We also use dask to parallelize computing requests for query and transformation. This allows us to process and scale our system to large data repositories.

This blog post (Jupyter notebook) describes various features of the file server library using genomic files hosted from the NIH Roadmap Epigenomics project.

The library provides various modules to
  • Parser: Read various genomic file formats,
  • Query: Access only necessary bytes of file for a given genomic location,
  • Compute: Apply transformations on data,
  • Server: Instantly convert the datasets into a REST API
  • Visualization: Interactive Exploration of data using Epiviz (uses the Server module above).

Note

  • The Epiviz file Server is an open source project on GitHub
  • Let us know what you think and any feedback or feature requests to improve the library!

Indices and tables