BCyBS - Tools

Tools

Bisulfite sequencing (BSseq) processing is among the most cumbersome next generation sequencing (NGS) applications. Though some BSseq processing tools are available, they are scattered, require puzzling parameters and are running-time and memory-usage demanding. We have developed P3BSseq, a parallel processing pipeline for fast, accurate and automatic analysis of BSseq reads that trims, aligns, annotates, records the intermediate results, performs bisulfite conversion quality assessment, generates BED methylome and report files following the NIH standards. P3BSseq outperforms the known BSseq mappers regarding running time, computer hardware requirements. We optimized the P3BSseq parameters for both directional and non-directional libraries, and for both single-end and paired-end reads of Whole Genome and Reduced Representation BSseq. P3BSseq is a user-friendly streamlined solution for BSseq upstream analysis, requiring only basic computer and NGS knowledge.

P3BSseq is available here and the publication here.

NaviSE: superenhancer navigator integrating epigenomics signal algebra

NaviSE is a user-friendly streamlined tool which performs a fully-automated parallel processing of genome-wide epigenomics data from sequencing files into a final report, built with a comprehensive set of annotated files that are navigated through a graphic user interface dynamically generated by NaviSE. NaviSE also implements an 'epigenomics signal algebra' that allows the combination of multiple activation and repression epigenomics signals. NaviSE provides an interactive chromosomal landscaping of the locations of superenhancers, which can be navigated to obtain annotated information about superenhancer signal profile, associated genes, gene ontology enrichment analysis, motifs of transcription factor binding sites enriched in superenhancers, graphs of the metrics evaluating the superenhancers quality, protein-protein interaction networks and enriched metabolic pathways among other features.

NaviSE is available here and the publication here.

FOntCell

FOntCell, a software module in Python for automatic parallel computed fusion of ontologies, and used it to fuse cell ontologies. FOntCell produces a fused ontology in OBO format and a circular representation of the fused ontology in a Directed Acyclic Graph (DAG) in an interactive HTML file that summaries all the results of the fusion.

FOntCell is available here and the publication here.

BigMPI4py: Python module for parallelization of Big Data objects

Big Data analysis is a powerful discipline due to the growing number of areas where technologies extract huge amounts of knowledge from data, thus increasing the demand for storage and computational resources. Python was one of the 5 most used programming languages in 2018 and is widely used in Big Data. Parallelization in Python integrates High Performance Computing (HPC) communication protocols like Message Passing Interface (MPI) via mpi4py module. However, mpi4py does not support parallelization of objects greater than 231 bytes, common in Big Data projects. To overcome this limitation we developed BigMPI4py, a Python module that surpasses the parallelization capabilities of mpi4py, and supports object sizes beyond the 231 boundary and up to the RAM limit of the computer. BigMPI4py automatically determines, taking into account the data type, the optimal object division strategy for parallelization, and uses vectorized methods for arrays of numeric types, achieving higher parallelization efficiency. Our module has simpler syntax than MPI4py and warrants “robustness” and seamless integration of complex data analysis pipelines. Thus, it facilitates the implementation of Python for Big Data applications by taking advantage of the computational power of multicore workstations and HPC systems.

BigMPI4py is available here and the publication here.