All codes are provided under GNU General Public License (GPL) or as a web-service, which guarantees your freedom to use the software for academic purposes. For more information, help or comments please contact Dr. Saeed.
MaSS-Simulator is capable of simulating MS/MS spectra for LC-MS/MS based proteomics experiments. Our recently introduced MaSS-Simulator is capable of simulating highly accurate MS/MS spectra for LC-MS/MS based proteomics experiments. It provides great degree of control over the simulation by providing multiple configurable parameters. MaSS-Simulator offers a platform to assess and mark the limitations of MS-proteomics algorithms by testing them against a curated set of data and whole range of parameters. A complete evaluation report of an algorithm using all possible parametric verifications will provide a much deeper insight to the performance of of a given algorithm. Such evaluation will solve the reproducibility issues that are frequently faced in proteomics algorithm development. Testing of tools using such curated data set for which parameters (e.g. peptide coverage, S/N ratio etc.) can be carefully explored will rigorously evaluate the proteomics algorithms. In general, one can vary (to a practical degree) the size of the peptide, the coverage, the S/N ratios of the spectra and addition of PTM’s in the simulated spectra created for benchmarking. Such benchmarks when available for algorithms will inform proteomics practitioners which tool will work specifically for their data. Further, such reproducible evaluation of proteomics algorithms will enable method developers to endorse algorithms with confidence and reliability. Such an approach will also be helpful for the users, who can evaluate their dataset and cross-reference its properties with the algorithm’s evaluation report to conclude if the given algorithm will serve their purpose or if it will be able to achieve the required level of performance.
If you use the simulator please cite: Muaaz Gul Awan, and Fahad Saeed*, “MaSS‐Simulator: A Highly Configurable Simulator for Generating MS/MS Datasets for Benchmarking of Proteomics Algorithms.” Proteomics (2018): 1800206 (https://doi.org/10.1002/pmic.201800206)
The tool can be downloaded from the following link as open-source GPL code: https://github.com/pcdslab/MaSS-Simulator
Simulator is also available on CodeOcean as a executable code at: https://codeocean.com/capsule/7820894
How to use Mass-Simulator: Tutorial (Pdf 1.4MB)
J-Eros is a machine learning algorithm for computing similarity between two multivariate time series along with k-Nearest-Neighbor classifier, to classify healthy vs ADHD children using just fMRI data without using any other data or demographics. We applied this technique to the public data provided by ADHD-200 Consortium competition and our results show that J-Eros is capable of discriminating healthy from ADHD children such that we outperformed other state of the art techniques. This machine learning algorithm is a major step towards diagnosing ADHD using quantitative methods and will be an essential part for diagnosing mental illnesses.
The tool can be downloaded from the following link: https://github.com/pcdslab/J-Eros
If you use the tool please cite: Taban Eslami, and Fahad Saeed*, “Similarity based classification of ADHD using Singular Value Decomposition“, Proceedings of ACM Conference on Computing Frontiers (ACM-CF), Ischia, Italy, May 2018 Tech Report | Presentation (YouTube)
J-Eros is also available on CodeOcean as a executable code at: https://codeocean.com/capsule/1256395
phyNGSC is a hybrid strategy between MPI and OpenMP to accelerate the compression of big FASTQ datasets by combining the best features of distributed and shared memory architectures to balance the load of work among processes, alleviate memory latency by exploiting locality and accelerate I/O by reducing excessive read/write operations and inter-node message exchange. The algorithm introduces a novel timestamp-based approach which allows concurrent writing of compressed data in a non-deterministic order and thereby allows us to exploit a high amount of parallelism. As a proof-of-concept, we implemented some methods developed for DSRC v1 to underline the compression portion of our hybrid parallel strategy, since it exhibits superior performance for sequential solutions. The parallel algorithm is developed using C/C++, MPI and OPENMP constructs
The tool can be downloaded from the following link: https://github.com/pcdslab/PHYNGSC
If you use the tool please cite: Sandino Vargas-Pérez and Fahad Saeed*, “A Hybrid MPI-OpenMP Strategy to Speedup the Compression of Big Next-Generation Sequencing Datasets“, IEEE Transactions on Parallel and Distributed Systems, March 2017 Tech Report | IEEE Xplore
Sandino N. V. Perez and Fahad Saeed*, “A Parallel Algorithm for Compression of Big Next Generation Sequencing Datasets”, IEEE International Workshop on Parallelism in Bioinformatics (PBio), Proceedings of Parallel and Distributed Processing with Applications (IEEE ISPA-15), Helsinki Finland, August 2015 Tech Report
GPU-ArraySort is a highly scalable parallel algorithm for sorting large number of arrays using a GPU. Existing techniques focus on sorting a single large array and cannot be used for sorting large number of smaller arrays in an efficient manner. Such small number of large arrays are common in many big data applications in fields such as proteomics, genomics, connectomics, and astronomy. Our algorithm performs in-place operations and makes minimum use of any temporary run-time memory. Our results indicate that we can sort up to 2 million arrays having 1000 elements each, within few seconds. We compare our results with the unorthodox tagged array sorting technique based on NVIDIA’s Thrust library. GPU-ArraySort out-performs the tagged array sorting technique by sorting three times more data in a much smaller time.
The tool can be downloaded from the following link: https://github.com/PCDS/GPU-ArraySort-2.0
If you use the tool please cite: Muaaz Gul Awan and Fahad Saeed*, “GPU-ArraySort: A parallel, in-place algorithm for sorting large number of arrays“, Proceedings of Workshop on High Performance Computing for Big Data, International Conference on Parallel Processing (ICPP-2016), Philadelphia PA, August 2016 Tech Report | IEEE Xplore
MS-Reduce is a linear-time tool that allows massive reduction in amount of mass spectrometry data without significantly reducing the quality of the peptide deduction. Our novel data-reductive strategy for analysis of Big MS data is called MS-REDUCE and is capable of eliminating noisy peaks as well as peaks that do not contribute to peptide deduction before any peptide deduction is attempted. Our experiments have shown up to 100x speed up over existing state of the art noise elimination algorithms while maintaining comparable high quality matches. Using our approach we were able to process a million spectra in just under an hour on a moderate server which will be especially useful for processing in high-throughput environments. The algorithms has been implemented in Java and code/associated data sets.
The tool can be downloaded from the following link: https://github.com/pcdslab/MSREDUCE
If you use the tool please cite: Muaaz Awan and Fahad Saeed*, “MS-REDUCE: An ultrafast technique for reduction of Big Mass Spectrometry Data for high-throughput processing“, Oxford Bioinformatics, Jan 2016 Tech Report | PubMed | Oxford
Muaaz Awan and Fahad Saeed*, “On the sampling of Big Mass Spectrometry Data“, Proceedings of Bioinformatics and Computational Biology (BICoB) Conference, Honolulu Hawaii, March 2015 Tech Report
PhosSA is a program for phosphorylation site assignment of LC-MS/MS data. It uses a linear-time and linear space dynamic programming strategy for phosphorylation site assignment. The algorithm optimizes the objective function defined as the summation of intensity peaks that are associated with theoretical peptide fragmentation ions. A classifier introduced in the algorithm exploits the specific characteristics of mass spectrometry data to distinguish between the correctly and incorrectly assigned site(s). The algorithm has been implemented in JAVA. An executable and instruction to use the software are available. can be downloaded at this link. Relevant publications:
The tool can be downloaded from the following link: https://hpcwebapps.cit.nih.gov/ESBL/PhosSA/
If you use the tool please cite: Fahad Saeed, Trairak Pisitkun, Jason Hoffert, Guanghui Wang, Marjan Gucek, and Mark Knepper, “An Efficient Dynamic Programming Algorithm for Phosphorylation Site Assignment of Large-Scale Mass Spectrometry Data“, accepted in International Workshop on Computational Proteomics, proceedings of IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Philadelphia USA, Oct 2012IEEE Xplore | PubMed
Fahad Saeed*, Trairak Pisitkun, Jason D. Hoffert, Sara Rashidian, Guanghui Wang, Marjan Gucek, and Mark A. Knepper, “PhosSA: Fast and Accurate Phosphorylation Site Assignment Algorithm for Mass Spectrometry Data“, Proteome Science Volume 11, Supplement 1, November 2013Proteome Science | PubMed
NHLBI-AbDesigner is a tool for analyzing the amino acid sequence of a given protein to identify optimal immunizing peptides for production of antibodies. NHLBI-AbDesigner displays the information needed for choice of immunizing peptides, allowing the user to recognize trade-offs between immunogenicity, specificity, animal species targets, and post-translational modifications.
The tool can be downloaded from the following link: https://hpcwebapps.cit.nih.gov/AbDesigner/
If you use the tool please cite:Trairak Pisitkun, Jason D. Hoffert, Fahad Saeed and Mark Knepper, “NHLBI-AbDesigner: An online tool for design of peptide-directed antibodies“, American Journal of Physiology (AJP), September 2011, (doi:10.1152/ajpcell.00325.2011) AJP |Pubmed
P-Pyro-Align is an open source parallel algorithm for multiple alignment of pyrosequencing reads from multiple genomes. The proposed alignment algorithm accurately aligns the erroneous reads and the accuracy of the alignment is confirmed from the consensus obtained from the multiple alignments. The algorithms uses domain decomposition for parallel computations of the local multiple alignments and a novel merging technique for global alignment of the reads. The proposed algorithm shows super-linear speedups for large number of reads. Note that the algorithm is for multiple alignment of reads coming from different strains of genomes which cannot be handled using mapping of the reads to a reference genome.
The code has been implemented using C/C++ and MPI library
The tool can be downloaded from the following link: (Download 375kb)
If you use the tool please cite: Fahad Saeed, Alan Perez-Rathke, Jarek Gwarnicki, Tanya Y. Berger-Wolf and Ashfaq Khokhar, “High performance multiple sequence alignment system for pyrosequencing reads from multiple genomes” Journal of Parallel and Distributed Computing (JPDC) August 2011 (10.1016/j.jpdc.2011.08.001) JPDC
Pyro-Align is a open-source computationally efficient method based on domain decomposition for multiple alignment of large number of pyrosequencing reads. The proposed alignment algorithm accurately aligns the erroneous reads and the accuracy of the alignment is confirmed from the consensus obtained from the multiple alignments. Functions are provided to multiple align the read in the presence of a wildtype reference genome. A proof-of-concept java program and command-line interface is available for non-programmers.
The tool can be downloaded from the following link: (Download 32KB)
If you use the tool please cite: Fahad Saeed, Ashfaq Khokhar, Osvaldo Zagordi and Niko Beerenwinkel. “Multiple Sequence Alignment System for Pyrosequencing Reads” Bioinformatics and Computational Biology (BICoB) conference, LNBI 5462, pp 362-375, 2009.arXiv:0901.2753 | Springer