We design and implement Parallel Algorithms (using manycore, GPU’s and memory-distributed clusters) for solving Scientific Research Problems using machine-learning techniques. The research focus of the Saeed lab is at the intersection of high performance computing and real-world applications, especially in computational and systems biology. We are particularly interested in designing and implementing high performance computing (HPC) solutions to Big Data problems in high-throughput proteomics, genomics and connectomics using variety of architectures and algorithms. We are also interested in designing and developing novel ways to dealing with big data computational biology problems using application specific domain-knowledge and artificial intelligent strategies including machine-learning and deep-learning algorithms.
Technical Focus: We here at Saeed lab are striving to solve big data problems emanating from scientific high-throughput technologies. Our techniques include novel reductive analytics algorithms, high performance algorithms for compressive analytics, high performance computing solutions to general big data problems and efficient protocols for sharing and transferring big genomic and proteomic data sets.
Scientific Focus: We are primarily a computational lab which builds computational infrastructure driven by scientific questions and needs. We are currently in an era marked by extreme and pervasive data generation as a result of high-throughput technologies with dramatic changes in the scale and nature of cyberinfrastructure requirements such as modelling scientific data more holistically, desire (and the need) for near-real time processing and the scalability of the proposed infrastructure in machine-learning space. This is exciting as well as challenging for computational scientists.
We therefore, at Saeed Lab, address challenges and opportunities which allows us to solve a spectrum of research problems related to computational, data, software, networking, high performance computing and human capital development that collectively can enable new discoveries across science.
Our immediate goal is to design, develop, and implement machine-learning HPC computational infrastructure that allows us to discover genomic and proteomics underpinning of mental disorders. To this end, this requires infrastructure development that is scalable and can deal simultaneously with: 1) Genomic Big Data (from next generation sequencing techniques) 2) Proteomics Big Data (from high-throughput mass spectrometry techniques), and 3) Connectomics Big Data (from brain imaging such as fMRI data).
More detailed information about research groups working under Saeed Lab are illustrated in the following pages:
Investigation of high-performance computing algorithms for compression of Next Generation Sequencing Data sets using Memory-Distributed Supercomputers. Further data-encoding using machine-learning techniques are being investigated for scalable I/O performance on peta-scale supercomputers as well as CPU-GPU architectures
Investigation of high-performance computing algorithms for analysis of Big Proteogenomics Data Sets obtained with combination with Next Generation Sequencing Data and Mass Spectrometry Data. Novel sketching, sampling and dimensionality reduction strategies that will allow computations in sub-linear time and sub-linear space on high-performance supercomputers are being investigated.
Machine-Learning Strategies for Computational Connectomics
Design and Implementation of machine-learning algorithms that can classify diseased brain (e.g. ADHD, ASD, Bipolar) from healthy fMRI brain scans is being investigated. High-performance algorithms that can run on CPU-GPU architectures suitable for clinical setting are also being investigated.
High Performance Computing Algorithms for big data and machine-learning strategies
Design and Implementation of algorithmic mapping of data and compute intensive applications on accelerators such as CPU-GPU and reconfigurable platforms such as FPGAs, with the aim to improve resource utilization, including amount of compute and memory logic, data I/O, throughput, and power consumption. Applications of interest include embedded systems for medical imaging (e.g. fMRI data), proteomics, protegenomics and big graph data.
We are thankful to the following federal agencies and Institutes for supporting our research efforts: