Kesheng (John) Wu

Dr. Kesheng (John) Wu works on several topics in data management, data analysis, and scientific computing. His algorithmic research work includes statistical methods for feature extraction, indexing techniques for searching, and tensor-based techniques for machine learning and scientific computing. He is a developer of software packages including FasTensor, IDEALEM, FastBit, and TRLan, as well as a contributor to community software projects such as HDF5 and ADIOS. He has authored more than 200 technical publications, 18 of which have more than 100 citations each. He has a doctorate in Computer Science from Univeristy of Minnesota

Dr. Wu works at the intersection between Big Data and mathematics. One theme of his work is how to find the right data for a user task. On this front, he has developed efficient indexing techniques and turned these algorithms into a software named FastBit. The FastBit indexing software has won an R&D 100 Award and is used by many organizations. For example, a German bioinformatics company uses FastBit to accelerate their molecular docking software by hundreds of times, and a US internet company uses it daily to sift through terabytes of advertisement related data. The FastBit software is also counted among 40 major works funded by US Department of Energy (DOE), Office of Science, as a part of its 40th Anniversary celebration in 2018. The second theme of John’s work is on how to effectively utilize the data storage systems for Big Data applications. Take the example of compression. The conventional storage systems treat user data as bytes while much of the sensor data and instrumental measurements are numerical values. Treating these numerical values as bytes makes them nearly impossible to compress, however, by capturing the fact that these values are numbers, he has been developing compression techniques that reduce the storage requirement by over 100-fold while capturing important features in the data.