Iowa State professors, doctorate students and other academic collaborators created a cyber infrastructure of compiled research papers related to coronaviruses, called BoaC. The hope for this open data set is to increase research efficiencies for scientists studying COVID-19.
BoaC is a platform containing 44,000 research papers related to various human coronaviruses published over the course of 64 years. The BoaC project is led by Hridesh Rajan, interim department chair of the computer science department.
“In particular, a number of coronaviruses are similar and the one that we are seeing today, COVID-19, is very similar to a number of other coronaviruses we have seen in the past,” Rajan said. “There are some similar properties and some different properties. If scientists have studied all these properties in the past, could they make use of that knowledge and get experiments done faster, get results faster and most importantly not go into blind alleys that have been explored before.”
Rajan said he made the BoaC platform for the scientists who couldn’t, and he hopes the platform will lead to scientific collaboration, thought-provoking questions and ultimately valuable information that could lead to effective treatment for COVID-19 patients.
The BoaC team is comprised of: Rajan; Yijia Huang, computer science graduate student; Rangeet Pan, computer science graduate student; Robert Dyer, assistant professor of computer science at Bowling Green State University; Simon Galetta, Des Moines University public health professor; Jianqiang Zhang, associate professor of vet diagnostic and production animal medicine and Tomislav Jelesijevic, assistant professor in veterinary pathology.
The team was given access to all of the data through the Allen Institute for AI, located in a massive zip folder with no organization. Rajan said when they saw the folder, it was well and good, but they wanted to find an efficient way for people to sift through the data they had been given so researchers could get to the point of asking the right questions.
“We thought ‘What if we create an infrastructure where someone could just open a browser, log on and start asking questions,'” Rajan said.
The BoaC platform can analyze all texts, filter through search options of keywords, exact phrases, word exclusion and publication date.
Advanced text analysis is also available throughout the platform by analyzing each part of the paper including title, abstract, all sections and references; analyzing metadata including authors, affiliation and other criteria and making search decisions based on that data; analyzing references and cross-linking between various sections of a paper; analyzing cited references and accumulate papers citing certain papers and removing stop words during analysis to focus on important contents.
Anyone may have access to the research information after requesting to create an account. Registration is reviewed and approved by the BoaC team to reduce risk of cyber-infrastructure attacks.
Rajan said there are similar infrastructures related to general medical information but nothing as extensive as BoaC, to his knowledge.
More BoaC updates and upgrades can be expected soon. The BoaC team intends on using scientists’ feedback to improve the infrastructure, as well as keeping the database up to date as more research is conducted on COVID-19.
The team hopes to then use the data infrastructure themselves to contribute to the growing bodies of knowledge on treatment, vaccinations and preventative measures to reduce risks of pandemics like COVID-19.
“I miss my students,” Rajan said. “Doing work remotely has been hard. I miss my colleagues, who are doing most of their work remotely. So this has been a challenging time for me. I would really like to do what I can to get everybody back on campus and do my part.”