BioGPT is a type of generative language model, trained on millions of previously published biomedical research articles.
Pre-trained language models have attracted increasing attention in the biomedical domain, inspired by their great success in the general natural language domain. Among the two main branches of pre-trained language models in the general language domain, i.e. BERT (and its variants) and GPT (and its variants), the first one has been extensively studied in the biomedical domain, such as BioBERT and PubMedBERT.
BioGPT relies on deep learning, where artificial neural networks—meant to mimic neurons in the human brain—learn to process increasingly complex data on their own. As a result, the new AI program is a type of “black box” technology, meaning developers do not know how individual components of neural networks work together to create the output.
To assess the accuracy of generative AI models, researchers have developed tests to measure natural language processing (NLP)—or the ability to understand text and spoken language. Microsoft’s recent paper assessed BioGPT along six scales of NLP, reporting that the new model outperformed previous models on most tasks. This includes the well-established scale PubMedQA, in which Microsoft reported BioGPT achieved human parity.
The rise of BioGPT forms part of the wider push towards AI solutions in healthcare and the clinical trials industry. Recently, AI has shown the potential to improving clinical trial patient selection, predicting drug development outcomes, and developing digital biomarkers.
The team studied the prompt design and target sequence design when applying BioGPT to downstream tasks and found that target sequences with natural language semantics are better than structured prompts explored in previous works.
The team designed and examined the prompt and the target sequence format while applying pre-trained BioGPT to downstream tasks based on GPT-2 and pre-trained on 15 million PubMed abstracts corpus. It performs better than earlier models on most of the six biomedical NLP tasks it evaluates.
In PubMedQA, users must answer “yes,” “no,” or “maybe” to a series of biomedical questions based on corresponding abstracts from the database PubMed. For example, one PubMedQA prompt asks, “Do preoperative statins reduce atrial fibrillation after coronary artery bypass grafting?”
BioGPT-Large, the most extensive version of the AI program, achieved a record 81% accuracy on PubMedQA, compared to an accuracy of 78% for a single human annotator. Most other NLP programs, including Google’s BERT family of language models, have not surpassed human accuracy.
NIH relevant articles –
- BioBERT: a pre-trained biomedical language representation model for biomedical text mining.Lee J, Yoon W, Kim S, Kim D, Kim S, So CH, Kang J.Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.PMID: 31501885 Free PMC article.
- BioVAE: a pre-trained latent variable language model for biomedical text mining.Trieu HL, Miwa M, Ananiadou S.Bioinformatics. 2022 Jan 12;38(3):872-874. doi: 10.1093/bioinformatics/btab702.PMID: 34636886 Free PMC article.
- Investigation of improving the pre-training and fine-tuning of BERT model for biomedical relation extraction.Su P, Vijay-Shanker K.BMC Bioinformatics. 2022 Apr 4;23(1):120. doi: 10.1186/s12859-022-04642-w.PMID: 35379166 Free PMC article.
- Discovering Thematically Coherent Biomedical Documents Using Contextualized Bidirectional Encoder Representations from Transformers-Based Clustering.Davagdorj K, Wang L, Li M, Pham VH, Ryu KH, Theera-Umpon N.Int J Environ Res Public Health. 2022 May 12;19(10):5893. doi: 10.3390/ijerph19105893.PMID: 35627429 Free PMC article.
- Abstracts of Presentations at the Association of Clinical Scientists 143rd Meeting Louisville, KY May 11-14,2022.[No authors listed]Ann Clin Lab Sci. 2022 May;52(3):511-525.PMID: 35777803 No abstract available.