OpenBioLLM, a collection of fine-tuned Llama models developed by life sciences company Saama, is revolutionizing workflows in clinical trials and personalized medicine. These models streamline the creation of clinical trial protocols, study reports, and other essential documents, speeding up data analysis and protocol generation to bring life-saving treatments to patients faster. Additionally, they enhance diagnostic accuracy and treatment planning by processing information efficiently, equipping doctors and patients with data-driven insights for informed care decisions.
“As an open-source model, OpenBioLLM is available to researchers and healthcare providers globally, with a tangible real-world impact,” says Malaikannan Sankarasubbu, Chief Technology & AI Officer at Saama.
The two models, OpenBioLLM-8B and OpenBioLLM-70B, leverage Llama 3’s architecture to accelerate tasks such as extracting insights from clinical trial documents, data analysis, clinical protocol development, and reasoning over medical knowledge graphs.
OpenBioLLM has been widely adopted in clinical development in biomedical and healthcare applications, facilitating research and analysis, data management, and operational efficiency. The models can aid in drug discovery and supporting genomics analysis. Other researchers are building on it for their own work, including a recent paper delivered at an Association of Computational Linguistics conference.
“The tangible impacts showcase how AI, specifically Llama-based models, can revolutionize healthcare and life sciences, potentially improving patient outcomes and saving lives,” Sankarasubbu says. “Our commitment to open source development has allowed us to share advancements with the broader scientific community, fostering collaboration and innovation in biomedical AI. These models are paving the way for highly personalized medical care.”
Building with Llama in complex use cases
Saama has significantly advanced its use of Llama, extending its capabilities to tackle complex tasks such as protocol generation, medical knowledge graph reasoning, and clinical trial document analysis. By developing specialized models tailored to various medical domains, the team has achieved notable improvements in biomedical task performance. Scaling its models to 8B and 70B parameters with the introduction of Llama 2 and 3 further enhanced these capabilities.
Currently, Saama is exploring multimodal applications, combining Llama-based models with medical imaging and genomics data, opening new frontiers in precision medicine and data-driven healthcare solutions.
To ensure privacy and compliance in the highly regulated healthcare environment, Saama developed advanced de-identification techniques and secure data handling protocols to adhere to healthcare regulations like HIPAA. Saama’s in-house AI researchers address any challenges that arise to ensure OpenBioLLM maintains its status as a state-of-the-art biomedical language model. The team employed rigorous testing protocols and collaborated with medical professionals to validate model outputs and mitigate biases.
When the team implemented Llama for biomedical applications, they built on the experience with MedMCQA, a dataset designed to address real-world medical entrance exam questions. A two-stage fine-tuning process involved several key steps, including curating a high-quality medical instruction dataset and creating a Direct Preference Optimization (DPO) dataset using medical expert evaluations. As a fine-tuning framework, the team adapted the Hugging Face Transformers library and TRL module for specific biomedical use cases.
“A comprehensive approach to fine-tuning enabled the team to create models that excel in biomedical tasks and outperform larger, proprietary models on specific benchmarks,” says Sankarasubbu. “The team used Llama 3 as the base model for both 8B and 70B parameter versions.”
The open source path to success
Saama opened its AI research lab in 2017, enabling collaborative innovation with talented developers and researchers worldwide. Sankarasubbu credits open source as fundamental to Saama’s success.
“Open source is poised to revolutionize biomedical AI, fostering a more inclusive, innovative ecosystem and democratizing access to advanced healthcare technologies,” Sankarasubbu says
Saama’s collaborations with universities have played a pivotal role in making its open-source projects and research more practical and impactful. The company actively promotes knowledge sharing by publishing papers in top-tier conferences and open-sourcing numerous projects on GitHub. This approach not only fosters innovation but also allows Saama to contribute to and benefit from the global knowledge ecosystem.
Its open-source contributions, including datasets and benchmarks, have been widely utilized by major AI organizations, further cementing its position as a key player in advancing AI research and applications.
“The positive response and appreciation we’ve received have reinforced our commitment to an open research culture,” Sankarasubbu says. “Collaborative approaches lead to faster advancements, and aligning open source projects with medical guidelines ensures responsible innovation in healthcare AI.”
As the Llama ecosystem evolves, Saama anticipates expanding its use in regular model upgrades based on each new iteration of Llama, including Llama 3.1 and future releases.
Các kết nối với các trường đại học đã giúp làm cho các dự án và bài viết nguồn mở của Saama trở nên thực tế và có tác động hơn. Cách tiếp cận chia sẻ kiến thức của công ty bao gồm xuất bản các tài liệu nghiên cứu tại các hội nghị hàng đầu và cung cấp nguồn mở cho nhiều dự án GitHub, cho phép công ty đóng góp và hưởng lợi từ nguồn kiến thức toàn cầu. Những người chơi AI lớn đã tận dụng những đóng góp nguồn mở của nó, bao gồm các bộ dữ liệu và điểm chuẩn.