Enhancing an Existing LLM Model with Domain-specific Jenkins Knowledge

This site is the new docs site currently being tested. For the actual docs in use please go to https://www.jenkins.io/doc.

Goal: To develop an app using an existing open-source LLM model with data collected for domain-specific Jenkins knowledge one can fine-tune locally and set up with a proper UI for the user to interact with

Skills: Python, React.js, LLM, AI/ML, Jenkins, Ollama, LangChain, UI

Status: Selected

Team

Contributor: Nour Almulhem

Mentor(s): krisstern Kris Stern gounthar Bruno Verachten shivaylamba Shivay Lamba harsh ps 2003 Harsh Pratap Singh

Details

Abstract

This full-stack project focuses on a proof-of-concept (PoC) idea to fine-tune an existing open-source large language model (LLM), such as Llama 2, with domain-specific Jenkins data. The data will be compiled, wrangled, and processed by the contributor as part of an AI-driven application. The goal is to develop a minimalistic user interface (UI) for users to interact with the fine-tuned LLM as a complete end-to-end product. This product is to be installed and run locally on the user’s laptop, with tools such as Ollama for setting up and running LLMs locally, and LangChain to be used as a framework to construct the LLM-powered app. The contributor will get to be involved in every step of the application development process, from data collection, wrangling, and processing, to fine-tuning the model and developing the user interface (UI). They may also get exposed to how to package software to be distributed as a standalone application to be consumed by the end user.

Rationale

Currently, we do not have any AI-driven assistive technology to help Jenkins users with domain-specific knowledge. This project aims to fill that gap by developing an AI-driven application that can assist Jenkins users with the knowledge that is normally possessed by a Jenkins expert.

Implementation

Due to the cutting-edge nature of the project, we will be combining both research and experimentation to arrive at an LLM-driven application using more than one approach to fine-tune the model with domain-specific Jenkins data. We will be using both Llama 2 and Llama 3 at various stages of the project, with the former to establish some benchmarks and the latter to experiment with the latest advancements in LLMs.

Office hours

  • (General) Official weekly Jenkins office hours: Thursdays 13:00 UTC

  • (Project-based) Weekly project based office hours after Bonding Period: Fridays 12:00 UTC

Chat

We use the #gsoc-2024-llm channel in the CDF Slack workspace.