Build domain specific custom ChatGPT with your own data

Step 1: Install and Download Software and Pre-Made Script

Please note the following instructions are for a Windows 10 or Windows 11 machine.

To provide custom data to ChatGPT, you’ll need to install and download the latest Python3, Git, Microsoft C++, and the ChatGPT-retrieval script from GitHub. If you already have some of the software installed on your PC, make sure they are updated with the latest version to avoid any hiccups during the process.

Start by installing:

Python3 and Microsoft C++ Installation Notes

When installing Python3, make sure that you tick the Add python.exe to PATH option before clicking Install Now. This is important as it allows you to access Python in any directory on your computer.

When Installing Microsoft C++, you’ll want to install Microsoft Visual Studio Build Tools first. Once installed, you can tick the Desktop development with C++ option and click Install with all the optional tools automatically ticked on the right sidebar.

Now that you have installed the latest versions of Python3, Git, and Microsoft C++, you can download the Python script to easily query custom local data.

Download: ChatGPT-retrieval script (Free)

To download the script, click on Code, then select Download ZIP. This should download the Python script into your default or selected directory.

Step 2: Set Up the Local Environment
To set up the environment, you’ll need to open a terminal in the chatgpt-retrieval-main folder you downloaded. To do that, open chatgpt-retrieval-main folder, right-click, and select Open in Terminal.

Once the terminal is open, copy and paste this command:

This command uses Python’s package manager to create and manage the Python virtual environment needed.

After creating the virtual environment, we need to supply an OpenAI API key to access their services. We’ll first need to generate an API key from the OpenAI API keys site by clicking on Create new secret key, adding a name for the key, then hitting the Create secret key button.

You will be provided with a string of characters. This is your OpenAI API key. Copy it by clicking on the copy icon on the side of the API key. Keep note that this API key should be kept secret. Do not share it with others unless you really intend for them to use it with you.

Once copied, return to the chatgpt-retrieval-main folder and open constants with Notepad. Now replace the placeholder with your API key. Remember to save the file!

Now that you have successfully set up your virtual environment and added your OpenAI API key as an environment variable. You can now provide your custom data to ChatGPT.

Step 3: Adding Custom Data
To add custom data, place all your custom text data in the data folder within chatgpt-retrieval-main. The format of the text data may be in the form of a PDF, TXT, or DOC.

As you can see from the screenshot above, I’ve added a text file containing a made-up personal schedule, an article I wrote on AMD’s Instinct Accelerators, and a PDF document.

Step 4: Querying ChatGPT Through Terminal

The Python script allows us to query data from the custom data we’ve added to the data folder and the internet. In other words, you will have access to the usual ChatGPT backend and all the data stored locally in the data folder.

To use the script, run the python script and then add your question or query as the argument.

Make sure to put your questions in quotation marks.

To test if we have successfully fed ChatGPT our data, I’ll ask a personal question regarding the Personal Sched.txt file.

It worked! This means ChatGPT was able to read the Personal Sched.txt provided earlier. Now let’s see if we have successfully fed ChatGPT with information it does not know due to its knowledge cutoff date.

You need to provide all the data yourself. You can still access all the knowledge of GPT-3.5 until its knowledge cutoff date; however, you must provide all the extra data. This means if you want your local model to be knowledgeable of a certain subject on the internet that GPT-3.5 don’t already know, you’ll have to go to the internet and scrape the data yourself and save it as a text on the data folder of chatgpt-retrieval-main.

Another issue is that querying ChatGPT like this takes more time to load when compared to asking ChatGPT directly.