Professor Digital Clone
This is an example of a lecture generated by the model on a topic it wasn't fine tuned on.
Key Learnings
- LLM fine-tuning techniques
- How to be resourceful with limited data
- Importance of data preprocessing
- Voice cloning technology
- Structured data preparation for LLMs
Project Overview
I created an AI system that generates educational lectures by cloning my professor's speaking style and voice. I had tons of fun working on this project since the OpenAI API was in its early stages and this technology was very new and exciting. The project scope included:
- Utilized YouTube API for scraping professor's lecture videos and auto generated transcripts
- Processed unstructured lecture data into LLM training format
- Tested different methods of preprocessing data to improve model performance
- Fine tuned early OpenAI models to generate the lecture script
- Integrated with ElevenLabs voice cloning to create lecture audio and DiD to animate her face cam
Technical Details
- The extreme token limit of early OpenAI API and only ~30 lectures available with a large amount of tokens led me to attempting fine tuning with two different dataset formats
- One dataset was using just the 30 lecture examples where the input was the lecture topic and output was the first 500 words of the transcript
- Other was creating many more samples (~500) with the input being the lecture topic and output being a 500 word snippet randomly sampled from the transcript
- The model was able to generate a lecture that was very similar to the professor's style and voice, and the quality was good even on topics it wasn't fine tuned on
Technologies Used
Python
LLM
YouTube API
Pandas
OpenAI API
ElevenLabs
DiD