Manage YouTube videos with AI

In this article we will discover how to automate some of the boring parts of YouTube video management: description and chapters.

We are going to talk about:

  • Audio transcription
  • SRF file analysis
  • Video description generation
  • Video chapter creation

Get your video ready!

First we need a library that could transcribe our audio. Since I am on MacOS, I will use mlx-whisper but you could choose a similar package for your OS. A good starting point for this is whisper.cpp.

Audio transcription – Whisper

If you are on MacOS, let’s install mlx-whisper.

mlx-examples/whisper at main · ml-explore/mlx-examples · GitHub
Examples in the MLX framework. Contribute to ml-explore/mlx-examples development by creating an account on GitHub.
github.com

Depending on your system configuration, you might need to use venv, as I did on my machine:

python3.9 -m venv youtube-markers-env
source ./youtube-markers-env/bin/activate
pip install mlx-whisper

Audio extraction

Now we have everything we need to transcribe our video, but first we need to extract the audio from the video!

You could do that with a wide set of tools, if you are on a Mac you could do that with Quicktime (File > Export As > Audio Only) or if you have ffmpeg installed on your system you could run:

ffmpeg -i inputVideo.mp4 the_audio.wav

Audio transcription

Transcribing an audio is as easy as launching this command:

mlx_whisper the_audio.aiff --model "mlx-community/whisper-medium-mlx" --output-format srt

I’ve used the mlx-community/whisper-medium-mlx model but you could choose any model from mlx-community: https://huggingface.co/mlx-community?search_models=whisper. Of course, the bigger the model, the longer it takes to complete its task and the higher its transcription accuracy. Try to test different models to see which one reaches the best balance for your use case.

The command will generate an SRT file in the same folder as the audio file.

Analyzing the srt file

A SRT file is a text file that looks like this:

1
00:00:00,000 --> 00:00:03,840
This is the first sentence

2
00:00:04,640 --> 00:00:12,320
and this is the second one.

It basically is a list of sentences with timestamps.

Now, we want to be able to determine which moments correspond to the beginning of a Chapter. That’s a process that typically involves some time if done manually but it turns out this is a fantastic task for an LLM!

I ❤️ Laravel, so let’s write some code.

First, let’s ask some questions to Anthropic Claude using the Prism package. Prism it’s a great package and you should check it out if you haven’t already.

$userMessage = new UserMessage(content: "From this SRT file, detect the main arguments and their timestamps. Detect maximum 10 arguments and propose only a reasonable number of them. Return them in a JSON format with title and timestamp. Here's the SRT content:\n\n{$srtContent}");
        $assistantMessage = new AssistantMessage(content: "here is the JSON:");

        $response =  Prism::text()
            ->using(Provider::Anthropic, 'claude-3-5-sonnet-latest')
            ->withMessages([$userMessage,$assistantMessage])
            ->withSystemPrompt("You are an expert at analyzing video transcripts and identifying main arguments.
             Always return your response in valid JSON format with an array of objects containing 'title' and 'timestamp' keys.
             The title must be in the srt language.
             The JSON must be in this form:
             [
                {
                    \"title\": \"Argument 1\",
                    \"timestamp\": \"00:00\"
                },
                 {
                    \"title\": \"Argument 2\",
                    \"timestamp\": \"00:30\"
                }
             ]")
            ->generate();


        return json_decode(Str::trim($response->text));

What did just happen? We are sending the SRT file to an LLM and asking it to find maximum 10 arguments in the video transcription. The set of arguments will then become our YouTube video chapters.

Anthropic APIs will return a JSON with a list of Chapters and their timestamp and later we are going to feed that JSON to another piece of code that will call YouTube Data API but first things first: let’s ask Claude to create a nice description of our video by passing again the SRT file to its APIs.

You are absolutely right: this process could be done in just one API call as if would be more efficient and cheaper.

What about a description for our video?

It should be pretty easy for an LLM to create a description directly from our video transcription, right?

We call Claude once again to create a nice description for us:

protected function generateDescription(string $srtContent): string
    {
        $response =  Prism::text()
            ->using(Provider::Anthropic, 'claude-3-5-sonnet-latest')
            ->withPrompt("From this SRT file, generate a description for the video. The description must be in the same language as the SRT file. Here's the SRT content:\n\n{$srtContent}")
            ->withSystemPrompt("You are an expert at analyzing video transcripts and identifying main arguments. Do not talk in third person, use the I form. Be concise and clear and friendly.
             Always return your response in the same language as the SRT file. The response must be in Markdown compatible with YouTube. Just return the description without anything else. ")
            ->withMaxTokens(300)
            ->generate();


        return Str::trim($response->text);
    }

In the example we ask explicitly for a maximum number of tokens (300): we don’t want the description to be too long.

YouTube Data Api

After receiving the JSON, we can invoke the YouTube Data API. Oh, I forgot.. you video needs to be already on YouTube for this code to work. It doesn’t need to be a public video and it’s probably better that you keep it private until you ran the code so you could have enough time to edit some parts of the description or chapters.

protected function setYouTubeMarkers(array $arguments,string $description): void
    {
        $client = new \Google\Client();
        $client->setAuthConfig(Storage::path('google-credentials.json'));
        $client->setDeveloperKey(config('services.google.api_key')); // Add this line

        $client->addScope(\Google_Service_YouTube::YOUTUBE_FORCE_SSL);

        // Load previously stored access token
        if (Storage::exists('google-access-token.json')) {
            $accessToken = json_decode(Storage::get('google-access-token.json'), true);
            $client->setAccessToken($accessToken);
        }

        // If token is expired, refresh it
        if ($client->isAccessTokenExpired() &&  Storage::exists('google-refresh-token.json')) {
            $client->fetchAccessTokenWithRefreshToken(Storage::get('google-refresh-token.json'));
            Storage::put('google-access-token.json', json_encode($client->getAccessToken()));
        }

        $youtube = new \Google_Service_YouTube($client);


        try {
            // Get current video details
            $video = $youtube->videos->listVideos('snippet', ['id' => $this->videoId]);

            if (empty($video->items)) {
                throw new Exception('Video not found');
            }

            $videoSnippet = $video->items[0]->getSnippet();


            // Format timestamps for YouTube
            $timestampText = "";
            foreach ($arguments as $argument) {
                $timestampText .= "{$argument['timestamp']} - {$argument['title']}\n";
            }

            // Combine existing description with timestamps
            $newDescription = $description . "\n\n" .  $timestampText;

            // Create update request
            $updateVideo = new \Google_Service_YouTube_Video();
            $updateVideo->setId($this->videoId);  // Add video ID

            $videoSnippet->setDescription($newDescription);
            $updateVideo->setSnippet($videoSnippet);

            $youtube->videos->update('snippet', $updateVideo);

        } catch (Exception $e) {
            logger()->error('Failed to update YouTube video:', [
                'error' => $e->getMessage(),
                'videoId' => $this->videoId,
            ]);
            throw $e;
        }
    }

If everything goes according to plan, we now have a nice description and our chapters ready to go!

I hope this short article was OK with you, please let me know what you think about it. You could reach me on X (@sfolador).

You can find all the code in this public Github repository: https://github.com/sfolador/youtube-markers.

Have a look at the video on my YouTube Channel: