diff --git a/Project-proposal.md b/Project-proposal.md index 0332a5e..c9f42a2 100644 --- a/Project-proposal.md +++ b/Project-proposal.md @@ -18,7 +18,7 @@ Note that by framing our query with `"` we filter content containing strictly th Focusing on French channels would restrict the dataset we are looking for, however as I experienced (notably with [this implementation](https://github.com/Benjamin-Loison/YouTube-comments-graph/blob/9802fd2c5d11c6dd866f4e39343630b98a01b4e3/CPP/main.cpp#L1140-L1233)) the country isn't given for every YouTube channel so not restricting on a given country sounds like a less feverish approach. -YouTube Data API v3 interesting [Captions: download](https://developers.google.com/youtube/v3/docs/captions/download) endpoint is only usable by the channel owning the given videos we want the captions of (source: comments of [this StackOverflow answer](https://stackoverflow.com/a/30660549), I verified this fact). +YouTube Data API v3 interesting [Captions: download](https://developers.google.com/youtube/v3/docs/captions/download) endpoint is only usable by the channel owning the given videos we want the captions of (source: [this StackOverflow comment](https://stackoverflow.com/questions/30653865/downloading-captions-always-returns-a-403#comment49414961_30660549), I verified this fact). I know how to retrieve captions of a video using [a reverse-engineered approach](https://stackoverflow.com/a/70013529) I developed, but we will try to focus on less technical tools such as `yt-dlp` to get the captions of videos. To retrieve not auto-generated captions `yt-dlp --all-subs --skip-download 'VIDEO_ID'` works fine, however both `youtube-dl --write-auto-sub --skip-download 'VIDEO_ID'` and `yt-dlp --write-auto-sub --skip-download 'VIDEO_ID'` return incorrect format files. If we have time, we will try to also download auto-generated video captions to be able to make comparison of our results with YouTube ones, so maybe by using a reverse-engineering approach (this works for sure).