Add: Note that we aren't interested in Auto-translate
d captions.
parent
0bd7a90c74
commit
b202a37560
@ -20,6 +20,6 @@ Focusing on French channels would restrict the dataset we are looking for, howev
|
||||
|
||||
YouTube Data API v3 interesting [Captions: download](https://developers.google.com/youtube/v3/docs/captions/download) endpoint is only usable by the channel owning the given videos we want the captions of (source: [this StackOverflow comment](https://stackoverflow.com/questions/30653865/downloading-captions-always-returns-a-403#comment49414961_30660549), I verified this fact).
|
||||
|
||||
I know how to retrieve captions of a video using [a reverse-engineered approach](https://stackoverflow.com/a/70013529) I developed, but we will try to focus on less technical tools such as `yt-dlp` to get the captions of videos. To retrieve not auto-generated captions `yt-dlp --all-subs --skip-download 'VIDEO_ID'` works fine, however both `youtube-dl --write-auto-sub --skip-download 'VIDEO_ID'` and `yt-dlp --write-auto-sub --skip-download 'VIDEO_ID'` return incorrect format files even with latest releases. Nevertheless using `yt-dlp --write-auto-subs --sub-format ttml --convert-subs vtt --skip-download 'VIDEO_ID'` works (source: [this Stack Overflow answer](https://stackoverflow.com/a/74935253)). If we have time, we will try to also download auto-generated video captions to be able to make comparison of our results with YouTube ones, so maybe by using a reverse-engineering approach (this works for sure).
|
||||
I know how to retrieve captions of a video using [a reverse-engineered approach](https://stackoverflow.com/a/70013529) I developed, but we will try to focus on less technical tools such as `yt-dlp` to get the captions of videos. To retrieve not auto-generated captions `yt-dlp --all-subs --skip-download 'VIDEO_ID'` works fine, however both `youtube-dl --write-auto-sub --skip-download 'VIDEO_ID'` and `yt-dlp --write-auto-sub --skip-download 'VIDEO_ID'` return incorrect format files even with latest releases. Nevertheless using `yt-dlp --write-auto-subs --sub-format ttml --convert-subs vtt --skip-download 'VIDEO_ID'` works (source: [this Stack Overflow answer](https://stackoverflow.com/a/74935253)). If we have time, we will try to also download auto-generated video captions to be able to make comparison of our results with YouTube ones, so maybe by using a reverse-engineering approach (this works for sure). Note that we aren't interested in `Auto-translate`d captions.
|
||||
|
||||
As I answered to [this StackOverflow question](https://stackoverflow.com/q/68970958), as [YouTube Data API v3 doesn't propose a way to enumerate all videos (even for just a country)](https://github.com/Benjamin-Loison/YouTube-comments-graph/issues/2), the idea to retrieve all video ids is to start from a starting set of channels, then list their videos using YouTube Data API v3 [PlaylistItems: list](https://stackoverflow.com/a/74579030), then list the comments on their videos and then restart the process as we potentially retrieved new channels thanks to comment authors on videos from already known channels.
|
||||
|
Loading…
Reference in New Issue
Block a user