Precise that have problems for automatically generated captions even with latest releases but there is a workaround
parent
f58dfdf16f
commit
0bd7a90c74
@ -20,6 +20,6 @@ Focusing on French channels would restrict the dataset we are looking for, howev
|
|||||||
|
|
||||||
YouTube Data API v3 interesting [Captions: download](https://developers.google.com/youtube/v3/docs/captions/download) endpoint is only usable by the channel owning the given videos we want the captions of (source: [this StackOverflow comment](https://stackoverflow.com/questions/30653865/downloading-captions-always-returns-a-403#comment49414961_30660549), I verified this fact).
|
YouTube Data API v3 interesting [Captions: download](https://developers.google.com/youtube/v3/docs/captions/download) endpoint is only usable by the channel owning the given videos we want the captions of (source: [this StackOverflow comment](https://stackoverflow.com/questions/30653865/downloading-captions-always-returns-a-403#comment49414961_30660549), I verified this fact).
|
||||||
|
|
||||||
I know how to retrieve captions of a video using [a reverse-engineered approach](https://stackoverflow.com/a/70013529) I developed, but we will try to focus on less technical tools such as `yt-dlp` to get the captions of videos. To retrieve not auto-generated captions `yt-dlp --all-subs --skip-download 'VIDEO_ID'` works fine, however both `youtube-dl --write-auto-sub --skip-download 'VIDEO_ID'` and `yt-dlp --write-auto-sub --skip-download 'VIDEO_ID'` return incorrect format files. If we have time, we will try to also download auto-generated video captions to be able to make comparison of our results with YouTube ones, so maybe by using a reverse-engineering approach (this works for sure).
|
I know how to retrieve captions of a video using [a reverse-engineered approach](https://stackoverflow.com/a/70013529) I developed, but we will try to focus on less technical tools such as `yt-dlp` to get the captions of videos. To retrieve not auto-generated captions `yt-dlp --all-subs --skip-download 'VIDEO_ID'` works fine, however both `youtube-dl --write-auto-sub --skip-download 'VIDEO_ID'` and `yt-dlp --write-auto-sub --skip-download 'VIDEO_ID'` return incorrect format files even with latest releases. Nevertheless using `yt-dlp --write-auto-subs --sub-format ttml --convert-subs vtt --skip-download 'VIDEO_ID'` works (source: [this Stack Overflow answer](https://stackoverflow.com/a/74935253)). If we have time, we will try to also download auto-generated video captions to be able to make comparison of our results with YouTube ones, so maybe by using a reverse-engineering approach (this works for sure).
|
||||||
|
|
||||||
As I answered to [this StackOverflow question](https://stackoverflow.com/q/68970958), as [YouTube Data API v3 doesn't propose a way to enumerate all videos (even for just a country)](https://github.com/Benjamin-Loison/YouTube-comments-graph/issues/2), the idea to retrieve all video ids is to start from a starting set of channels, then list their videos using YouTube Data API v3 [PlaylistItems: list](https://stackoverflow.com/a/74579030), then list the comments on their videos and then restart the process as we potentially retrieved new channels thanks to comment authors on videos from already known channels.
|
As I answered to [this StackOverflow question](https://stackoverflow.com/q/68970958), as [YouTube Data API v3 doesn't propose a way to enumerate all videos (even for just a country)](https://github.com/Benjamin-Loison/YouTube-comments-graph/issues/2), the idea to retrieve all video ids is to start from a starting set of channels, then list their videos using YouTube Data API v3 [PlaylistItems: list](https://stackoverflow.com/a/74579030), then list the comments on their videos and then restart the process as we potentially retrieved new channels thanks to comment authors on videos from already known channels.
|
||||||
|
Loading…
Reference in New Issue
Block a user