Benjamin Loison
3aa9947f8e
As we want to retrieve as many comments as possible, we have to proceed video per video, as [`3F8dFt8LsXY`](https://www.youtube.com/watch?v=3F8dFt8LsXY) for instance has comments but using YouTube Data API v3 CommentThreads: list endpoint with `allThreadsRelatedToChannelId` filter returns for `UCWIdqSQekeGmUWlSFeCiEnA`: ```json { "error": { "code": 403, "message": "The video identified by the \u003ccode\u003e\u003ca href=\"/youtube/v3/docs/commentThreads/list#videoId\"\u003evideoId\u003c/a\u003e\u003c/code\u003e parameter has disabled comments.", "errors": [ { "message": "The video identified by the \u003ccode\u003e\u003ca href=\"/youtube/v3/docs/commentThreads/list#videoId\"\u003evideoId\u003c/a\u003e\u003c/code\u003e parameter has disabled comments.", "domain": "youtube.commentThread", "reason": "commentsDisabled", "location": "videoId", "locationType": "parameter" } ] } } ```
10 lines
1.2 KiB
Markdown
10 lines
1.2 KiB
Markdown
As explained in the project proposal, the idea to retrieve all video ids is to start from a starting set of channels, then list their videos using YouTube Data API v3 PlaylistItems: list, then list the comments on their videos and then restart the process as we potentially retrieved new channels thanks to comment authors on videos from already known channels.
|
|
|
|
For a given channel, there is a single way to list comments users published on it:
|
|
As explained, YouTube Data API v3 PlaylistItems: list endpoint enables us to list the channel videos **(isn't there any limit?)** and CommentThreads: list and Comments: list endpoints enable us to retrieve their comments
|
|
|
|
We can multi-thread this process by channel or we can multi-thread per videos of a given channel.
|
|
As would like to proceed channel per channel, the question is **how much time does it take to retrieve all comments from the biggest YouTube channel? If the answer is a long period of time, then multi-threading per videos of a given channel may make sense.**
|
|
|
|
Have to proceed with a breadth-first search approach as treating all *child* channels might take a time equivalent to treating the whole original tree.
|