Update README.md to remove possibility to proceed using YouTube Data API v3 CommentThreads: list endpoint with allThreadsRelatedToChannelId filter

As we want to retrieve as many comments as possible, we have to proceed video per video, as [`3F8dFt8LsXY`](https://www.youtube.com/watch?v=3F8dFt8LsXY) for instance has comments but using YouTube Data API v3 CommentThreads: list endpoint with `allThreadsRelatedToChannelId` filter returns for `UCWIdqSQekeGmUWlSFeCiEnA`:
```json
{
  "error": {
    "code": 403,
    "message": "The video identified by the \u003ccode\u003e\u003ca href=\"/youtube/v3/docs/commentThreads/list#videoId\"\u003evideoId\u003c/a\u003e\u003c/code\u003e parameter has disabled comments.",
    "errors": [
      {
        "message": "The video identified by the \u003ccode\u003e\u003ca href=\"/youtube/v3/docs/commentThreads/list#videoId\"\u003evideoId\u003c/a\u003e\u003c/code\u003e parameter has disabled comments.",
        "domain": "youtube.commentThread",
        "reason": "commentsDisabled",
        "location": "videoId",
        "locationType": "parameter"
      }
    ]
  }
}
```
This commit is contained in:
Benjamin Loison 2022-12-21 23:49:27 +01:00
parent b4e99c1eca
commit db3db57a9f

View File

@ -1,10 +1,9 @@
As explained in the project proposal, the idea to retrieve all video ids is to start from a starting set of channels, then list their videos using YouTube Data API v3 PlaylistItems: list, then list the comments on their videos and then restart the process as we potentially retrieved new channels thanks to comment authors on videos from already known channels. As explained in the project proposal, the idea to retrieve all video ids is to start from a starting set of channels, then list their videos using YouTube Data API v3 PlaylistItems: list, then list the comments on their videos and then restart the process as we potentially retrieved new channels thanks to comment authors on videos from already known channels.
For a given channel, there are two ways to list comments users published on it: For a given channel, there is a single way to list comments users published on it:
1. As explained, YouTube Data API v3 PlaylistItems: list endpoint enables us to list the channel videos **(isn't there any limit?)** and CommentThreads: list and Comments: list endpoints enable us to retrieve their comments As explained, YouTube Data API v3 PlaylistItems: list endpoint enables us to list the channel videos **(isn't there any limit?)** and CommentThreads: list and Comments: list endpoints enable us to retrieve their comments
2. A simpler approach consists in using YouTube Data API v3 CommentThreads: list endpoint with `allThreadsRelatedToChannelId`. The main upside of this method, in addition to be simpler, is that for channels with many videos we spare much time by working 100 comments at a time instead of a video at a time with possibly not a single comment. Note that this approach doesn't list all videos etc so we don't retrieve some information. **As I haven't gone this way previously (or I forgot) making sure that for a given video we retrieve all its comments would make sense. Note that this approach may doesn't work for some *peppa-pig* like channels, as in my other project.**
We can multi-thread this process by channel in both cases, however for the first one we can multi-thread per videos. We can multi-thread this process by channel or we can multi-thread per videos of a given channel.
As would like to proceed channel per channel, the question is **how much time does it take to retrieve all comments from the biggest YouTube channel? In case of willing to support *peppa-pig* like channels (if there is a problem for them (cf above)), this question doesn't matter, as we would be enforced to work video per video.** As would like to proceed channel per channel, the question is **how much time does it take to retrieve all comments from the biggest YouTube channel? If the answer is a long period of time, then multi-threading per videos of a given channel may make sense.**
Have to proceed with a breadth-first search approach as treating all *child* channels might take a time equivalent to treating the whole original tree. Have to proceed with a breadth-first search approach as treating all *child* channels might take a time equivalent to treating the whole original tree.