Update README.md to remove possibility to proceed using YouTube Data API v3 CommentThreads: list endpoint with allThreadsRelatedToChannelId filter

As we want to retrieve as many comments as possible, we have to proceed video per video, as [`3F8dFt8LsXY`](https://www.youtube.com/watch?v=3F8dFt8LsXY) for instance has comments but using YouTube Data API v3 CommentThreads: list endpoint with `allThreadsRelatedToChannelId` filter returns for `UCWIdqSQekeGmUWlSFeCiEnA`:
```json
{
  "error": {
    "code": 403,
    "message": "The video identified by the \u003ccode\u003e\u003ca href=\"/youtube/v3/docs/commentThreads/list#videoId\"\u003evideoId\u003c/a\u003e\u003c/code\u003e parameter has disabled comments.",
    "errors": [
      {
        "message": "The video identified by the \u003ccode\u003e\u003ca href=\"/youtube/v3/docs/commentThreads/list#videoId\"\u003evideoId\u003c/a\u003e\u003c/code\u003e parameter has disabled comments.",
        "domain": "youtube.commentThread",
        "reason": "commentsDisabled",
        "location": "videoId",
        "locationType": "parameter"
      }
    ]
  }
}
```
This commit is contained in:
Benjamin Loison 2022-12-21 23:49:27 +01:00
parent c828b118d3
commit 3aa9947f8e
Signed by: Benjamin_Loison
SSH Key Fingerprint: SHA256:BtnEgYTlHdOg1u+RmYcDE0mnfz1rhv5dSbQ2gyxW8B8

View File

@ -1,10 +1,9 @@
As explained in the project proposal, the idea to retrieve all video ids is to start from a starting set of channels, then list their videos using YouTube Data API v3 PlaylistItems: list, then list the comments on their videos and then restart the process as we potentially retrieved new channels thanks to comment authors on videos from already known channels.
For a given channel, there are two ways to list comments users published on it:
1. As explained, YouTube Data API v3 PlaylistItems: list endpoint enables us to list the channel videos **(isn't there any limit?)** and CommentThreads: list and Comments: list endpoints enable us to retrieve their comments
2. A simpler approach consists in using YouTube Data API v3 CommentThreads: list endpoint with `allThreadsRelatedToChannelId`. The main upside of this method, in addition to be simpler, is that for channels with many videos we spare much time by working 100 comments at a time instead of a video at a time with possibly not a single comment. Note that this approach doesn't list all videos etc so we don't retrieve some information. **As I haven't gone this way previously (or I forgot) making sure that for a given video we retrieve all its comments would make sense. Note that this approach may doesn't work for some *peppa-pig* like channels, as in my other project.**
For a given channel, there is a single way to list comments users published on it:
As explained, YouTube Data API v3 PlaylistItems: list endpoint enables us to list the channel videos **(isn't there any limit?)** and CommentThreads: list and Comments: list endpoints enable us to retrieve their comments
We can multi-thread this process by channel in both cases, however for the first one we can multi-thread per videos.
As would like to proceed channel per channel, the question is **how much time does it take to retrieve all comments from the biggest YouTube channel? In case of willing to support *peppa-pig* like channels (if there is a problem for them (cf above)), this question doesn't matter, as we would be enforced to work video per video.**
We can multi-thread this process by channel or we can multi-thread per videos of a given channel.
As would like to proceed channel per channel, the question is **how much time does it take to retrieve all comments from the biggest YouTube channel? If the answer is a long period of time, then multi-threading per videos of a given channel may make sense.**
Have to proceed with a breadth-first search approach as treating all *child* channels might take a time equivalent to treating the whole original tree.