From b4e99c1ecab45d6d63a027710f5ac4a10b1a88a0 Mon Sep 17 00:00:00 2001 From: Benjamin Loison Date: Wed, 21 Dec 2022 23:46:14 +0100 Subject: [PATCH] Add `README.md` with first sketching questions --- README.md | 10 ++++++++++ 1 file changed, 10 insertions(+) create mode 100644 README.md diff --git a/README.md b/README.md new file mode 100644 index 0000000..0cf87e5 --- /dev/null +++ b/README.md @@ -0,0 +1,10 @@ +As explained in the project proposal, the idea to retrieve all video ids is to start from a starting set of channels, then list their videos using YouTube Data API v3 PlaylistItems: list, then list the comments on their videos and then restart the process as we potentially retrieved new channels thanks to comment authors on videos from already known channels. + +For a given channel, there are two ways to list comments users published on it: +1. As explained, YouTube Data API v3 PlaylistItems: list endpoint enables us to list the channel videos **(isn't there any limit?)** and CommentThreads: list and Comments: list endpoints enable us to retrieve their comments +2. A simpler approach consists in using YouTube Data API v3 CommentThreads: list endpoint with `allThreadsRelatedToChannelId`. The main upside of this method, in addition to be simpler, is that for channels with many videos we spare much time by working 100 comments at a time instead of a video at a time with possibly not a single comment. Note that this approach doesn't list all videos etc so we don't retrieve some information. **As I haven't gone this way previously (or I forgot) making sure that for a given video we retrieve all its comments would make sense. Note that this approach may doesn't work for some *peppa-pig* like channels, as in my other project.** + +We can multi-thread this process by channel in both cases, however for the first one we can multi-thread per videos. +As would like to proceed channel per channel, the question is **how much time does it take to retrieve all comments from the biggest YouTube channel? In case of willing to support *peppa-pig* like channels (if there is a problem for them (cf above)), this question doesn't matter, as we would be enforced to work video per video.** + +Have to proceed with a breadth-first search approach as treating all *child* channels might take a time equivalent to treating the whole original tree.