YouTube UI and YouTube Data API v3 search feature doesn't browse through not auto-generated video captions.
Go to file
2023-01-06 16:09:12 +01:00
channels.txt #2: Add data logging 2023-01-02 19:46:32 +01:00
findTreatedChannelWithMostComments.py Add {removeChannelsBeingTreated, findTreatedChannelWithMost{Comments, Subscribers}}.py 2023-01-04 02:41:07 +01:00
findTreatedChannelWithMostSubscribers.py Add {removeChannelsBeingTreated, findTreatedChannelWithMost{Comments, Subscribers}}.py 2023-01-04 02:41:07 +01:00
LICENSE #1: Add GNU AGPLv3 license 2023-01-06 16:09:12 +01:00
main.cpp Add try/catch around json parser 2023-01-06 00:31:05 +01:00
Makefile Make #7: Add multi-threading compatible with my Debian setup 2023-01-04 02:51:40 +01:00
README.md Add main.cpp, Makefile and channelsToTreat.txt 2022-12-22 05:20:32 +01:00
removeChannelsBeingTreated.py Modify removeChannelsBeingTreated.py to be more resilient against not existing files in the treatment process 2023-01-04 03:10:28 +01:00

As explained in the project proposal, the idea to retrieve all video ids is to start from a starting set of channels, then list their videos using YouTube Data API v3 PlaylistItems: list, then list the comments on their videos and then restart the process as we potentially retrieved new channels thanks to comment authors on videos from already known channels.

For a given channel, there are two ways to list comments users published on it:

  1. As explained, YouTube Data API v3 PlaylistItems: list endpoint enables us to list the channel videos up to 20,000 videos (so we will not treat and write down channels in this case) and CommentThreads: list and Comments: list endpoints enable us to retrieve their comments
  2. A simpler approach consists in using YouTube Data API v3 CommentThreads: list endpoint with allThreadsRelatedToChannelId. The main upside of this method, in addition to be simpler, is that for channels with many videos we spare much time by working 100 comments at a time instead of a video at a time with possibly not a single comment. Note that this approach doesn't list all videos etc so we don't retrieve some information. Note that this approach doesn't work for some channels that have comments enabled on some videos but not the whole channels. So when possible we will proceed with 2. and use 1. as a fallback approach.

We can multi-thread this process by channel or we can multi-thread per videos of a given channel (loosing optimization of CommentThreads: list with allThreadsRelatedToChannelId). In any case we shouldn't do something hybrid in terms of multi-threading, as it would be too complex. As would like to proceed channel per channel, the question is how much time does it take to retrieve all comments from the biggest YouTube channel? If the answer is a long period of time, then multi-threading per videos of a given channel may make sense. There are two possibilities following our methods:

  1. Here the complexity is linear in the number of channel's comments, more precisely this number divided by 100 - we could guess that the channel with the most subscribers (T-Series) has the most comments
  2. Here the complexity is linear in the number of videos - as far as I know RoelVandePaar has the most videos, 2,026,566 according to SocialBlade. However due to the 20,000 limit of YouTube Data API v3 PlaylistItems: list the actual limit is 20,000 as far as I know.

Have to proceed with a breadth-first search approach as treating all child channels might take a time equivalent to treating the whole original tree.

sudo apt install nlohmann-json3-dev
make
./main