Update README.md
to clean notes concerning optimized approaches
This commit is contained in:
parent
daf14d4b5b
commit
d776c09fec
@ -1,12 +1,11 @@
|
|||||||
As explained in the project proposal, the idea to retrieve all video ids is to start from a starting set of channels, then list their videos using YouTube Data API v3 PlaylistItems: list, then list the comments on their videos and then restart the process as we potentially retrieved new channels thanks to comment authors on videos from already known channels.
|
As explained in the project proposal, the idea to retrieve all video ids is to start from a starting set of channels, then list their videos using YouTube Data API v3 PlaylistItems: list, then list the comments on their videos and then restart the process as we potentially retrieved new channels thanks to comment authors on videos from already known channels.
|
||||||
|
|
||||||
For a given channel, there is a single way to list comments users published on it:
|
For a given channel, there are two ways to list comments users published on it:
|
||||||
As explained, YouTube Data API v3 PlaylistItems: list endpoint enables us to list the channel videos up to 20,000 videos and CommentThreads: list and Comments: list endpoints enable us to retrieve their comments
|
1. As explained, YouTube Data API v3 PlaylistItems: list endpoint enables us to list the channel videos up to 20,000 videos (so we will not treat and write down channels in this case) and CommentThreads: list and Comments: list endpoints enable us to retrieve their comments
|
||||||
|
2. A simpler approach consists in using YouTube Data API v3 CommentThreads: list endpoint with `allThreadsRelatedToChannelId`. The main upside of this method, in addition to be simpler, is that for channels with many videos we spare much time by working 100 comments at a time instead of a video at a time with possibly not a single comment. Note that this approach doesn't list all videos etc so we don't retrieve some information. **As I haven't gone this way previously (or I forgot) making sure that for a given video we retrieve all its comments would make sense.** Note that this approach doesn't work for some channels that have comments enabled on some videos but not the whole channels.**
|
||||||
|
So when possible we will proceed with 2. and use 1. as a fallback approach.
|
||||||
|
|
||||||
We can multi-thread this process by channel or we can multi-thread per videos of a given channel.
|
We can multi-thread this process by channel or we can multi-thread per videos of a given channel.
|
||||||
As would like to proceed channel per channel, the question is **how much time does it take to retrieve all comments from the biggest YouTube channel? If the answer is a long period of time, then multi-threading per videos of a given channel may make sense.**
|
As would like to proceed channel per channel, the question is **how much time does it take to retrieve all comments from the biggest YouTube channel? If the answer is a long period of time, then multi-threading per videos of a given channel may make sense.**
|
||||||
|
|
||||||
**In fact should proceed fastly with CommentThreads: list with `allThreads...` when possible**
|
|
||||||
**do I have an example of channels where commentthreads: list work but doesn't list a comment of a video ... ?**
|
|
||||||
|
|
||||||
Have to proceed with a breadth-first search approach as treating all *child* channels might take a time equivalent to treating the whole original tree.
|
Have to proceed with a breadth-first search approach as treating all *child* channels might take a time equivalent to treating the whole original tree.
|
||||||
|
Loading…
Reference in New Issue
Block a user