Add support for multiple keys to be resilient against exceeded quota errors #6

Closed
opened 2023-01-02 19:50:26 +01:00 by Benjamin_Loison · 2 comments

While #4 avoids this improvement, doing it would avoid using a proxy and so adding latency (in addition that the https://yt.lemnoslife.com no-key service has two logical cores and so wouldn't be able to manage requests from multi-threaded processes).

I have a personal YouTube Data API v3 API key with 16,000,000 quota per day and 180,000 per minute. As 180,000 * 60 * 24 = 259,200,000 >> 16,000,000 our limitation will be the quota per day, as our algorithm have in average no special quota burning period.

From my home server and my VPS it takes respectively 250 ms and 175 ms to complete (tested twice):

time curl 'https://www.googleapis.com/youtube/v3/commentThreads?part=snippet,replies&allThreadsRelatedToChannelId=UC4QobU6STFB0P71PMvOGN5A&maxResults=100&key=AIzaSy...' > /dev/null

So can assume that in the best case in average it takes 50 ms to complete a request to YouTube Data API v3.

Both YouTube Data API v3 CommentThreads: list and Comments: list endpoints cost 1 quota per call.

So per day it limits us to 16,000,000 / (3,600 * 24 * 20) = 9(.259) threads.
Will work with this multi-threading limitation for the moment as it's already a good start. Even with 8 logical cores, as threads are idling while waiting response from YouTube Data API v3 servers and treating the responses is quite trivial (only some mutex and logging stuff), we could go far beyond this limit.

Related to #7 and #9.

While #4 avoids this improvement, doing it would avoid using a proxy and so adding latency (in addition that the https://yt.lemnoslife.com no-key service has two logical cores and so wouldn't be able to manage requests from multi-threaded processes). I have a personal YouTube Data API v3 API key with 16,000,000 quota per day and 180,000 per minute. As 180,000 * 60 * 24 = 259,200,000 >> 16,000,000 our limitation will be the quota per day, as our algorithm have in average no special quota burning period. From my home server and my VPS it takes respectively 250 ms and 175 ms to complete (tested twice): ```sh time curl 'https://www.googleapis.com/youtube/v3/commentThreads?part=snippet,replies&allThreadsRelatedToChannelId=UC4QobU6STFB0P71PMvOGN5A&maxResults=100&key=AIzaSy...' > /dev/null ``` So can assume that in the best case in average it takes 50 ms to complete a request to YouTube Data API v3. Both YouTube Data API v3 [CommentThreads: list](https://developers.google.com/youtube/v3/docs/commentThreads/list) and [Comments: list](https://developers.google.com/youtube/v3/docs/comments/list) endpoints cost 1 quota per call. So per day it limits us to 16,000,000 / (3,600 * 24 * 20) = 9(.259) threads. Will work with this multi-threading limitation for the moment as it's already a good start. Even with 8 logical cores, as threads are idling while waiting response from YouTube Data API v3 servers and treating the responses is quite trivial (only some mutex and logging stuff), we could go far beyond this limit. Related to #7 and #9.
Benjamin_Loison changed title from Add support for multiple keys to Add support for multiple keys to be resilient against exceeded quota errors 2023-01-02 19:50:43 +01:00
Author
Owner

With 10 threads, for no clear reason the 8 logical cores are used at the maximum of their capacity according to htop, that should be investigated (finally treated in fdfec17817 thanks to a different search approach for already treated channels). However with such a number of threads I observed that in practice we treat about 100,000 comments per minute.

This might highlight the importance of being more precise with the data we store (cf #2).

With 10 threads, for no clear reason the 8 logical cores are used at the maximum of their capacity according to `htop`, that should be investigated (finally treated in fdfec17817 thanks to a different search approach for already treated channels). However with such a number of threads I observed that in practice we treat about 100,000 comments per minute. This might highlight the importance of being more precise with the data we store (cf #2).
Benjamin_Loison added the
medium
label 2023-01-06 18:50:46 +01:00
Benjamin_Loison added the
enhancement
label 2023-01-06 18:52:52 +01:00
Benjamin_Loison added the
medium priority
label 2023-01-06 19:33:01 +01:00
Author
Owner

Making statistics of estimated quota per key on the official instance no-key service would be interesting to know how many threads can I use.

No need to especially make statistics about quota usage, as we can't do better anyway. However it will possibly be interesting if we reach the quota limit of all keys of the YouTube operational API no-key service. Otherwise we could use a dichotomy approach on the number of threads but we have to keep in mind that these keys are used by their original users and by the no-key service, so we can't make any guarantee that they even have any quota.

Making statistics of estimated quota per key on the official instance no-key service would be interesting to know how many threads can I use. No need to especially make statistics about quota usage, as we can't do better anyway. However it will possibly be interesting if we reach the quota limit of all keys of the YouTube operational API no-key service. Otherwise we could use a dichotomy approach on the number of threads but we have to keep in mind that these keys are used by their original users and by the no-key service, so we can't make any guarantee that they even have any quota.
Benjamin_Loison added
high priority
and removed
medium
labels 2023-01-08 16:59:49 +01:00
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: Benjamin_Loison/YouTube_captions_search_engine#6
No description provided.