43 Commits

Author SHA1 Message Date
Make the first channel of channels.txt being treated again, solve temporary empty response from YouTube Data API v3 issue and temporarily remove sanity check failing very rarely #39 2023-02-14 23:15:07 +01:00
Make the COMMUNITY tab process not infinitely loop
Related to https://github.com/Benjamin-Loison/YouTube-operational-API/issues/49
2023-02-13 06:17:23 +01:00
Fix #37: Use a number of channels seen (possibly repeated) instead of YouTube Data API v3 Comment(Thread): resource 2023-02-12 16:31:27 +01:00
Add logging to exec and make it crashless, requests and captions folders support for compressing, clean captions support for videos being livestreams and videos starting with - 2023-02-12 16:24:16 +01:00
Move YouTube API requests logging to requests/ channel sub-folder 2023-02-10 20:17:49 +01:00
Fix #13: Add captions extraction
I was about to commit in addition:

// Due to videos with automatically generated captions but being set to `Off` by default aren't retrieved with `--sub-langs '.*orig'`.
// My workaround is to first call YouTube Data API v3 Captions: list endpoint with `part=snippet` and retrieve the language that has `"trackKind": "asr"` (automatic speech recognition) in `snippet`.
/*json data = getJson(threadId, "captions?part=snippet&videoId=" + videoId, true, channelToTreat),
     items = data["items"];
for(const auto& item : items)
    json snippet = item["snippet"];
    if(snippet["trackKind"] == "asr")
        string language = snippet["language"];
        cmd = cmdCommonPrefix + "--write-auto-subs --sub-langs '" + language + "-orig' --sub-format ttml --convert-subs vtt" + cmdCommonPostfix;
        exec(threadId, cmd);
        // As there should be a single automatic speech recognized track, there is no need to go through all tracks.

Instead of:

cmd = cmdCommonPrefix + "--write-auto-subs --sub-langs '.*orig' --sub-format ttml --convert-subs vtt" + cmdCommonPostfix;
exec(threadId, cmd);

But I realized that, as the GitHub comment I was about to add to https://github.com/yt-dlp/yt-dlp/issues/2655, I was

> `yt-dlp --cookies cookies.txt --sub-langs 'en.*,.*orig' --write-auto-subs https://www.youtube.com/watch?v=tQqDBySHYlc` work as expected. Many thanks again.
> ```
> 'subtitleslangs': ['en.*','.*orig'],
> 'writeautomaticsub': True,
> ```
> Work as expected too. Thank you
> Very sorry for the video sample. I even not watched it.

Thank you for this workaround. However note that videos having automatically generated subtitles but being set to `Off` by default aren't retrieved with your method (example of such video: [`mozyXsZJnQ4`](https://www.youtube.com/watch?v=mozyXsZJnQ4)). My workaround is to first call [YouTube Data API v3](https://developers.google.com/youtube/v3) [Captions: list](https://developers.google.com/youtube/v3/docs/captions/list) endpoint with [`part=snippet`](https://developers.google.com/youtube/v3/docs/captions/list#part) and retrieve the [`language`](https://developers.google.com/youtube/v3/docs/captions#snippet.language) that has [`"trackKind": "asr"`](https://developers.google.com/youtube/v3/docs/captions#snippet.trackKind) (automatic speech recognition) in [`snippet`](https://developers.google.com/youtube/v3/docs/captions#snippet).
2023-02-10 20:03:08 +01:00
Fix #36: Make the program stops by crashing on YouTube operational API instance being detected as sending unusual traffic 2023-02-10 12:02:39 +01:00
Correct the termination of COMMUNITY tab process due to missing page tokens 2023-02-10 00:37:28 +01:00
Add a verification that snippet/authorChannelId/value isn't null when using commentThreads for COMMUNITY
As it can happen cf https://www.youtube.com/channel/UCWeg2Pkate69NFdBeuRFTAw/community?lc=UgwGfjNxGuwqP8qYPPN4AaABAg&lb=UgkxYiEAo9-b1vWPasxFy13f959rrctQpZwW
2023-02-09 01:51:22 +01:00
Add in urls.txt if the URL is related to YouTube Data API v3 or YouTube operational API 2023-02-08 16:05:03 +01:00
Fix #34: Correct JSON files by putting first line in another metadata file 2023-02-07 23:08:09 +01:00
Correct typo: the channel tab is LIVE, not LIVES 2023-01-25 01:00:29 +01:00
Introduce {,MAIN_}EXIT_WITH_ERROR macros for exitting with an error 2023-01-22 15:17:14 +01:00
#11: Add the discovering of channels having commented on ended livestreams 2023-01-22 15:15:27 +01:00
#11: Add current livestreams support to discover channels 2023-01-22 04:00:11 +01:00
Instead of looping on items where we expect only one to be, we just use items[0] 2023-01-22 02:19:26 +01:00
Make PRINT not requiring to precise threadId 2023-01-22 02:04:03 +01:00
#11: Treat COMMUNITY post comments to discover channels 2023-01-22 01:37:32 +01:00
#11: Update channel CHANNELS tab treatment following YouTube-operational-API/issues/121 closure 2023-01-21 02:24:42 +01:00
#11: Add the treatment of channels' tab, but only postpone unlisted videos treatment 2023-01-15 14:56:44 +01:00
#7: Make commentsCount and requestsPerChannel compatible with multithreading 2023-01-15 14:31:55 +01:00
#11: Add a first iteration for the CHANNELS retrieval 2023-01-15 02:19:31 +01:00
#11: Add --youtube-operational-api-instance-url parameter and use exit(EXIT_{SUCCESS, FAILURE}) instead of exit({0, 1}) 2023-01-15 00:49:32 +01:00
Fix #26: Keep efficient search algorithm while keeping order (notably of the starting set) 2023-01-14 15:14:24 +01:00
Fix #24: Stop using macros for user inputs to notably make releases 2023-01-08 18:26:20 +01:00
Fix #6: Add support for multiple keys to be resilient against exceeded quota errors 2023-01-08 17:59:08 +01:00
Fix #23: YouTube Data API v3 PlaylistItems: list endpoint returns playlistNotFound error for regular uploads ones 2023-01-08 16:31:57 +01:00
Fix #20: YouTube Data API v3 returns rarely suddenly commentsDisabled error which involves an unwanted method switch
Also modified compression command, as I got `sh: 1: zip: Argument list too long` when compressing the 248,868 json files of the French most subscribers channel.
2023-01-08 15:43:27 +01:00
Fix #9: Make sure that in case of error returned by the YouTube Data API v3 the algorithm treats it correctly
Note that in case of error the algorithm used to skip the received content, as if just no `items` were in it.
2023-01-06 20:55:32 +01:00
#7: Remove remaining undefined behavior due to missing mutex use 2023-01-06 18:00:51 +01:00
Fix #17: Add to stdout live statistics of the number of comments treated per second 2023-01-06 17:55:16 +01:00
Add try/catch around json parser
As got:
terminate called after throwing an instance of 'nlohmann::detail::parse_error'
terminate called recursively
  what():  [json.exception.parse_error.101] parse error at line 1, column 1: syntax error while parsing value - unexpected end of input; expected '[', '{', or a literal
terminate called recursively
2023-01-06 00:31:05 +01:00
#2: Add compression to channels/ folder
Can use following Python script to compress existing uncompressed
`channels/` folder.


import os, shutil

path = 'channels/'


d = next(os.walk('.'))[1]
for channelIndex, channelId in enumerate(d):
    print(f'{channelIndex} / {len(d)}: {channelId}')
    shutil.make_archive(channelId, 'zip', channelId)
2023-01-04 03:06:33 +01:00
Fix #7: Add multi-threading 2023-01-03 04:56:19 +01:00
Fix #8: Support comments disabled channels
Tested with `UCWIdqSQekeGmUWlSFeCiEnA` which treated correctly the 36 comments of the only comments enabled video `3F8dFt8LsXY`.

Note that this commit doesn't support comments disabled channels with more than 20,000 videos.
2023-01-03 02:56:07 +01:00
#2: Add data logging 2023-01-02 19:46:32 +01:00
Apply astyle formatting to main.cpp 2023-01-02 18:31:16 +01:00
Fix #4: Provide a version relying on the no-key service of https://yt.lemnoslife.com 2023-01-02 18:30:18 +01:00
Make compatible with Debian
More precise ly make compatible with `gcc version 10.2.1 20210110 (Debian 10.2.1-6)`
2023-01-02 18:23:30 +01:00
Add progression save and use spaces instead of tabs 2022-12-22 06:18:22 +01:00
Add time to logging 2022-12-22 05:47:16 +01:00
Add resilience to missing authorChannelId in main.cpp 2022-12-22 05:41:38 +01:00
Add main.cpp, Makefile and channelsToTreat.txt
Note that running this algorithm end up with channel [`UC-99odscxh1xxTyxHyXuRrg`](https://www.youtube.com/channel/UC-99odscxh1xxTyxHyXuRrg) and more precisely the video [`Tq5aPNzfYcg`](https://www.youtube.com/watch?v=Tq5aPNzfYcg) and more precisely the comment [`Ugx-TlSq6SNCbOX04mx4AaABAg`](https://www.youtube.com/watch?v=Tq5aPNzfYcg&lc=Ugx-TlSq6SNCbOX04mx4AaABAg) [which doesn't have any author](https://yt.lemnoslife.com/noKey/comments?part=snippet&id=Ugx-TlSq6SNCbOX04mx4AaABAg)...
2022-12-22 05:20:32 +01:00