Commit Graph

67 Commits

Author SHA1 Message Date
3c4664a4b1 Fix #13: Add captions extraction
I was about to commit in addition:

```c++
// Due to videos with automatically generated captions but being set to `Off` by default aren't retrieved with `--sub-langs '.*orig'`.
// My workaround is to first call YouTube Data API v3 Captions: list endpoint with `part=snippet` and retrieve the language that has `"trackKind": "asr"` (automatic speech recognition) in `snippet`.
/*json data = getJson(threadId, "captions?part=snippet&videoId=" + videoId, true, channelToTreat),
     items = data["items"];
for(const auto& item : items)
{
    json snippet = item["snippet"];
    if(snippet["trackKind"] == "asr")
    {
        string language = snippet["language"];
        cmd = cmdCommonPrefix + "--write-auto-subs --sub-langs '" + language + "-orig' --sub-format ttml --convert-subs vtt" + cmdCommonPostfix;
        exec(threadId, cmd);
        // As there should be a single automatic speech recognized track, there is no need to go through all tracks.
        break;
    }
}*/
```

Instead of:

```c++
cmd = cmdCommonPrefix + "--write-auto-subs --sub-langs '.*orig' --sub-format ttml --convert-subs vtt" + cmdCommonPostfix;
exec(threadId, cmd);
```

But I realized that, as the GitHub comment I was about to add to https://github.com/yt-dlp/yt-dlp/issues/2655, I was
wrong:

> `yt-dlp --cookies cookies.txt --sub-langs 'en.*,.*orig' --write-auto-subs https://www.youtube.com/watch?v=tQqDBySHYlc` work as expected. Many thanks again.
>
> ```
> 'subtitleslangs': ['en.*','.*orig'],
> 'writeautomaticsub': True,
> ```
>
> Work as expected too. Thank you
>
> Very sorry for the video sample. I even not watched it.

Thank you for this workaround. However note that videos having automatically generated subtitles but being set to `Off` by default aren't retrieved with your method (example of such video: [`mozyXsZJnQ4`](https://www.youtube.com/watch?v=mozyXsZJnQ4)). My workaround is to first call [YouTube Data API v3](https://developers.google.com/youtube/v3) [Captions: list](https://developers.google.com/youtube/v3/docs/captions/list) endpoint with [`part=snippet`](https://developers.google.com/youtube/v3/docs/captions/list#part) and retrieve the [`language`](https://developers.google.com/youtube/v3/docs/captions#snippet.language) that has [`"trackKind": "asr"`](https://developers.google.com/youtube/v3/docs/captions#snippet.trackKind) (automatic speech recognition) in [`snippet`](https://developers.google.com/youtube/v3/docs/captions#snippet).
2023-02-10 20:03:08 +01:00
7fcc8b09fa Fix #36: Make the program stops by crashing on YouTube operational API instance being detected as sending unusual traffic 2023-02-10 12:02:39 +01:00
87d67e4e85 Correct the termination of COMMUNITY tab process due to missing page tokens 2023-02-10 00:37:28 +01:00
8f9b1275be Remove the Content-Type: application/json HTTP header when retrieving urls.txt inside a .zip 2023-02-09 02:07:10 +01:00
afd9e1b0b6 Add a verification that snippet/authorChannelId/value isn't null when using commentThreads for COMMUNITY
As it can happen cf https://www.youtube.com/channel/UCWeg2Pkate69NFdBeuRFTAw/community?lc=UgwGfjNxGuwqP8qYPPN4AaABAg&lb=UgkxYiEAo9-b1vWPasxFy13f959rrctQpZwW
2023-02-09 01:51:22 +01:00
5a1df71bb9 Append to channels.txt all channels mentioned in the Wiki 2023-02-08 16:28:44 +01:00
622188d6d9 Add in urls.txt if the URL is related to YouTube Data API v3 or YouTube operational API 2023-02-08 16:05:03 +01:00
0c51bd05bc Fix #34: Correct JSON files by putting first line in another metadata file 2023-02-07 23:08:09 +01:00
e0f521d572 Restore ability to download whole archives
As API keys aren't written in the first line of JSON files.
2023-02-07 23:01:26 +01:00
e5a50bcba4 Remove ability in channels.php to download whole archive for not leaking API keys used 2023-02-07 22:42:24 +01:00
2179e9b6f4 Add channels.php adding support for (file in) zip download 2023-02-07 22:39:43 +01:00
e9b77369fb #31: Add zip files search 2023-02-07 20:15:36 +01:00
b45384bab7 Comment WebSocket mechanism to work with an arbitrary number of independent send 2023-02-07 18:14:49 +01:00
126cc75dc6 Make WebSocket able to manage arbitrary feedback to end-user
While previous implementation was able to send two independent messages, now we can send an arbitrary amount of independent messages.
2023-02-07 17:25:17 +01:00
7302679a81 Make websockets.php able to proceed blocking treatments 2023-02-07 01:22:26 +01:00
0dba8e0c7d Make a WebSocket example work with crawler.yt.lemnoslife.com 2023-01-31 01:05:09 +01:00
155d372186 Run php-cs-fixer fix --rules=@PSR12 websocket.php 2023-01-31 00:57:06 +01:00
bd184bd0f0 Rename chat.php to websocket.php 2023-01-30 22:24:02 +01:00
0193f05143 Copy-pasted the README.md quick example of ratchetphp/Ratchet
5012dc9545 (a-quick-example)
2023-01-30 22:19:04 +01:00
931b2df563 Add static website/index.php 2023-01-30 22:14:05 +01:00
0f4b89ccd9 Correct typo: the channel tab is LIVE, not LIVES 2023-01-25 01:00:29 +01:00
4e162e34c3 Add comment in README.md about the usage of --no-keys or generating a YouTube Data API v3 key 2023-01-22 15:41:13 +01:00
10e8811817 Introduce {,MAIN_}EXIT_WITH_ERROR macros for exitting with an error 2023-01-22 15:17:14 +01:00
0f15bb0235 #11: Add the discovering of channels having commented on ended livestreams 2023-01-22 15:15:27 +01:00
bdb4e6443a #11: Add current livestreams support to discover channels 2023-01-22 04:00:11 +01:00
d2391e5d54 Instead of looping on items where we expect only one to be, we just use items[0] 2023-01-22 02:19:26 +01:00
993d0b9771 Make PRINT not requiring to precise threadId 2023-01-22 02:04:03 +01:00
0fcb5a0426 #11: Treat COMMUNITY post comments to discover channels 2023-01-22 01:37:32 +01:00
57200da482 Add in README.md the fact that as documented in #30, this algorithm is only known to be working fin on Linux 2023-01-21 22:20:45 +01:00
a0880c79bb #11: Update channel CHANNELS tab treatment following YouTube-operational-API/issues/121 closure 2023-01-21 02:24:42 +01:00
10c5c1d605 #11: Add the treatment of channels' tab, but only postpone unlisted videos treatment 2023-01-15 14:56:44 +01:00
51a70f6e54 #7: Make commentsCount and requestsPerChannel compatible with multithreading 2023-01-15 14:31:55 +01:00
aa97c94bf8 #11: Add a first iteration for the CHANNELS retrieval 2023-01-15 02:19:31 +01:00
d1b84335d1 #11: Add --youtube-operational-api-instance-url parameter and use exit(EXIT_{SUCCESS, FAILURE}) instead of exit({0, 1}) 2023-01-15 00:49:32 +01:00
6ce29051c0 Fix #26: Keep efficient search algorithm while keeping order (notably of the starting set) 2023-01-14 15:14:24 +01:00
ad9f96b33c Fix #24: Stop using macros for user inputs to notably make releases 2023-01-08 18:26:20 +01:00
d498c86058 Fix #6: Add support for multiple keys to be resilient against exceeded quota errors 2023-01-08 17:59:08 +01:00
1ee767abbc Fix #23: YouTube Data API v3 PlaylistItems: list endpoint returns playlistNotFound error for regular uploads ones 2023-01-08 16:31:57 +01:00
7e35a6473a Fix #20: YouTube Data API v3 returns rarely suddenly commentsDisabled error which involves an unwanted method switch
Also modified compression command, as I got `sh: 1: zip: Argument list too long` when compressing the 248,868 json files of the French most subscribers channel.
2023-01-08 15:43:27 +01:00
ba37d6a111 Make all Python scripts executable and add findAlreadyTreatedCommentsCount.py to find how many comments were already treated 2023-01-07 15:45:31 +01:00
5a7e5b6f78 Add a note about the timing percentage of findLatestTreatedCommentsForChannelsBeingTreated.py going backward 2023-01-07 15:35:12 +01:00
e3cab4c204 Fix #9: Make sure that in case of error returned by the YouTube Data API v3 the algorithm treats it correctly
Note that in case of error the algorithm used to skip the received content, as if just no `items` were in it.
2023-01-06 20:55:32 +01:00
156a621413 Fix #15: Provide an algorithm to retrieve the list of 100 French channels with most subscribers (and provide it too) 2023-01-06 18:06:00 +01:00
fdfec17817 #7: Remove remaining undefined behavior due to missing mutex use 2023-01-06 18:00:51 +01:00
3ef5fa0707 Fix #17: Add to stdout live statistics of the number of comments treated per second 2023-01-06 17:55:16 +01:00
0259dfb3fb Fix #16: Provide an algorithm to determine the progress of retrieving comments for huge YouTube channels 2023-01-06 17:51:00 +01:00
b2fafb721c #1: Add GNU AGPLv3 license 2023-01-06 16:09:12 +01:00
01394769fd Add try/catch around json parser
As got:
```
terminate called after throwing an instance of 'nlohmann::detail::parse_error'
terminate called recursively
  what():  [json.exception.parse_error.101] parse error at line 1, column 1: syntax error while parsing value - unexpected end of input; expected '[', '{', or a literal
terminate called recursively
```
2023-01-06 00:31:05 +01:00
5d13bd3c44 Modify removeChannelsBeingTreated.py to be more resilient against not existing files in the treatment process 2023-01-04 03:10:28 +01:00
512485b1b8 #2: Add compression to channels/ folder
Can use following Python script to compress existing uncompressed
`channels/` folder.

```py

import os, shutil

path = 'channels/'

os.chdir(path)

d = next(os.walk('.'))[1]
for channelIndex, channelId in enumerate(d):
    print(f'{channelIndex} / {len(d)}: {channelId}')
    shutil.make_archive(channelId, 'zip', channelId)
    shutil.rmtree(channelId)
```
2023-01-04 03:06:33 +01:00