Commit Graph

91 Commits

Author SHA1 Message Date
8475463d29 Advertize pip instead of apt in README.md to install the latest version of yt-dlp 2023-02-23 23:16:36 +01:00
e2f0150151 #19: Detail how to run the website and reference channels.txt on it 2023-02-23 23:12:18 +01:00
b5cc3d5547 Fix #19: Improve documentation and code comments 2023-02-23 22:50:30 +01:00
3fa4d43bbd #44: Allow arbitrary end-user requests 2023-02-22 17:48:24 +01:00
4a11ac4196 Fix #51: These last days the algorithm seems to not treat completely firstly the starting set of channels before treating discovered channels
I verified that this commit solves the issue by treating only `CHANNELS` tab of the channels in `channels.txt`.
2023-02-22 04:09:35 +01:00
c30847c1f5 #48: Stop relying on echo, tee and /dev/null for redirecting compression command to debug/ 2023-02-22 03:47:06 +01:00
221956438d #48: Redirect compression command echo to /dev/null 2023-02-22 03:37:07 +01:00
ba78223c0c Fix #48: Redirect compression execution logs for not having them overlapping PRINTs 2023-02-22 03:27:49 +01:00
e86d629597 #48: Modify removeChannelsBeingTreated.py to temporarily solve the issue 2023-02-19 02:04:28 +01:00
78b2bf18fa #35: Make the not automatically generated captions correctly downloaded 2023-02-17 16:57:11 +01:00
5bfceccb8e Change the EXIT_WITH_ERROR to PRINT for channels not having an enumerable uploads playlist 2023-02-16 12:21:28 +01:00
eb8431746e Make the first channel of channels.txt being treated again, solve temporary empty response from YouTube Data API v3 issue and temporarily remove sanity check failing very rarely #39 2023-02-14 23:15:07 +01:00
a7f6e1cd85 Fix #31: List all occurrences of search within video captions 2023-02-14 02:56:11 +01:00
21ad878be8 Fix #31: Make a website with a search engine notably based on the captions extracted 2023-02-14 02:00:23 +01:00
57572c6d6c #31: Make search within captions not limited by line wrapping 2023-02-14 01:32:36 +01:00
e0faf053a1 Fix #38: Add a loading message with progress on end-user interface 2023-02-14 01:08:05 +01:00
77bafdd592 #31: Add a first search only captions support 2023-02-14 00:59:37 +01:00
fa7da64879 Add .gitignore to ignore {keys, channels}.txt 2023-02-13 06:18:42 +01:00
9e650cf72a Make the COMMUNITY tab process not infinitely loop
Related to https://github.com/Benjamin-Loison/YouTube-operational-API/issues/49
2023-02-13 06:17:23 +01:00
dc63de82f5 Add link to channels/ to index.php 2023-02-13 05:55:44 +01:00
dfdfbe3272 Modify website to support new sub-folders architecture 2023-02-13 05:45:08 +01:00
a51e3b1a9a Fix #37: Use a number of channels seen (possibly repeated) instead of YouTube Data API v3 Comment(Thread): resource 2023-02-12 16:31:27 +01:00
b572d078dd Add logging to exec and make it crashless, requests and captions folders support for compressing, clean captions support for videos being livestreams and videos starting with - 2023-02-12 16:24:16 +01:00
8df226e2bc Move YouTube API requests logging to requests/ channel sub-folder 2023-02-10 20:17:49 +01:00
3c4664a4b1 Fix #13: Add captions extraction
I was about to commit in addition:

```c++
// Due to videos with automatically generated captions but being set to `Off` by default aren't retrieved with `--sub-langs '.*orig'`.
// My workaround is to first call YouTube Data API v3 Captions: list endpoint with `part=snippet` and retrieve the language that has `"trackKind": "asr"` (automatic speech recognition) in `snippet`.
/*json data = getJson(threadId, "captions?part=snippet&videoId=" + videoId, true, channelToTreat),
     items = data["items"];
for(const auto& item : items)
{
    json snippet = item["snippet"];
    if(snippet["trackKind"] == "asr")
    {
        string language = snippet["language"];
        cmd = cmdCommonPrefix + "--write-auto-subs --sub-langs '" + language + "-orig' --sub-format ttml --convert-subs vtt" + cmdCommonPostfix;
        exec(threadId, cmd);
        // As there should be a single automatic speech recognized track, there is no need to go through all tracks.
        break;
    }
}*/
```

Instead of:

```c++
cmd = cmdCommonPrefix + "--write-auto-subs --sub-langs '.*orig' --sub-format ttml --convert-subs vtt" + cmdCommonPostfix;
exec(threadId, cmd);
```

But I realized that, as the GitHub comment I was about to add to https://github.com/yt-dlp/yt-dlp/issues/2655, I was
wrong:

> `yt-dlp --cookies cookies.txt --sub-langs 'en.*,.*orig' --write-auto-subs https://www.youtube.com/watch?v=tQqDBySHYlc` work as expected. Many thanks again.
>
> ```
> 'subtitleslangs': ['en.*','.*orig'],
> 'writeautomaticsub': True,
> ```
>
> Work as expected too. Thank you
>
> Very sorry for the video sample. I even not watched it.

Thank you for this workaround. However note that videos having automatically generated subtitles but being set to `Off` by default aren't retrieved with your method (example of such video: [`mozyXsZJnQ4`](https://www.youtube.com/watch?v=mozyXsZJnQ4)). My workaround is to first call [YouTube Data API v3](https://developers.google.com/youtube/v3) [Captions: list](https://developers.google.com/youtube/v3/docs/captions/list) endpoint with [`part=snippet`](https://developers.google.com/youtube/v3/docs/captions/list#part) and retrieve the [`language`](https://developers.google.com/youtube/v3/docs/captions#snippet.language) that has [`"trackKind": "asr"`](https://developers.google.com/youtube/v3/docs/captions#snippet.trackKind) (automatic speech recognition) in [`snippet`](https://developers.google.com/youtube/v3/docs/captions#snippet).
2023-02-10 20:03:08 +01:00
7fcc8b09fa Fix #36: Make the program stops by crashing on YouTube operational API instance being detected as sending unusual traffic 2023-02-10 12:02:39 +01:00
87d67e4e85 Correct the termination of COMMUNITY tab process due to missing page tokens 2023-02-10 00:37:28 +01:00
8f9b1275be Remove the Content-Type: application/json HTTP header when retrieving urls.txt inside a .zip 2023-02-09 02:07:10 +01:00
afd9e1b0b6 Add a verification that snippet/authorChannelId/value isn't null when using commentThreads for COMMUNITY
As it can happen cf https://www.youtube.com/channel/UCWeg2Pkate69NFdBeuRFTAw/community?lc=UgwGfjNxGuwqP8qYPPN4AaABAg&lb=UgkxYiEAo9-b1vWPasxFy13f959rrctQpZwW
2023-02-09 01:51:22 +01:00
5a1df71bb9 Append to channels.txt all channels mentioned in the Wiki 2023-02-08 16:28:44 +01:00
622188d6d9 Add in urls.txt if the URL is related to YouTube Data API v3 or YouTube operational API 2023-02-08 16:05:03 +01:00
0c51bd05bc Fix #34: Correct JSON files by putting first line in another metadata file 2023-02-07 23:08:09 +01:00
e0f521d572 Restore ability to download whole archives
As API keys aren't written in the first line of JSON files.
2023-02-07 23:01:26 +01:00
e5a50bcba4 Remove ability in channels.php to download whole archive for not leaking API keys used 2023-02-07 22:42:24 +01:00
2179e9b6f4 Add channels.php adding support for (file in) zip download 2023-02-07 22:39:43 +01:00
e9b77369fb #31: Add zip files search 2023-02-07 20:15:36 +01:00
b45384bab7 Comment WebSocket mechanism to work with an arbitrary number of independent send 2023-02-07 18:14:49 +01:00
126cc75dc6 Make WebSocket able to manage arbitrary feedback to end-user
While previous implementation was able to send two independent messages, now we can send an arbitrary amount of independent messages.
2023-02-07 17:25:17 +01:00
7302679a81 Make websockets.php able to proceed blocking treatments 2023-02-07 01:22:26 +01:00
0dba8e0c7d Make a WebSocket example work with crawler.yt.lemnoslife.com 2023-01-31 01:05:09 +01:00
155d372186 Run php-cs-fixer fix --rules=@PSR12 websocket.php 2023-01-31 00:57:06 +01:00
bd184bd0f0 Rename chat.php to websocket.php 2023-01-30 22:24:02 +01:00
0193f05143 Copy-pasted the README.md quick example of ratchetphp/Ratchet
5012dc9545 (a-quick-example)
2023-01-30 22:19:04 +01:00
931b2df563 Add static website/index.php 2023-01-30 22:14:05 +01:00
0f4b89ccd9 Correct typo: the channel tab is LIVE, not LIVES 2023-01-25 01:00:29 +01:00
4e162e34c3 Add comment in README.md about the usage of --no-keys or generating a YouTube Data API v3 key 2023-01-22 15:41:13 +01:00
10e8811817 Introduce {,MAIN_}EXIT_WITH_ERROR macros for exitting with an error 2023-01-22 15:17:14 +01:00
0f15bb0235 #11: Add the discovering of channels having commented on ended livestreams 2023-01-22 15:15:27 +01:00
bdb4e6443a #11: Add current livestreams support to discover channels 2023-01-22 04:00:11 +01:00
d2391e5d54 Instead of looping on items where we expect only one to be, we just use items[0] 2023-01-22 02:19:26 +01:00