Commit Graph

87 Commits

Author SHA1 Message Date
7626c7bad1
Fix #51: These last days the algorithm seems to not treat completely firstly the starting set of channels before treating discovered channels
I verified that this commit solves the issue by treating only `CHANNELS` tab of the channels in `channels.txt`.
2023-02-22 04:09:35 +01:00
76e41f2f7b
#48: Stop relying on echo, tee and /dev/null for redirecting compression command to debug/ 2023-02-22 03:47:06 +01:00
8d4f31d106
#48: Redirect compression command echo to /dev/null 2023-02-22 03:37:07 +01:00
c0db4eb437
Fix #48: Redirect compression execution logs for not having them overlapping PRINTs 2023-02-22 03:27:49 +01:00
e23a0fc4c7
#48: Modify removeChannelsBeingTreated.py to temporarily solve the issue 2023-02-19 02:04:28 +01:00
e5dd476490
#35: Make the not automatically generated captions correctly downloaded 2023-02-17 16:57:11 +01:00
fd75d27b99
Change the EXIT_WITH_ERROR to PRINT for channels not having an enumerable uploads playlist 2023-02-16 12:21:28 +01:00
cb7b68342a
Make the first channel of channels.txt being treated again, solve temporary empty response from YouTube Data API v3 issue and temporarily remove sanity check failing very rarely #39 2023-02-14 23:15:07 +01:00
e166fdb4e5
Fix #31: List all occurrences of search within video captions 2023-02-14 02:56:11 +01:00
8d34cf33ae
Fix #31: Make a website with a search engine notably based on the captions extracted 2023-02-14 02:00:23 +01:00
09f7675bf7
#31: Make search within captions not limited by line wrapping 2023-02-14 01:32:36 +01:00
4449d488c9
Fix #38: Add a loading message with progress on end-user interface 2023-02-14 01:08:05 +01:00
34c0d03587
#31: Add a first search only captions support 2023-02-14 00:59:37 +01:00
8d20327b67
Add .gitignore to ignore {keys, channels}.txt 2023-02-13 06:18:42 +01:00
09c2a8eafe
Make the COMMUNITY tab process not infinitely loop
Related to https://github.com/Benjamin-Loison/YouTube-operational-API/issues/49
2023-02-13 06:17:23 +01:00
9cd7c57e2f
Add link to channels/ to index.php 2023-02-13 05:55:44 +01:00
94ad823e3b
Modify website to support new sub-folders architecture 2023-02-13 05:45:08 +01:00
454503271e
Fix #37: Use a number of channels seen (possibly repeated) instead of YouTube Data API v3 Comment(Thread): resource 2023-02-12 16:31:27 +01:00
54fe40e588
Add logging to exec and make it crashless, requests and captions folders support for compressing, clean captions support for videos being livestreams and videos starting with - 2023-02-12 16:24:16 +01:00
8cf5698051
Move YouTube API requests logging to requests/ channel sub-folder 2023-02-10 20:17:49 +01:00
04c59eb025
Fix #13: Add captions extraction
I was about to commit in addition:

```c++
// Due to videos with automatically generated captions but being set to `Off` by default aren't retrieved with `--sub-langs '.*orig'`.
// My workaround is to first call YouTube Data API v3 Captions: list endpoint with `part=snippet` and retrieve the language that has `"trackKind": "asr"` (automatic speech recognition) in `snippet`.
/*json data = getJson(threadId, "captions?part=snippet&videoId=" + videoId, true, channelToTreat),
     items = data["items"];
for(const auto& item : items)
{
    json snippet = item["snippet"];
    if(snippet["trackKind"] == "asr")
    {
        string language = snippet["language"];
        cmd = cmdCommonPrefix + "--write-auto-subs --sub-langs '" + language + "-orig' --sub-format ttml --convert-subs vtt" + cmdCommonPostfix;
        exec(threadId, cmd);
        // As there should be a single automatic speech recognized track, there is no need to go through all tracks.
        break;
    }
}*/
```

Instead of:

```c++
cmd = cmdCommonPrefix + "--write-auto-subs --sub-langs '.*orig' --sub-format ttml --convert-subs vtt" + cmdCommonPostfix;
exec(threadId, cmd);
```

But I realized that, as the GitHub comment I was about to add to https://github.com/yt-dlp/yt-dlp/issues/2655, I was
wrong:

> `yt-dlp --cookies cookies.txt --sub-langs 'en.*,.*orig' --write-auto-subs https://www.youtube.com/watch?v=tQqDBySHYlc` work as expected. Many thanks again.
>
> ```
> 'subtitleslangs': ['en.*','.*orig'],
> 'writeautomaticsub': True,
> ```
>
> Work as expected too. Thank you
>
> Very sorry for the video sample. I even not watched it.

Thank you for this workaround. However note that videos having automatically generated subtitles but being set to `Off` by default aren't retrieved with your method (example of such video: [`mozyXsZJnQ4`](https://www.youtube.com/watch?v=mozyXsZJnQ4)). My workaround is to first call [YouTube Data API v3](https://developers.google.com/youtube/v3) [Captions: list](https://developers.google.com/youtube/v3/docs/captions/list) endpoint with [`part=snippet`](https://developers.google.com/youtube/v3/docs/captions/list#part) and retrieve the [`language`](https://developers.google.com/youtube/v3/docs/captions#snippet.language) that has [`"trackKind": "asr"`](https://developers.google.com/youtube/v3/docs/captions#snippet.trackKind) (automatic speech recognition) in [`snippet`](https://developers.google.com/youtube/v3/docs/captions#snippet).
2023-02-10 20:03:08 +01:00
9b792015fa
Fix #36: Make the program stops by crashing on YouTube operational API instance being detected as sending unusual traffic 2023-02-10 12:02:39 +01:00
4508b12b6c
Correct the termination of COMMUNITY tab process due to missing page tokens 2023-02-10 00:37:28 +01:00
ea604cce40
Remove the Content-Type: application/json HTTP header when retrieving urls.txt inside a .zip 2023-02-09 02:07:10 +01:00
01aac3f66e
Add a verification that snippet/authorChannelId/value isn't null when using commentThreads for COMMUNITY
As it can happen cf https://www.youtube.com/channel/UCWeg2Pkate69NFdBeuRFTAw/community?lc=UgwGfjNxGuwqP8qYPPN4AaABAg&lb=UgkxYiEAo9-b1vWPasxFy13f959rrctQpZwW
2023-02-09 01:51:22 +01:00
d0dee043c6
Append to channels.txt all channels mentioned in the Wiki 2023-02-08 16:28:44 +01:00
d5c55e756a
Add in urls.txt if the URL is related to YouTube Data API v3 or YouTube operational API 2023-02-08 16:05:03 +01:00
50306aff5d
Fix #34: Correct JSON files by putting first line in another metadata file 2023-02-07 23:08:09 +01:00
4fa433495b
Restore ability to download whole archives
As API keys aren't written in the first line of JSON files.
2023-02-07 23:01:26 +01:00
a0ba474fcc
Remove ability in channels.php to download whole archive for not leaking API keys used 2023-02-07 22:42:24 +01:00
b4cb072770
Add channels.php adding support for (file in) zip download 2023-02-07 22:39:43 +01:00
fda8fc728e
#31: Add zip files search 2023-02-07 20:15:36 +01:00
82e597f205
Comment WebSocket mechanism to work with an arbitrary number of independent send 2023-02-07 18:14:49 +01:00
03c2566a20
Make WebSocket able to manage arbitrary feedback to end-user
While previous implementation was able to send two independent messages, now we can send an arbitrary amount of independent messages.
2023-02-07 17:25:17 +01:00
a116b29df9
Make websockets.php able to proceed blocking treatments 2023-02-07 01:22:26 +01:00
1fe92ec2d0
Make a WebSocket example work with crawler.yt.lemnoslife.com 2023-01-31 01:05:09 +01:00
411a3db465
Run php-cs-fixer fix --rules=@PSR12 websocket.php 2023-01-31 00:57:06 +01:00
08b465753d
Rename chat.php to websocket.php 2023-01-30 22:24:02 +01:00
45c5d8a940
Copy-pasted the README.md quick example of ratchetphp/Ratchet
5012dc9545 (a-quick-example)
2023-01-30 22:19:04 +01:00
668aa608ed
Add static website/index.php 2023-01-30 22:14:05 +01:00
c746d43ddf
Correct typo: the channel tab is LIVE, not LIVES 2023-01-25 01:00:29 +01:00
05cd243abd
Add comment in README.md about the usage of --no-keys or generating a YouTube Data API v3 key 2023-01-22 15:41:13 +01:00
9d40fef429
Introduce {,MAIN_}EXIT_WITH_ERROR macros for exitting with an error 2023-01-22 15:17:14 +01:00
d34fade0cd
#11: Add the discovering of channels having commented on ended livestreams 2023-01-22 15:15:27 +01:00
68b1f9a77f
#11: Add current livestreams support to discover channels 2023-01-22 04:00:11 +01:00
c17a33d181
Instead of looping on items where we expect only one to be, we just use items[0] 2023-01-22 02:19:26 +01:00
59dc5676cc
Make PRINT not requiring to precise threadId 2023-01-22 02:04:03 +01:00
548a797ee8
#11: Treat COMMUNITY post comments to discover channels 2023-01-22 01:37:32 +01:00
46ef8146f8
Add in README.md the fact that as documented in #30, this algorithm is only known to be working fin on Linux 2023-01-21 22:20:45 +01:00
4133faad41
#11: Update channel CHANNELS tab treatment following YouTube-operational-API/issues/121 closure 2023-01-21 02:24:42 +01:00