Commit Graph

101 Commits

Author SHA1 Message Date
e493eaeb49
Add an optimization to the website when providing a channel id as a file path filter 2023-02-26 15:56:16 +01:00
e1aff6f469
#35: Move Python scripts to scripts/ and describe the project structure in README.md 2023-02-26 15:12:06 +01:00
ff5542d8b0
Add sudo apt install nginx to README.md for hosting the website 2023-02-25 15:55:24 +01:00
65bc8853e6
Make the website support regex for both search and path filtering 2023-02-24 15:38:51 +01:00
d3c87d3b6f
Use with open(filePath) as f: instead of f = open(filePath) in search.py 2023-02-24 15:15:42 +01:00
58f25a114e
#44: Enable end-users to filter path for searches 2023-02-24 15:12:07 +01:00
3bba97e90c
Make search.py search across displayed captions.
Otherwise `Linux, is in millions of computers` doesn't match the not automatically generated caption of [`o8NPllzkFhE`](https://www.youtube.com/watch?v=o8NPllzkFhE). Note to be confused with the search across captions that already used to work with for instance `is in millions of computers, it`.
2023-02-24 14:46:00 +01:00
9d433ba2f3
Remove unused setFromVector function 2023-02-23 23:50:07 +01:00
572f9e121f
Precise in README.md in which folder each command has to be ran 2023-02-23 23:48:40 +01:00
02c3d74eb6
Add support for channelsToTreat to be empty
It's the case when providing a single channel in `channels.txt` for
instance.
2023-02-23 23:45:36 +01:00
95e96d08e1
Advertize pip instead of apt in README.md to install the latest version of yt-dlp 2023-02-23 23:16:36 +01:00
df3af2780b
#19: Detail how to run the website and reference channels.txt on it 2023-02-23 23:12:18 +01:00
68cd27c263
Fix #19: Improve documentation and code comments 2023-02-23 22:50:30 +01:00
f44ee4b3c1
#44: Allow arbitrary end-user requests 2023-02-22 17:48:24 +01:00
7626c7bad1
Fix #51: These last days the algorithm seems to not treat completely firstly the starting set of channels before treating discovered channels
I verified that this commit solves the issue by treating only `CHANNELS` tab of the channels in `channels.txt`.
2023-02-22 04:09:35 +01:00
76e41f2f7b
#48: Stop relying on echo, tee and /dev/null for redirecting compression command to debug/ 2023-02-22 03:47:06 +01:00
8d4f31d106
#48: Redirect compression command echo to /dev/null 2023-02-22 03:37:07 +01:00
c0db4eb437
Fix #48: Redirect compression execution logs for not having them overlapping PRINTs 2023-02-22 03:27:49 +01:00
e23a0fc4c7
#48: Modify removeChannelsBeingTreated.py to temporarily solve the issue 2023-02-19 02:04:28 +01:00
e5dd476490
#35: Make the not automatically generated captions correctly downloaded 2023-02-17 16:57:11 +01:00
fd75d27b99
Change the EXIT_WITH_ERROR to PRINT for channels not having an enumerable uploads playlist 2023-02-16 12:21:28 +01:00
cb7b68342a
Make the first channel of channels.txt being treated again, solve temporary empty response from YouTube Data API v3 issue and temporarily remove sanity check failing very rarely #39 2023-02-14 23:15:07 +01:00
e166fdb4e5
Fix #31: List all occurrences of search within video captions 2023-02-14 02:56:11 +01:00
8d34cf33ae
Fix #31: Make a website with a search engine notably based on the captions extracted 2023-02-14 02:00:23 +01:00
09f7675bf7
#31: Make search within captions not limited by line wrapping 2023-02-14 01:32:36 +01:00
4449d488c9
Fix #38: Add a loading message with progress on end-user interface 2023-02-14 01:08:05 +01:00
34c0d03587
#31: Add a first search only captions support 2023-02-14 00:59:37 +01:00
8d20327b67
Add .gitignore to ignore {keys, channels}.txt 2023-02-13 06:18:42 +01:00
09c2a8eafe
Make the COMMUNITY tab process not infinitely loop
Related to https://github.com/Benjamin-Loison/YouTube-operational-API/issues/49
2023-02-13 06:17:23 +01:00
9cd7c57e2f
Add link to channels/ to index.php 2023-02-13 05:55:44 +01:00
94ad823e3b
Modify website to support new sub-folders architecture 2023-02-13 05:45:08 +01:00
454503271e
Fix #37: Use a number of channels seen (possibly repeated) instead of YouTube Data API v3 Comment(Thread): resource 2023-02-12 16:31:27 +01:00
54fe40e588
Add logging to exec and make it crashless, requests and captions folders support for compressing, clean captions support for videos being livestreams and videos starting with - 2023-02-12 16:24:16 +01:00
8cf5698051
Move YouTube API requests logging to requests/ channel sub-folder 2023-02-10 20:17:49 +01:00
04c59eb025
Fix #13: Add captions extraction
I was about to commit in addition:

```c++
// Due to videos with automatically generated captions but being set to `Off` by default aren't retrieved with `--sub-langs '.*orig'`.
// My workaround is to first call YouTube Data API v3 Captions: list endpoint with `part=snippet` and retrieve the language that has `"trackKind": "asr"` (automatic speech recognition) in `snippet`.
/*json data = getJson(threadId, "captions?part=snippet&videoId=" + videoId, true, channelToTreat),
     items = data["items"];
for(const auto& item : items)
{
    json snippet = item["snippet"];
    if(snippet["trackKind"] == "asr")
    {
        string language = snippet["language"];
        cmd = cmdCommonPrefix + "--write-auto-subs --sub-langs '" + language + "-orig' --sub-format ttml --convert-subs vtt" + cmdCommonPostfix;
        exec(threadId, cmd);
        // As there should be a single automatic speech recognized track, there is no need to go through all tracks.
        break;
    }
}*/
```

Instead of:

```c++
cmd = cmdCommonPrefix + "--write-auto-subs --sub-langs '.*orig' --sub-format ttml --convert-subs vtt" + cmdCommonPostfix;
exec(threadId, cmd);
```

But I realized that, as the GitHub comment I was about to add to https://github.com/yt-dlp/yt-dlp/issues/2655, I was
wrong:

> `yt-dlp --cookies cookies.txt --sub-langs 'en.*,.*orig' --write-auto-subs https://www.youtube.com/watch?v=tQqDBySHYlc` work as expected. Many thanks again.
>
> ```
> 'subtitleslangs': ['en.*','.*orig'],
> 'writeautomaticsub': True,
> ```
>
> Work as expected too. Thank you
>
> Very sorry for the video sample. I even not watched it.

Thank you for this workaround. However note that videos having automatically generated subtitles but being set to `Off` by default aren't retrieved with your method (example of such video: [`mozyXsZJnQ4`](https://www.youtube.com/watch?v=mozyXsZJnQ4)). My workaround is to first call [YouTube Data API v3](https://developers.google.com/youtube/v3) [Captions: list](https://developers.google.com/youtube/v3/docs/captions/list) endpoint with [`part=snippet`](https://developers.google.com/youtube/v3/docs/captions/list#part) and retrieve the [`language`](https://developers.google.com/youtube/v3/docs/captions#snippet.language) that has [`"trackKind": "asr"`](https://developers.google.com/youtube/v3/docs/captions#snippet.trackKind) (automatic speech recognition) in [`snippet`](https://developers.google.com/youtube/v3/docs/captions#snippet).
2023-02-10 20:03:08 +01:00
9b792015fa
Fix #36: Make the program stops by crashing on YouTube operational API instance being detected as sending unusual traffic 2023-02-10 12:02:39 +01:00
4508b12b6c
Correct the termination of COMMUNITY tab process due to missing page tokens 2023-02-10 00:37:28 +01:00
ea604cce40
Remove the Content-Type: application/json HTTP header when retrieving urls.txt inside a .zip 2023-02-09 02:07:10 +01:00
01aac3f66e
Add a verification that snippet/authorChannelId/value isn't null when using commentThreads for COMMUNITY
As it can happen cf https://www.youtube.com/channel/UCWeg2Pkate69NFdBeuRFTAw/community?lc=UgwGfjNxGuwqP8qYPPN4AaABAg&lb=UgkxYiEAo9-b1vWPasxFy13f959rrctQpZwW
2023-02-09 01:51:22 +01:00
d0dee043c6
Append to channels.txt all channels mentioned in the Wiki 2023-02-08 16:28:44 +01:00
d5c55e756a
Add in urls.txt if the URL is related to YouTube Data API v3 or YouTube operational API 2023-02-08 16:05:03 +01:00
50306aff5d
Fix #34: Correct JSON files by putting first line in another metadata file 2023-02-07 23:08:09 +01:00
4fa433495b
Restore ability to download whole archives
As API keys aren't written in the first line of JSON files.
2023-02-07 23:01:26 +01:00
a0ba474fcc
Remove ability in channels.php to download whole archive for not leaking API keys used 2023-02-07 22:42:24 +01:00
b4cb072770
Add channels.php adding support for (file in) zip download 2023-02-07 22:39:43 +01:00
fda8fc728e
#31: Add zip files search 2023-02-07 20:15:36 +01:00
82e597f205
Comment WebSocket mechanism to work with an arbitrary number of independent send 2023-02-07 18:14:49 +01:00
03c2566a20
Make WebSocket able to manage arbitrary feedback to end-user
While previous implementation was able to send two independent messages, now we can send an arbitrary amount of independent messages.
2023-02-07 17:25:17 +01:00
a116b29df9
Make websockets.php able to proceed blocking treatments 2023-02-07 01:22:26 +01:00
1fe92ec2d0
Make a WebSocket example work with crawler.yt.lemnoslife.com 2023-01-31 01:05:09 +01:00