# The algorithm: To retrieve the most YouTube video ids in order to retrieve the most video captions, we need to retrieve the most YouTube channels. So to discover the YouTube channels graph with a breadth-first search, we proceed as follows: 1. Provide a starting set of channels. 2. Given a channel, retrieve other channels thanks to its content by using [YouTube Data API v3](https://developers.google.com/youtube/v3) and [YouTube operational API](https://github.com/Benjamin-Loison/YouTube-operational-API) and then repeat 1. for each retrieved channel. A ready to be used by the end-user website instance of this project is hosted at: https://crawler.yt.lemnoslife.com See more details on [the Wiki](https://gitea.lemnoslife.com/Benjamin_Loison/YouTube_captions_search_engine/wiki). # Running the algorithm: Because of [the current compression mechanism](https://gitea.lemnoslife.com/Benjamin_Loison/YouTube_captions_search_engine/issues/30), Linux is the only known OS able to run this algorithm. ```sh sudo apt install nlohmann-json3-dev yt-dlp make ./youtubeCaptionsSearchEngine -h ``` If you plan to use the front-end website, also run: ```sh pip install webvtt-py ``` Except if you provide the argument `--youtube-operational-api-instance-url https://yt.lemnoslife.com`, you have [to host your own instance of the YouTube operational API](https://github.com/Benjamin-Loison/YouTube-operational-API/#install-your-own-instance-of-the-api). Except if you provide the argument `--no-keys`, you have to provide at least one [YouTube Data API v3 key](https://developers.google.com/youtube/v3/getting-started) in `keys.txt`. ```sh ./youtubeCaptionsSearchEngine ```