Move from Firefox to Chromium to be able to retrieve download URL

Thanks to `chrome://downloads`.
This commit is contained in:
Benjamin Loison 2023-02-04 19:08:02 +01:00
parent 73a8d77a32
commit 4f7e9ac336
Signed by: Benjamin_Loison
SSH Key Fingerprint: SHA256:BtnEgYTlHdOg1u+RmYcDE0mnfz1rhv5dSbQ2gyxW8B8

View File

@ -1,26 +1,23 @@
from selenium import webdriver import undetected_chromedriver.v2 as uc
from selenium.webdriver.common.by import By from selenium.webdriver.common.by import By
from selenium.webdriver.firefox.options import Options from selenium.webdriver.chrome.options import Options
import json import json
""" """
As there is something looking as an anti-bot for downloading media files, we use a Selenium-based approach. As there is something looking as an anti-bot for downloading media files, we use a Selenium-based approach.
""" """
profile_path = '/home/benjamin/.mozilla/firefox/ilfnifi0.default-release' options = Options()
fp = webdriver.FirefoxProfile(profile_path) options.add_argument("--user-data-dir=selenium")
browser = uc.Chrome(options=options)
browser = webdriver.Firefox(fp)
browser.get('https://studio.youtube.com/channel/UC/music') browser.get('https://studio.youtube.com/channel/UC/music')
""" """
For `Music` tab, YouTube UI returns 3,000 entries while my reverse-engineering approach returns 5,819 entries. For `Music` tab, YouTube UI returns 3,000 entries while my reverse-engineering approach returns 5,819 entries.
For `Sound effects` tab, YouTube UI returns 400 entries while my reverse-engineering approach returns 2021 entries. For `Sound effects` tab, YouTube UI returns 400 entries while my reverse-engineering approach returns 2021 entries.
So I assume YouTube UI pagination doesn't work fine, so to retrieve all media files (for `Music`), the idea is to filter by `Track title` and download one entry that perfectly (not just `contains`) matches `artist/name`, `title` and `duration/nanos` (converted if only `seconds`), as some tracks have the same titles. So I assume YouTube UI pagination doesn't work fine, so to retrieve all media files, the idea is to filter by `Track title` and download all entries, preferably only those that have the title we are looking for, as some tracks have the same titles.
Only `trackId` and `viperId` differ when identifying with `artist/name`, `title` and `duration/nanos` (cf above comment) (example: `Dyalla_Ringside_116`), as I verified all duplicates, they are binary identical. As for `Sound effects`, even with `Sound effect`, `Duration`, `Category` and `Added` there is an ambiguity on which files do we refer to (for instance for `Truck Driving in Parking Structure`, as they all are different).
So we will have to duplicate the media file with the different `trackId`s for files being *identitcal* (note that `trackId`, as well as `viperId` are uniquely identified).
Otherwise I could clean the metadata by removing duplicates (but then if we update the database we have to make sure that ids that we have kept are still kept).
""" """
with open('music.json') as json_file: with open('music.json') as json_file: