Move from Firefox to Chromium to be able to retrieve download URL
Thanks to `chrome://downloads`.
This commit is contained in:
		@@ -1,26 +1,23 @@
 | 
			
		||||
from selenium import webdriver
 | 
			
		||||
import undetected_chromedriver.v2 as uc
 | 
			
		||||
from selenium.webdriver.common.by import By
 | 
			
		||||
from selenium.webdriver.firefox.options import Options
 | 
			
		||||
from selenium.webdriver.chrome.options import Options
 | 
			
		||||
import json
 | 
			
		||||
 | 
			
		||||
"""
 | 
			
		||||
As there is something looking as an anti-bot for downloading media files, we use a Selenium-based approach.
 | 
			
		||||
"""
 | 
			
		||||
 | 
			
		||||
profile_path = '/home/benjamin/.mozilla/firefox/ilfnifi0.default-release'
 | 
			
		||||
fp = webdriver.FirefoxProfile(profile_path)
 | 
			
		||||
 | 
			
		||||
browser = webdriver.Firefox(fp)
 | 
			
		||||
options = Options()
 | 
			
		||||
options.add_argument("--user-data-dir=selenium")
 | 
			
		||||
browser = uc.Chrome(options=options)
 | 
			
		||||
browser.get('https://studio.youtube.com/channel/UC/music')
 | 
			
		||||
 | 
			
		||||
"""
 | 
			
		||||
For `Music` tab, YouTube UI returns 3,000 entries while my reverse-engineering approach returns 5,819 entries.
 | 
			
		||||
For `Sound effects` tab, YouTube UI returns 400 entries while my reverse-engineering approach returns 2021 entries.
 | 
			
		||||
 | 
			
		||||
So I assume YouTube UI pagination doesn't work fine, so to retrieve all media files (for `Music`), the idea is to filter by `Track title` and download one entry that perfectly (not just `contains`) matches `artist/name`, `title` and `duration/nanos` (converted if only `seconds`), as some tracks have the same titles.
 | 
			
		||||
Only `trackId` and `viperId` differ when identifying with `artist/name`, `title` and `duration/nanos` (cf above comment) (example: `Dyalla_Ringside_116`), as I verified all duplicates, they are binary identical.
 | 
			
		||||
So we will have to duplicate the media file with the different `trackId`s for files being *identitcal* (note that `trackId`, as well as `viperId` are uniquely identified).
 | 
			
		||||
Otherwise I could clean the metadata by removing duplicates (but then if we update the database we have to make sure that ids that we have kept are still kept).
 | 
			
		||||
So I assume YouTube UI pagination doesn't work fine, so to retrieve all media files, the idea is to filter by `Track title` and download all entries, preferably only those that have the title we are looking for, as some tracks have the same titles.
 | 
			
		||||
As for `Sound effects`, even with `Sound effect`, `Duration`, `Category` and `Added` there is an ambiguity on which files do we refer to (for instance for `Truck Driving in Parking Structure`, as they all are different).
 | 
			
		||||
"""
 | 
			
		||||
 | 
			
		||||
with open('music.json') as json_file:
 | 
			
		||||
 
 | 
			
		||||
		Reference in New Issue
	
	Block a user