Full text to avoid paywall
If you’ve left a comment on a YouTube video, a new website claims it might be able to find every comment you’ve ever left on any video you’ve ever watched. Then an AI can build a profile of the commenter and guess where you live, what languages you speak, and what your politics might be.
The service is called YouTube-Tools and is just the latest in a suite of web-based tools that started life as a site to investigate League of Legends usernames. Now it uses a modified large language model created by the company Mistral to generate a background report on YouTube commenters based on their conversations. Its developer claims it’s meant to be used by the cops, but anyone can sign up. It costs about $20 a month to use and all you need to get started is a credit card and an email address.
The tool presents a significant privacy risk, and shows that people may not be as anonymous in the YouTube comments sections as they may think. The site’s report is ready in seconds and provides enough data for an AI to flag identifying details about a commenter. The tool could be a boon for harassers attempting to build profiles of their targets, and 404 Media has seen evidence that harassment-focused communities have used the developers’ other tools.
YouTube-Tools also appears to be a violation of YouTube’s privacy policies, and raises questions about what YouTube is doing to stop the scraping and repurposing of peoples’ data like this. “Public search engines may scrape data only in accordance with YouTube’s robots.txt file or with YouTube’s prior written permission,” it says.
To test the service, I plugged a random YouTube commenter into the system and within seconds the site found dozens of comments on multiple videos and produced an AI-generated paragraph about them. “Possible Location/Region: The presence of Italian language comments and references to ‘X Factor Italia’ and Italian cooking suggest an association with Italy,” the report said.
“Political/Social/Cultural Views: Some comments reflect a level of criticism towards interviewers and societal norms (e.g., comments on masculinity), indicating an engagement with contemporary cultural discussions. However, there is no overtly political stance expressed,” it continued.
According to the site, it has access to “1.4 billion users & 20 billion comments.” The dataset is not complete; YouTube has more than 2.5 billion users.
Youtube-Tools launched about a week ago and is an outgrowth of LoL-Archiver. There’s also nHentai-Archiver, which can give you a comprehensive comment history of a user on the popular adult manga sharing site. Kick-Tools can produce the chat history or ban history of a user on the streaming site Kick. Twitch-Tools can give you the chat history for an account sorted by timestamp and sortable by all the channels they interact on.
Twitch-Tools only monitors a channel that users have specifically requested it to monitor. As of this writing, the website says it is monitoring 39,057 Twitch channels. For example, I was able to pull a username from a popular Twitch stream, plug it into the tool and then track every time that user had made a comment on another one of the tracked channels.
Reached for comment, the developer of these tools didn’t dance around the reason they built them. “The end goal of people tracking Twitch channels would certainly be to gather information on specific users,” they said.
Twitch did not respond to 404 Media’s request for comment, and YouTube acknowledged a request but did not provide a statement in time for publication. But I spoke with someone in control of a contact email address listed on the LoL-Archiver’s “about” page. They said they’re based in Europe, have a background in OSINT, and often partnered with law enforcement in their country. “I decided I launched [sic] these tools in the first place as a project to build the tool that could be use by LEAs [law enforcement agencies] and PIs [private investigators.]”
According to the developer, they’ve provided the tool to cops in Portugal, Belgium, and “other countries in Europe.” They told 404 Media that the website is meant for private investigators, journalists, and cops.
“To prevent abuses [sic] we only allow the website to people with legitimate purposes,” they said. I asked how the site vets users. “We ask the users to accept our Terms of Use and do targeted KYC [know your customer] requests to people we estimate have an illegitimate reason to use our website. If we find that a user doesn’t have a legitimate purpose to use our service according to our terms of use, we reserve the right to terminate that user’s access to our website.”
The site’s Terms of Service makes this explicit in the first paragraph. “The Service is distributed only to licensed professional investigators and law enforcement. Non-professional individuals are not allowed to subscribe to the Service,” it says.
But YouTube-Tools is a “grant access first ask for proof later” kind of website. 404 Media was able to set up an account and begin browsing information in minutes after paying for a month of the service with a credit card. It didn’t ask me any questions about how I planned to use the service nor did it need any other information about me.
I asked the developer for an example of a time they had removed someone from the platform. They said they’d removed a client a few weeks ago after they realized the email the client used to obtain their license was “temporary.” The developer said they reached out to the client to ask why they wanted the tool and didn’t get a response. “They ignored us, and we therefore reported the issue to Stripe and terminated their access.”
The AI summaries are new and only exist for the YouTube tools. “The AI summary is to provide points of interest, so that an investigator doesn’t have to go through the (potentially) thousand [sic] of comments,” the developer said. “This summary is not to replace the research and investigation process of the investigator, but to give clues on where they can start looking at first.”
I asked them about the possible privacy violations the tool presents and the developer acknowledged that they’re real. “But we try to limit them during [our] vetting process,” they said. Again, I was able to sign up for the site with a credit card and an email. I was not vetted.
“I also believe that the tool can be a very valuable source of information for professionals such as police agencies, private investigators, journalists,” the developer said. “That is why we currently offer free access to police agencies requesting it, and have offered [it] to several agencies already. If someone wants to remove any information that the tools has archived they can make a formal request to us, to which we will comply, as we’ve always done.”
Scraping public data is a big problem. Last month, researchers in Brazil published a dataset built from 2 billion Discord messages they’d pulled from publicly available servers. Last year, Discord shut down a service called Spy Pet that’s similar to YouTube-Tools.
- Developers like this should be considered collaborators with fascist elements. - Why? They are just bringing to light the tools already being used by corps behind closed doors. - Edit: Seems the author wants to paint a different picture. Either extreme CYA or you were correct. - Anyone who builds tools explicitly for mass surveillance of the public is a collaborator, no matter who writes their paycheck. - And don’t let the narrative be “MegaCorp built these tools.” No, human beings with names and ostensibly consciences built these tools for MegaCorp. They are just as guilty if not more so. 
 
 
- I like to throw out a random y’all just to throw off scrapers like this. In my mind, they’ll either think I’m in the southern US or just get confused. - Brilliant! I should start writing and speaking like a Brit. 
- You should go on fetlife. Whole bunch of kinky motherfuckers in the science field doing research in Antartica. 
 
- to Predict Where Users Live - sign up - no proper examples - made for cops, journalists and PI’s - Translation: Unless you’ve revealed a bunch of personal information, it wont “predict” where you live. - As someone who only speaks English as a third language after Latin and Mandarin living in Anchorage I tend to agree with you. 
- Except you don’t need personal information, look at supermarket data, even when ‘anonymized’ it can predict pregnancy and even a rough geographic location based on what items are available at various locations/times as well as corelating your purchases with the weather. 
 
- deleted by creator - This is exactly what I did on Reddit. 
 
- YouTube-Tools also appears to be a violation of YouTube’s privacy policies, and raises questions about what YouTube is doing to stop the scraping and repurposing of peoples’ data like this. “Public search engines may scrape data only in accordance with YouTube’s robots.txt file or with YouTube’s prior written permission,” it says. - Yeah right Google, scrape people at will, scrape artists copyrighted material without approbations or repercussions, and you dare to tell us to obey your little concerns… - Fuck off will ya 
- Pretty fucking awful 
- This is something an LLM can do really well, really easily, with little engineering effort. - The cat is out of the bag. Most of us have a bunch of unstructured comment data that is peppered with references to local weather, policies, sports, etc. Determining what city or what district you live in is, sadly, not hard in this day and age. - This only requires a quick integration and some promo engineering. Assume people are already doing it. 
- We need a tool to post random comments on random YT videos. - So… bots? I’m pretty sure we already have those. - that is even better, become a bot account. 
 
 
- deleted by creator - Seeing how this article exists, you’re already too late. 
- I was with ya until that last bit 
 
- “This summary is not to replace the research and investigation process of the investigator, but to give clues on where they can start looking at first.” - I bet most of them live in cloud with this AI. They’re probably owned or financed by same company because if you generate fake data then who cares these days as long as you paid for it. 





