"I'm Pomu"Pomu Rainpuff

Archiving and You

Compressed jpg

wtf is happening
Joined:  Dec 21, 2022
Text (Mostly Tweet) Archiving and You

Kourai

Not pictured: the will to live
Early Adopter
Archivist
Joined:  Sep 16, 2022

Text (Mostly Tweet) Archiving and You​

Why?​

People delete things. To discuss news and drama accurately, it's important to have a record of who said what.

How?​

Archive sites:​

Archive sites take a snapshot of a web page as it was on the day you accessed it. Unlike screenshots, they are not easily faked.

I know of three archive sites. There may be others. Unfortunately all three sites have their own quirks when it comes to tweets, which are the majority of what should be archived.
  • archive.today - Paste URL in. Press Save. The Firefox add-on in the post above reduces this to one button. When the page changes to archive.ph/wip/[characters], copy that URL and put it in your post. You do not need to wait for it to finish.
    • Only preserves embedded images as thumbnails
    • Does not preserve replies to a tweet (currently)
  • ghostarchive.org - Paste URL in. Press Submit for Archival. When the page changes to ghostarchive.org/archive/[characters], copy that URL and put it in your post. You do not need to wait for it to finish.
    • Preserves replies to a tweet (currently)
    • Seems slower than archive.today
    • Seems to intend to preserve full embedded images in tweets, but gets stuck on loading
  • archive.org - Click on Web. Paste URL into Save Page Now and press Save.
    • Has a process for takedown requests, so not the first choice. If you're paranoid that a page which has been deleted is only on archive.org, you can re-archive it using archive.today.
    • Useless for new tweets when I tested it
    • Very useful for edge cases like video, audio, Google Docs that require you to be signed in
Important embedded media, like images of ominous white pages with text, should be saved separately if possible.

All three sites can be searched for existing archives as well, provided you have a URL.

One weird trick (the TVA embed method):​

If you have a lot of tweets you want to archive, paste them all into a post on this forum and press Post. Then use the Archive Page button on the top right of the page. This opens a new archive.today tab. Edit that archive URL into your post. The embedded tweets will still be visible, even if the original tweets are deleted. However, anything in spoiler tags is not archived.
1701433807923.png

Screenshots:​

Archive sites are an independent record. Screenshots are not independent; they can be faked. But sometimes you have no choice (e.g. site that requires login), and in those cases screenshots are better than nothing.

Edge: ... (Menu) > Web Capture, or Ctrl+Shift+S - Capture Full Page is an option
Firefox: Right-click > Take Screenshot - Save Full Page is an option (but see the post below)
Chrome/Brave/etc. : GoFullPage add-on

I have not looked into mobile browser options. If nothing else, you can screenshot your phone screen and crop it down as needed.
 
Last edited:

SZ 109

Guest
Joined:  Nov 13, 2022
Firefox: Right-click > Take Screenshot - Save Full Page is an option
Keep in mind that using this method, websites can not only detect that you've done this, but track you to that particular installation of Firefox.
Demo link
In short, it's because that screenshot feature is implemented via a webextension that lives on a unique moz-extension:// url that's generated at install time, which is leaked via origin header when it interacts with the outside world.
 

21st Century Pipkin Man

rabbit's foot, vomit drawer
Joined:  Jan 18, 2023

Text (Mostly Tweet) Archiving and You​

Do you have any experience or opinion on which one of these is the most reliable? I feel like it's a crapshoot as to whether these services work, or shit the bed because their X accounts are rate limited or banned.
 

Kourai

Not pictured: the will to live
Early Adopter
Archivist
Joined:  Sep 16, 2022
Do you have any experience or opinion on which one of these is the most reliable? I feel like it's a crapshoot as to whether these services work, or shit the bed because their X accounts are rate limited or banned.
I usually start with archive.today. I use Ghostarchive or the TVA embed method if there are important replies or multiple tweets. My experience may be different because of time zones and how many people are hitting the services and/or Twitter when I'm online. The TVA embed method may actually be the most resilient.
 
Mass Downloading Videos and Images from Xitter

Porean

Lavender Spider Lover & Tsunderia Scholar
Early Adopter
Joined:  Sep 16, 2022
I've been searching for a way to mass download videos and images from twitter for a while and i have now found a solution;
Now, you might click that and say to yourself; "Porean is trying to give me a virus and make me pay to remove it via Google store gift cards."
To that i say; "fuck you bloody banchod bitch show bob and vagene"

But for real, it might look scummy but it works, and as a man with an Oshi who constantly posts drawings and I HAVE TO HAVE THEM ALL it's been very useful, not only does it let you download an entire Twitter accounts media tab, you can also use it on a specific hashtag, useful for art-tags you want to archive. You have to give the programme your cookies for twitter for it to work properly, if you don't know how it tells you in the programme itself.

I will attach an image here to show you my folder after I've downloaded Amiya's media tab and her fan-art hashtag;
folder example.PNG
As you can see it sorts via username if you tell it to, which i love, so i very much recommend this if you have a lot of images you wanna download from Vtubers you follow. It is slow if it has to search through many posts (Ami posts a LOT.) It also works on other sites, how-ever i have not used it on other sites so i cannot speak for it's efficacy there.
 
Last edited:

21st Century Pipkin Man

rabbit's foot, vomit drawer
Joined:  Jan 18, 2023
I've been searching for a way to mass download videos and images from twitter for a while and i have now found a solution;
Now, you might click that and say to yourself; "Porean is trying to give me a virus and make me pay to remove it via Google store gift cards."
To that i say; "fuck you bloody banchod bitch show bob and vagene"

But for real, it might look scummy but it works, and as a man with an Oshi who constantly posts drawings and I HAVE TO HAVE THEM ALL it's been very useful, not only does it let you download an entire Twitter accounts media tab, you can also use it on a specific hashtag, useful for art-tags you want to archive. You have to give the programme your cookies for twitter for it to work properly, if you don't know how it tells you in the programme itself.

I will attach an image here to show you my folder after I've downloaded Amiya's media tab and her fan-art hashtag;
View attachment 59911
As you can see it sorts via username if you tell it to, which i love, so i very much recommend this if you have a lot of images you wanna download from Vtubers you follow. It is slow if it has to search through many posts (Ami posts a LOT.) It also works on other sites, how-ever i have not used it on other sites so i cannot speak for it's efficacy there.
gallery-dl seems to do the exact thing but using a command line, and it doesn't look and feel like computer AIDS. It's also maintained more actively, which means it has a better chance of working whenever the Twitter API gets fucked with - a problem I had when I tried out wfdownloader a few months ago.
 

The Proctor

Manager Arc Unlocked?
Staff member
Lovebug Proctologist
Joined:  Sep 9, 2022
I'm going to update the OP a bit in a little while, since so many people have come forward during the Marina Saga saying 'I know this stuff but didn't record anything.' Going to make it a bit more concise and easy to understand for innocent zoomers who think that Discord texts vanish if they're no longer in the same window.
 

naganon

#1 Hexa Fan
Joined:  Feb 26, 2023
@Short I'm using your streamlink setup to archive Hexa's vods. Is there anyway to bypass the ads with it, instead of just cutting them out? I've got a sub.
Edit: would adding "--twitch-api-header=Authorization=OAuth abcdefghijklmnopqrstuvwxyz0123" after "--twitch-disable-ads" work?
 
Last edited:

Short

God Damn the Sun
Joined:  Apr 4, 2023
@Short I'm using your streamlink setup to archive Hexa's vods. Is there anyway to bypass the ads with it, instead of just cutting them out? I've got a sub.
Edit: would adding "--twitch-api-header=Authorization=OAuth abcdefghijklmnopqrstuvwxyz0123" after "--twitch-disable-ads" work?
It should work
 

naganon

#1 Hexa Fan
Joined:  Feb 26, 2023
It should work
Thanks, nigga. You're one of the good ones.
Edit:
cmd-OAVD3y-AWWa.png

It says abcdefghijklmnopqrstuvwxyz0123 is unrecognized.
Edit 2: I tried "--twitch-api-header=Authorization=OAuth abcdefghijklmnopqrstuvwxyz0123" in quotes. That got rid of the unrecognized arguments, but I have no clue if that means it will work or not until Hexa starts streaming.
 
Last edited:

Seth

Well-known member
Fubuki's Best Friendo
Joined:  Feb 11, 2023

Text (Mostly Tweet) Archiving and You​

Why?​

People delete things. To discuss news and drama accurately, it's important to have a record of who said what.

How?​

Archive sites:​

Archive sites take a snapshot of a web page as it was on the day you accessed it. Unlike screenshots, they are not easily faked.

I know of three archive sites. There may be others. Unfortunately all three sites have their own quirks when it comes to tweets, which are the majority of what should be archived.
  • archive.today - Paste URL in. Press Save. The Firefox add-on in the post above reduces this to one button. When the page changes to archive.ph/wip/[characters], copy that URL and put it in your post. You do not need to wait for it to finish.
    • Only preserves embedded images as thumbnails
    • Does not preserve replies to a tweet (currently)
  • ghostarchive.org - Paste URL in. Press Submit for Archival. When the page changes to ghostarchive.org/archive/[characters], copy that URL and put it in your post. You do not need to wait for it to finish.
    • Preserves replies to a tweet (currently)
    • Seems slower than archive.today
    • Seems to intend to preserve full embedded images in tweets, but gets stuck on loading
  • archive.org - Click on Web. Paste URL into Save Page Now and press Save.
    • Has a process for takedown requests, so not the first choice. If you're paranoid that a page which has been deleted is only on archive.org, you can re-archive it using archive.today.
    • Useless for new tweets when I tested it
    • Very useful for edge cases like video, audio, Google Docs that require you to be signed in
Important embedded media, like images of ominous white pages with text, should be saved separately if possible.

All three sites can be searched for existing archives as well, provided you have a URL.

One weird trick (the TVA embed method):​

If you have a lot of tweets you want to archive, paste them all into a post on this forum and press Post. Then use the Archive Page button on the top right of the page. This opens a new archive.today tab. Edit that archive URL into your post. The embedded tweets will still be visible, even if the original tweets are deleted. However, anything in spoiler tags is not archived.
View attachment 58490

Screenshots:​

Archive sites are an independent record. Screenshots are not independent; they can be faked. But sometimes you have no choice (e.g. site that requires login), and in those cases screenshots are better than nothing.

Edge: ... (Menu) > Web Capture, or Ctrl+Shift+S - Capture Full Page is an option
Firefox: Right-click > Take Screenshot - Save Full Page is an option (but see the post below)
Chrome/Brave/etc. : GoFullPage add-on

I have not looked into mobile browser options. If nothing else, you can screenshot your phone screen and crop it down as needed.
A simple remark concerning ghostarchive. Yes you can archive twitter threads but when you try and archive with the URL of a comment in said thread the website simply keeps loading until the archive fails.

I could be totally wrong and very unlucky since it never worked for me at least.
 

21st Century Pipkin Man

rabbit's foot, vomit drawer
Joined:  Jan 18, 2023
A simple remark concerning ghostarchive. Yes you can archive twitter threads but when you try and archive with the URL of a comment in said thread the website simply keeps loading until the archive fails.

I could be totally wrong and very unlucky since it never worked for me at least.
Every archival service seems iffy for Xitter, I think they all utilize throwaway accounts that keep getting banned and/or rate limited.
 
  • Like
Reactions: Icy

21st Century Pipkin Man

rabbit's foot, vomit drawer
Joined:  Jan 18, 2023
Brothers, i bring you two new(?) tools;

1. I see many complaints about keeping yt-dl and yt-dlp updated, well i have recommended this before but TARTUBE is excellent for this. Besides having a GUI making it useable for those of us with merely Asperger's and not full-blown Autism, it's one click updatable and includes installs for FFMPEG, matplotlib and streamlink, and has a built in clipper via timestamps.

2. Given talk of subtitling and me recently doing some of that myself, i have tried this program; WHISPER GUI It is pretty reliant on having a Nvidia card (10xx series or up) but it transcribes videos or audio files and makes a subtitle file from it, and I've tried it on both videos and podcast with multiple speaker and the results have been great, besides some hang-ups on names and such, but even given the time you have to spend on cleaning that up it is a huge time-saver.

[EDIT;] i have here an example, attached to this post is a .rar with a generated unedited SRT file made from this clip-vid i made;

it's VERY accurate, and it even timestamps every line of dialogue. I highly recommend you check whispers-gui out.

An update on this:

WhisperGUI hasn't been updated since its release, and it's still missing a lot of features. On default settings, it also seems to hallucinate and fuck up timings if there's any sections without speech in whatever you're feeding it, and there's no way to adjust these from the interface.

WhisperCPP runs a lot better for me, doesn't require you to fuck with Python, and supports quantized models. Quantized models require less memory, meaning you can run them with lesser hardware. WhisperGUI recommends 10gb for large-v2, but a quantized large-v2 should do with about 5! It'll also let you fine tune the settings.
Obvious downside is that you need to run it using the command line, and it only takes .wav files, but you can do this easily using FFMPEG.


-After downloading it, grab one or more models from here: https://huggingface.co/ggerganov/whisper.cpp. Apparently large-v2 is still the best for english, but large-v3 works better with translation.
-Example command to transcribe a file named output.wav into .srt, using the model "ggml-large-v3-q5_0.bin" placed in the models subfolder of your WhisperCPP folder:
main -osrt -f output.wav -m models/ggml-large-v2-q5_0.bin -output-srt
 

Porean

Lavender Spider Lover & Tsunderia Scholar
Early Adopter
Joined:  Sep 16, 2022
An update on this:

WhisperGUI hasn't been updated since its release, and it's still missing a lot of features. On default settings, it also seems to hallucinate and fuck up timings if there's any sections without speech in whatever you're feeding it, and there's no way to adjust these from the interface.

WhisperCPP runs a lot better for me, doesn't require you to fuck with Python, and supports quantized models. Quantized models require less memory, meaning you can run them with lesser hardware. WhisperGUI recommends 10gb for large-v2, but a quantized large-v2 should do with about 5! It'll also let you fine tune the settings.
Obvious downside is that you need to run it using the command line, and it only takes .wav files, but you can do this easily using FFMPEG.


-After downloading it, grab one or more models from here: https://huggingface.co/ggerganov/whisper.cpp. Apparently large-v2 is still the best for english, but large-v3 works better with translation.
-Example command to transcribe a file named output.wav into .srt, using the model "ggml-large-v3-q5_0.bin" placed in the models subfolder of your WhisperCPP folder:
main -osrt -f output.wav -m models/ggml-large-v2-q5_0.bin -output-srt
This made me interested if whether or not these new model were an improvement over whisper GUI which admittedly is rather outdated at this point (I think it had no updates in 180~ days when i first recommended it) and CPP does indeed give better results, with the exception of sometimes deciding to make background noise into SDH subs (like interpreting the Super Mario Bros. theme as "waterfall noises" lol).
this video was used for testing:

I ran several Versions of whisper on different sizes, CPP large v2 gave superior results to all others.

And since the command line killed my dog (LET'S FUCKING GOOOOO) and i have sworn revenge, i found a way to give a GUI to CPP-Whisper;
Which despite being Danish is a good subtitle editor (I prefer Aegissub myself, mostly out of nostalgia) Subtitle Edit has a built in "generate text from audio" option, letting you download several whisper forks:

and you can either download the models yourself or do it through the programme itself. So for those command line avoidant such as myself i recommend Subtitle edit, being able to download everything through the programme itself and not having to trawl several githubs is very nice.
Also interesting note, from my testing the worst version was always CPP cuBLAS. Just really sucking shit each time despite requiring you to install Nvidias CUDA.
 

httn

Panko of color
Joined:  Dec 27, 2022
Does anyone have experience using chat-downloader for members streams?

One of the Chuubas I watch has now moved there freechat to members only and i'm finding that the chat recording seems to fail after around 30 minutes and requires a fresh cookies.txt to be exported each time. Checking the cookie files them selves they shouldn't expire for another 2 weeks. Seems to fail whether or not i set a timeout duration using
Code:
--timeout


I'm wondering if i'm just being a tard with chat-downloaders cli?

example of the command i am running:
Code:
chat_downloader https://www.youtube.com/watch?v=*** --cookies examplecookie.txt --output file.txt
 

Porean

Lavender Spider Lover & Tsunderia Scholar
Early Adopter
Joined:  Sep 16, 2022
Does anyone have experience using chat-downloader for members streams?

One of the Chuubas I watch has now moved there freechat to members only and i'm finding that the chat recording seems to fail after around 30 minutes and requires a fresh cookies.txt to be exported each time. Checking the cookie files them selves they shouldn't expire for another 2 weeks. Seems to fail whether or not i set a timeout duration using
Code:
--timeout


I'm wondering if i'm just being a tard with chat-downloaders cli?

example of the command i am running:
Code:
chat_downloader https://www.youtube.com/watch?v=*** --cookies examplecookie.txt --output file.txt
How did you export the Cookies? By a add-on into a netscape cookies txt or did you copy the sqlite file?
[edit] i see cookies.txt i am wood retard
@Short
Save us oh chat master
 
Last edited:

Short

God Damn the Sun
Joined:  Apr 4, 2023
Does anyone have experience using chat-downloader for members streams?

One of the Chuubas I watch has now moved there freechat to members only and i'm finding that the chat recording seems to fail after around 30 minutes and requires a fresh cookies.txt to be exported each time. Checking the cookie files them selves they shouldn't expire for another 2 weeks. Seems to fail whether or not i set a timeout duration using
Code:
--timeout


I'm wondering if i'm just being a tard with chat-downloaders cli?

example of the command i am running:
Code:
chat_downloader https://www.youtube.com/watch?v=*** --cookies examplecookie.txt --output file.txt
chat-downloader should be able to download members-only chat with no problem, try creating the cookie.txt file from a browser you never use.
if you have ublock origin you can try to add these filters before making a cookie file

Code:
||accounts.youtube.com/RotateCookiesPage
||accounts.youtube.com/RotateCookies
 

httn

Panko of color
Joined:  Dec 27, 2022
Code:
Code:
||accounts.youtube.com/RotateCookiesPage
||accounts.youtube.com/RotateCookies
This seems to have done the trick. Thank you african prince!
 

21st Century Pipkin Man

rabbit's foot, vomit drawer
Joined:  Jan 18, 2023
Does anyone have a fast and easy way of getting the unmuted Twitch VOD? VOD recovery tools should be able to do it, but the way they check and download the video by segments is incredibly slow, and takes a bit of fiddling.
 
Top Bottom