"She's got a big cock"Noir Vesper

Archiving and You

Icy

Member
Joined:  Jan 27, 2024

Text (Mostly Tweet) Archiving and You​

Why?​

People delete things. To discuss news and drama accurately, it's important to have a record of who said what.

How?​

Archive sites:​

Archive sites take a snapshot of a web page as it was on the day you accessed it. Unlike screenshots, they are not easily faked.

I know of three archive sites. There may be others. Unfortunately all three sites have their own quirks when it comes to tweets, which are the majority of what should be archived.
  • archive.today - Paste URL in. Press Save. The Firefox add-on in the post above reduces this to one button. When the page changes to archive.ph/wip/[characters], copy that URL and put it in your post. You do not need to wait for it to finish.
    • Only preserves embedded images as thumbnails
    • Does not preserve replies to a tweet (currently)
  • ghostarchive.org - Paste URL in. Press Submit for Archival. When the page changes to ghostarchive.org/archive/[characters], copy that URL and put it in your post. You do not need to wait for it to finish.
    • Preserves replies to a tweet (currently)
    • Seems slower than archive.today
    • Seems to intend to preserve full embedded images in tweets, but gets stuck on loading
  • archive.org - Click on Web. Paste URL into Save Page Now and press Save.
    • Has a process for takedown requests, so not the first choice. If you're paranoid that a page which has been deleted is only on archive.org, you can re-archive it using archive.today.
    • Useless for new tweets when I tested it
    • Very useful for edge cases like video, audio, Google Docs that require you to be signed in
Important embedded media, like images of ominous white pages with text, should be saved separately if possible.

All three sites can be searched for existing archives as well, provided you have a URL.

One weird trick (the TVA embed method):​

If you have a lot of tweets you want to archive, paste them all into a post on this forum and press Post. Then use the Archive Page button on the top right of the page. This opens a new archive.today tab. Edit that archive URL into your post. The embedded tweets will still be visible, even if the original tweets are deleted. However, anything in spoiler tags is not archived.
View attachment 58490

Screenshots:​

Archive sites are an independent record. Screenshots are not independent; they can be faked. But sometimes you have no choice (e.g. site that requires login), and in those cases screenshots are better than nothing.

Edge: ... (Menu) > Web Capture, or Ctrl+Shift+S - Capture Full Page is an option
Firefox: Right-click > Take Screenshot - Save Full Page is an option (but see the post below)
Chrome/Brave/etc. : GoFullPage add-on

I have not looked into mobile browser options. If nothing else, you can screenshot your phone screen and crop it down as needed.
What about megalodon? Its basically a japanese archive.today, its useless for tweets threads, but it archives well a lot of sites.
 

naganon

#1 Hexa Fan
Joined:  Feb 26, 2023
My stream archive setup:

I use ytarchive, streamlink, FFmpeg, and chat downloader, mainly with custom bat files
just install chat downloader, streamlink, ytarchive, FFmpeg, and put them in the same folder together with the bat files.

ytarchive:
Code:
@echo off
set /p url_and_quality="Url and quality"
start "" ytarchive -t --vp9 --add-metadata --monitor-channel %url_and_quality%
It will ask you to put the video or channel URL and the video quality
Ex: https://www.youtube.com/watch?v= best
If no quality is input it will ask you for one after you press enter
Hate to bother you again, but is there anyway make the ytarchive setup work with member streams too? ytarchive is what I have been using. I setup yt-dlp, but I'd like another option just in case I fucked it up.
 

GOD'S STRONGEST BUILDERBEAR

"Shut up, Dazzle. I will clip your balls" -SB
Early Adopter
Joined:  Sep 12, 2022
Hate to bother you again, but is there anyway make the ytarchive setup work with member streams too? ytarchive is what I have been using. I setup yt-dlp, but I'd like another option just in case I fucked it up.
yes if you use
Code:
    --members-only
        Only download members-only streams. Can only be used with channel URLs
        such as /live, /streams, etc, and requires cookies.
        Useful when monitoring channels and you only want membership streams.
and
Code:
--cookies COOKIES_FILE
        Give a cookies.txt file that has your youtube cookies. Allows
        the script to access members-only content if you are a member
        for the given stream's user. Must be netscape cookie format.
 

naganon

#1 Hexa Fan
Joined:  Feb 26, 2023
yes if you use
Code:
    --members-only
        Only download members-only streams. Can only be used with channel URLs
        such as /live, /streams, etc, and requires cookies.
        Useful when monitoring channels and you only want membership streams.
and
Code:
--cookies COOKIES_FILE
        Give a cookies.txt file that has your youtube cookies. Allows
        the script to access members-only content if you are a member
        for the given stream's user. Must be netscape cookie format.
So if I wanted to edit the .bat file, would this work?
Code:
@echo off
set /p url_and_quality="Url and quality"
start "" ytarchive -t --vp9 --cookies cookies-youtube-com --members-only --add-metadata --monitor-channel %url_and_quality%
 

GOD'S STRONGEST BUILDERBEAR

"Shut up, Dazzle. I will clip your balls" -SB
Early Adopter
Joined:  Sep 12, 2022
So if I wanted to edit the .bat file, would this work?
Code:
@echo off
set /p url_and_quality="Url and quality"
start "" ytarchive -t --vp9 --cookies cookies-youtube-com --members-only --add-metadata --monitor-channel %url_and_quality%
probably, never actually archived anything, but docs say it should work
 

Willemshaven

Outlasted the Chinese Community Sinicization Group
Joined:  Sep 23, 2023
A simple remark concerning ghostarchive. Yes you can archive twitter threads but when you try and archive with the URL of a comment in said thread the website simply keeps loading until the archive fails.

I could be totally wrong and very unlucky since it never worked for me at least.
Repeating what I've said in the general thread:
Ghostarchive is much better in archiving Tweets on Firefox than (Ungoogled) Chromium for some reason. So far, archiving Tweets using Firefox has been successful 100% of the time I've tried, reply Tweets included. The quirk is that archiving in Firefox makes the Tweet not load unless you click "Archived page not showing up?" in the left menu.
 
Last edited:

Seth

Well-known member
Fubuki's Best Friendo
Joined:  Feb 11, 2023
Repeating what I've said the general thread:
Ghostarchive is much better in archiving Tweets on Firefox than (Ungoogled) Chromium for some reason. So far, archiving Tweets using Firefox has been successful 100% of the time I've tried, reply Tweets included. The quirk is that archiving in Firefox makes the Tweet not load unless you click "Archived page not showing up?" in the left menu.
Im currently having this issue with one of Kuri Rinji's twitter thread that i have shared on the niji thread. I can archive but it doesnt load. I press "Archived page not showing up" but since it flagged as sensitive content for some reason, the screenshots wont show up. The archive basically become useless.
 

Willemshaven

Outlasted the Chinese Community Sinicization Group
Joined:  Sep 23, 2023
Im currently having this issue with one of Kuri Rinji's twitter thread that i have shared on the niji thread. I can archive but it doesnt load. I press "Archived page not showing up" but since it flagged as sensitive content for some reason, the screenshots wont show up. The archive basically become useless.
Well, I've tried archiving the Tweet on Firefox, making sure the unnecessary bits in the link are removed (hover here). Not only does the Tweet load on first try, the "Show" button fully works. (FYI, I've opened the archived page back in Chromium.)

LAST UPDATED on 01-5-2024
Appendix about Nitter
Watch this page for a streak of bad archives which will indicate if Twitter archiving will work or not.

Archiving Tweet threads using Ghostarchive reliably
version 6 by Willemshaven​

Ghostarchive is currently the only remaining web archiver that can save entire Twitter threads and NSFW-marked Tweets, but it can be a crapshoot to successfully get it to archive Tweets.
That is unless you're using Firefox. It's by far the most reliable way to archive Tweets with Ghostarchive for some reason. FYI, my Firefox setup also has uBlock Origin and LocalCDN add-ons installed.
Despite it being very good tool, there're still a few important things and caveats to keep in mind:

And there you have it. You can now reliably utilize Ghostarchive to archive entire Tweet threads at a time. I will keep updating this post when new developments happen.
Version 1
-Initial version

Version 2 (11-4-2024)
-Updated the situation on archiving Tweets with videos.
-Re-worded some of the items.
-Changed closing words.

Version 3 (12-4-2024)
-Added situation of archiving quote Tweets of Tweets with video's.

Status Update (14-4-2024)
-Added notice regarding Twitter archiving not working.
-Update ~8PM UTC: Twitter archives work again.

Version 4 (15-4-2024)
-Added example of a fully successful archive of a video Tweet. Revised images and videos item accordingly.
-Updated archiving status to edit time.

Version 5 (18-4-2024)
-Added another scenario where archiving won't work fully.

Status Update (19-4-2024)
-Added notice regarding Twitter archiving not working, again.

Status Update (20-4-2024)
-Added notice that archiving Twitter with Ghostarchive works again.

Version 6 (20-4-2024)
-Added link to my new Appendix about Nitter.
-Updated status on "read more" Tweets.
-Shortening of most "Archived page not showing up?" parts.

Status Update (24-4-2024)
-I went away and Twitter archiving broke again.
-It worked again since ~9PM UTC, but it's in CAPTCHA hell since ~11PM UTC.

Status Update (25-4-2024)
-I went away for a long time and Ghostarchive is no longer loading.

Status Update (26-4-2024)
-Ghostarchive is working again.

Update (01-5-2024)
-I won't update the status of Twitter archiving anymore. I will only update the guide itself from now on.
 
Last edited:

Seth

Well-known member
Fubuki's Best Friendo
Joined:  Feb 11, 2023

The Proctor

Manager Arc Unlocked?
Staff member
Joined:  Sep 9, 2022
This should help a lot with archiving. Thank you. @The Proctor is there any way of highlighting @Willemshaven's post? A lot of people are struggling with archives recently, it would fix a lot of headaches.

Having dinner, will do so right after.
 

The Rrat

Phoneposting, Rat-loving menace
Early Adopter
Joined:  Sep 9, 2022
making sure the unnecessary bits in the link are removed
People should also be doing that because share links sometimes contain a string of characters that are specific to the account you are logged in as, as far as I am aware.
 

Willemshaven

Outlasted the Chinese Community Sinicization Group
Joined:  Sep 23, 2023
I've edited my highlighted post into a more detailed guide as a result of hours of testing.
 

BlueSharkTV

Fucking Riggers
Early Adopter
Yuria's Husband
Joined:  Sep 10, 2022
[NEEDS more testing] Archived Tweets with videos doesn't seem to work at all. Clicking "Archived page not showing up?" will lead me to a error page. I still have to test if this will cause issues with, for example, Tweet chains with a video attached somewhere.
I don't remember if they only say something about it when archiving, but ghostarchive says it won't archive pages with media over 50mb so videos break the archiving process.
 

Willemshaven

Outlasted the Chinese Community Sinicization Group
Joined:  Sep 23, 2023
I don't remember if they only say something about it when archiving, but ghostarchive says it won't archive pages with media over 50mb so videos break the archiving process.
I've done more testing and archiving Tweets with attached video's that are short and small does work (minus the video). Tweet chains with video Tweets also work. I've updated my post to reflect these discoveries.
 

shipmate_F

menhera addicted sister
Joined:  Jun 21, 2023
A bit related to this, but how do you check if something was archived in the past?
For example, can you use search engines such as Google, DDG or Yandex? If so, what's the syntax?
I tried finding a tweet that was deleted back in february on the websites linked here and others such as archive.ph with no success.
 

21st Century Pipkin Man

rabbit's foot, vomit drawer
Joined:  Jan 18, 2023
A bit related to this, but how do you check if something was archived in the past?
For example, can you use search engines such as Google, DDG or Yandex? If so, what's the syntax?
I tried finding a tweet that was deleted back in february on the websites linked here and others such as archive.ph with no success.
I don't think there's much you can use for that exact purpose if you've already checked archive.today and ghostarchive like you said. Doesn't look like any of the search engines crawl ghostarchive, for example, unless they're linked somewhere.

It's a long shot, but you could try searching for some of the text in the tweet itself, since there might be a cached search engine result. Even a cached snapshot of someone's timeline works if they retweeted it.
Here's a handy site that lets you search cached versions via URL, and shows you where they're accessed on search engines: https://cache.pw/#howitswork
Google has recently removed the cached links from results, but you can look them up via syntax: cache:https://thevirtualasylum.com
 

Willemshaven

Outlasted the Chinese Community Sinicization Group
Joined:  Sep 23, 2023
LAST UPDATED on 24-4-2024
Twitter archiving guide appendix: The Nitter route
version 2 by Willemshaven
Nitter had been discontinued and shut down a few months ago. Despite this, there're still a handful of Nitter proxies that're still functional, such as nitter.poast.org.
Nitter is currently the only way to navigate Twitter without a account and bypass NSFW warnings. You will encounter "not found" errors from time to time, which can be resolved after refreshing a bunch of times. This error will sometimes extend into the archived pages themselves, such as this example.
Another benefit is that you can just "right click>save as" videos easily and also obtain the direct video link on the Twitter servers. Make sure to check "Enable HLS video streaming" in the settings page.

Until this guide, I've been ignoring Nitter as archiving through it doesn't save as much replies then archiving regular Twitter with Ghostarchive. Also, my previous testing with archiving video Tweets through Nitter resulted in a result that's no different than archiving through regular Twitter.
But with the growing amount of caveats in archiving Tweets with Ghostarchive and a somewhat common outage issue with archiving Twitter, I've been testing the scenarios through Nitter to see if it fares any better.

In conclusion, archiving through Nitter is a great substitute that will come in handy during Twitter archiving outages, with the main caveat being that it can be more finnicky. In other situations, it should be used to cover some of the shortcomings that comes with archiving regular Twitter. Nitter archives are superior than the Twitter "not showing up" limb mode.

Version 1 (20-4-2024)
-Initial version

Version 2 (24-4-2024)
-Changed closing words to better convey the purpose of using Nitter to archive Tweets.
 
Last edited:

Godzilla1984

Well-known member
Early Adopter
Joined:  Sep 12, 2022
Twitter archiving guide appendix: The Nitter route
by Willemshaven
Nitter had been discontinued and shut down a few months ago. Despite this, there're still a handful of Nitter proxies that're still functional, such as nitter.poast.org.
Nitter is currently the only way to navigate Twitter without a account and bypass NSFW warnings. You will encounter "not found" errors from time to time, which can be resolved after refreshing a bunch of times. This error will sometimes extend into the archived pages themselves, such as this example.
Another benefit is that you can just "right click>save as" videos easily and also obtain the direct video link on the Twitter servers. Make sure to check "Enable HLS video streaming" in the settings page.

Until this guide, I've been ignoring Nitter as archiving through it doesn't save as much replies then archiving regular Twitter with Ghostarchive. Also, my previous testing with archiving video Tweets through Nitter resulted in a result that's no different than archiving through regular Twitter.
But with the growing amount of caveats in archiving Tweets with Ghostarchive, I've been testing the scenarios through Nitter to see if it fares any better.

In conclusion, archiving through Nitter is more of a supplement but there're a few scenarios where it's superior than archiving through regular Twitter. There's less replies being archived, but at least the thread (that aren't too long) gets saved in it's entirely, which is better than the "not showing up" limb mode.
 

electronic elephant

"I am uncontrollable. I cannot be managed."—Vesper
Early Adopter
Joined:  Sep 10, 2022
Script for downloading subtitles from all streams (and optionally regular videos) of a youtube channel and transforming them to make them more searchable (stripping out things like timestamps and sound effects). The downloaded files can then be easily searched using a text editor that can search all files in a directory like vscode, sublime, atom, etc.

Dependencies (python packages): google-api-core google-api-python-client yt-dlp typer
Also requires creating a google cloud account to create a youtube api key, but the free query quota is enough to use this script. Set an environment variable 'YT_API_KEY' with the youtube api key.
Example usage:
python3 subtitles_download.py --help
python3 subtitles_download.py NinomaeInanis
Subtitle files are written to a directory ending with "_subs" and readable/searchable versions are written to a directory ending with "_transcripts".
 

Attachments

  • subtitles_download.py.txt
    13.4 KB · Views: 2
Last edited:
Top Bottom