Page 1 of 2
oolite.space on web.archive.org
Posted: Tue May 14, 2024 10:15 am
by user2357
Greetings, commanders! <o
I have just discovered that oolite.space apparently does not save completely on web.archive.org anymore.
https://web.archive.org/web/20230722190 ... ite.space/ was apparently the last good save.
Since then, from a quick count, there have been approximately 20 partially-successful saves, and approximately 4 redirects, and 1 (or 2?) saves that are marked as (a) redirect(s).
I thought I should bring this to someone's attention who is more clued-up about such matters.
Thank you for your consideration of this matter.
Sincerely and faithfully
user
Re: oolite.space on web.archive.org
Posted: Wed May 15, 2024 10:10 pm
by hiran
That website is under version control.
Do we have events in the range of last good scan and first partly scan?
Re: oolite.space on web.archive.org
Posted: Thu May 16, 2024 8:36 am
by user2357
hiran wrote: ↑Wed May 15, 2024 10:10 pm
That website is under version control.
Do we have events in the range of last good scan and first partly scan?
Hiran, if you're question is addressed to me... I'm sorry, but I don't even know what it means that a "website is under version control". Sorry, don't ask me, I don't know, I just work here. It's above my pay-grade.
Somebody else... please. ... Anybody...? Thanks.
Re: oolite.space on web.archive.org
Posted: Thu May 16, 2024 10:36 am
by hiran
Apparently the last good scan was on 22JUL2023. The next happened one month later, so at 22AUG2023 and is no longer considered good.
As partly scans are still possible I doubt we locked out robots or anything similar. Thus I'd look at changes to the site's content. And that is stored at
https://github.com/OoliteProject/oolite-web
The two important branches are only-static and only-media.
So all we need to check is whether there were changes in the above timerange for these two branches. Maybe we directly see what might be the issue.
Re: oolite.space on web.archive.org
Posted: Thu May 16, 2024 1:06 pm
by hiran
With the above assumptions I found now two pages that should answer the question what has changed in that timeframe.
only-static:
https://github.com/OoliteProject/oolite ... 65e647+140
only-media:
https://github.com/OoliteProject/oolite ... fe81e4+209
At that time we had lost oolite.org and were partly recovered to oolite.space. The missing parts had to be rewritten.
Why would the new way of the site be problematic for the internet archive?
Re: oolite.space on web.archive.org
Posted: Thu May 16, 2024 8:21 pm
by hiran
Here is some technical information about the Internet Archive / Wayback machine:
https://help.archive.org/help/using-the ... k-machine/
On the page I found this:
Why are some sites harder to archive than others?
If you look at our collection of archived sites, you will find some broken pages, missing graphics, and some sites that aren’t archived at all. Some of the things that may cause this are:
Robots.txt — A site’s robots.txt document may have prevented the crawling of a site.
Javascript — Javascript elements are often hard to archive, but especially if they generate links without having the full name in the page. Plus, if javascript needs to contact the originating server in order to work, it will fail when archived.
Server side image maps — Like any functionality on the web, if it needs to contact the originating server in order to work, it will fail when archived.
Orphan pages — If there are no links to your pages, the robot won’t find it (the robots don’t enter queries in search boxes.)
As a general rule of thumb, simple html is the easiest to archive.
Looking at our site's source code I do not see how the navigation bar on the left works. It is not plain HTML, it is not javascript but something like flex events. Nothing I am familiar with. Maybe the Wayback machine is struggling at the same level.
So the question arises: do we want to change that? Is archive.org something we want to have?
Re: oolite.space on web.archive.org
Posted: Fri May 17, 2024 7:26 am
by timer
yes, many parts of the site are built dynamically through requests from the JS... apparently that’s why the time machine can’t pick up the finished view of the site...
Re: oolite.space on web.archive.org
Posted: Fri May 17, 2024 7:38 am
by user2357
hiran wrote: ↑Thu May 16, 2024 8:21 pm
So the question arises: do we want to change that? Is archive.org something we want to have?
Firstly, thank you, hiran, for your diligence in this matter. I wouldn't know where to start. I still don't even know how to continue!
Thanks, timer, for confirmation.
However, to answer that last question of yours, hiran, as far as I am concerned: YES, VERY MUCH SO, PLEASE!!!
I am a great believer in the internet archive, and in its priceless value as a countermeasure against the loss of historical record and any conspirational-theoretical Orwellian Minitrue redaction.
If we as a commOonity can work together to save these our bulletin boards for posterity -- as we have done recently, and previously -- then surely we should also work at least as hard to save all other historical records of our growth and development, including those stored on the internet archive.
Elite and
Oolite are phenomenal pop-cultural artefacts like few others; they and all their related information should -- nay,
MUST! -- be preserved as far as it is within our abilities. ...please.
...
Another thought, though: If the internet archive proves too problematic, is/are there any other similar, simpler, better-suited alternative(s)?
Kind regards
user
Re: oolite.space on web.archive.org
Posted: Fri May 17, 2024 8:19 am
by timer
difficult question... in theory, I can try to create a truly static version, maybe I can even build it (create automatic) from the current one, but doing this within the current implementation of the site is probably too difficult - it would be easier to try to create it on a separate domain like static.oolite.space and make an invisible link to this domain (hide it using JS - then it will be visible in the archive) - Time Machine will find it and index it (probably). Or try provide a redirect to the static version when an archive bot is detected... need to try, and most importantly, find the time to do this...
Re: oolite.space on web.archive.org
Posted: Fri May 17, 2024 9:15 am
by hiran
timer wrote: ↑Fri May 17, 2024 8:19 am
difficult question... in theory, I can try to create a truly static version, maybe I can even build it (create automatic) from the current one, but doing this within the current implementation of the site is probably too difficult - it would be easier to try to create it on a separate domain like static.oolite.space and make an invisible link to this domain (hide it using JS - then it will be visible in the archive) - Time Machine will find it and index it (probably). Or try provide a redirect to the static version when an archive bot is detected... need to try, and most importantly, find the time to do this...
Sounds like we have options.
Timer has done a very good job in the past, but his time is limited. The task is more HTML related than into programming. If he gave hints what to look at, who could help finishing off that goal?
Re: oolite.space on web.archive.org
Posted: Fri May 17, 2024 11:22 am
by Cholmondely
One presumes that there is only so much space for stuff at web.archive.org
The archived sites which I've accessed tend to have too little information to be worthwhile -
Oosat 2 (only the one page) &
Dizzy's backup site.
Re: oolite.space on web.archive.org
Posted: Sun May 19, 2024 7:18 am
by user2357
hiran wrote: ↑Fri May 17, 2024 9:15 am
Timer has done a very good job in the past, but his time is limited. The task is more HTML related than into programming. If he gave hints what to look at, who could help finishing off that goal?
Yes. Time and/or skills are limited. Those of us who have more skills, have less time, and vice versa, it seems. If I could, I would, certainly, but, alas, my skillz am fail.
*
Cholmondely wrote: ↑Fri May 17, 2024 11:22 am
One presumes that there is only so much space for stuff at web.archive.org
The archived sites which I've accessed tend to have too little information to be worthwhile -
Oosat 2 (only the one page) &
Dizzy's backup site.
Yes. I marvel at the more than 100 web pages per person on this planet at the moment, that the internet archive has been able to save -- not to mention all the books, video, audio, software and images as well.
Where do they put it all
However, it is sometimes possible to save, manually, more than just the home page or other basic pages by means of the "Save Page Now" feature on
https://web.archive.org/.
*
Another thought: since oolite.space has been successfully saved in the past, up to 2023jul22, wouldn't it help to revert to that previous oolite.space web-page format, or would that be more trouble than it's worth? (I hope this question doesn't reveal how little of this I actually understand...
)
Re: oolite.space on web.archive.org
Posted: Sun May 19, 2024 12:48 pm
by hiran
user2357 wrote: ↑Sun May 19, 2024 7:18 am
hiran wrote: ↑Fri May 17, 2024 9:15 am
Timer has done a very good job in the past, but his time is limited. The task is more HTML related than into programming. If he gave hints what to look at, who could help finishing off that goal?
Yes. Time and/or skills are limited. Those of us who have more skills, have less time, and vice versa, it seems. If I could, I would, certainly, but, alas, my skillz am fail.
Do not underestime your skills. HTML is no black magic.
user2357 wrote: ↑Sun May 19, 2024 7:18 am
However, it is sometimes possible to save, manually, more than just the home page or other basic pages by means of the "Save Page Now" feature on
https://web.archive.org/.
You want to indicate you could trigger correct backups manually?
user2357 wrote: ↑Sun May 19, 2024 7:18 am
Another thought: since oolite.space has been successfully saved in the past, up to 2023jul22, wouldn't it help to revert to that previous oolite.space web-page format, or would that be more trouble than it's worth? (I hope this question doesn't reveal how little of this I actually understand...
)
Of course we can go back. That's the whole point of version control. We can go back and forth as we like. But as in any sci fi story problems arise if you want to change the past. Imagine that will open a parallel universe leading to a different future. In version control that is called a branch.
Question to all: shall we go back?
Re: oolite.space on web.archive.org
Posted: Sun May 19, 2024 12:54 pm
by phkb
The thing is, we went forward with fixes to other things. If we go back, we will have to redo all that work. Not impossible, mind - just a lot of work. Might it not be better to work out why the internet archive is having problems with the design as it exists now? Surely we can't be the first website to come up against this issue.
Re: oolite.space on web.archive.org
Posted: Mon May 20, 2024 7:39 am
by timer
Hi, all! )
First of all, I want to explain that the site has been completely rewritten to be static. What does it mean? This means that now the site consists ONLY of files like: html, css, js, json, etc. And all these files are publicly available on GitHub. And the files of our site are served to everyone by GitHub itself. Why was this done and what did it give? The previous site worked on a server on which PHP and Apache (analogue) were installed, and the site pages were formed by the SERVER, access to which was (forgive me, but can I say this) almost lost or extremely difficult for the community. Currently, a server is not required to operate the site. The site code is visible to everyone, many (who have rights) can change it, for example, add news or correct a typo. In theory, the site can now live forever (as long as GitHub exists and as long as it allows free hosting of projects) without the need for someone to maintain/pay for a separate server. This was the fundamental idea when creating our new ecosystem. There is also the difficult issue of domain, but this is a completely separate issue.
Now I’ll explain the following - in approximation we can say that the old site consisted of blocks that were combined with each other by server PHP code and then the finished built pages were sent to the browser. That is, the site did not previously consist of pages ready for downloading too. When problems (domain/site/server) happen - I decided to try to rewrite the site so that it does not require a server. But someone has to “combine” the blocks and build the pages of the site... Now the JS script does this. When a user opens a site, he receives a "blank" page on which there is almost nothing except a header and a menu, then the client’s browser launches a JS script and this script sends additional requests and receives the necessary blocks and completes the page - the main text is added, news blocks are added and etc. - all this happens in a split second before your eyes. This is how the vast majority of modern sites work - turn off the JS in your browser, and you will not recognize familiar sites on the Internet.
And now pay attention - TimeMachine does not run JS scripts - it only downloads HTML, CSS and images. That is why on the archive website you see only a “blank” page of our website.
We can try to rewrite the site into pages that will immediately include everything and be ready and complete. But no programmer will work with this - imagine that the menu or header code will have to be duplicated on each page, this is hell for any normal programmer (we can say that the history of programming development is largely a struggle against code duplication) - any correction in the code of blocks common to pages will turn into a non-trivial task.
In general, IMHO, the solution to the problem may be to write util code that will automatically create a full HTML version of the site from the source blocks and give this version to the archive. To perform this kind of task, a server is required. We have servers, this is not a problem. I'll try to find time and think about this issue.