Project

General

Profile

server technology considerations

Added by koszko about 2 years ago

What to write our platform (i.e. repo software) in? What comes to my mind right now are:

C - because that's what I know the best, it's fast, I would probably use it together with Apache (or Nginx)
Erlang - because of its great scalability and robustness

Another issue is what database to use. Some relational one (SQL approach) would probably be the simplest to use, at the cost of scalability. Either Postgres or MySQL/MariaDB would be OK in such case. However, I am also considering noSQL. I am not too experienced in this field. I know there are even some Erlang-specific databases.

Any suggestions? I hope for some quick ones (I'd like to start this part soon)


Replies (12)

RE: server technology considerations - Added by jahoti about 2 years ago

Use whatever you feel most comfortable with in all cases, I think- the most important thing for now is to have something that works. Even if we throw it away later, it will be easier by then to apply for funds for a re-write (even to hire someone if that's the preferred option), and there will be something to test the client-side tools with in the meantime.

On an associated note, #50 would (in my opinion) be a helpful task to address early on in writing this- an informal standard for the repository API will at least make it possible to start work on integrating support in the extension simultaneously (which is the closest I might get, having no experience with server-side programming),

RE: server technology considerations - Added by colby about 2 years ago

I suggest holding off on this indefinitely. Or at least in the meantime, we can piggyback off the https://hypothes.is infrastructure. Advertising the fact that a site uses non-free scripts and that fully free custom scripts are available instead is a great use case for Web annotations. All we'd need to do is set up a Hachette-specific group, or get people to use the public stream and use a specific tag, which our client can be trained to look for. We can use the same mechanism for handling (a) feedback about custom scripts themselves and (b) sites that are in need of a custom script.

Dan Whaley may or may not be also of some assistance (although it's not strictly necessary to have him personally involved). But who knows, maybe he could even help with the grant application, or alternative funding?

RE: server technology considerations - Added by colby about 2 years ago

Just to be clear: the Hypothesis server and client are free software, but I'm proposing that we not even worry about setting up our own server (with or without customizations for our use case). Just use the existing instance running at https://hypothes.is.

RE: server technology considerations - Added by koszko about 2 years ago

colby wrote:

Just to be clear: the Hypothesis server and client are free software

Doesn't server rely on nonfree Elasticsearch? Or am I missing something?

Anyway, what you suggest doesn't seem to be a bad idea. I have a personal issue, though - I already pretty much imagined how a repo should function and I know what I would need to program. With Hypothesis, I would need to first bite into the topic.

Annotations seem suitable for advertising facts about websites and getting feedback. But are they also for sharing the actual scripts?

Also, even if we're to use the existing infrastructure, we're not going to be doing that forever, are we? Once we reach the point we depart from Hypothesis service, are we going to have to deploy all their big software system with many dependencies and features we don't need? Or do you mean we would then finally develop our own server software, as I want to do right now?

RE: server technology considerations - Added by colby about 2 years ago

Annotations seem suitable for advertising facts about websites and getting
feedback. But are they also for sharing the actual scripts?

Annotations could be used for very short scripts. But what I had in mind is
hypothes.is being used purely as the side channel for annotating the original
site with notices that there are replacements available for the non-free
scripts used there--but the replacement script itself would be hosted
elsewhere (wherever the script author chooses to put it), and the annotation
would contain a link to it. We get decentralization "for free" this
way--anyone with access to a free/cheap static web host can dump their script
there and then point others to it with an annotation on the original site.

Note also that when I say use Hypothesis here, I'm not necessarily talking
about instructing users to install the official Hypothesis client and create
these annotations by hand (and worry about their well-formedness). I mean
using hypothes.is as a carrier; Hachette itself, in addition to necessarily
being able to understand this "protocol" so it can discover available
replacement scripts from well-formed (appropriately tagged) annotations, would
also be able to create and post them.

The fact that these are just ordinary annotations that one might stumble upon
"in the wild" outside the context of Hachette is something that also has the
potential to help with project awareness.

Any design for decentralization that depends on people downloading the server
software (or writing their own compatible server) and running an instance
themselves as a precondition to hosting off-main repostory scripts is pretty
much doomed. It's a nice idea, but what happens is that by and large no one
really does that, and of the few who do, it tends to be a short-lived hobby;
people are attracted to the idea of doing so, but eventually get bored, no
longer have time to admin the instance, and then move on. This will of course
happen even under my proposed design (people abandoning, say, a tildeverse
account tied to the namespace where their contributed scripts are hosted), but
the fact that contributed scripts just live as ordinary resources on the Web
means that we can pretty easily make sure they're backed up (e.g. using the
Wayback Machine; we would also get versioning "for free" this way).

Doesn't server rely on nonfree Elasticsearch?

I'm not intimately familiar; I don't know, maybe at some point since the
Elasticsearch license changes in January there has been some statement from
Hypothesis aligning themselves with Elasticsearch and committing to it for the
future, instead of either sticking with an older, free version or moving to
the community-spawned fork. Even if that's true, though, it has low impact;
there should be no reason that anyone looking to deploy their own
infrastructure would be advised to use ElasticSearch themselves instead of
OpenSearch, just because hypothes.is is committed to the former--and again, I
don't even know that that is the case.

even if we're to use the existing infrastructure, we're not going to be
doing that forever, are we?

I dunno. Why not?

Once we reach the point we depart from Hypothesis service, are we going to
have to deploy all their big software system with many dependencies and
features we don't need?

See https://github.com/judell/MinimalHypothesisServer. (I myself am not a
fan of how complex and complicated the official client and server are.)

RE: server technology considerations - Added by colby about 2 years ago

Or do you mean we would then finally develop our own server software,
as I want to do right now?

No one can demand that you spend your time in a certain way to keep you
from doing what you want to do. I'd just recommend, as I always do, to
first do the simplest possible thing that works. From my perspective,
that entails not having to develop and manage our own infrastructure--
especially since that's a serial process, and the development part is a
blocker.

If you are committed to your existing vision, how would you feel about
writing such a "server" that is intended to be installed on a multi-user
system (like any tildeverse host) for tracking the scripts contributed
by the community there--as a microcosm (and proving ground) for what you
have in mind? Globally, it would still make things public for discovery
by Hachette under the principles described before, but this server could
be e.g. tailored towards assisting that system's users get their scripts
into their Web space on that system convieniently, provide an additional
frontend for showcasing (to the public and/or internally) the work that
that community is doing to create free alternatives of non-free scripts,
etc.

I will say that I think going with C is a bad choice for a server, for
both technical and social reasons (except perhaps for the microcosmic
use case above as an exception, where it might be considered a strength,
i.e., attractive to the types of people who operate and sign up for
tilde-style systems). Have you considered Go?

RE: server technology considerations - Added by koszko about 2 years ago

We get decentralization "for free" this
way--anyone with access to a free/cheap static web host can dump their script
there and then point others to it with an annotation on the original site.

Only script hosting would be decentralized this way and not annotation hosting.

The fact that these are just ordinary annotations that one might stumble upon
"in the wild" outside the context of Hachette is something that also has the
potential to help with project awareness.

It does... although it's a factor small enough we don't have to consider it.

Any design for decentralization that depends on people downloading the server
software (or writing their own compatible server) and running an instance
themselves as a precondition to hosting off-main repostory scripts is pretty
much doomed. It's a nice idea, but what happens is that by and large no one
really does that, and of the few who do, it tends to be a short-lived hobby;
people are attracted to the idea of doing so, but eventually get bored, no
longer have time to admin the instance, and then move on.

It seems you care about decentralization a lot. The idea of "downloading the server software and running an instance" is used in case of Debian's repos and Wikipedia - 2 projects I take inspiration from. Making off-main repository hosting that dumb simple and effortless does not yet - in my opinion - justify a bizarre design choice. We could (and should) instead provide a sandbox for users within the platform we develop. Consider it an equivalent of Ubuntu's PPA service or Wikipedia pages where beginner edits have to be approved by some more credited wikipedian but can be viewed by those who explicitly click to see them.

even if we're to use the existing infrastructure, we're not going to be
doing that forever, are we?

I dunno. Why not?

I see our expeciations of the final outcome of this project are quite different. I imagine a strong-standing platform like Wikipedia or Debian which is the one meant to be primarily used with the tool(s) we're developing. Such platform would be forkable (which is a must if we're not to violate software freedom principles), example of that in case of Debian is Ubuntu. However, the platform itself should be centralized enough to allow us to impose some rules on scripts served, most notably FSDG compliance.

If you're much into decentralization, you might dislike that idea. However, the everybody-can-easily-host idea, in case of software, leads to the problem we have with AUR, Dockerhub, as well as most language-specific package repositories (there are exceptions, though) - they mostly allow nonfree packages and also make no requirements or even recommendation towards good practices. This leads to packages with dependencies on multiple versions of the same library, unclearly indicated licensing, packages built a "dirty" way (e.g. build utilizes some bundled binary programs or libs instead of using 1st party versions of these), lack of standardized way of building from source and probably other issues that didn't come to my mind right now. Playground for users is not a bad thing, I just don't want the main repo to become a playground.

The level of control I expect is not impossible to achieve with hypothes.is annotations, once we start utilizing PGP signatures (which we should anyway at some point - that's how serious software repos work; mere TLS is not trustworthy enough). However, annotations don't help (with the issues I described), either.

Btw, I do realize js for sites is somewhat different from usual software packages, because sites change often and we need to be elastic. However, I still want to maintain some level of stability and security (probably justified given that we might at some point start writing custom js for online banking sites). So there is obviously not going to be a 2-year schedule, but some policy is to be enforced in the "official" repo.

Related topic: https://hachettebugs.koszko.org/boards/1/topics/28

If you are committed to your existing vision, how would you feel about
writing such a "server" that is intended to be installed on a multi-user
system (like any tildeverse host)

Could you elaborate? How does a tildeverse host work and how you're going to utilize it (unfortunately, looking up things like this costs me too much time, so I decided to start asking people who bring the topic instead; I hope you understand).

I will say that I think going with C is a bad choice for a server, for
both technical and social reasons (except perhaps for the microcosmic
use case above as an exception, where it might be considered a strength,
i.e., attractive to the types of people who operate and sign up for
tilde-style systems). Have you considered Go?

I haven't because I don't know it. Wait, Go is a language with build system that fetches dependencies from places like github repos, right? That kind of falls in line with your other preferences :P

As to simplicity, C is simple in some senses. It requires the minimal amount of runtime dependencies. Another plus is that C libraries are most often available from distros' repos, making it quite effortless to avoid FSDG-incompliant dependency sources. In that it beats both Erlang and Go.

RE: server technology considerations - Added by jahoti about 2 years ago

If you are committed to your existing vision, how would you feel about writing such a "server" that is intended to be installed on a multi-user system (like any tildeverse host)
Could you elaborate? How does a tildeverse host work and how you're going to utilize it (unfortunately, looking up things like this costs me too much time, so I decided to start asking people who bring the topic instead; I hope you understand).

Tildeverse hosts are more-or-less standard shared hosting, except with the goal of incubating a "maker community" rather than selling a service. They therefore offer free accounts to interested individuals, offer on-host IRC and bulletin boards, install all sorts of compilers and scripts, and so on. The idea of writing for a multi-user system would then be, I assume, that such hosts can install the program and all the hackers who use them can then publish their scripts.

Go is a language with build system that fetches dependencies from places like github repos, right? That kind of falls in line with your other preferences :P

It can do that; however, it is also strongly oriented towards creating servers and server software, which does objectively fit this application.

As to the actual topic at hand, there are really two separate points to make here.

Firstly, while I am sympathetic to the importance of decentralization, the point koszko makes in regard to maintaining primacy of the project-maintained repository is also critical to ensuring users do not have to choose between manually vetting most scripts they install and missing out on a significant proportion of available fixes. It is nonetheless an entirely social problem we have not assessed other possible solutions for; do you have any suggestions, colby ?

Secondly, I fully agree with the idea of not re-inventing the wheel and can definitely see the potential in using an already existing setup like Hypothes.is, provided we first confirm it could easily serve our needs now and into the future. The specific case of Hypothes.is nevertheless appears problematic; the official implementation is (as noted) overcomplicated, and https://github.com/judell/MinimalHypothesisServer is not released under a free license.

RE: server technology considerations - Added by koszko about 2 years ago

Go is a language with build system that fetches dependencies from places like github repos, right? That kind of falls in line with your other preferences :P

It can do that; however, it is also strongly oriented towards creating servers and server software, which does objectively fit this application.

Btw, Go, C and Erlang all share one uncommon quality - they have not fallen for the OOP fashion.
Erlang is also suited for creating servers and C... well, Apache and Nginx are themselves written in C. I can agree for Erlang but not for Go (which I dont know) make a choice ;)

RE: server technology considerations - Added by jahoti about 2 years ago

My comment about Go was probably too emphatic; it was only intended as an alternative explanation for why Go might be relevant, rather than as advice which would be very inappropriate from somebody who has almost no experience with any of those languages...

Perhaps I should learn them, given they aren't OOP-based and I am a proceduralist/functionalist :).

My choice remains the same as originally: do whatever is easiest for you, and worry about technical merit only once we have a proof of concept.

RE: server technology considerations - Added by koszko about 2 years ago

jahoti wrote:

somebody who has almost no experience with any of those languages...

Out of curiosity: what languages do you have experience with?

RE: server technology considerations - Added by jahoti about 2 years ago

Really only Python, JavaScript, and shell scripting (if that can really be called a language). I've studied theoretrically and very much love the idea of functional programming; however, I've never actually taken the time to learn such a language.

    (1-12/12)