Project

General

Profile

scripts, bags and pages - redesign?

Added by koszko 6 months ago

Current schema for handling custom page resources (currently only scripts) has the concept of 3 types of objects:

  1. Scripts - A script has a unique name and text contents (or url+hash for fetching the text contents).
  2. Bags - A bag has a unique name and a list of subresources that can be other bags or scripts.
  3. Pages - A "page" has a unique pattern specifying to which URLs it applies, a whitelist boolean flag and a payload (either a script or a bag) to inject into an actual opened page.

After some time I came to think this schema is suboptimal. Before I start changing anything I'd like to gather some feedback, though. The issues I see are:

Pages are not flexible enough

Right now a "page" is basically a pattern with some settings applied to it. In practice, however, it would be useful to be able to specify multiple pattern-payload mappings for a single site. Let's assume there is a site like fansofbloat.com. This site can have different things under https://fansofbloat.com/about https://fansofbloat.com/contact and https://fansofbloat.com/user/<name-of-user> and maybe even https://<name-of-project>.fansofbloat.com/details. Now, let's assume there's some Mr Cooldev who develops libre script replacements, custom styling and enhancements for fansofbloat.com. He manages to deliver his custom stuff for all of the URLs mentioned, with the same kind of changes, although using slightly different sets of javascript libraries under different kinds of URLs. Unfortunately, because of the way pages are specified right now, Mr Cooldev's work has to be split into several page objects and a typical user who wants to enable all Mr Cooldev's fixes for fansofbloat.com has to enable several pages in the settings which is inconvenient.

Additionally, if Miss Greatskill decides to make her own fixes for all or part of fansofbloat.com, their patterns collide with those used by Mr Cooldev. Sure, nobody would be using those 2 sets of fixes simultaneously. However, it would be nice to be able to store them all in the settings, just not having them all enabled at once.

I believe we should make page object able to store multiple pattern->payload mappings. It should also have a unique name, just as script and bag have.

Having bags and scripts as separate objects is not very convenient

The current approach is flexible enough, but is a bit weird and having to pass information whether some component is a script or a bag is irritating. When we package software for a distro, the equivalence of bag would be, for example, a package of a library. Such package has its files and dependencies as completely separate objects. Also, files and packages live on different levels of abstraction.

I think we should embrace a schema similar to that one. We could have dependencies and files (be they scripts or later maybe also CSS files or some arbitrary data files) listed as 2 different properties of a bag. A payload would have to be a bag and not a script. Scripts would no longer be able to live independently of bags and would not have their own tab in the settings page.

WDYT?


Replies (18)

RE: scripts, bags and pages - redesign? - Added by jahoti 6 months ago

The second sounds perfect to me!

The first is also perfectly good, and I fully support it; while we could for logical completeness adopt a recursive structure analogous to the new bags/files system for pages/pattern->payload mappings, that would still preserve the basic system you suggest (if it isn't identical to this "modification") and could easily be added on later.

The remaining issue is the situation with conflicts like for Mr Cooldev and Miss Greatskill, for which I can see three options:

  1. Check for conflicts and then "group" colliding patterns, forcing the user to select just one. This could be difficult to implement in a UI, as mappings in different pages will have to somehow be symbolically linked for the user to make a choice between them. Also, it might lead to superposition: if Mr Cooldev only makes critical fixes and so (say) doesn't create a payload for https://fansofbloat.com/flashybuttons, while Miss Greatskill creates one unified script that fully cleans up and restores every page of the form https://fansofbloat.com/*, then anyone who installs both and selects Mr Cooldev's work is not going to get the intended effect on the flashybuttons page.
  2. Execute every payload mapped from a compatible pattern. It fixes the issues above; of course, anybody who doesn't realize they have conflicting extensions installed might get a very big surprise. There's also no chance of streamlining the settings query with this approach, and in fact it could grow more complicated.
  3. We could mark packages as conflicting in the same way Debian does. The overwhelming difficulty here might be the fact that it prevents the "superposition" descrbed above: as bad as it can be in some cases, the user could actually want it if (say) Mr Cooldev hasn't fixed https://fansofbloat.com/donate yet.

Perhaps the best approach is some combination of 1 and 3.

RE: scripts, bags and pages - redesign? - Added by koszko 6 months ago

As a response to https://hachettebugs.koszko.org/issues/38#change-111, I imagined a bag would contain a list of css files analogous to a list of scripts. This is, however, not urgent (for now CSS can also be embedded inside script) and should be easy to add later, so I concluded it should be left for now.

As to your other ideas, I was not commenting, because I have nothing to add

EDIT: Another IMHO cool idea would be to actually facilitate storing 2 versions of a script: a minified and non-minified one (and the same for minified CSS files, too!). User could then choose which version (minified or not) to inject. Btw, I noticed Debian puts javascript libraries under /usr/share/javascript/. Each script there is under both a minified and a non-minified version. Hydrilla will (by default) look for scripts under /var/lib/hydrilla/content. We can thus make symlinks there to actual Debian-supplied scripts, dropping some of the burden and getting some cleanly packaged js libraries to start with :)

RE: scripts, bags and pages - redesign? - Added by jahoti 6 months ago

As a response to https://hachettebugs.koszko.org/issues/38#change-111, I imagined a bag would contain a list of css files analogous to a list of scripts. This is, however, not urgent (for now CSS can also be embedded inside script) and should be easy to add later, so I concluded it should be left for now.

That makes sense- and indeed, it is not at all a priority.

EDIT: Another IMHO cool idea would be to actually facilitate storing 2 versions of a script: a minified and non-minified one (and the same for minified CSS files, too!). User could then choose which version (minified or not) to inject. Btw, I noticed Debian puts javascript libraries under /usr/share/javascript/. Each script there is under both a minified and a non-minified version. Hydrilla will (by default) look for scripts under /var/lib/hydrilla/content. We can thus make symlinks there to actual Debian-supplied scripts, dropping some of the burden and getting some cleanly packaged js libraries to start with :)

The idea of symlinking to Debian-supplied scripts would be really useful- I love it!

Storing minified and non-minified versions of files client-side (I assume that's what you meant?) could be interesting too; however, could you clarify where it might be used? I can understand maybe wanting a copy of the source as well as the "compiled" code, yet would have thought source maps or whatever they're called serve that purpose better. Nevertheless, if there's a reason I've missed, then that would definitely be a cool idea to implement!

RE: scripts, bags and pages - redesign? - Added by koszko 6 months ago

Storing minified and non-minified versions of files client-side (I assume that's what you meant?) could be interesting too; however, could you clarify where it might be used? I can understand maybe wanting a copy of the source as well as the "compiled" code, yet would have thought source maps or whatever they're called serve that purpose better.

I know something like source maps exists but haven't yet learned how they work. Debian packages all three: non-minified js, source maps and minified js. If source maps indeed render original code unneeded, we can instead make it possible to store the maps in a bag. Or make all 3 possible to store simultaneously.

Anyway, the point is indeed to make the user able to easily (i.e. without downloading Debian-format source package) access the original code for reference/modifications but also have the minified version for when fast loading of page is preferred.

RE: scripts, bags and pages - redesign? - Added by jahoti 6 months ago

I know something like source maps exists but haven't yet learned how they work. Debian packages all three: non-minified js, source maps and minified js. If source maps indeed render original code unneeded, we can instead make it possible to store the maps in a bag. Or make all 3 possible to store simultaneously.

Anyway, the point is indeed to make the user able to easily (i.e. without downloading Debian-format source package) access the original code for reference/modifications but also have the minified version for when fast loading of page is preferred.

They might be a better fit here then. Source maps somehow contain the original files from which a bundled and minified script is built, along with mappings from locations in the compiled script to locations in the originals. That helps makes debugging easier with supportive devtools, and IIRC the data is stored in JSON file from which we could then recover the originals for modification.

That said, it will make things more difficult for us and mean even bigger files doing it this way.

RE: scripts, bags and pages - redesign? - Added by koszko 6 months ago

That said, it will make things more difficult

Let's consider this a low-priority feature for adding later on.

and mean even bigger files doing it this way.

Well, maps don't need to be always downloaded. Only when the user requests them

RE: scripts, bags and pages - redesign? - Added by koszko 6 months ago

we could for logical completeness adopt a recursive structure analogous to the new bags/files system for pages/pattern->payload mappings

With time I came to think this is indeed a good idea.

As to the resursive structure for pages and pattern mappings, let me give an argument in favor of it in case some passersby happen to read this thread. We could have 3 fixes for each of Google Docs, Google Sheets and Google Drive. These ought to be separate - the user should be able to enable one and not the others. At the same time, it would be nice if user who actually wishes to install the whole set of related fixes could do that conveniently. If we allow these 3 fixes to be implemented in separate packages and a 4th one to depend on those, everybody's happy.

Actually, now I would opt for an even more ambitious change to the schema I proposed. Merging bags and pages into one kind of entity that has a recursive relationship. Why?

  1. Because managing 1 kind of entity is easier than managing 2 kinds.
  2. Because many js files or sets thereof are going to be site-specific, and so, many fixes would unnecessarily require creation of 2 entities.

Despite proposing it I am not 100% sure it's a good idea, so I am waiting for comments now

RE: scripts, bags and pages - redesign? - Added by jahoti 6 months ago

That's even better! If we do merge everything, however, and maybe even if we don't, the "script/bag/etc." editor might need to be shown in a separate view to the list of items; otherwise the settings will become far too unwieldy to use.

RE: scripts, bags and pages - redesign? - Added by koszko 6 months ago

That's even better!

I now got troubled as to how we're going to define re-usable payloads :/

the "script/bag/etc." editor might need to be shown in a separate view to the list of items; otherwise the settings will become far too unwieldy to use.

A pop-up dialog like the ones used when importing settings or selecting payloads?

EDIT: Perhaps with some good styling the separate view would not be necessary. Not that it would be bad, though.

RE: scripts, bags and pages - redesign? - Added by jahoti 6 months ago

I now got troubled as to how we're going to define re-usable payloads :/

We could just define them as packages that don't contain any pattern->payload mappings couldn't we? Admittedly, that would make settings_query somewhat more difficult.

the "script/bag/etc." editor might need to be shown in a separate view to the list of items; otherwise the settings will become far too unwieldy to use.
A pop-up dialog like the ones used when importing settings or selecting payloads?
EDIT: Perhaps with some good styling the separate view would not be necessary. Not that it would be bad, though.

Either of those could work; now that you mention it, however, avoiding a separate view might make it easier to apply whatever ends up being developed to the popup as well, which already has the issue of insufficient space.

RE: scripts, bags and pages - redesign? - Added by koszko 6 months ago

avoiding a separate view might make it easier to apply whatever ends up being developed to the popup as well, which already has the issue of insufficient space.

Actually, the popup window can be stretched a bit. I just happened to give it some pretty low temporary dimensions.
But anyway, I am going to do something about the popup. Maybe add tabs to it? Actually, I already slightly modified the order of popup contents as part of my (not yet committed) developments

RE: scripts, bags and pages - redesign? - Added by jahoti about 2 months ago

The new version looks great! Apart from signatures and related information, it looks to have everything we might need too.

The only aspect I don't quite understand is the UUID. Isn't the string identifier "+" dot-separated version sufficient for that purpose, or is there some particular use I'm missing that makes having a pre-defined/arbitrary UUID more suitable?

RE: scripts, bags and pages - redesign? - Added by koszko about 2 months ago

jahoti wrote:

The new version looks great! Apart from signatures and related information, it looks to have everything we might need too.

Really? I am constantly noticing shortcomings in what I prepare (just look at the number of edits...)

The only aspect I don't quite understand is the UUID. Isn't the string identifier "+" dot-separated version sufficient for that purpose, or is there some particular use I'm missing that makes having a pre-defined/arbitrary UUID more suitable?

This is purely to avoid having someone accidently create a resource with the same name as another already-existing resource. Maybe there is a better way to prevent this that I haven't realized?

EDIT
My current thoughts regrding the schema I came up with so far:

  1. We're indicating licenses of files that are served. It doesn't work well in case of resources that need to be built. We should instead indicate licenses of unbuilt sources.
  2. The above means we'd be best always conveying license information regarding the entire source of a Haketilo package, even when only one out of many of package's built files is downloaded.
  3. My current way of indicating licenses is "reinventing the wheel".
    • I looked at REUSE today and I believe we could use it to generate SPDX documents for packages and use that. It seems waaaaay clearer than both my JSON definitions and Debian-format copyright file.
    • To make it easier to integrate existing code, we could optionally allow other copyright formats to be used, just as Debian allows non-standard copyright file to be used.
  4. Python Hydrilla is currently meant to compute some values (auto license detection, etc.) and put them in JSON definitions it serves. Once we come to cryptographically sign those definitions, we'll need to have the computation happen before, on developer's machine.

RE: scripts, bags and pages - redesign? - Added by jahoti about 2 months ago

Really? I am constantly noticing shortcomings in what I prepare (just look at the number of edits...)

Admittedly, I am not very good at thinking through all the possibilities it may have to deal with :). Perhaps it would be more accurate to say the general principle looks great.

The only aspect I don't quite understand is the UUID. Isn't the string identifier "+" dot-separated version sufficient for that purpose, or is there some particular use I'm missing that makes having a pre-defined/arbitrary UUID more suitable?

This is purely to avoid having someone accidently create a resource with the same name as another already-existing resource. Maybe there is a better way to prevent this that I haven't realized?

You were right I think; the suggestion essentially amounts to computing the UUID live from other parameters, which adds another layer of difficulty.

We're indicating licenses of files that are served. It doesn't work well in case of resources that need to be built. We should instead indicate licenses of unbuilt sources.

Indeed, in the eventual case of source packages we may need both sets of licenses indicated. Does that need to be accounted for in the current protocol?

I looked at REUSE today and I believe we could use it to generate SPDX documents for packages and use that. It seems waaaaay clearer than both my JSON definitions and Debian-format copyright file.

That's a good idea actually! The REUSE tool might also be usable then once we start accepting uploads from the general public, as a form of automated checking.

RE: scripts, bags and pages - redesign? - Added by koszko about 2 months ago

The only aspect I don't quite understand is the UUID. Isn't the string identifier "+" dot-separated version sufficient for that purpose, or is there some particular use I'm missing that makes having a pre-defined/arbitrary UUID more suitable?

This is purely to avoid having someone accidently create a resource with the same name as another already-existing resource. Maybe there is a better way to prevent this that I haven't realized?

You were right I think; the suggestion essentially amounts to computing the UUID live from other parameters, which adds another layer of difficulty.

Other parameters? What do you mean? The only thing that's as unique as UUID is the identifier.

Anyway, it seems we so far agree that UUIDs are OK and may stay in the schema.

We're indicating licenses of files that are served. It doesn't work well in case of resources that need to be built. We should instead indicate licenses of unbuilt sources.

Indeed, in the eventual case of source packages we may need both sets of licenses indicated.

Both sets? I'd rather indicate the licensing of sources in all cases, regardless of whether those are then built or not. Btw, that's what Debian does :) It uses a (already known to you) copyright file to describe the licenses of source files and installs it under /usr/share/doc/<package-name>.

Does that need to be accounted for in the current protocol?

I initially thought we could make the changes later. But now as I see so many shortcomings in the current protocol and actually have an idea how it should look like, I think it'd be better to make it good from the beginning. We'll be adding things to it in the future, of course, but many of the things we plan could be added without breaking backward compatibility if we make the initial protocol sane...

I looked at REUSE today and I believe we could use it to generate SPDX documents for packages and use that. It seems waaaaay clearer than both my JSON definitions and Debian-format copyright file.

That's a good idea actually! The REUSE tool might also be usable then once we start accepting uploads from the general public, as a form of automated checking.

Indeed. Does the REUSE-generated spdx report contain the actual license texts? If not, we'd take from each package a REUSE-style LICENSES/ directory together with the spdx report (and maybe also the DEP5 file if it proves needed and is present) and distribute that as the copyright info of each package. We'd then modify Haketilo to store files by their sha256 hashes so that even when many packages use GPLv3, it only gets downloaded and stored once

RE: scripts, bags and pages - redesign? - Added by jahoti about 2 months ago

Does the REUSE-generated spdx report contain the actual license texts?

Only for licenses not in the official list; however, it shouldn't be too hard to patch the relevant code (https://git.fsfe.org/reuse/tool/src/branch/master/src/reuse/report.py#L166) for that purpose.

As for everything else, it appears there are many things I am either confused about or didn't know. I will have to look further into this!

RE: scripts, bags and pages - redesign? - Added by koszko about 2 months ago

Only for licenses not in the official list; however, it shouldn't be too hard to patch the relevant code (https://git.fsfe.org/reuse/tool/src/branch/master/src/reuse/report.py#L166) for that purpose.

Let's not... After the policy smuggling methods we tried, I think we've already used up our limit for ugly hacks...
This will do. In most cases the licenses will belong to those from the list hence we won't be wasting space for them anyway :)

EDIT

As for everything else, it appears there are many things I am either confused about or didn't know. I will have to look further into this!

Not that I knew about everything from the beginning. I first looked into REUSE specification a few days ago. I also first read an IndexedDB API guide this week

    (1-18/18)