[rb-general] SOURCE_PREFIX_MAP format specification proposals
Daniel Shahaf
danielsh at apache.org
Fri Jan 13 04:45:02 CET 2017
Ximin Luo wrote on Thu, Jan 12, 2017 at 13:40:00 +0000:
> Daniel Shahaf:
> > My opinion:
> >
> > a) Avoid the characters 0x00 through 0x1F inclusive (note: this includes tab)
> >
> > b) Avoid the character 0x7F (DEL)
> >
> > This is to make things as portable as possible. (Even through
> > screenshots, printouts, contexts that don't permit control characters
> > (like email subject lines), etc)
> >
> > [..]
> >
>
> Could you explain in some more detail what you mean by "portable" and what you think are important use-cases? A comparison table like I did for characters in the first email, would be very useful.
I mean a value that can round-trip through as many contexts as
possible.
Does it work through
x=`env`;
⋮
env $x /foo/bar
? (at(1) does this)
Can it be printed and typed back in from the hardcopy?
Can it be typed in *anywhere*, even where .XCompose or $EDITOR is not
available? (e.g., Notepad on a friend's computer, or a friend's phone)
Can I write it on a whiteboard?
Can I dictate it over the phone?
If the value is interpolated into an error message that I cannot
copy-paste, can I unambiguously recover the value? (Many contexts
wouldn't escape [0x01..0x1F] in error messages)
Can I put it in an email subject line? In a git log message? In
a source code comment?
The surest way to meet all these requirements is to stick to 0x21
through 0x7E. Conversely, using anything below 0x20 is a sure way to
break most of these criteria — even if it is copy-paste-able in many
contexts.
These are largely the same criteria as in your table; the difference is
probably that I consider [0x01..0x1F] insufficiently "round-trippable"
regardless of whether they happen to copy-paste correctly in a few
situations I checked.
(That difference may be due to my background. I'm used to security
design, where something is not considered working because it was tested
it and found to work; something is only considered working when one has
proven that it can't possibly fail under any situation within the
system's parameters.)
>
> > 2) Like (1a) but escape any backslash and colon in the keys and values
> > with a backslash; that is:
> >
> > [..]
> >
> > This format is fully general, printable non-whitespace ASCII only,
> > round-trips through everything including dead trees and screenshot, and
> > so on. Yes, parsing this requires more code than just split(), which
> > means the consumers' code is a little longer; but in return the
> > producers' code is dead simple. I think that'd be a good trade-off to
> > make.
> >
> > [..]
> >
>
> If this would be similar to SOURCE_DATE_EPOCH, we would have more
> consumers to convince and they would also approach it more critically,
> than producers (most of whom already like reproducible builds and
> treat it with significant priority).
I expect an encoding that's restricted to [0x21, 0x7E] make the
standard easier for people to adopt: envvars are traditionally
restricted to this subset, notwithstanding that getenv(3) return type
allos any NUL-terminated string.
My point was that SOURCE_PREFIX_MAP will have more producers than it
will have consumers.
> Anyway, I will play with your code samples in the meantime, thanks for those!
You're welcome.
More information about the rb-general
mailing list