[rb-general] SOURCE_PREFIX_MAP format specification proposals

Daniel Shahaf danielsh at apache.org
Fri Jan 13 04:45:02 CET 2017

Ximin Luo wrote on Thu, Jan 12, 2017 at 13:40:00 +0000:
> Daniel Shahaf:
> > My opinion:
> > 
> > a) Avoid the characters 0x00 through 0x1F inclusive (note: this includes tab)
> > 
> > b) Avoid the character 0x7F (DEL)
> > 
> > This is to make things as portable as possible.  (Even through
> > screenshots, printouts, contexts that don't permit control characters
> > (like email subject lines), etc)
> > 
> > [..]
> > 
> Could you explain in some more detail what you mean by "portable" and what you think are important use-cases? A comparison table like I did for characters in the first email, would be very useful.

I mean a value that can round-trip through as many contexts as

Does it work through
    env $x /foo/bar
?  (at(1) does this)

Can it be printed and typed back in from the hardcopy?

Can it be typed in *anywhere*, even where .XCompose or $EDITOR is not
available?  (e.g., Notepad on a friend's computer, or a friend's phone)

Can I write it on a whiteboard?

Can I dictate it over the phone?

If the value is interpolated into an error message that I cannot
copy-paste, can I unambiguously recover the value?  (Many contexts
wouldn't escape [0x01..0x1F] in error messages)

Can I put it in an email subject line?  In a git log message?  In
a source code comment?

The surest way to meet all these requirements is to stick to 0x21
through 0x7E.  Conversely, using anything below 0x20 is a sure way to
break most of these criteria — even if it is copy-paste-able in many

These are largely the same criteria as in your table; the difference is
probably that I consider [0x01..0x1F] insufficiently "round-trippable"
regardless of whether they happen to copy-paste correctly in a few
situations I checked.

(That difference may be due to my background.  I'm used to security
design, where something is not considered working because it was tested
it and found to work; something is only considered working when one has
proven that it can't possibly fail under any situation within the
system's parameters.)


> > 2) Like (1a) but escape any backslash and colon in the keys and values
> > with a backslash; that is:
> > 
> > [..]
> > 
> > This format is fully general, printable non-whitespace ASCII only,
> > round-trips through everything including dead trees and screenshot, and
> > so on.  Yes, parsing this requires more code than just split(), which
> > means the consumers' code is a little longer; but in return the
> > producers' code is dead simple.  I think that'd be a good trade-off to
> > make.
> > 
> > [..]
> > 
> If this would be similar to SOURCE_DATE_EPOCH, we would have more
> consumers to convince and they would also approach it more critically,
> than producers (most of whom already like reproducible builds and
> treat it with significant priority).

I expect an encoding that's restricted to [0x21, 0x7E] make the
standard easier for people to adopt: envvars are traditionally
restricted to this subset, notwithstanding that getenv(3) return type
allos any NUL-terminated string.

My point was that SOURCE_PREFIX_MAP will have more producers than it
will have consumers.

> Anyway, I will play with your code samples in the meantime, thanks for those!

You're welcome.

More information about the rb-general mailing list