[rb-general] BUILD_PATH_PREFIX_MAP code examples and test cases
infinity0 at debian.org
Mon Feb 13 18:39:00 CET 2017
> Sorry about the delay replying.
> I think the original proposal's %+ %; %% were OK. I think the
> problems with %% are overblown. But to satisfy those who think %% is
> a problem, and avoid encodings which expose Unix shell metacharacters
> (and avoid adding /s), I suggest
> = => %+ (mnemonic: same key on many keyboards)
> : => %. (mnemonic: visually similar)
> % => %# (weak mnemonic: both are quite full character cells)
> (The other characters which meet all the nice-to-haves are ^ @ , ~
> and are, I think, less memorable. @ is a metacharacter in Perl
> ""-strings, too, and "," seems a poor choice.)
> These have the following good properties:
> * If filenames do not contain = : % then no encoding is needed.
> * If a filename can be written unquoted in Unix shell, so can its
> * Decoding does not involve resolving the semi-ambiguity of `%%'.
> * Word-breaking algorithms based on [A-Za-z0-9]+ [-A-Za-z0-9]+
> [_A-Za-z0-9]+ will treat encoding of punctuation as punctuation.
>> This isn't meant to be a generic communications encoding, I don't know why anyone would do what you're suggesting, and the % character (or any other reasonable character we could use) would already mess with "word boundary" algorithms.
> I think you have misunderstood my objection to %p %e %c.
> Suppose I have a PREFIX_MAP value mentions a directory
> "blork=wombat.d", which is encoded as "blork%ewombat.d". Such
> filenames are not likely to occur other than as formulaic compositions
> by a build system, or similar, so it is likely that "wombat" is an
> interesting token.
> The encoging "blork%ewombat.d" is suboptimal because it "looks" like
> it was made out of "ewombat", rather than "wombat". Examples where
> this might be annoying:
> $ less +/'\bwombat\b' build.log # misses the mention in PREFIX_MAP
> $ printenv | grep '\bwombat\b' # misses the mention in PREFIX_MAP
> double-click on wombat in an xterm selects "ewombat", not "wombat"
> Of course more formal setups would probably not make the assumption
> which "ewombat" violates. But I think we would prefer to avoid
> misleading users who type informal and ad-hoc shell runes, and to
> avoid breaking their finger macros, etc.
> Encoding punctuation characters as somewhat different punctuation
> characters avoids this problem. In my suggestion you end up with
> "blork%+wombat.d", which matches \bwombat\b
> Of course it doesn't match \bblork=wombat\b but someone who is typing
> that is will hopefully suspect that the = will cause trouble and
> search for \bblork.*wombat\b or something - perhaps even \bwombat\b
> Sorry for perpetuating this bike shed conversation.
Thanks for the very detailed reply and explanations! I actually realised my "already mess with" comment was wrong right after I sent that email, but didn't have anything else to say at the time so I just left it.
I think I generally agree with this, and I wasn't *too* pleased with the %pec stuff myself, but I thought picking another symbol would be "too random". I think % -> %# is fine though. I also had another idea:
% -> %@
= -> %-
: -> %.
The % sign could be thought of as "doubling" the next character, with @ used instead of 0/o to avoid "word" characters. Anyway, this is very easy to change in the code so I'll wait for a few more days in case anyone else wants to comment, after which I'll pick one of these schemes at random.
More information about the rb-general