[rb-general] BUILD_PATH_PREFIX_MAP format spec, draft #1
Ximin Luo
infinity0 at debian.org
Sat Jan 21 21:49:00 CET 2017
Daniel Shahaf:
> [..]
>
> "Postprocess the value" sounds like you mean:
>
> BUILD_PATH_PREFIX_MAP=$(printf %s "$BUILD_PATH_PREFIX_MAP" | base64 -e)
>
> [..]
Envvars never get transmitted directly between two systems like this. What I meant was that they should invent their own way of communicating arbitrary envvars, that can handle arbitrary data inside the names or values.
> [..] It would be better for data that is not encodeable in the
> envvar's value is to be transmitted out-of-band and the envvar
> reconstructed to a conforming value by the recipient.
I don't understand what you mean by "data that is not encodeable in the envvar's value" nor what you mean by "transmitted out-of-band". Environment variable values don't get "transmitted" anywhere, you have to expressly read the value and turn it into a string. In which case you are not transmitting an envvar but a string. Then you should "postprocess" this string (to reuse my earlier terminology) if you expect it contains characters that aren't suited to your transmission protocol or your recipient.
>> You SHOULD NOT hexencode any additional characters in the _enquote() step, or
>> anything equivalent to this. Although decoders can process this correctly, this
>> is only meant to simplify implementations of the decode algorithm and not as a
>> general data-encoding mechanism. In particular, this only works if T is "bytes"
>> on the transmitter side. For other T types, you would need some extra encoding
>> step similar to the previous paragraph *anyways* and augmenting _enquote would
>> split your logic across two places - not clean.
>
> I'm just lost here. It sounds like it'd be a lot easier to specify that
> the platform must specify an encoding of T to bytes — Windows, for
> example, could specify UTF-16BE (since big endian is the network byte
> order) — and then the decoding process is two-stepped: first decode
> %-encoding to get a sequence of bytes, then decode that sequence into
> a sequence of T. (The "outer" encoding/decoding would be a no-op when
> T == bytes.) [..]
This is basically what I said in the paragraph above ("postprocess" and "reverse") except that I'm leaving the exact method open for the future because it really is a separate concern and we don't need to finalise that at the moment.
>
>> On the other hand, if you expect that your paths do *not* contain such
>> characters, e.g. if they only contain printable ASCII characters, then you
>> could transmit the value of BUILD_PATH_PREFIX_MAP as-is.
>>
>> Rejected options
>> ================
>>
>> - Any variant of backslash-escape, because it is annoying to implement in
>> higher-level languages. Backslash-escape is an encoding that is optimised for
>> being typed manually by humans, but I don't expect that will be a major
>> use-case for this encoding.
>
> Both of these are your subjective opinions, not objective properties of
> backslash escaping.
>
> Regarding interactive use, I'd say that backslash encoding is superior
> URL-encoding since it doesn't involve looking up byte values, but that
> neither of them is the holy grail of UX.
>
> Regarding ease of implementation...
>
> def decode(s):
> "Decode a $PATH-with-backslash-escaping-encoded value into a list"
> return \
> "".join(
> '\0' if x == ':' else x[-1]
> for x in re.compile(r'[\\]?.').findall(s)
> ).split('\0')
>
Sure, but I didn't want to make this dependent on regex either, since every language does those very slightly differently. It makes it more time-consuming to verify that these are exactly following a spec including handling all the error conditions, and that it's behaving the same way as another implementation in a different language.
split(":") without having to worry about a backslash before it, is an operation that is really in every language and implemented obviously exactly the same. It takes probably <100 the amount of time to glance over and understand, than a regexp or a for loop.
All of these factors are objective differences between urlencode and backslash-escape. I shortened them all to "annoying" because I'd really like to close this topic ASAP and bury the bikeshed.
X
--
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git
More information about the rb-general
mailing list