[rb-general] SOURCE_PREFIX_MAP format specification proposals

Ximin Luo infinity0 at debian.org
Sat Jan 14 11:45:00 CET 2017


Daniel Kahn Gillmor:
> On Thu 2017-01-12 22:45:02 -0500, Daniel Shahaf wrote:
>> I expect an encoding that's restricted to [0x21, 0x7E] make the
>> standard easier for people to adopt: envvars are traditionally
>> restricted to this subset, notwithstanding that getenv(3) return type
>> allos any NUL-terminated string.
>>
>> My point was that SOURCE_PREFIX_MAP will have more producers than it
>> will have consumers.
> 
> the deployment challenge here is that we need the consumers to be
> willing to adopt it first; without any consumers, there will be no
> producers.
> 
> That said, i find myself swayed by Daniel's arguments for
> round-trippable data.
> 

I agree that the principles expressed are very important for a communications protocol or encoding system, but this is not what SOURCE_PREFIX_MAP is. The only way it will be transmitted, is via .buildinfo files, which already quote all envvar values (see deb-buildinfo(1)).

But sure, it's a "nice to have".

> Daniel's 1a ("key1:value1:key2:value2" on non-windows,
> "key1;value1;key2;value2" on windows) seems the simplest to me.  People
> already know how to parse $PATH, and this is just that with an even
> number of values.
> 
> If people complain about colons, it's the same problem they're already
> used to dealing with with $PATH, right?
> 

PATH does not escape colons, it just disallows paths-with-colons in PATH.

I had another idea last night which is to use urlencoding, so it would look like this:

SOURCE_PREFIX_MAP=/a/b%3Dyyy=ERROR&/a=lol&/b=foo&/a/b%3Dyyy=secret

The minor difference is that application/x-www-url-encoded requires all non-alphanum to be escaped, but we would only require %=;& to be escaped, which is easier to write in a Makefile. (This is for appending to the variable)

For parsing it, in python3 it's just urllib.parse.parse_qsl(SOURCE_PREFIX_MAP) and in C it would be a simple adaptation of the escaping code that Daniel already posted earlier.

The benefit over backslash-escaping, is that if producers of the variable "over-escape" their values by accident (e.g. they escape / to %2f), naively-written consumers that unescape all %nn codes will still read these values fine. By contrast, with backslash-escaping we'd need to add extra code to support "expected" escape codes like \n \f \t etc.

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git


More information about the rb-general mailing list