[rb-general] SOURCE_PREFIX_MAP format specification proposals
infinity0 at debian.org
Sat Jan 14 11:45:00 CET 2017
Daniel Kahn Gillmor:
> On Thu 2017-01-12 22:45:02 -0500, Daniel Shahaf wrote:
>> I expect an encoding that's restricted to [0x21, 0x7E] make the
>> standard easier for people to adopt: envvars are traditionally
>> restricted to this subset, notwithstanding that getenv(3) return type
>> allos any NUL-terminated string.
>> My point was that SOURCE_PREFIX_MAP will have more producers than it
>> will have consumers.
> the deployment challenge here is that we need the consumers to be
> willing to adopt it first; without any consumers, there will be no
> That said, i find myself swayed by Daniel's arguments for
> round-trippable data.
I agree that the principles expressed are very important for a communications protocol or encoding system, but this is not what SOURCE_PREFIX_MAP is. The only way it will be transmitted, is via .buildinfo files, which already quote all envvar values (see deb-buildinfo(1)).
But sure, it's a "nice to have".
> Daniel's 1a ("key1:value1:key2:value2" on non-windows,
> "key1;value1;key2;value2" on windows) seems the simplest to me. People
> already know how to parse $PATH, and this is just that with an even
> number of values.
> If people complain about colons, it's the same problem they're already
> used to dealing with with $PATH, right?
PATH does not escape colons, it just disallows paths-with-colons in PATH.
I had another idea last night which is to use urlencoding, so it would look like this:
The minor difference is that application/x-www-url-encoded requires all non-alphanum to be escaped, but we would only require %=;& to be escaped, which is easier to write in a Makefile. (This is for appending to the variable)
For parsing it, in python3 it's just urllib.parse.parse_qsl(SOURCE_PREFIX_MAP) and in C it would be a simple adaptation of the escaping code that Daniel already posted earlier.
The benefit over backslash-escaping, is that if producers of the variable "over-escape" their values by accident (e.g. they escape / to %2f), naively-written consumers that unescape all %nn codes will still read these values fine. By contrast, with backslash-escaping we'd need to add extra code to support "expected" escape codes like \n \f \t etc.
More information about the rb-general