[rb-general] SOURCE_PREFIX_MAP format specification proposals

Daniel Shahaf danielsh at apache.org
Sat Jan 14 20:01:24 CET 2017


Ximin Luo wrote on Sat, Jan 14, 2017 at 10:45:00 +0000:
> > That said, i find myself swayed by Daniel's arguments for
> > round-trippable data.
> > 
> 
> I agree that the principles expressed are very important for
> a communications protocol or encoding system, but this is not what
> SOURCE_PREFIX_MAP is. The only way it will be transmitted, is via
> .buildinfo files, which already quote all envvar values (see
> deb-buildinfo(1)).

Successful things tend to be used beyond their original scope.
Facilitating this means making it easy to reuse your thing, to compose
it and plug it with other things.  (For example, that's one reason why
CLI tools are designed to the "line-based files are the universal
interface" paradigm.)

Incidentally, this is also an argument in favour of using either URL
encoding or $PATH non-encoding in lieu of any backslashy design.

> For parsing it, in python3 it's just
> urllib.parse.parse_qsl(SOURCE_PREFIX_MAP) and in C it would be
> a simple adaptation of the escaping code that Daniel already posted
> earlier.

For what it's worth, that C code was buggy:

[[[
--- a/1.c
+++ b/1.c
@@ -32,6 +32,7 @@
             switch (*p) {
                 case '\\': p += 2; continue;
                 case ':': *p = '\0'; p++; continue;
+                default: p++; continue;
             }
         }
 
@@ -42,7 +43,7 @@
             while (*p) p++;
             /* On the last iteration, the following line will cause p to be
              * a one-past-the-end pointer — which is well-defined */
-            ++p;
+            while (!*p) ++p;
         } while (p <= END);
 
         return ret;
]]]

It also needs to strdup() the return value of getenv() for POSIX
compliance.

> The benefit over backslash-escaping, is that if producers of the
> variable "over-escape" their values by accident (e.g. they escape / to
> %2f), naively-written consumers that unescape all %nn codes will still
> read these values fine. By contrast, with backslash-escaping we'd need
> to add extra code to support "expected" escape codes like \n \f \t
> etc.

printf(3) escape sequences in SOURCE_PREFIX_MAP?  YAGNI.  The required
functionality is achieved without them, and there is plenty of precedent
for using backslash-escaping without supporting printf(3) escape
sequences.

For the backslash route, I would recommend either of:

15) Producers may backslash-escape any character, and that means that
character.

    % export SOURCE_PREFIX_MAP='foo\n:bar:'
    % printf %s "$SOURCE_PREFIX_MAP" | cut -c4
    \
    % printf %s "$SOURCE_PREFIX_MAP" | cut -c5
    n
    % parse_S_P_M
    { "foon": "bar" }

16a) Producers MUST NOT backslash-escape any character other than
backslash and the characters that serve as between-key-and-value and
between-keyvalue-pair-and-keyvalue-pair delimiters.  Consumers MUST
issue fatal errors when such an escape is encountered.

    % export SOURCE_PREFIX_MAP='foo\n:bar:'
    % parse_S_P_M
    parse_S_P_M: error: invalid escape sequence at offset 4

16b) The same as 15a, but with "Consumers MUST NOT use
$SOURCE_PREFIX_MAP values that contain such escapes and SHOULD issue
fatal errors".  (In case there's a context that can't just abort())

    % export SOURCE_PREFIX_MAP='foo\n:bar:'
    % parse_S_P_M
    parse_S_P_M: warning: invalid escape sequence at offset 4
    { }
    % parse_S_P_M -Werror
    parse_S_P_M: error: invalid escape sequence at offset 4
    % 

All that said, urlencoding would work just as well.  (Assuming we pick
delimiter characters that all URL libraries quote by default)

Daniel


More information about the rb-general mailing list