[rb-general] BUILD_PATH_PREFIX_MAP format spec, draft #1

Ximin Luo infinity0 at debian.org
Mon Jan 23 23:14:00 CET 2017


After experimenting with various test cases today:

https://github.com/infinity0/rb-prefix-map/tree/master/consume/testcases

Ximin Luo:
> [..]
> def _dequote(part):
>     subs = part.split("%")
>     # Will raise if there are <2 chars after % or if these aren't valid hex
>     return subs[0] + "".join(chr(int(sub[0:2], 16)) + sub[2:] for sub in subs[1:])
>     # In a lower-level language one would manually iterate through the string,
>     # see bottom of this email for an example.
> [..]

^this part turns out to be problematic. Specifically, languages differ in how they represent characters >127 in strings, so for example some languages will decode %ff into "ΓΏ" but then this would actually be represented as the UTF-8 byte sequence, 0xC3 0xBF either when writing the value back out, or when comparing it against external values that are read into the program.

To avoid this I'll drop the hex-encoding, and instead do a simpler mapping:

'%' -> '%%'
'=' -> '%+'
':' -> '%;'

Hopefully this is easier to type as well, those characters are on the same key as their replacements, on most keyboards anyway.

As to what happens with other %-codes, it'll be based on what's easiest to code up.

X

-- 
GPG: ed25519/56034877E1F87C35
GPG: rsa4096/1318EFAC5FBBDBCE
https://github.com/infinity0/pubkeys.git


More information about the rb-general mailing list