aboutsummaryrefslogtreecommitdiffstats
path: root/HOWTO
diff options
context:
space:
mode:
authorAnders Svensson <[email protected]>2014-03-19 17:57:31 +0100
committerAnders Svensson <[email protected]>2014-03-21 09:14:01 +0100
commit2aa88958de6b07f35eea5e26a65adb69619daa7b (patch)
treed5ad9d921300d60c6cf7b040f1b8ba2ba2a45376 /HOWTO
parentfdcdaca338849d7f63d4300e489318f6ee275d82 (diff)
downloadotp-2aa88958de6b07f35eea5e26a65adb69619daa7b.tar.gz
otp-2aa88958de6b07f35eea5e26a65adb69619daa7b.tar.bz2
otp-2aa88958de6b07f35eea5e26a65adb69619daa7b.zip
Adapt dictionary compilation to new default encoding
The problem is that the change in default encoding to utf8 in 17.0, in commit 00e42967, changes the behaviour of erl_parse:abstract/1, which is used by the dictionary compiler to turn terms into abstract code. In particular, it transforms the orddict representation of a parsed dictionary to contruct the return value of a dictionary module's dict/0 function. This orddict contains various lists, one of which is a list of tuples of the form {Name, Code, [VendorId], [Avp]} where Name is an ASCII string and VendorId is a non-negative integer. Using erl_parse:abstract/2 instead allows a string encoding to be specified but regardless of what encoding is used, the result of transforming our tuple might not be what we really want, which is for Name to always be represented as a string form and [VendorId] to always be represented as a cons form: the [VendorId] will always end up as a string form if the integers are small enough. The only way around this is to transform the tuple bit by bit, but modifying the code to do this is quite a lot of work, for not much gain: it would be nice to produce more readable output but nothing stops working without it. This commit restores the pre-17.0 conversion by explicilty specifying latin1 as the string encoding to erl_parse:abstract/2. The utf8 encoding broke the compilation of some dictionares since unicode strings aren't expected when writing the generated code to file. Note that the latin1 encoding does reasonably well in practice, although it mangles the Ericsson Vendor Id list [193] into a "LATIN CAPITAL LETTER A WITH ACUTE". The utf8 encoding does worse, mangling the 3GPP Vendor Id 10415 into "DESERET CAPITAL LETTER CHEE". An ascii encoding would do better than latin1 but doesn't yet exist. (Encoding isn't really what the option is. It's a string predicate: if the predicate is true then represent as a string form, otherwise a cons form.)
Diffstat (limited to 'HOWTO')
0 files changed, 0 insertions, 0 deletions