diff options
author | Anders Svensson <[email protected]> | 2014-03-19 17:57:31 +0100 |
---|---|---|
committer | Anders Svensson <[email protected]> | 2014-03-21 09:14:01 +0100 |
commit | 2aa88958de6b07f35eea5e26a65adb69619daa7b (patch) | |
tree | d5ad9d921300d60c6cf7b040f1b8ba2ba2a45376 /HOWTO | |
parent | fdcdaca338849d7f63d4300e489318f6ee275d82 (diff) | |
download | otp-2aa88958de6b07f35eea5e26a65adb69619daa7b.tar.gz otp-2aa88958de6b07f35eea5e26a65adb69619daa7b.tar.bz2 otp-2aa88958de6b07f35eea5e26a65adb69619daa7b.zip |
Adapt dictionary compilation to new default encoding
The problem is that the change in default encoding to utf8 in 17.0, in
commit 00e42967, changes the behaviour of erl_parse:abstract/1, which is
used by the dictionary compiler to turn terms into abstract code. In
particular, it transforms the orddict representation of a parsed
dictionary to contruct the return value of a dictionary module's dict/0
function. This orddict contains various lists, one of which is a list of
tuples of the form
{Name, Code, [VendorId], [Avp]}
where Name is an ASCII string and VendorId is a non-negative integer.
Using erl_parse:abstract/2 instead allows a string encoding to be
specified but regardless of what encoding is used, the result of
transforming our tuple might not be what we really want, which is for
Name to always be represented as a string form and [VendorId] to always
be represented as a cons form: the [VendorId] will always end up as a
string form if the integers are small enough. The only way around this
is to transform the tuple bit by bit, but modifying the code to do this
is quite a lot of work, for not much gain: it would be nice to produce
more readable output but nothing stops working without it.
This commit restores the pre-17.0 conversion by explicilty specifying
latin1 as the string encoding to erl_parse:abstract/2. The utf8 encoding
broke the compilation of some dictionares since unicode strings aren't
expected when writing the generated code to file.
Note that the latin1 encoding does reasonably well in practice, although
it mangles the Ericsson Vendor Id list [193] into a "LATIN CAPITAL
LETTER A WITH ACUTE". The utf8 encoding does worse, mangling the 3GPP
Vendor Id 10415 into "DESERET CAPITAL LETTER CHEE". An ascii encoding
would do better than latin1 but doesn't yet exist. (Encoding isn't
really what the option is. It's a string predicate: if the predicate is
true then represent as a string form, otherwise a cons form.)
Diffstat (limited to 'HOWTO')
0 files changed, 0 insertions, 0 deletions