otp.git - Mirror of Erlang/OTP repository.

diff options

author	Anders Svensson <[email protected]>	2014-03-19 17:57:31 +0100
committer	Anders Svensson <[email protected]>	2014-03-21 09:14:01 +0100
commit	2aa88958de6b07f35eea5e26a65adb69619daa7b (patch)
tree	d5ad9d921300d60c6cf7b040f1b8ba2ba2a45376 /system
parent	fdcdaca338849d7f63d4300e489318f6ee275d82 (diff)
download	otp-2aa88958de6b07f35eea5e26a65adb69619daa7b.tar.gz otp-2aa88958de6b07f35eea5e26a65adb69619daa7b.tar.bz2 otp-2aa88958de6b07f35eea5e26a65adb69619daa7b.zip

Adapt dictionary compilation to new default encoding

The problem is that the change in default encoding to utf8 in 17.0, in commit 00e42967, changes the behaviour of erl_parse:abstract/1, which is used by the dictionary compiler to turn terms into abstract code. In particular, it transforms the orddict representation of a parsed dictionary to contruct the return value of a dictionary module's dict/0 function. This orddict contains various lists, one of which is a list of tuples of the form {Name, Code, [VendorId], [Avp]} where Name is an ASCII string and VendorId is a non-negative integer. Using erl_parse:abstract/2 instead allows a string encoding to be specified but regardless of what encoding is used, the result of transforming our tuple might not be what we really want, which is for Name to always be represented as a string form and [VendorId] to always be represented as a cons form: the [VendorId] will always end up as a string form if the integers are small enough. The only way around this is to transform the tuple bit by bit, but modifying the code to do this is quite a lot of work, for not much gain: it would be nice to produce more readable output but nothing stops working without it. This commit restores the pre-17.0 conversion by explicilty specifying latin1 as the string encoding to erl_parse:abstract/2. The utf8 encoding broke the compilation of some dictionares since unicode strings aren't expected when writing the generated code to file. Note that the latin1 encoding does reasonably well in practice, although it mangles the Ericsson Vendor Id list [193] into a "LATIN CAPITAL LETTER A WITH ACUTE". The utf8 encoding does worse, mangling the 3GPP Vendor Id 10415 into "DESERET CAPITAL LETTER CHEE". An ascii encoding would do better than latin1 but doesn't yet exist. (Encoding isn't really what the option is. It's a string predicate: if the predicate is true then represent as a string form, otherwise a cons form.)

Diffstat (limited to 'system')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: