Found a peculiar situation which stranded us for quite some time.
There is a HashMap ht = {ShareStatus=True} and an ArrayList att = [ShareStatus]
When I try, ht.get(String(att.get(0))); it returns a null.
Upon close inspection, I found that the key of the Hashmap is different in ISO-8859-1 encoding and looked like this :: {ShareStatus=True}
This is because of a zero-width no-break character which is used in byte-order-marking and is used at the start of certain files.
We generally use Java's String.trim() to remove extreme whitespace characters from a text but that seems to be a moot point as the handling of whitespaces is not effective to handle the different types of spaces/breaks available in unicode encoding standard.
We might have to come up with an API to handle this or rely of available API's like apache-commons-lang
For more details on white-spaces refer the links in the mail and the ones listed below:
- whitespace? what's that? - https://spreadsheets.google.com/pub?key=pd8dAQyHbdewRsnE5x5GzKQ
- Unicode Look up - http://unicodelookup.com/#ShareStatus/1
- zero-width no-break character's properties - http://www.fileformat.info/info/unicode/char/feff/index.htm
- Unicode spaces - http://www.cs.tut.fi/~jkorpela/chars/spaces.html
- Byte-order-mark wiki link - https://en.wikipedia.org/wiki/Byte_order_mark
- Strange working of java's String.trim() - http://closingbraces.net/2008/11/11/javastringtrim/
Note:
1. This issue is almost always present in windows environment and not in linux.
2. Please suggest the following solution for the customers using powershell for removing the no-break BOM character in the output file itself.
Comments
Post a Comment