This document adds MSL to UNICODE translations to the HP PCL/PJL PCL 5 Comparison guide Appendix D Table D-1. The RedTitan equivalent table index is shown on the right.
The Comparison Guide goes on to define a subset of character collections Table D-4 (right) indexed by Unicode.
All characters referenced in table D-1 are now defined in the RedTitan MSL/UNICODE master database.
Page references are tables that can be visually compared with the graphics in the PCL Comparison Guide. i.e. so that MSL numbers can be identified.
The character names are based on the HP PCL Comparison Guide, Xerox documentation or the Unicode.org name.
The RedTitan MSL/UNICODE master database defines about 619 MSL numbers as follows...
0..6, 8..64, 66..97 ,99..303, 305..333, 335, 338..342, 400..407, 410..411, 414..423, 428..429, 432..435, 438..463, 466..471, 474..487, 500..615, 617..669, 1000..1006, 1017..1021, 1023,1028, 1030..1031, 1034, 1036, 1040..1045, 1047, 1060..1065, 1067..1069, 1084..1116
Character graphics in the following tables are based on the font Arial MS UNICODE. This font contains all the required characters except those flagged *** NG (No Graphic) in the database. Currently only two characters fall into this category - U+2384 Composition symbol and U+03DD Greek small letter Digamma.
U+FFFF is used as the "Not Unicode" character. The character name is flagged *** NU
|
|
|
|
|
Table D-4 Unicode index |
Ascii | Bit 31 |
Latin 1i | Bit 30 |
Latin 2 | Bit 29 |
Latin 5 | Bit 28 |
Latin 6 | Bit 20 |
Desktop publishing | Bit 27 |
Accents | Bit 26 |
PCL | Bit 25 |
Mac | Bit 24 |
postscript | Bit 23 |
|
Table D-1 MSL indexed |
Collection flag | Page reference |
Basic Latin (bit 63) | 294, 295, 296, 297, 298, 299, 300
|
East European (Bit 62) | 301
|
Turkish (Bit 61) | 302 |
Baltic/Nordic/Latin-6 (Bit 60) | 303,304 |
Math (Bit 34) | 305,306,307,308 |
Semi-Graphics (Bit 33) | 309,310,311 |
|
PCL MSL numbers in the data base that have no Unicode equivalent. |
- 0317 uppercase acute (Xerox EFF9)
- 0318 uppercase grave(Xerox EFF8)
- 0319 uppercase circumflex (Xerox EFF7)
- 0320 uppercase dieresis (Xerox EFF6)
- 0321 uppercase tilde (Xerox EFF5)
- 0322 uppercase caron (Xerox EFF4)
- 0323 uppercase ring above (Xerox EFF3)
- 0330 uppercase cedilla (Xerox EFF2)
- 1030 uppercase ogonek (Xerox EFF2)
- 1045 uppercase double acute (Xerox EFF0)
- 1085 uppercase macron (Xerox EFEF)
- 1087 uppercase breve (Xerox EFEE)
- 1089 uppercase overdot (Xerox EFED)
- 0661 Large solid box (XEROX EFFB)
- 0559 Vector Symbol
- 0560 Overline comp.
- 0520 Underline, Composite
- 0600 Top Left brace
- 0601 Middle Left brace
- 0602 Bottom Left brace
- 0603 Middle Curve Integral
- 0604 Top Left Summation
- 0605 Dbl Vert Line Composite (Arrows)
- 0606 Bottom Left Summation
- 0607 Bottom Diagonal Summation
- 0610 Top Right brace
- 0611 Middle Right brace
|
- 0612 Bottom brace
- 0613 Thick Vert Line, Composite
- 0614 Thin Vert Line, Composite
- 0615 Bottom Radical, Vert
- 0616 Top Right Summation, Composite
- 0617 Middle Summation
- 0618 Bottom Right Summation
- 0619 Top Diagonal Summation
- 0623 Mask Symbol, Sup
- 0640 Power Set Symbol
- 0643 Left double Bracket
- 0644 Middle double Bracket
- 0645 Right double Bracket
- 0648 Ext Lrg Union/Product
- 0649 Bottom Lrg Union
- 0650 Top Large Intersection
- 0651 Top Left double Bracket
- 0652 Bottom Left double Bracket
- 0657 Bottom Lrg Bott Product
- 0658 Top Large Top Product
- 0659 Top Right double Bracket
- 0660 Bottom Right double Bracket
- 0665 Horz. Arrow Ext
- 0666 Dbl Horz. Arrow Ex
- 0667 Complement of #617
- 1116 Visible End-of-File
|
These unkown in UNICODE characters fall into four main categories.
- UNICODE has no concept of uppercase floating accents. Unknown characters number 1 to 13 do not exist in UNICODE. Xerox were obviously moved to create fonts with these characters defined in the Unicode PRIVATE USE plane. In principle, these characters could be made from combining diacritics. The notional lower case floating accent do exist. See Basic Latin page 296
- Except for box drawing, the world has moved away from gluing several small characters together to make one large one; and Unicode does not try (except strangely for the integral sign). A large number of the unknowns are just character fragments. This rather messes up Math (bit 34) page 307
- Should be in UNICODE but are not (or at least I can't find 'em).
- The pathological.
MSL conversion requiring more research
There are a few characters that I should not have put into the database. (they should be flagged unknown)
- MSL 1100 ® registered mark is treated as the same as 1103. It should be a serif version. (shouldn't this be a font style issue?)
- MSL 1102 ™ trade mark is treated as the same as 1105. It should be a serif version
- MSL 1101 © copyright mark is treated as the same as 1104. It should be a serif version.
- 0216 U+266B beamed eighth notes does not look like the HP graphic.
- The round corners 646,647,655,656 on page 307 won't join up!
- The bullets (both round and square) need some thought.
MSL to UNICODE conversion tables
|