hidden object results from WMF import

Bug #942050 reported by David Mathog
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Inkscape
Fix Released
Low
David Mathog

Bug Description

Inkscape 0.48+devel Windows XP SP3

A WMF file generated by ACD Chemsketch contains the character delta in two positions, in the strings <delta>+ and <delta>-. When this is imported into inkscape the + and - remain, but the delta goes into a very odd state. This is illustrated in the SVG which is attached, which has been edited down from the full WMF. The "delta" is not visible on the screen. It may be selected with "select all", but it will not be selected with the arrow tool.

The SVG lines that correspond to this delta look like this (and I have no idea what those characters are supposed to be)

       transform="scale(0.99925522,1.0007453)"></text>

I understand that the primary character map issue may be in the (external?) wmf import routines. However, there needs to be some sanity checking so that invisible, unselectable characters are not created.

Requested change: imported characters which do not map to printable characters (which I think is what is going on here) should be replaced with a red "?" character. This is no more "wrong" than what we have now, and at least with the red question mark it would be obvious there is a problem, whereas with the current behavior, on a complex page, it is hard to see when a couple of characters have disappeared.

Related branches

Revision history for this message
David Mathog (mathog) wrote :
Revision history for this message
David Mathog (mathog) wrote :

Note, the delta's are visible in Window's Preview.

Revision history for this message
David Mathog (mathog) wrote :

Note the two delta characters near the center.

Revision history for this message
David Mathog (mathog) wrote :

Just to clarify. When I say the "delta may be selected" I mean the point object corresponding to the delta may be selected. It just shows up as a tiny square, there is no indication that it is actually a character, let alone a "delta".

Revision history for this message
David Mathog (mathog) wrote :

Also, in the XML editor, the "text" for objects text124 and text128 are shown as 4 little grey dots within a black rectangle. I could not determine what those grey dots were (characters?) even on inspecting them with a magnifying glass. They looked sort of like they might be the letters "F Q 4 4", but I would not bet a nickel that is what they actually are.

Revision history for this message
su_v (suv-lp) wrote :

Did you import the WMF file via internal routines or via UniConvertor?

The file type differs (in 'File > Open…'):
"Windows Metafiles (*.wmf)" -> internal
"Windows Metafile (*.wmf)" -> UniConvertor

If you double-clicked the file in the Explorer, the one listed first (in the drop-down list of the 'File > Open…' dialog) gets used.

tags: added: importing wmf
Revision history for this message
David Mathog (mathog) wrote :

Just selected the file from the file browser using the default file type of "all inkscape files".

 Until you pointed it out just now I never even noticed that there were two wmf options in the "files of type" pull down list. The first one behaves as I described in post 1. The second wmf selection in the "files of type" list drops all of the text. After import there are a bunch of "path####" objects but no "text####" objects.

su_v (suv-lp)
tags: added: win32
Revision history for this message
Alvin Penner (apenner) wrote :

attached is the svg result I get with Windows XP and Inkscape rev 11012 and using the file type "Windows Metafiles", which is the internal version, not Uniconvertor.
The rendering appears fairly normal as far as I can tell. The deltas look better in IE8 anmd IE9 than they do in Inkscape, for reasons I do not understand, but in any event they are present.
- can you indicate which rev of Inkscape you are using. Changes were made in rev 10045 which would affect this behaviour

Revision history for this message
Alvin Penner (apenner) wrote :

something very strange happened, the file size of the file I just uploaded is much smaller than the original file on my hard drive, I am uploading it again just for fun. I'm changing the filetype to xml for fun.

Revision history for this message
Alvin Penner (apenner) wrote :

yes, that is the right size, rename that to svg before viewing it

Revision history for this message
Alvin Penner (apenner) wrote :

sorry, my mistake, here is the file I had originally intended to upload in the first place

Revision history for this message
Alvin Penner (apenner) wrote :

you can get the Inkscape rev from Help->About Inkscape. It will say something like
Inkscape 0.48 + devel r11020

or you can get it from DOS, which will give you a message like:

C:\Program Files\Inkscape>inkscape -V
Inkscape 0.48.2 r9819 (Aug 14 2011)

as for devlibs, there have been some recent changes, so it would be worthwhile to check out rev 29
from
https://code.launchpad.net/~inkscape.dev/inkscape-devlibs/trunk

Revision history for this message
David Mathog (mathog) wrote :

Just completed a clean build from today's (2/29/2012) devlib and inkscape (trunk). As before, the test WMF (from the 2nd post, above) imports incorrectly, with the two delta's becoming invisible characters.

Revision history for this message
Alvin Penner (apenner) wrote :

could you attach the svg file you get from this build?

Revision history for this message
David Mathog (mathog) wrote :

Here it is, edited down to just the two bad characters and 4 good ones.

Revision history for this message
David Mathog (mathog) wrote :

This is a screen shot of the same as it appears in Inkscape with all objects selected. The hidden characters are represented by the small squares at upper right of the O and upper left (and over a bit) of the N.

Revision history for this message
Alvin Penner (apenner) wrote :

you may need to check to see whether you have the font Symbol installed on your machine, since this svg file uses it. Attached is a screenshot of what this svg file looks like in IE8.

Revision history for this message
Alvin Penner (apenner) wrote :

attached here is a screenshot of what this file looks like in Inkscape rev 11030. Note that the deltas are rendered in both cases, although the appearance is somewhat different in the two cases.

Note also that the original svg file submitted in comment 1 above, also contains both deltas as well, using the Symbol font.

Revision history for this message
David Mathog (mathog) wrote :

My version of Inkscape is built without ghostscript support. The delta's you show in the inkscape example are not symbol font, they are coming from somewhere else, maybe ghostscript?

Symbol font is definitely installed on my machine. It looks to me like the invisible delta's are Unicode encoded, which maybe my
symbol font does not support. delta's in symbol should be the ascii letter "d".

Here is a modified example SVG with the two invisible delta's and the word "delta" in symbol font. It looks like:

1. inkscape: delta's invisible, "delta" in greek
2. exported to emf - use "PREVIEW": two delta's visible, "delta" in greek
3. exported to pdf - use pdf viewer: two delta's invisible, "delta" in greek
4. open the svg in seamonkey: two delta's visible, "delta" in english

Even stranger, if the emf from 2 is imported then something else happens. The XML still shows the two delta's and one "delta" string, but "select all" will not select the first two, which it did in the original SVG. The same thing happens if I save the file as "plain svg" and then open it again.

Revision history for this message
David Mathog (mathog) wrote :
Revision history for this message
David Mathog (mathog) wrote :
Revision history for this message
David Mathog (mathog) wrote :
Revision history for this message
David Mathog (mathog) wrote :
Revision history for this message
David Mathog (mathog) wrote :

This is getting stranger and stranger. Open the SVG from #19 in Inkscape 0.48.2 r9819 and the "delta" is in English. Moreover
I just discovered that control-U 03B4 does not insert a visible lower case delta character in the trunk version built yesterday, while it does in the older version just cited. However, I can cut and paste an arial unicode delta created in the older version into the newer version and it will be visible.

Revision history for this message
David Mathog (mathog) wrote :

Checked the version built from trunk downloaded on 1/23/12 and the ^U mechanism is working correctly there. (However
the hidden delta on import from WMF problem is still present.) Note, both of the trunk versions are using the dll's from devlib downloaded 2/29/12, so the difference seems to be in inkscape and not devlib.

Is there a known problem with the ^U mechanism recently in the trunk?

Revision history for this message
Alvin Penner (apenner) wrote :

> My version of Inkscape is built without ghostscript support

I would recommend building Inkscape the normal way. This may also explain why your Inkscape rev number was not showing up properly.

Revision history for this message
David Mathog (mathog) wrote :

This was built the normal way, as far as I can tell. Code was downloaded into two directories with:

bzr checkout lp:inkscape-devlibs devlibs
bzr checkout lp:inkscape trunk

respectively. Then:

//modify mingwenv to pick up the right path to devlibs
mingwenv
g++ buildtool.cpp -o btool
btool

The only modification to build.xml (later) was to add "-lwinspool" to support a local modification of the emf code that gets it to use a printer instead of the screen for the reference device (higher resolution). When I run it ghostscript is omitted in the .bat

% cat run_inkscape.bat
@echo Setting environment variables for MinGw build of Inkscape
IF "%DEVLIBS_PATH%"=="" set DEVLIBS_PATH=c:\progs\devlibs2
IF "%MINGW_PATH%"=="" set MINGW_PATH=c:\progs\mingw
set MINGW_BIN=%MINGW_PATH%\bin
set PKG_CONFIG_PATH=%DEVLIBS_PATH%\lib\pkgconfig
REM set GS_BIN=C:\latex\gs\gs8.61\bin
REM set PATH=%DEVLIBS_PATH%\bin;%DEVLIBS_PATH%\python;%MINGW_BIN%;%PATH%;%GS_BIN%
set PATH=%DEVLIBS_PATH%\bin;%DEVLIBS_PATH%\python;%MINGW_BIN%;%PATH%
inkscape

Maybe there is no version number because of the way bzr grabs code from trunk?

Revision history for this message
David Mathog (mathog) wrote :

The ^U issue was unrelated. Found it and squished it as bug 944183

Revision history for this message
David Mathog (mathog) wrote :

Edited the SVG file down to just >mysterycharacter< and then used od to see what it is. Inside the >< are the 3 bytes:
EF 81 A4, which is UTF-8 for unicode F064, which is not a delta character. It is some funky rectangle with itty bitty characters in it. That is how it shows up in the XML editor too. Now that I know what it is:

create a string "AB"
use ^UF064<enter> to place this character between the A and the B
remove the A, remove the B
click on another object.
select all - the inserted character resulting from the above manipulation looks just like the ones that resulted from the WMF
import.

Revision history for this message
David Mathog (mathog) wrote :

Alvin, the language setting on my XP machine is English (United States). What is yours set to?

Revision history for this message
Alvin Penner (apenner) wrote :

yes, in Windows XP my language settings are English (United States)
in DOS the encoding is IBM850 Western Europe

C:\InkscapeBZR>chcp
Active code page: 850

in the Windows\Fonts directory I have installed the file symbol.ttf.

Revision history for this message
David Mathog (mathog) wrote :

My code page was 437. Changed it to 850 and it didn't make any difference with the existing binary. Possibly it would if I recompiled from scratch with it set to that.

symbol.ttf is in C:\Windows|Fonts here too.

In any case, I have found the problem, or at least part of it. The program which created the problematic delta's encoded them as F064. It turns out that is in the "private use area" F020-F0FF that Microsoft used to map Symbol (and other fonts, wingdings too I think) onto ASCII 20-FF. (I know for a fact that that application uses Richedit, so this is not surprising.) So start with F064, knock of the F0 to get 64, and find that it is ASCII lower case "d", as appropriate for a delta in symbol font. Inkscape doesn't know anything about this private use area though, and just leaves it as F064, which makes a mess, at least on my system. There is a call to:

  g_utf16_to_utf8

at around line 1858 in emf-win32-inout.cpp which is responsible for dumping the UTF8 representation of F065 into the ascii string. If you drop a print statement of some sort in after that conversion, on your system, does it come out with "d" or with the 3 character utf-8 representation? I see the latter. For lack of a better method, I did this using this line:

ofstream myfile; myfile.open ("debug.txt",ios::out | ios::app); myfile << "EMR_EXTTEXTOUTW ansi " <<ansi_text << " characters " << pEmr->emrtext.nChars<< endl; myfile.close();

and

#include <iostream>
#include <fstream>
using namespace std;

up at the top of the file.

Revision history for this message
Alvin Penner (apenner) wrote :

     unfortunately, the problem that I have is that I cannot reproduce the problem. The rendering of the deltas appears to be normal on my machine, as far as I can tell.
      Probably the best thing to do is wait and see if someone else encounters the same issue.

Revision history for this message
Alvin Penner (apenner) wrote :

and yes, in my case I also see a three character utf-8 representation, the same as you, except that on my computer this renders correctly on the screen.
    - could you check the encoding of your Internet Explorer to confirm that it is utf-8. Right click on a page and select encoding, and confirm that it is Unicode UTF-8

Revision history for this message
David Mathog (mathog) wrote :

IE was using "Western European (Windows)". Changed that to "Unicode UTF-8" and ran test again in inkscape, no change there.
Why would the setting in IE change what happens in inkscape?

What piece of code is it that determines if a character is visible or not? On your system in inkscape this code is looking at the symbol F064 and processing it as a visible delta, on mine this same piece of code thinks the same character is not visible.

Revision history for this message
David Mathog (mathog) wrote :

Since we are seeing different things a few screen shots may help.

This one shows the text string "delta" in symbol font on the canvas, selected, as well as in the "text and font" window.

The canvas looks right, but shouldn't the text in the text and font window also be in Greek characters?

Revision history for this message
David Mathog (mathog) wrote :

This one shows the problem F064 character selected. (Through the XML editor).

On the canvas it is invisible, once it is selected (see below) only the little square handle is visible.

In the text and font area this character is shown as a little rectangle with 4 characters (presumable F,0,6, and 4) inside. That is what should show up in the canvas too, or a delta character, or even a question mark, but in any case some indicator that they are present.

These invisible characters are really a pain because most of the time they cannot be selected with "select all", the only way to make them visible is with the XML editor. By "most of the time" I mean the following: immediately after import from a WMF file and ungroup they may be selected with "select all" or the XML editor - they may not be selected with the arrow tool (by dragging a selection rectangle around them). However, once saved in an SVG file, inkscape closed, inkscape restarted, open the SVG file, they may no longer be selected with "select all", only the XML editor works.

Revision history for this message
David Mathog (mathog) wrote :

This is what F064 looks like in the glyph's window on my XP system. Little rectangles instead of greek, even though the font is set to symbol.

Revision history for this message
David Mathog (mathog) wrote :

This SVG was made by entering "symbol' in arial, then inserting UF002 in the middle. In Arial F002 displays like "fl".
Then select the F002 character and change the font to Symbol. In inkscape the F002 is shown as "88".
In firefox it looks like the Arial character "fl"

Looking at the glyphs window, script=all, range=all none of Symbol or Wingdings* have a visible glyph. Also they only show
"glyphs" (as dots, or rectangles with 4 digits in them) for the range F020 to F0FF. This is very confusing, as on the drawing canvas one can enter a "d", change it to symbol, and see a Greek lower case delta. It is almost as if in the glyphs window the only valid range for symbol,wingdings* is the Private Use Area, whereas in the drawing canvas the only valid range for these is the rest of unicode (or at least up through latin1)

The glyphs window has some other issues. Select Arial, script=all, Latin Extended B. In this range there are a bunch of rectangles with 4 letters in them, so these do not display properly in the glyph's window. However, they do not say "unassigned", as did all of the Symbol glyphs. Click on one, click on append, and the appropriate (from the unicode references) letter appears on the canvas.

Revision history for this message
Alvin Penner (apenner) wrote :

could you provide a screenshot of what the file hiddendelta.svg (from comment 1 above) looks like when it is viewed in IE8 or IE9?

Revision history for this message
David Mathog (mathog) wrote :

The SVG displays as text in IE8. There is no SVG support without a plugin, and none is installed.

Revision history for this message
David Mathog (mathog) wrote :

Here is an even simpler example - just a single lower case delta in a WMF file. Viewed with "preview" or IE8 it shows the expected lower case delta character.

With inkscape:

new drawing
import justdelta.wmf (nothing will be visible)
immediately ungroup (the text handle will become visible, not the character)
edit XML
  open until the rectangle with the 4 tiny characters becomes visible.
  click on it
  in the right pain, enter an "a" before the rectangle, a "d" after it

On the drawing canvas "<alpha><space>< delta>" will become visible. Will post that svg next.

Revision history for this message
David Mathog (mathog) wrote :
Revision history for this message
David Mathog (mathog) wrote :

Note that alphaspacedelta.svg is not rendered consistently by different applications.
I tested 5 and none of them gave the same view!

inkscape: <alpha><space><delta>
firefox: <a><delta><d>
gimp 2.6.10: <a><rectangle with F064 in it><d>
LODraw 3.5.orc: <alpha><delta><delta>
ImageMagick Display: <alpha><?><delta>

Revision history for this message
Alvin Penner (apenner) wrote :

for IE8 you can get Adobe SVG Viewer 3.0, which works quite well

Revision history for this message
David Mathog (mathog) wrote :

The attached patch works for me. It scans through the wide chars as they come in from the EMF/WMF, if any are in the range F020-F0FF then subtract F000. Once the patch was applied all of the problem WMF examples imported correctly.

AFAIK MS only used this range for the Symbols and Wingdings* fonts, so it might speed things up slightly on EMF/WMF import to only call the new msdepua function for text when it is in one of those fonts. But I suspect it would only matter if the imported metafile was about a book's worth of text.

Does Inkscape have a philosophy/design document concerning Private Use Areas? If so, this patch may not be acceptable.

In any case, the odds are that if characters are found in this range during an EMF,WMF import, then they probably came from MS software or from software that used the same conventions. Mostly because it was too dangerous for anybody else to step into Microsoft's PUA, as doing that would have been a good recipe for incompatibility with this de facto "standard".

Revision history for this message
David Mathog (mathog) wrote :

I'm biting the bullet and writing font translation symbol -> unicode for EMF import, and after that works, unicode->symbol
(optionally, eventually) for emf export. The attached image shows the current state of things. What you are looking at is
the named font/characters post translation into Times New Roman (except for the headers, which stayed Arial). Will attach
the bits of code needed to do this in a second - not a final version yet, there are still debug statements in it!

Revision history for this message
David Mathog (mathog) wrote :
Revision history for this message
David Mathog (mathog) wrote :

One note - in Powerpoint and some other MS programs when you a Zapf Dingbat symbol is inserted in a document it says that programs says "Zapf Dingbat", and but what it is shows is wingdings. Not sure what happens on a Mac, this was all done on a PC.

Revision history for this message
David Mathog (mathog) wrote :

This is the test file which was shown in Inkscape after import in the png a couple of posts up. It was prepared in PPT 2003, all objects selected, then "save as picture" to EMF.

Revision history for this message
David Mathog (mathog) wrote :

Something I left out of post 47. The reason I'm doing this is that I have pretty much given up on the Pango folks fixing the
Symbol font rendering issues for characters above 0xA0 described in Bug #948245. The other reason is that SVG files really should only have Unicode characters, as most browsers will not show other characters correctly, yet there are still a lot of
programs in circulation that use Symbol or Wingdings instead of unicode for some characters.

I know there are a lot of holes in some of the translations, but the corresponding characters seem not to have been mapped
into the lower 64K unicode character space. Some of the missing ones have been assigned above that (like file folders, or clock faces, try U1F550), but that is problematic, as in my limited testing, Inkscape could not insert a character like that using the ^U mechanism, probably because it passes through a two byte integer.

Revision history for this message
David Mathog (mathog) wrote :

I have unicode <-> nonunicode conversion working pretty well now. It is only applied when an EMF file is read in or written out, and it only handles 3 nonunicode fonts: symbol, wingdings, and Zapf Dingbats. All of the code changes are in the attached .tar.gz.

The way it works is this:

1. EMF/WMF read: all symbol, wingdings, and dingbats are automatically converted to the closest Unicode character in Times New Roman font. If there is no matching unicode, it is converted to an unknown symbol character.

2. on EMF write optionally (pop up window, same one as "convert to path") unicode which maps to one of the nonunicode fonts may be converted back.

3. optionally, on output conversion the character value can be moved up into the F020-F0FF Microsoft PUA area.

The conversion ONLY applies to the EMF written. What is left in Inkscape is formatted the same as before. To see the changes the EMF must be explicitly read in.

It is still possible to work with Symbol font (broken as that is Bug #948245), and to keep them as such, so long as the file is saved in any format other than EMF.

The only problem I have found so far is that since the glyphs are not quite the same widths, text like:

   blah blah (the next word is in greek letters, but the period isn't) symbol.

will write out with the period slightly offset relative to the Symbol font text. The longer the text, the worse this effect is. For me not such a problem since the symbols tend to be one or two greek characters in diagrams.

Anyway, this lets me take the SVG files I have currently with symbol font in them, and convert them to unicode, while still being able to get the Symbol back for other applications. So the SVG will work on the web, and the EMF it generates will go into MS applications.

Revision history for this message
David Mathog (mathog) wrote :

Rows 2-F (not A-F, the file name is wrong) for various nonunicode fonts. This is a cleaned up version of the one posted
earlier in this thread.

Revision history for this message
David Mathog (mathog) wrote :

This is the example emf after it has been automatically converted to unicode. Note that the dingbats part isn't right, because
the application that generated the EMF (MS Powerpoint 2003) used wingdings for dingbats. (There is no Zapf Dingbats font
on the test PC, if somebody can generate a test file with Zapf Dingbats on a Mac and post it here that would be very helpful.)

In any case, the SVG can be opened both in inkscape and in Firefox, and most of the characters are mapped, with the ones that aren't not having a unicode match with a value <64k.

Revision history for this message
David Mathog (mathog) wrote :

The guts of the code that do this are in a separate directory /src/libunicode-convert. That part is written in C.

Revision history for this message
David Mathog (mathog) wrote :

See bug 919728 post 63 for updated code.

Kris (kris-degussem)
Changed in inkscape:
status: New → In Progress
importance: Undecided → Low
assignee: nobody → David Mathog (mathog)
Revision history for this message
su_v (suv-lp) wrote :
Changed in inkscape:
milestone: none → 0.49
status: In Progress → Fix Committed
Bryce Harrington (bryce)
Changed in inkscape:
status: Fix Committed → Fix Released
To post a comment you must log in.