Merge into bzr.dev : i18n-msgextract : Code : Bazaar

Reviewer	Review Type	Date Requested	Status
Vincent Ladeuil		2011-05-05	Approve on 2011-05-13
Review via email: mp+60033@code.launchpad.net

Revision history for this message

Alexander Belchenko (bialix) wrote on 2011-05-05:

#

INADA Naoki пишет:
> INADA Naoki has proposed merging lp:~songofacandy/bzr/i18n-msgextract into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
>
> For more details, see:
> https://code.launchpad.net/~songofacandy/bzr/i18n-msgextract/+merge/60033
>
> I18n part2 - extract messages for translate.

I think it will be very cool if we can automatically extract help
strings for command-line options, so we can avoid wrapping them into
N_() function calls. I fear that N_() will slowdown the CLI. Of course
somebody have to check the time, maybe I'm totally wrong here.

--
All the dude wanted was his rug back

Revision history for this message

methane (songofacandy) wrote on 2011-05-05:

#

> I think it will be very cool if we can automatically extract help
> strings for command-line options, so we can avoid wrapping them into
> N_() function calls. I fear that N_() will slowdown the CLI. Of course
> somebody have to check the time, maybe I'm totally wrong here.

Done.

Revision history for this message

Alexander Belchenko (bialix) wrote on 2011-05-05:

#

>
> > I think it will be very cool if we can automatically extract help
> > strings for command-line options, so we can avoid wrapping them into
> > N_() function calls. I fear that N_() will slowdown the CLI. Of course
> > somebody have to check the time, maybe I'm totally wrong here.
>
> Done.

Cool!

Revision history for this message

Alexander Belchenko (bialix) wrote on 2011-05-05:

#

INADA Naoki пишет:
> INADA Naoki has proposed merging lp:~songofacandy/bzr/i18n-msgextract into lp:bzr.
>
> Requested reviews:
> bzr-core (bzr-core)
> Related bugs:
> Bug #83941 in Bazaar: "Bzr doesn't speak my tongue"
> https://bugs.launchpad.net/bzr/+bug/83941
>
> For more details, see:
> https://code.launchpad.net/~songofacandy/bzr/i18n-msgextract/+merge/60033
>
> I18n part2 - extract messages for translate.

Why did you put your new python script into filename without .py
extension? That makes windows life a bit harder. Is it really important
to omit .py extension here?

--
All the dude wanted was his rug back

Revision history for this message

methane (songofacandy) wrote on 2011-05-05:

#

> Why did you put your new python script into filename without .py
> extension? That makes windows life a bit harder. Is it really important
> to omit .py extension here?

No reason. It is because it is originally from mercurial and named hggettext.
I've added .py extension.

Revision history for this message

John A Meinel (jameinel) wrote on 2011-05-06:

#

That isn't all of the .py files. Is there a reason not to do:

$(PYTHON) tools/bzrgettext bzrlib/*.py \
bzrlib/*/*.py \

I guess you would get tests/ which isn't really necessary but you might want to be getting stuff like transport/ and ui/

Also, for plugins, you really need:
bzrlib/plugins/*/*.py

Because cmd_* functions will be in the plugins themselves, not in the containing dir.

If we are doing xargs anyway, why not use a "find" command.

Revision history for this message

methane (songofacandy) wrote on 2011-05-06:

#

Fixed.

I've found some modules raises error on importing because of lacking some modules.
Modules providing commands should be able to import while exporting command helps.
Otherwise, help messages of the commands are not translated.

I don't think this is a big problem because updating pot is as special as making
package. Only few developers and buildbots do this task.
Is this an acceptable limitation?

Revision history for this message

Vincent Ladeuil (vila) wrote on 2011-05-06:

#

> Fixed.
>
> I've found some modules raises error on importing because of lacking some
> modules.

Can you elaborate on that ?

> Modules providing commands should be able to import while exporting command
> helps.
> Otherwise, help messages of the commands are not translated.

Right, this could be addressed by using the command registry probably.

>
> I don't think this is a big problem because updating pot is as special as
> making
> package. Only few developers and buildbots do this task.
> Is this an acceptable limitation?

Yes and no, it depends ;)

More importantly, I think we need... tests ;)

Especially for the case you're encountering right now but more
generally so we can clearly define which texts should be
translated and ensure that a test will fail if whatever code
modification happens to escape the collection stage.

22 + xgettext --package-name "Bazaar" \

Really ? Is it just some internal id or can it be referenced by
say, launchpad ? If the later is true, we probably want bzr no ?

134 +def importpath(path):

I think we have some better implementation for that in pyutils
(get_name_object ?), this may also be related to the import
errors you're encountering.

I don't want to sound NIH-ish here, but... it seems to me we'll
do a better job (in terms of coverage and precision) by using
bzrlib and all its facilities no ?

Getting all commands, their help and all exceptions while
excluding tests for example sounds like an uphill battle to fight
with find and grep...

Do you get a feeling about how close you're coming to get *all*
the needed strings and can you categorize them (my intuition
being that there are ways to get them all reliably and precisely
by leveraging some existing APIs... and if we can do that,
designing for tests for them should be straightforward).

review: Needs Information

Revision history for this message

Vincent Ladeuil (vila) wrote on 2011-05-06:

#

http://blog.launchpad.net/translations/sharing-translations-2 may be relevant here...

Revision history for this message

methane (songofacandy) wrote on 2011-05-06:

#

Download full text (3.8 KiB)

> > Fixed.
> >
> > I've found some modules raises error on importing because of lacking some
> > modules.
>
> Can you elaborate on that ?

Before ignoreing "bzrlib/doc", there are four import errors.
Can't import 'bzrlib/doc_generate/builders/texinfo.py': No module named sphinx
Can't import 'bzrlib/doc_generate/writers/texinfo.py': No module named docutils
Can't import 'bzrlib/util/simplemapi.py': name 'windll' is not defined
Can't import 'bzrlib/transport/ftp/_gssapi.py': Unable to import library "kerberos": No module named kerberos

I think bzrlib/doc and bzrlib/doc_generate should be excluded so I added filter to Makefile.
Both of bzrlib.util.simplemapi and bzrlib.transport.ftp._gssapi doesn't provide
any commands.

> > Modules providing commands should be able to import while exporting command
> > helps.
> > Otherwise, help messages of the commands are not translated.
>
> Right, this could be addressed by using the command registry probably.

Okey, I'll try command registry based approach.

> > I don't think this is a big problem because updating pot is as special as
> > making
> > package. Only few developers and buildbots do this task.
> > Is this an acceptable limitation?
>
> Yes and no, it depends ;)
>
> More importantly, I think we need... tests ;)
>
> Especially for the case you're encountering right now but more
> generally so we can clearly define which texts should be
> translated and ensure that a test will fail if whatever code
> modification happens to escape the collection stage.
>

How can I write tests for tools like bzrgettext?

One idea I have is making "xx" language. This language is translated
automatically from pot. For example, "Display status" translated to
"xx{{Display status}}".
With this language, test of command can be ensure that messages that
should be translated is really exported and translated.

> 22 + xgettext --package-name "Bazaar" \
>
> Really ? Is it just some internal id or can it be referenced by
> say, launchpad ? If the later is true, we probably want bzr no ?

I don't know that this name affects anyware.
Should I try to make test project for it and play with Rosetta?
Or can someone in bzr-core team help me by making test branch including
po/bzr.pot generated by my branch?

BTW, there are no reason to rename it to 'bzr'. I'll do it.

> 134 +def importpath(path):
>
> I think we have some better implementation for that in pyutils
> (get_name_object ?), this may also be related to the import
> errors you're encountering.
>
> I don't want to sound NIH-ish here, but... it seems to me we'll
> do a better job (in terms of coverage and precision) by using
> bzrlib and all its facilities no ?
>
> Getting all commands, their help and all exceptions while
> excluding tests for example sounds like an uphill battle to fight
> with find and grep...

As I said above, I'll try command registry based approach.
With this approach, there are no need to manualy importing like
"importpath()".

> Do you get a feeling about how close you're coming to get *all*
> the needed strings and can you categorize them (my intuition
> being that there are ways to get them all reliably and precisely
> by leverag...

> > Fixed.
> >
> > I've found some modules raises error on importing because of lacking some
> > modules.
> 
> Can you elaborate on that ?

Before ignoreing "bzrlib/doc", there are four import errors.
Can't import 'bzrlib/doc_generate/builders/texinfo.py': No module named sphinx
Can't import 'bzrlib/doc_generate/writers/texinfo.py': No module named docutils
Can't import 'bzrlib/util/simplemapi.py': name 'windll' is not defined
Can't import 'bzrlib/transport/ftp/_gssapi.py': Unable to import library "kerberos": No module named kerberos

I think bzrlib/doc and bzrlib/doc_generate should be excluded so I added filter to Makefile.
Both of bzrlib.util.simplemapi and bzrlib.transport.ftp._gssapi doesn't provide
any commands.

> > Modules providing commands should be able to import while exporting command
> > helps.
> > Otherwise, help messages of the commands are not translated.
> 
> Right, this could be addressed by using the command registry probably.

Okey, I'll try command registry based approach.

> > I don't think this is a big problem because updating pot is as special as
> > making
> > package. Only few developers and buildbots do this task.
> > Is this an acceptable limitation?
> 
> Yes and no, it depends ;)
> 
> More importantly, I think we need... tests ;)
> 
> Especially for the case you're encountering right now but more
> generally so we can clearly define which texts should be
> translated and ensure that a test will fail if whatever code
> modification happens to escape the collection stage.
>

How can I write tests for tools like bzrgettext?

One idea I have is making "xx" language. This language is translated
automatically from pot. For example, "Display status" translated to
"xx{{Display status}}".
With this language, test of command can be ensure that messages that
should be translated is really exported and translated.

> 22      + xgettext --package-name "Bazaar" \
> 
> Really ? Is it just some internal id or can it be referenced by
> say, launchpad ? If the later is true, we probably want bzr no ?

I don't know that this name affects anyware.
Should I try to make test project for it and play with Rosetta?
Or can someone in bzr-core team help me by making test branch including
po/bzr.pot generated by my branch?

BTW, there are no reason to rename it to 'bzr'. I'll do it.

> 134     +def importpath(path):
> 
> I think we have some better implementation for that in pyutils
> (get_name_object ?), this may also be related to the import
> errors you're encountering.
> 
> I don't want to sound NIH-ish here, but... it seems to me we'll
> do a better job (in terms of coverage and precision) by using
> bzrlib and all its facilities no ?
> 
> Getting all commands, their help and all exceptions while
> excluding tests for example sounds like an uphill battle to fight
> with find and grep...

As I said above, I'll try command registry based approach.
With this approach, there are no need to manualy importing like
"importpath()".

> Do you get a feeling about how close you're coming to get *all*
> the needed strings and can you categorize them (my intuition
> being that there are ways to get them all reliably and precisely
> by leveraging some existing APIs... and if we can do that,
> designing for tests for them should be straightforward).

It is far from *all* because there are no N_() and gettext() now.
I can't imagine how far.

But I think message categories that is most important to users is:
* help topics
* command help
* Error messages giving a important hint to user. (ex. NotWorkingTree)

About help topics, I'll implement scanning of it to bzrgettext.py.
But we need to decide how handle text files under bzrlib/help_topics/en/.

About command help, I don't know how to prepare command registry that
includes all bundled plugins but does not include 3rd party plugins.

About error messages, looking on bzrlib.errors is enough.

Revision history for this message

Vincent Ladeuil (vila) wrote on 2011-05-06:

#

> It is far from *all* because there are no N_() and gettext() now.
> I can't imagine how far.
>

Fair enough.

> But I think message categories that is most important to users is:
> * help topics
> * command help
> * Error messages giving a important hint to user. (ex. NotWorkingTree)

Right, so I think a better approach would be to indeed focus on
these ones and neglect (to begin with) the other strings. This
will also reduce the amount of needed translations while we
bootstrap the whole process.

Defining a *bzr* command to do that (may be hidden as this is not
targeted at regular bzr users) will make things simpler.

>
> About help topics, I'll implement scanning of it to bzrgettext.py.
> But we need to decide how handle text files under bzrlib/help_topics/en/.

bzrlib.help is probably the way to go then since it already
provides the registry for that and several other utilities who
already extract the relevant texts (and may in fact be the site
where the localization should occur).

>
> About command help, I don't know how to prepare command registry that
> includes all bundled plugins but does not include 3rd party plugins.

Good point. But again, defining a proper bzr command will mean we
can use BZR_PLUGIN_PATH and friends to control which plugins are
loaded/seen by bzrlib. If you want to focus on bzr and its
bundled plugins, using 'BZR_PLUGINS_PATH=-site' will do just
that.

>
> About error messages, looking on bzrlib.errors is enough.

I think so too, filtering the bzrlib.errors module for classes
inheriting from BzrError should do.

This would miss some errors defined locally in some modules but
we could ignore them to start with and file bugs for them later.

Finally, an important point is the order of the strings in the
generated file.

Relying on 'find' means that different users running the script
are likely to get different file orders (hence vastly different
file content IIUC), whereas relying on internal registries means
that we can force the lexicographical order on command names or
help topics to ensure a consistent order.

This will also means that we can rely on the existing tests for
coverage and focus on tests specific to the problem we're
addressing here (string order or avoiding duplicate strings (if
that matters, I don't know) for example).

231 + with bzrlib.initialize():

2.6 specific you evil :)

> It is far from *all* because there are no N_() and gettext() now.
> I can't imagine how far.
>

Fair enough.

> But I think message categories that is most important to users is:
> * help topics
> * command help
> * Error messages giving a important hint to user. (ex. NotWorkingTree)

Right, so I think a better approach would be to indeed focus on
these ones and neglect (to begin with) the other strings. This
will also reduce the amount of needed translations while we
bootstrap the whole process.

Defining a *bzr* command to do that (may be hidden as this is not
targeted at regular bzr users) will make things simpler.

> 
> About help topics, I'll implement scanning of it to bzrgettext.py.
> But we need to decide how handle text files under bzrlib/help_topics/en/.

bzrlib.help is probably the way to go then since it already
provides the registry for that and several other utilities who
already extract the relevant texts (and may in fact be the site
where the localization should occur).

> 
> About command help, I don't know how to prepare command registry that
> includes all bundled plugins but does not include 3rd party plugins.

Good point. But again, defining a proper bzr command will mean we
can use BZR_PLUGIN_PATH and friends to control which plugins are
loaded/seen by bzrlib. If you want to focus on bzr and its
bundled plugins, using 'BZR_PLUGINS_PATH=-site' will do just
that.

> 
> About error messages, looking on bzrlib.errors is enough.

I think so too, filtering the bzrlib.errors module for classes
inheriting from BzrError should do.

This would miss some errors defined locally in some modules but
we could ignore them to start with and file bugs for them later.

Finally, an important point is the order of the strings in the
generated file.

Relying on 'find' means that different users running the script
are likely to get different file orders (hence vastly different
file content IIUC), whereas relying on internal registries means
that we can force the lexicographical order on command names or
help topics to ensure a consistent order.

This will also means that we can rely on the existing tests for
coverage and focus on tests specific to the problem we're
addressing here (string order or avoiding duplicate strings (if
that matters, I don't know) for example).

231	+ with bzrlib.initialize():

2.6 specific you evil :)

Revision history for this message

Alexander Belchenko (bialix) wrote on 2011-05-06:

#

INADA Naoki пишет:
>
> But I think message categories that is most important to users is:
> * help topics
> * command help
> * Error messages giving a important hint to user. (ex. NotWorkingTree)
>
> About help topics, I'll implement scanning of it to bzrgettext.py.
> But we need to decide how handle text files under bzrlib/help_topics/en/.

I think the initial idea was to put translations into corresponding text
files under bzrlib/help_topics/$LANG_CODE/

Is it not what you expect? What's you intent here? Extract and convert
these files into PO files? I suppose big and long help topics should be
translated not paragraph by paragraph, but as the whole text. Therefore
I'd prefer to extract all other help topics from python modules and put
them into plain text files. That won't work very good with Launchpad,
but this is another problem.

I think we should discuss this more broadly in bzr ML, because in the
past some people have concerns about txt files vs py files. IIRC Aaron
didn't like the fact we're using txt files.

But! In my opinion such text files is much better for translators.

PO files are good only for relatively short strings. They're very bad re
context.

--
All the dude wanted was his rug back

Revision history for this message

methane (songofacandy) wrote on 2011-05-06:

#

> > But I think message categories that is most important to users is:
> > * help topics
> > * command help
> > * Error messages giving a important hint to user. (ex. NotWorkingTree)
>
> Right, so I think a better approach would be to indeed focus on
> these ones and neglect (to begin with) the other strings. This
> will also reduce the amount of needed translations while we
> bootstrap the whole process.
>
> Defining a *bzr* command to do that (may be hidden as this is not
> targeted at regular bzr users) will make things simpler.
>
...
> > About command help, I don't know how to prepare command registry that
> > includes all bundled plugins but does not include 3rd party plugins.
>
> Good point. But again, defining a proper bzr command will mean we
> can use BZR_PLUGIN_PATH and friends to control which plugins are
> loaded/seen by bzrlib. If you want to focus on bzr and its
> bundled plugins, using 'BZR_PLUGINS_PATH=-site' will do just
> that.
>

OK. I'll do it.

I want to bzr command is usable to 3rd party plugins too, if possible.

But we should focus on starting translation on Launchpad to give
translaters enough time before bzr-2.4. So BZR_PLUGINS_PATH=-site is
good starting point.

> > About help topics, I'll implement scanning of it to bzrgettext.py.
> > But we need to decide how handle text files under bzrlib/help_topics/en/.
>
> bzrlib.help is probably the way to go then since it already
> provides the registry for that and several other utilities who
> already extract the relevant texts (and may in fact be the site
> where the localization should occur).

As Alexander mentioned, I said about help_topics/en/*.txt
should be translated with Rosetta or not.
I'll post to ML about this.

> >
> > About error messages, looking on bzrlib.errors is enough.
>
> I think so too, filtering the bzrlib.errors module for classes
> inheriting from BzrError should do.
>
> This would miss some errors defined locally in some modules but
> we could ignore them to start with and file bugs for them later.
>

bzrerrors() function in bzrgettext.py filters Error classes with
internal_error=True.

> Finally, an important point is the order of the strings in the
> generated file.
>
> Relying on 'find' means that different users running the script
> are likely to get different file orders (hence vastly different
> file content IIUC), whereas relying on internal registries means
> that we can force the lexicographical order on command names or
> help topics to ensure a consistent order.
>
> This will also means that we can rely on the existing tests for
> coverage and focus on tests specific to the problem we're
> addressing here (string order or avoiding duplicate strings (if
> that matters, I don't know) for example).

OK, I'll mind to order.

>
> 231 + with bzrlib.initialize():
>
> 2.6 specific you evil :)

It can run under 2.5, because there is "from __future__ import with_statement".
But when bzrgettext.py is bzr command, it should be able to run under
2.4 until dropping 2.4 support is decided.

> > But I think message categories that is most important to users is:
> > * help topics
> > * command help
> > * Error messages giving a important hint to user. (ex. NotWorkingTree)
> 
> Right, so I think a better approach would be to indeed focus on
> these ones and neglect (to begin with) the other strings. This
> will also reduce the amount of needed translations while we
> bootstrap the whole process.
> 
> Defining a *bzr* command to do that (may be hidden as this is not
> targeted at regular bzr users) will make things simpler.
> 
...
> > About command help, I don't know how to prepare command registry that
> > includes all bundled plugins but does not include 3rd party plugins.
> 
> Good point. But again, defining a proper bzr command will mean we
> can use BZR_PLUGIN_PATH and friends to control which plugins are
> loaded/seen by bzrlib. If you want to focus on bzr and its
> bundled plugins, using 'BZR_PLUGINS_PATH=-site' will do just
> that.
>

OK. I'll do it.

I want to bzr command is usable to 3rd party plugins too, if possible.

But we should focus on starting translation on Launchpad to give
translaters enough time before bzr-2.4. So BZR_PLUGINS_PATH=-site is
good starting point.

> > About help topics, I'll implement scanning of it to bzrgettext.py.
> > But we need to decide how handle text files under bzrlib/help_topics/en/.
> 
> bzrlib.help is probably the way to go then since it already
> provides the registry for that and several other utilities who
> already extract the relevant texts (and may in fact be the site
> where the localization should occur).

As Alexander mentioned, I said about help_topics/en/*.txt
should be translated with Rosetta or not.
I'll post to ML about this.

> >
> > About error messages, looking on bzrlib.errors is enough.
> 
> I think so too, filtering the bzrlib.errors module for classes
> inheriting from BzrError should do.
> 
> This would miss some errors defined locally in some modules but
> we could ignore them to start with and file bugs for them later.
>

bzrerrors() function in bzrgettext.py filters Error classes with
internal_error=True.

> Finally, an important point is the order of the strings in the
> generated file.
> 
> Relying on 'find' means that different users running the script
> are likely to get different file orders (hence vastly different
> file content IIUC), whereas relying on internal registries means
> that we can force the lexicographical order on command names or
> help topics to ensure a consistent order.
> 
> This will also means that we can rely on the existing tests for
> coverage and focus on tests specific to the problem we're
> addressing here (string order or avoiding duplicate strings (if
> that matters, I don't know) for example).

OK, I'll mind to order.

> 
> 231     + with bzrlib.initialize():
> 
> 2.6 specific you evil :)

It can run under 2.5, because there is "from __future__ import with_statement".
But when bzrgettext.py is bzr command, it should be able to run under
2.4 until dropping 2.4 support is decided.

Revision history for this message

John A Meinel (jameinel) wrote on 2011-05-06:

#

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05/06/2011 08:45 AM, INADA Naoki wrote:
> Fixed.
>
> I've found some modules raises error on importing because of lacking some modules.
> Modules providing commands should be able to import while exporting command helps.
> Otherwise, help messages of the commands are not translated.
>
> I don't think this is a big problem because updating pot is as special as making
> package. Only few developers and buildbots do this task.
> Is this an acceptable limitation?
>

Modules ideally should not have dependency issues. ie You should be able
to run "python -c 'import bzrlib.XXX'" for any XXX and not have it fail.
If there are circular dependencies or missing dependencies, we should
fix this.

I had thought you were just grepping the code, but if you are importing
it, then we should probably go via a different path.

from bzrlib import initialize, plugin, commands
initialize().__enter__()
plugin.load_plugins()

# do stuff with the registered commands. Which will now include plugin-
# provided commands.

And then whatever else you need. Once you've done initialize +
load_plugins, I think everything should be in a happy enough state for
importing.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk3EBOsACgkQJdeBCYSNAAP7BwCeM5NWtQP/Cs03o8ssj/gtafcg
D/8AnAydV84uiweq+VyhiZrImKorY0m7
=0Io/
-----END PGP SIGNATURE-----

Revision history for this message

Martin Packman (gz) wrote on 2011-05-07:

#

Would using _ast be an alternative to needing to import the modules and worry about dependencies?

+ if fmt:
+ poentry('bzrlib/erros.py', inspect.findsource(klass)[1], fmt)

Typo, "errors.py"?

Revision history for this message

methane (songofacandy) wrote on 2011-05-08:

#

Yey! I did it!
This is my first bzr command!

Revision history for this message

methane (songofacandy) wrote on 2011-05-09:

#

Then, what kind of tests should I write?

Revision history for this message

Vincent Ladeuil (vila) wrote on 2011-05-10:

#

>>>>> INADA Naoki <email address hidden> writes:

> Yey! I did it!

Hurray !

> This is my first bzr command!

Congrats :)

Revision history for this message

methane (songofacandy) wrote on 2011-05-12:

#

Then, last things I should do is writing tests more?
The way of extracting messages is OK?

Revision history for this message

Vincent Ladeuil (vila) wrote on 2011-05-13:

#

I think that's good enough to start experimenting, thanks !

I'm still unclear about the overall approach to extract the messages but I don't want to block on that either. We can still revisit it later once we all know a bit more about i18n and l10n ;)

Unless the second reviewer objects, I think we should land this and see how it goes for the whole translation process.

review: Approve

Revision history for this message

methane (songofacandy) wrote on 2011-05-14:

#

I want to enable extracting help topics.
It is implemented in lp:~songofacandy/bzr/i18n
Should it be separeted merge request?

Revision history for this message

Vincent Ladeuil (vila) wrote on 2011-05-14:

#

> Should it be separeted merge request?

Yes.

*This* mp has been approved and only needs a second review.

Start a new one for the new work so we can land *this* mp (when approved) while discussing the other one.

Revision history for this message

Vincent Ladeuil (vila) wrote on 2011-05-16:

#

It seems there are too many possible ways to address the message extraction and that we won't progress without making at least some experiments.

This proposal implements one way to extract the messages and it's worth trying.

So I'll land it and if we change our mind later, well fix the fallouts if needed.

Revision history for this message

Vincent Ladeuil (vila) wrote on 2011-05-16:

#

@Naoki: Also, it would be nice if you collect the knowledge you acquire in some document we can refer to *in* the code base (doc/developers/i18n.txt comes to mind)

Don't worry about it being perfect or even finalized, just put what you know today there and we'll refine it as we get a better collective understanding.

Revision history for this message

Vincent Ladeuil (vila) wrote on 2011-05-16:

#

sent to pqm by email

Bazaar

Merge lp:~songofacandy/bzr/i18n-msgextract into lp:bzr

Commit message

Description of the change

Preview Diff

Subscribers

 === modified file 'Makefile'
 --- Makefile	2011-04-18 00:44:08 +0000
 +++ Makefile	2011-05-12 12:21:48 +0000
@@ -420,6 +420,26 @@
  	$(PYTHON) tools/win32/ostools.py remove dist
++# i18n targets
++
++.PHONY: update-pot po/bzr.pot
++update-pot: po/bzr.pot
++
++TRANSLATABLE_PYFILES:=$(shell find bzrlib -name '*.py' \
++    		| grep -v 'bzrlib/tests/' \
++    		| grep -v 'bzrlib/doc' \
++		)
++
++po/bzr.pot: $(PYFILES) $(DOCFILES)
++	$(PYTHON) ./bzr export-pot > po/bzr.pot
++	echo $(TRANSLATABLE_PYFILES) | xargs \
++	  xgettext --package-name "bzr" \
++	  --msgid-bugs-address "<bazaar@canonical.com>" \
++	  --copyright-holder "Canonical" \
++	  --from-code ISO-8859-1 --join --sort-by-file --add-comments=i18n: \
++	  -d bzr -p po -o bzr.pot
++
++
  ### Packaging Targets ###
  .PHONY: dist check-dist-tarball
 === modified file 'bzrlib/builtins.py'
 --- bzrlib/builtins.py	2011-05-03 13:53:46 +0000
 +++ bzrlib/builtins.py	2011-05-12 12:21:48 +0000
@@ -6154,6 +6154,16 @@
              self.outf.write('%s %s\n' % (path, location))
++class cmd_export_pot(Command):
++    __doc__ = """Export command helps and error messages in po format."""
++
++    hidden = True
++
++    def run(self):
++        from bzrlib.export_pot import export_pot
++        export_pot(self.outf)
++
++
  def _register_lazy_builtins():
      # register lazy builtins from other modules; called at startup and should
      # be only called once.
 === added file 'bzrlib/export_pot.py'
 --- bzrlib/export_pot.py	1970-01-01 00:00:00 +0000
 +++ bzrlib/export_pot.py	2011-05-12 12:21:48 +0000
@@ -0,0 +1,241 @@
++# Copyright (C) 2011 Canonical Ltd
++#
++# This program is free software; you can redistribute it and/or modify
++# it under the terms of the GNU General Public License as published by
++# the Free Software Foundation; either version 2 of the License, or
++# (at your option) any later version.
++#
++# This program is distributed in the hope that it will be useful,
++# but WITHOUT ANY WARRANTY; without even the implied warranty of
++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
++# GNU General Public License for more details.
++#
++# You should have received a copy of the GNU General Public License
++# along with this program; if not, write to the Free Software
++# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
++
++# The normalize function is taken from pygettext which is distributed
++# with Python under the Python License, which is GPL compatible.
++
++"""Extract docstrings from Bazaar commands.
++"""
++
++import inspect
++import os
++
++from bzrlib import (
++    commands as _mod_commands,
++    errors,
++    help_topics,
++    plugin,
++    )
++from bzrlib.trace import (
++    mutter,
++    note,
++    )
++
++
++def _escape(s):
++    s = (s.replace('\\', '\\\\')
++        .replace('\n', '\\n')
++        .replace('\r', '\\r')
++        .replace('\t', '\\t')
++        .replace('"', '\\"')
++        )
++    return s
++
++def _normalize(s):
++    # This converts the various Python string types into a format that
++    # is appropriate for .po files, namely much closer to C style.
++    lines = s.split('\n')
++    if len(lines) == 1:
++        s = '"' + _escape(s) + '"'
++    else:
++        if not lines[-1]:
++            del lines[-1]
++            lines[-1] = lines[-1] + '\n'
++        lines = map(_escape, lines)
++        lineterm = '\\n"\n"'
++        s = '""\n"' + lineterm.join(lines) + '"'
++    return s
++
++
++_FOUND_MSGID = None # set by entry function.
++
++def _poentry(outf, path, lineno, s, comment=None):
++    if s in _FOUND_MSGID:
++        return
++    _FOUND_MSGID.add(s)
++    if comment is None:
++        comment = ''
++    else:
++        comment = "# %s\n" % comment
++    mutter("Exporting msg %r at line %d in %r", s[:20], lineno, path)
++    print >>outf, ('#: %s:%d\n' % (path, lineno) +
++           comment+
++           'msgid %s\n' % _normalize(s) +
++           'msgstr ""\n')
++
++def _poentry_per_paragraph(outf, path, lineno, msgid):
++    # TODO: How to split long help?
++    paragraphs = msgid.split('\n\n')
++    for p in paragraphs:
++        _poentry(outf, path, lineno, p)
++        lineno += p.count('\n') + 2
++
++_LAST_CACHE = _LAST_CACHED_SRC = None
++
++def _offsets_of_literal(src):
++    global _LAST_CACHE, _LAST_CACHED_SRC
++    if src == _LAST_CACHED_SRC:
++        return _LAST_CACHE.copy()
++
++    import ast
++    root = ast.parse(src)
++    offsets = {}
++    for node in ast.walk(root):
++        if not isinstance(node, ast.Str):
++            continue
++        offsets[node.s] = node.lineno - node.s.count('\n')
++
++    _LAST_CACHED_SRC = src
++    _LAST_CACHE = offsets.copy()
++    return offsets
++
++def _standard_options(outf):
++    from bzrlib.option import Option
++    src = inspect.findsource(Option)[0]
++    src = ''.join(src)
++    path = 'bzrlib/option.py'
++    offsets = _offsets_of_literal(src)
++
++    for name in sorted(Option.OPTIONS.keys()):
++        opt = Option.OPTIONS[name]
++        if getattr(opt, 'hidden', False):
++            continue
++        if getattr(opt, 'title', None):
++            lineno = offsets.get(opt.title, 9999)
++            if lineno == 9999:
++                note("%r is not found in bzrlib/option.py" % opt.title)
++            _poentry(outf, path, lineno, opt.title,
++                     'title of %r option' % name)
++        if getattr(opt, 'help', None):
++            lineno = offsets.get(opt.help, 9999)
++            if lineno == 9999:
++                note("%r is not found in bzrlib/option.py" % opt.help)
++            _poentry(outf, path, lineno, opt.help,
++                     'help of %r option' % name)
++
++def _command_options(outf, path, cmd):
++    src, default_lineno = inspect.findsource(cmd.__class__)
++    offsets = _offsets_of_literal(''.join(src))
++    for opt in cmd.takes_options:
++        if isinstance(opt, str):
++            continue
++        if getattr(opt, 'hidden', False):
++            continue
++        name = opt.name
++        if getattr(opt, 'title', None):
++            lineno = offsets.get(opt.title, default_lineno)
++            _poentry(outf, path, lineno, opt.title,
++                     'title of %r option of %r command' % (name, cmd.name()))
++        if getattr(opt, 'help', None):
++            lineno = offsets.get(opt.help, default_lineno)
++            _poentry(outf, path, lineno, opt.help,
++                     'help of %r option of %r command' % (name, cmd.name()))
++
++
++def _write_command_help(outf, cmd_name, cmd):
++    path = inspect.getfile(cmd.__class__)
++    if path.endswith('.pyc'):
++        path = path[:-1]
++    path = os.path.relpath(path)
++    src, lineno = inspect.findsource(cmd.__class__)
++    offsets = _offsets_of_literal(''.join(src))
++    lineno = offsets[cmd.__doc__]
++    doc = inspect.getdoc(cmd)
++
++    _poentry_per_paragraph(outf, path, lineno, doc)
++    _command_options(outf, path, cmd)
++
++def _command_helps(outf):
++    """Extract docstrings from path.
++
++    This respects the Bazaar cmdtable/table convention and will
++    only extract docstrings from functions mentioned in these tables.
++    """
++    from glob import glob
++
++    # builtin commands
++    for cmd_name in _mod_commands.builtin_command_names():
++        command = _mod_commands.get_cmd_object(cmd_name, False)
++        if command.hidden:
++            continue
++        note("Exporting messages from builtin command: %s", cmd_name)
++        _write_command_help(outf, cmd_name, command)
++
++    plugin_path = plugin.get_core_plugin_path()
++    core_plugins = glob(plugin_path + '/*/__init__.py')
++    core_plugins = [os.path.basename(os.path.dirname(p))
++                        for p in core_plugins]
++    # core plugins
++    for cmd_name in _mod_commands.plugin_command_names():
++        command = _mod_commands.get_cmd_object(cmd_name, False)
++        if command.hidden:
++            continue
++        if command.plugin_name() not in core_plugins:
++            # skip non-core plugins
++            # TODO: Support extracting from third party plugins.
++            continue
++        note("Exporting messages from plugin command: %s in %s",
++             cmd_name, command.plugin_name())
++        _write_command_help(outf, cmd_name, command)
++
++
++def _error_messages(outf):
++    """Extract fmt string from bzrlib.errors."""
++    path = errors.__file__
++    if path.endswith('.pyc'):
++        path = path[:-1]
++    offsets = _offsets_of_literal(open(path).read())
++
++    base_klass = errors.BzrError
++    for name in dir(errors):
++        klass = getattr(errors, name)
++        if not inspect.isclass(klass):
++            continue
++        if not issubclass(klass, base_klass):
++            continue
++        if klass is base_klass:
++            continue
++        if klass.internal_error:
++            continue
++        fmt = getattr(klass, "_fmt", None)
++        if fmt:
++            note("Exporting message from error: %s", name)
++            _poentry(outf, 'bzrlib/errors.py',
++                     offsets.get(fmt, 9999), fmt)
++
++def _help_topics(outf):
++    topic_registry = help_topics.topic_registry
++    for key in topic_registry.keys():
++        doc = topic_registry.get(key)
++        if isinstance(doc, str):
++            _poentry_per_paragraph(
++                    outf,
++                    'dummy/help_topics/'+key+'/detail.txt',
++                    1, doc)
++
++        summary = topic_registry.get_summary(key)
++        if summary is not None:
++            _poentry(outf, 'dummy/help_topics/'+key+'/summary.txt',
++                     1, summary)
++
++def export_pot(outf):
++    global _FOUND_MSGID
++    _FOUND_MSGID = set()
++    _standard_options(outf)
++    _command_helps(outf)
++    _error_messages(outf)
++    # disable exporting help topics until we decide  how to translate it.
++    #_help_topics(outf)
 === modified file 'bzrlib/tests/__init__.py'
 --- bzrlib/tests/__init__.py	2011-05-10 07:46:15 +0000
 +++ bzrlib/tests/__init__.py	2011-05-12 12:21:48 +0000
@@ -3783,6 +3783,7 @@
          'bzrlib.tests.test_eol_filters',
          'bzrlib.tests.test_errors',
          'bzrlib.tests.test_export',
++        'bzrlib.tests.test_export_pot',
          'bzrlib.tests.test_extract',
          'bzrlib.tests.test_fetch',
          'bzrlib.tests.test_fixtures',
 === added file 'bzrlib/tests/test_export_pot.py'
 --- bzrlib/tests/test_export_pot.py	1970-01-01 00:00:00 +0000
 +++ bzrlib/tests/test_export_pot.py	2011-05-12 12:21:48 +0000
@@ -0,0 +1,147 @@
++# Copyright (C) 2011 Canonical Ltd
++#
++# This program is free software; you can redistribute it and/or modify
++# it under the terms of the GNU General Public License as published by
++# the Free Software Foundation; either version 2 of the License, or
++# (at your option) any later version.
++#
++# This program is distributed in the hope that it will be useful,
++# but WITHOUT ANY WARRANTY; without even the implied warranty of
++# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
++# GNU General Public License for more details.
++#
++# You should have received a copy of the GNU General Public License
++# along with this program; if not, write to the Free Software
++# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
++
++from cStringIO import StringIO
++import textwrap
++
++from bzrlib import (
++    export_pot,
++    tests,
++    )
++
++class TestEscape(tests.TestCase):
++
++    def test_simple_escape(self):
++        self.assertEqual(
++                export_pot._escape('foobar'),
++                'foobar')
++
++        s = '''foo\nbar\r\tbaz\\"spam"'''
++        e = '''foo\\nbar\\r\\tbaz\\\\\\"spam\\"'''
++        self.assertEqual(export_pot._escape(s), e)
++
++    def test_complex_escape(self):
++        s = '''\\r \\\n'''
++        e = '''\\\\r \\\\\\n'''
++        self.assertEqual(export_pot._escape(s), e)
++
++
++class TestNormalize(tests.TestCase):
++
++    def test_single_line(self):
++        s = 'foobar'
++        e = '"foobar"'
++        self.assertEqual(export_pot._normalize(s), e)
++
++        s = 'foo"bar'
++        e = '"foo\\"bar"'
++        self.assertEqual(export_pot._normalize(s), e)
++
++    def test_multi_lines(self):
++        s = 'foo\nbar\n'
++        e = '""\n"foo\\n"\n"bar\\n"'
++        self.assertEqual(export_pot._normalize(s), e)
++
++        s = '\nfoo\nbar\n'
++        e = ('""\n'
++             '"\\n"\n'
++             '"foo\\n"\n'
++             '"bar\\n"')
++        self.assertEqual(export_pot._normalize(s), e)
++
++
++class PoEntryTestCase(tests.TestCase):
++
++    def setUp(self):
++        self.overrideAttr(export_pot, '_FOUND_MSGID', set())
++        self._outf = StringIO()
++        super(PoEntryTestCase, self).setUp()
++
++    def check_output(self, expected):
++        self.assertEqual(
++                self._outf.getvalue(),
++                textwrap.dedent(expected)
++                )
++
++class TestPoEntry(PoEntryTestCase):
++
++    def test_simple(self):
++        export_pot._poentry(self._outf, 'dummy', 1, "spam")
++        export_pot._poentry(self._outf, 'dummy', 2, "ham", 'EGG')
++        self.check_output('''\
++                #: dummy:1
++                msgid "spam"
++                msgstr ""
++
++                #: dummy:2
++                # EGG
++                msgid "ham"
++                msgstr ""
++
++                ''')
++
++    def test_duplicate(self):
++        export_pot._poentry(self._outf, 'dummy', 1, "spam")
++        # This should be ignored.
++        export_pot._poentry(self._outf, 'dummy', 2, "spam", 'EGG')
++
++        self.check_output('''\
++                #: dummy:1
++                msgid "spam"
++                msgstr ""\n
++                ''')
++
++
++class TestPoentryPerPergraph(PoEntryTestCase):
++
++    def test_single(self):
++        export_pot._poentry_per_paragraph(
++                self._outf,
++                'dummy',
++                10,
++                '''foo\nbar\nbaz\n'''
++                )
++        self.check_output('''\
++                #: dummy:10
++                msgid ""
++                "foo\\n"
++                "bar\\n"
++                "baz\\n"
++                msgstr ""\n
++                ''')
++
++    def test_multi(self):
++        export_pot._poentry_per_paragraph(
++                self._outf,
++                'dummy',
++                10,
++                '''spam\nham\negg\n\nSPAM\nHAM\nEGG\n'''
++                )
++        self.check_output('''\
++                #: dummy:10
++                msgid ""
++                "spam\\n"
++                "ham\\n"
++                "egg"
++                msgstr ""
++
++                #: dummy:14
++                msgid ""
++                "SPAM\\n"
++                "HAM\\n"
++                "EGG\\n"
++                msgstr ""\n
++                ''')
 === added directory 'po'