Bazaar

Code review comment for lp:~songofacandy/bzr/i18n

i18n
Merge into bzr.dev

Revision history for this message

David Planella (dpm) wrote on 2011-05-10:

> David Planella пишет:
> > I've been asked to provide some feedback and answer any questions, so I'll
> focus on those rather than reviewing the actual code.
> >
> > Here are a couple of things that caught my eye:
> >
> > Standard i18n Tools Integration
> > -------------------------------
> >
> > It strikes me that you seem to rewrite much of the functionality provided by
> gettext and other tools already.
> > Is there a reason why you are not using something like intltool
>
> What's intltool? That's one: https://launchpad.net/intltool ?
> Is it work on Windows? Mac?
>

Yeah, that's intltool. Intltool is a higher level tool that adds functionality to gettext by allowing the extraction of translatable strings from a variety of file formats, and provides some extra functionality. It is a standard tool used for many translatable projects and a dependency from python-distutils-extra.

It's a perl script, so it should run on any platform where perl can be run.

> > and python-distutils-extra (e.g. using the build_i18n command from p-d-e
> instead of writing extras/build_mo.py),
> > which are used by most OSS Python projects implementing i18n support? Is it
> because of platform compatibility issues?
>
> What is restrictions of the build_i18n? Can it be extended/customized?
>

I'm not sure I understand the question on restrictions, but in any case, here's the code:

http://bazaar.launchpad.net/~python-distutils-extra-hackers/python-distutils-extra/debian/view/head:/DistUtilsExtra/auto.py#L505

> > Why do you use the gettext() call directly instead of the more usual _()?
>
> Because _ has special meaning in PDB. Why does it matter?
>

This is not a real issue, just a cosmetic one.

The feedback I get is that developers prefer writing the shorter _() to a translatable message than the longer and perhaps less readable gettext(). This has been the standard practice for nearly all localizable Open Source projects I've seen. The only exception that I can think of off the top of my head is bzr-gtk, where i18n() is used.

Why is is an issue in bzr but not in all other Python projects that make use of gettext?

> > Why is extras/bzrgettext needed?
> > The file is very well documented, but I'm not sure I can follow
> > why standard gettext cannot be used.
>
> What is standard gettext? xgettext? This one is unable to extract
> docstrings without wrapping them into gettext() calls.
>

Sorry, I should have been a bit clearer. By standard gettext I mean bot the gettext API and the set of developer tools provided by the gettext package. xgettext is part of the gettext package.

Could you tell me more about how are docstrings used in the bzr context and why should they be translatable? (I'm not against this, the more translatable the better, but I'd suggest sticking to standard tools from now and focus on translating the strings that are exposed to users first, e.g. the command line).

> > May I ask you to ellaborate on this?
> >
> > The same with bzrlib/utextwrap.py, what's its purpose?
>
> This is one is required to deal with multibyte characters. Some
> (Japanese) unicode characters are actually require 2 positions on the
> screen, so if you need to wrap long string using the width of the
> terminal then you need to take care of multibyte characters, because
> len(string) != len(visible characters).
>

Would gettext not take care of this already?

> > Note that I'm not arguing that you are doing anything wrong,
> > I'm just wondering if we could make use of more standard practices
> > in the internationalization world.
>
> Can you provide some guidelines or hints/tips then?
>

I've tried to provide that on my previous comment, but I'm happy to ellaborate on particular points. Is there an area in particular where you'd like me to give more specific details?

As an example, you can have a look at the merge request for adding i18n support to bzr-gtk as an example:

https://code.launchpad.net/~dpm/bzr-gtk/i18n/+merge/52318

> > Integration with Launchpad
> > --------------------------
> >
> > It just makes sense to have bzr use all the Launchpad integration
> > features regarding translations.
> > For this, at the code level the only thing that is needed is to:
> >
> > * Have a tool in the build system that can extract translatable strings from
> the code and merge them into a bzr.pot template file
> > * Have the right source tree layout:
> >
> > po/bzr.pot
> > po/jp.po
> > po/de.po
> > po/pt_BR.po
> > ...
> >
> > Where translations are in the same directory as the .pot file and are named
> with iso 639 2-letter or 3-letter codes, with an optional country code name.
> >
> > The rest can be done at the Launchpad project level and it's up to the bzr
> project admins:
> >
> > * Set up the translation focus and permissions: my recommendation would be
> Restricted or Structured, assigned to the Launchpad Translators group
> > * Set up automatic translation imports, so that on every commit of the
> bzr.pot file translations are exposed in Launchpad
> > * Set up automatic exports, so that translations can be exported to a branch
> of your choice automatically (currently daily). My recommendation would be to
> use the same branch for imports and exports, so that the manual intervention
> in managing translations is reduced to 0 in this aspect.
>
> Can you tell us how Launchpad shows the translatable strings in its UI:
> do they shown in the same order as they are present in PO file, or
> Launchpad could sort/shuffle them as it thinks appropriate?
>

Here's an example on how strings are exposed for translation:

https://translations.launchpad.net/bzr-gtk/trunk/+pots/bzr-gtk/ca/+translate

The order is kept, but afaik, this is not guaranteed. In any case, this shouldn't be much of an issue for translators or maintainers.

> What's about translating really long texts, say 10 or more lines? What
> will be the best strategy? Split them into hunks or keep as one block?

Exactly. It's better to translate long text into separate translatable paragraphs. It makes life much easier for translators.

> Will our hunks be shown in the LP following each other or this is not
> guaranteed?
>

Good point, now I see where your previous question was coming from :) So yeah, I believe the order is kept, but it is not something to take for granted.

I hope the comments help. Feel free to ping me on IRC ('dpm') as well.

Cheers,
David.

> David Planella пишет:
> > I've been asked to provide some feedback and answer any questions, so I'll
> focus on those rather than reviewing the actual code.
> >
> > Here are a couple of things that caught my eye:
> >
> > Standard i18n Tools Integration
> > -------------------------------
> >
> > It strikes me that you seem to rewrite much of the functionality provided by
> gettext and other tools already.
>  > Is there a reason why you are not using something like intltool
> 
> What's intltool? That's one: https://launchpad.net/intltool ?
> Is it work on Windows? Mac?
>

It's a perl script, so it should run on any platform where perl can be run.

> > and python-distutils-extra (e.g. using the build_i18n command from p-d-e
> instead of writing extras/build_mo.py),
> > which are used by most OSS Python projects implementing i18n support? Is it
> because of platform compatibility issues?
> 
> What is restrictions of the build_i18n? Can it be extended/customized?
>

I'm not sure I understand the question on restrictions, but in any case, here's the code:

http://bazaar.launchpad.net/~python-distutils-extra-hackers/python-distutils-extra/debian/view/head:/DistUtilsExtra/auto.py#L505

> > Why do you use the gettext() call directly instead of the more usual _()?
> 
> Because _ has special meaning in PDB. Why does it matter?
>

This is not a real issue, just a cosmetic one.

Why is is an issue in bzr but not in all other Python projects that make use of gettext?

> > Why is extras/bzrgettext needed?
>  > The file is very well documented, but I'm not sure I can follow
>  > why standard gettext cannot be used.
> 
> What is standard gettext? xgettext? This one is unable to extract
> docstrings without wrapping them into gettext() calls.
>

Sorry, I should have been a bit clearer. By standard gettext I mean bot the gettext API and the set of developer tools provided by the gettext package. xgettext is part of the gettext package.

> > May I ask you to ellaborate on this?
> >
> > The same with bzrlib/utextwrap.py, what's its purpose?
> 
> This is one is required to deal with multibyte characters. Some
> (Japanese) unicode characters are actually require 2 positions on the
> screen, so if you need to wrap long string using the width of the
> terminal then you need to take care of multibyte characters, because
> len(string) != len(visible characters).
>

Would gettext not take care of this already?

> > Note that I'm not arguing that you are doing anything wrong,
>  > I'm just wondering if we could make use of more standard practices
>  > in the internationalization world.
> 
> Can you provide some guidelines or hints/tips then?
>

I've tried to provide that on my previous comment, but I'm happy to ellaborate on particular points. Is there an area in particular where you'd like me to give more specific details?

As an example, you can have a look at the merge request for adding i18n support to bzr-gtk as an example:

https://code.launchpad.net/~dpm/bzr-gtk/i18n/+merge/52318

> > Integration with Launchpad
> > --------------------------
> >
> > It just makes sense to have bzr use all the Launchpad integration
>  > features regarding translations.
>  > For this, at the code level the only thing that is needed is to:
> >
> > * Have a tool in the build system that can extract translatable strings from
> the code and merge them into a bzr.pot template file
> > * Have the right source tree layout:
> >
> > po/bzr.pot
> > po/jp.po
> > po/de.po
> > po/pt_BR.po
> > ...
> >
> > Where translations are in the same directory as the .pot file and are named
> with iso 639 2-letter or 3-letter codes, with an optional country code name.
> >
> > The rest can be done at the Launchpad project level and it's up to the bzr
> project admins:
> >
> > * Set up the translation focus and permissions: my recommendation would be
> Restricted or Structured, assigned to the Launchpad Translators group
> > * Set up automatic translation imports, so that on every commit of the
> bzr.pot file translations are exposed in Launchpad
> > * Set up automatic exports, so that translations can be exported to a branch
> of your choice automatically (currently daily). My recommendation would be to
> use the same branch for imports and exports, so that the manual intervention
> in managing translations is reduced to 0 in this aspect.
> 
> Can you tell us how Launchpad shows the translatable strings in its UI:
> do they shown in the same order as they are present in PO file, or
> Launchpad could sort/shuffle them as it thinks appropriate?
>

Here's an example on how strings are exposed for translation:

https://translations.launchpad.net/bzr-gtk/trunk/+pots/bzr-gtk/ca/+translate

The order is kept, but afaik, this is not guaranteed. In any case, this shouldn't be much of an issue for translators or maintainers.

> What's about translating really long texts, say 10 or more lines? What
> will be the best strategy? Split them into hunks or keep as one block?

Exactly. It's better to translate long text into separate translatable paragraphs.  It makes life much easier for translators.

> Will our hunks be shown in the LP following each other or this is not
> guaranteed?
>

Good point, now I see where your previous question was coming from :) So yeah, I believe the order is kept, but it is not something to take for granted.

I hope the comments help. Feel free to ping me on IRC ('dpm') as well.

Cheers,
David.

« Back to merge proposal