sloccount should be updated to properly handle UTF-8 locales

Bug #1524 reported by Ralph Corderoy
6
Affects Status Importance Assigned to Milestone
sloccount (Debian)
Fix Released
Unknown
sloccount (Ubuntu)
Fix Released
Low
MOTU

Bug Description

Package: sloccount
Version: 2.26-2

sloccount generates warnings from perl.

    $ mkdir foo
    $ cd foo
    $ sloccount .
    Creating filelist for foo
    Categorizing files.
    perl: warning: Setting locale failed.
    perl: warning: Please check that your locale settings:
            LANGUAGE = "en_GB:en",
            LC_ALL = (unset),
            LANG = "en_GB"
        are supported and installed on your system.
    perl: warning: Falling back to the standard locale ("C").
    Computing results.

    perl: warning: Setting locale failed.
    perl: warning: Please check that your locale settings:
            LANGUAGE = "en_GB:en",
            LC_ALL = (unset),
            LANG = "en_GB"
        are supported and installed on your system.
    perl: warning: Falling back to the standard locale ("C").
    SLOC Directory SLOC-by-Language (Sorted)
    0 foo (none)
    SLOC total is zero, no further analysis performed.
    $ cd ..
    $ rmdir foo
    $

I think it's because it meddles with $LANG.

    $ awk '/Perl/, /^fi/' /usr/bin/sloccount
    # Perl 5.8.0 handles the "LANG" environment variable oddly;
    # if it includes ".UTF-8" (which is does in Red Hat Linux 9 and others)
    # then it will bitterly complain about ordinary text.
    # So, we'll need to filter ".UTF-8" out of LANG.
    if [ x"$LANG" != x ]
    then
     LANG=`echo "$LANG" | sed -e 's/\.UTF-8//'`
     export LANG
     # echo "New LANG variable: $LANG"
    fi
    $

A simpler example.

    $ perl -w -e 0
    $ LANG=${LANG%.UTF-8} perl -w -e 0
    perl: warning: Setting locale failed.
    perl: warning: Please check that your locale settings:
            LANGUAGE = "en_GB:en",
            LC_ALL = (unset),
            LANG = "en_GB"
        are supported and installed on your system.
    perl: warning: Falling back to the standard locale ("C").
    $

Perhaps it no longer needs to meddle, or its meddling isn't complete
enough.

-- System Information:
Debian Release: 3.1
Architecture: i386 (i686)
Kernel: Linux 2.6.10-5-386
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8)

Versions of packages sloccount depends on:
ii libc6 2.3.2.ds1-20ubuntu13 GNU C Library: Shared libraries an
ii perl 5.8.4-6ubuntu1 Larry Wall's Practical Extraction

-- no debconf information

Changed in sloccount:
assignee: nobody → motu
Revision history for this message
Daniel Robitaille (robitaille) wrote :

Confirmed on Dapper Beta

Changed in sloccount:
status: Unconfirmed → Confirmed
Revision history for this message
Scott Kitterman (kitterman) wrote :

It looks like this is a duplicated by Bug #67054. Bug #67054 should be closed to this bug. Still see this with Feisty on initial chroot install.

Revision history for this message
Ralph Corderoy (ralph-inputplus) wrote :

I disagree. See my comment on Bug #67054. I think that's a general problem to do with what locales are defined whereas this one is specifically on sloccount's own code.

Revision history for this message
Scott Kitterman (kitterman) wrote :

Sorry about that. I agree that this is not the same as Bug #67054. I'm having some locale related issues in my Feisty chroot, but this isn't it.

I just did the short test cases above in the Feisty chroot and could not replicate the problem described.

I ran sloccount as described above not in the chroot (Edgy) and got the same locale problem.

If someone has an actual Feisty box available to try this outside a chroot, I think that would be useful. Perhaps this problem has been corrected.

Revision history for this message
Scott Kitterman (kitterman) wrote :

Never mind. Bug still exists in Feisty.

Revision history for this message
mustapha benali (mustap) wrote :

I'm using ubuntu gutsy. It seems sloccount is complaining about the dash in UTF-8. My env gives LANG=en_DK.UTF-8 but when I run sloccount with:

    $ LANG=en_DK.UTF8 sloccount .

I get no warning.

Revision history for this message
Ralph Corderoy (ralph-inputplus) wrote :

Hi mustapha, it isn't the dash that sloccount's complaining about. What sloccount is doing is to alter your $LANG if and only if it ends with ".UTF-8". This alteration causes problems in Perl. If you've previously changed your $LANG to end with ".UTF8" then you're avoiding sloccount's meddling.

Revision history for this message
Ralph Corderoy (ralph-inputplus) wrote :

Confirmed problem still exists in 7.10's sloccount 2.26-2 as described when bug was opened.

Revision history for this message
Daniel T Chen (crimsun) wrote :

Lowering severity due to apparent innocuous behaviour.

Changed in sloccount:
importance: Medium → Low
Revision history for this message
Colin Watson (cjwatson) wrote :

Looks like Debian bug 414656, which was fixed in sloccount 2.26-2.1 (also in Ubuntu 8.04).

Changed in sloccount:
status: Confirmed → Fix Released
Changed in sloccount:
status: Unknown → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.