lp:debian/collectl
- Get this branch:
- bzr branch lp:debian/collectl
Branch information
- Owner:
- Ubuntu branches
- Status:
- Development
Recent revisions
- 13. By Troy Heber
-
Add Replaces and Breaks collectl-utils dependenices to ensure proper
upgrade. (Closes #785038) - 12. By Troy Heber
-
* rare, but if selecting processes by parent pid or command name, it's
possible when a new pid is seen that it's already exited by the time
we try to read /proc/pid/stat, and it will return an undef value
* finally cleaned up code to read speeds from /sys to use internal
cat() to avoid misc 'Invalid Arg' errors. also fixed cat() to return
null when nothing read.
* added mlx5 as a new type of IB device name [thanks fred]
* get lustre version a different way because format changed [thanks Jeff]
also note that native lustre support in collectl is going away in
summer of 2015!
* lexpr was incorrectly reporting sys/user cpu details in the wrong
place and as a result showed up before the timestamp in some cases
* colmux has now been moved to the collectl package, release notes
to be continued here going forward
* COLMUX: -oT -test wasn't including time column in help output whereas -od
and -oD did
* COLMUX: new switch: -retaddr tells collectl to connect back to this
address rather than the one colmux chooses by default which is default
interface's addr
* COLMUX: change in way return address is determined because RHEL 7 changed
the format of the ifconfig output, changing Bcast to broadcast and
dropping addr: - 11. By Troy Heber
-
* New upstream release 3.7.4
* typo in $netFilt (should have been $netFiltIgnore) preventing any
network from being included in totals when *-netfilt specified, but
also made me rethink the way summaries are calculalted (see next item)
* 2 more network types were discovered to be causing double counting
in summaries, specifically vibr and vnets. since the exceptions occur
at a far greater rate it was decided that rather than have a default list
of those network types to exclude from the summaries, it makes far more
sense to have a list with those that SHOULD be included as well as a
mechanism for handling new summary types. This led to a reinterpretation
of *-netfilt. see the man page and Network.html for more details
* removed references to XC, which is no longer supported
* use abs to generate path to exe, simpler and cleaner [thanks Jeff]
* extended the way formatit is loaded and changed the order that collectl.conf
is discovered, noting it should only effect people actually modifying
code or moving things to non*standard locations. it IS now documented
in Startup and Initialization. [thanks again, Jeff]
* set max lines to read for diskstats to 20000 for those with real large
disk counts where 10000 wasn't enough [thanks jean*marc]
* very rare, but if doing timing and no hires present, $microInterval gets set
to zero and the division by the interval blows up
* finally remembered to remove -G and --group which were replaced by --tworaw
* clarified description of -s defaults in manpage as well as adding a
pointer to the online documentation on file naming [thanks rob]
* added additional error message for when files match selection string
but none contain *date-time.raw [thanks rob]
* add support for newer kernel CPU stats: guest, guest_nice
* now that 2.4 kernels no longer supported, make sure CPU stats contain
at least softirq field
* change headers with % to PCT and remove space, also remove whitespace in
interrupt detail output for type and devices columns [thanks rob]
* new switch --ALL, selects summary and detail data for all subsystems
[thanks rob]
* new switch --full, selects --verbose, always includes RECORD separator and
includes which subsystem data is being reported with each interval in
the RECORD header to make parsing easier for rob [thanks rob]
* if you DON'T collect tcp data but want to play it back, variables weren't
initialized to 0 and you get uninit variable warnings
* if disk name ends with a digit (can only happen when manually changing
disk filtering in either collectl.conf or with *-rawdskfilt, don't
include in disk summary stats [thanks guy]
* discovered a place where some numa counters go backwards! This MUST be a
kernel bug but inserted code to mitigate and warn if it happens [thanks rob]
* removed a line of code incorrectly initializing $HCAPosts[] because that is
now a doubly indexed array [thanks Jeff]
* discovered tap devices don't set default network speeds correctly and can
cause 'bogus' messages so use default max
* make 'Intrpt' header mixed case for CPU details, not all upper
* new 3rd option for --top, allows one to display the top-n processes sorted
by any column vertically, similar to playback mode, which in some cases
can be very handy
* if only 1 tcp subtype selected with --tcpfilt, was printing column
header of ERR and I've no idea why. Changed it to TCP.
* I didn't like --tcpfilt I by itself forcing --verbose so changed it to just
being in the *-tcpfilt string will force it and updated man page as well
since *-tcpfilt wasn't even documented in it
* As warned I'm in the process of direct support for lustre and you should
contact Peter Piela at TeraScala to get a copy of his lustre plugin.
Therefore *sl is being removed as a default. To get collectl's native
lustre support in daemon mode, you must add it to *s. Native support will
be completely removed around the summer of 2015. - 10. By Troy Heber
-
* Support for infiniband extended counters also allows multiple copies to
run
* Removed myrinet and quadrics support. Also dropped nvidia and sexpr as
promised
* New switch --cpufilt, allows display a subset of CPUs for machines with
high cpu counts - 9. By Troy Heber
-
* typo in network plot header loop resulted in infinite loop
* remove $int/secs from numa hit rate calc AND add more precision to its
output [thanks stig] - 8. By Troy Heber
-
* new flag $exportComm must be set in gexpr/ganglia so that they won't
generate an error if run without -f or -A [thanks tom]
* new switch: --intfilt allows filtering of interrupts
* always log messages of type F/E to syslog in daemon mode even if -m is not
set [thanks again, tom]
* wasn't dealing correctly with missing whitespace after network name in
/proc/dev/net in initRecord() [thanks andy]
* updated init.d script for suse per the maintainer's instructions [thanks
tom]
* extra spaces were being printed in plot mode for tpc stats
* added entry to envrules.std to deal with intel Phi Co-Processor
* debian init.d script now does 'exit 1' if status reports 'not running'
* rawnetignore switch wasn't working correctly
* found/fixed some subtle problems with --procanalyze as well as some
cleanup
* need to ignore first sample after initializing summary arrays
* need to init summary hashes for thrutime and accumT because get uninit var
in print routine is only a single process entry
* found a typo in procAnalyze() to a $usecs which wasn't being used!
* added error check to make sure --procanalyze with -P requires -s
* added a little more debugging output for -d128
* discovered dynamic disk/network detail names for interactive mode were not
being reported correctly. sounds a lot worse than it is because this is
typically not done very often nor are disks/networks very dynamic except
in large, virtualized environments such as clouds
* add to list of devices to exlude from network summary data: tap, dp and
nl, which are associated with openstack cinder. remember you can always
add more to that list with --netfilt
* $lastHour was never referenced and dayInit() called every time a log was
created so fix logic to update $lastHour correctly AND call initDay() one
time and do it before newLog() called.
* closed a couple of file handles that were left open and reportedly causing
some defunct processes with -sx. [thanks brian]
* fixed bug in lustre stats recording [thanks roland]
* clarified --showsubopts text about disk and network filters in that they
apply to both summary and detail data output
* fixed problem with --import and --stats
* --statsopt a didn't work because when changed some internal logic missed
changing a test of $timestampFlag to $timestampCounter[$rawPFlag] and so
now $timestampCount can be removed entirely
* clear $firstpass after 1st pass during playback
* make sure filename initialized before calling loadConfig so if there is an
error logsys() doesn't get an undefined var warning
* to be safe, remove any quotes on net/dsk filters in case included by
mistake in DaemonCommands string
* tightened up tests to see if daemonized collectl already running
* if no hiRes::Time, fudge the value of $microInterval based on -i [thanks
Domi]
* new --procOpt k, removes known shells from process listing with -sZ,
currently set to /bin.sh, /usr/bin/perl, /usr/bin/python and python - 7. By Troy Heber
-
* set network speed for vnets to '??' so they'll use $DefNetSpeed for bogus
checks since the kernel hardcodes then to 10 which makes no sense
* code to print brief totals for -st wasn't include in a conditional so
you'd always get extra columns of output when *st was NOT included
* needed to initialize numaMem->{lock} for cases where user selects -sM and
no data collected
* added randomize and align switches to graphite module and align switch
only to gexpr.ph since gexpr uses current times in messages
* added escape switch to graphite to allow one to change the dots in
hostname
* change to suse startup script to look in /usr/sbin instead of /usr/bin
* added debug mask of 16 to lexpr to help test x= switch
* can now use commas OR colons with lexpr,x= though commas preferred and
colons may go away
* added disk qlen, wait, svctime and util to lexpr
* it was pointed out that in getExec() I'm initializing $oneline instead of
$oneLine
* for debian init script, reverse logic for running start-stop-deamon with
*test so it will work with buxybox too
* new switch: --cpuopts z (the only option) which suppresses lines of idle
activity from detailed stats
* when purging imported detail plot data, only do so if file had changed
* when playing back multiple files, do NOT try to process a new file that
has not yet seen the end of the current interval ($timestampCound==1)
* fix SuSE init.d script - 6. By Troy Heber
-
* was not updating new major/minor numbers for a disk when they changed so
got stuck in a loop which kept disk maj/min changed every interval
* new -r option to purge older .log files, def=12 months
* fixed DaemonCommands to preserver order so you can override anything by
adding on the right side of it
* new 'align' switch added to lexpr so default is NOT to align to whole min
* for -sE do not convert negative temperatures [thanks kevin]
* add error handling to 'print' in logmsg
* vmstat needs to set $sameColsFlag to make header pagination work with -p
* new graphite switch f, use fqdn for host [thanks Bryant]
* when lexpr called with x= it needs to set summary data flag in case
nothing else is being reported, otherwise timestamps print after the data
instead of before
* lexpr typos: $tcpError, $udpError and $icmpError should not be singular
* timestamp wasn't being updated for -sD because it was specified in
$dskdetFormat
* explicitly close logs before opening new ones in the hope that the
occasionally corrputed file problems with gunzip will go away
* tcp 'last' variables weren't correctly initialized and so was printing bad
data on first line of output
* modified lexpr, gexpr and graphite such that when i= is used, to align
sending on whole minute boundaries which is particularly useful with rrd
* merged snmp and tcp stats under -st and changed export routines to show
summary error counts for *st. removed snmp.ph from kit. summaries
(based on *-tcpfilt) as does brief format
* correctly deal with dynamic disks/networks instead of pulling names from
header, get them from raw file when discovered
* simplify code that deals with changed disks, now that more cleanly handled
* replace runtime calls to 'die' with calls to syslog
* readS was still left in INSTALL! [thanks gavin]
* added system boot time to header
* new values for procopts s/S to show process start times
* graphite.ph now prints loadavgs to 2 decimal places [thanks brandon]
* extended lexpr,x= functionality to also call an init routine
* initFormat now returns entire header!
* if nothing returned from an import module on a printVerbose or printPlot
call for detail data do not call printText() since it will screw up colmux
and plot detail file with empty lines
* new --rawdskignore AND --rawnetignore because sometimes easier to specify
a pattern of things to ignore
* removed restriction for running as root to get network speeds via ethtool
by looking in /sys/devices now
* slight change to way the disk queue depth is being calculated to provide
better accuracy [thanks ken]
* new --dskopts f reports disk details with some fractional values
* always calculate disk details even when only doing -sd since a plugin
might want to get at them
* new graphite switch b, will cause output to be prefaced by a specified
string [thanks justin]
* slight change to s= functionality for lexpr, gexpr and graphite: no
arguments will disable all but imported data, allowing you do log *s
data to files sending over socket
* need to give other routines (specifically --import) access to the lexpr
interval by declaring it with 'our'
* had to change the way lexpr/gexpr/graphite do min/max/avg since they were
using a positional index to track intermediate values when clearly a hash
is required for cases where not all intervals contain same elements
* -P and --plotflag had different effects on $headerRepeat because prior to
calling getopts I was peeking ahead for an ARG of *P and not including
--plo [thanks devilized]
* gexpr module has wrong units for network packets and with 'g' modes had to
multiply kb counts by 1024 to convert to bytes, which is the units for
these that ganglia uses [thanks, trevor]
* clean up handling of missing ipmitool and root access [thanks trevor]
* finally remembered to remove readS from the kit [thanks joseba]
* when filtering a process by the fill path with 'f', never include collectl
itself
* documented utime in manpage
* if -i0 set $DefNetSpeed to 0 so we don't throw any 'bogus' network speed
messages
* new switches, --rawdiskfilt and --rawnetfilt, allow one to filter
disks/nets at time of data collection so they never appear in raw file
* added call to IntervalEnd() (if it exists) for --import
* add option timeout to --address when connecting back to explicit address
* moved code that deal with fractional intervals and !HiRes closer to other
interval processing
* added 'strict' to snmp module as well as 'help' option: snmp,h
* fixed problems with --import
* if --import is used to generate detail data with -f and -P not
specified, collectl throws an error trying to close the detail log which
clearly hasn't been created
* when using interval other than the defaul AND -s-all, blank lines are
printed for standard intervals which don't have imported data. this
applied to brief, verbose AND detail data
* added some more systems to envrules: Proliant SL230/SL250 Gen 8 and
SE1170s - 5. By Troy Heber
-
* New upstream release 3.6.3
* finally remembered to remove readS from the kit
* when filtering a process by the fill path with 'f', never include collec
itself
* documented utime in manpage
* if -i0 set $DefNetSpeed to 0 so we don't throw any 'bogus' network speed
messages
* new switches, --rawdiskfilt and --rawnetfilt, allow one to filter
disks/nets at time of data collection so they never appear in raw file
* added call to IntervalEnd() (if it exists) for --import
* add option timeout to --address when connecting back to explicit address
* moved code that deal with fractional intervals and !HiRes closer to othe
interval processing
* added 'strict' to snmp module as well as 'help' option: snmp,h
* fixed problems with --import
* if --import is used to generate detail data with -f and -P not specifi
collectl throws an error trying to close the detail log which clearly
hasn't been created
* when using interval other than the defaul AND -s-all, blank lines are
printed for standard intervals which don't have imported data. this
applied to brief, verbose AND detail data
* added some more systems to envrules: Proliant SL230 /SL250 Gen 8 and SE1
* fixed serious bug introduced a number of versions ago, which during
playback of multiple files and specifying date/time caused collectl to
continue reading first timestamp in each file and generating 'uninit
variable' errors. not harmful, but inefficient and ugly!
* added exit codes of 0/1 to all the exit points
* moved help text for --stats from basic to extended
* found $file=~/rawp/ near line 1440 clearing $1, $2 and $3 and so $prefix
$fileDate and $fileTime were not getting set correctly
* clarified 'No files processed' message to be a little more explicit
* broaden where collectl looks for lustre modules and also fixed a typo of
$lustops to $lustOpts
* procAnalize incorrectly totaling fault totals instead if interval values
* optimize new pid processing with --procfilt
* add new pids to pidSkip{} as appropriate
* undef pidSkip{} whenever pids wrap
* added hello.ph and graphite.ph to INSTALL
* was incorrectly setting DiskFilterFlag to 1 all the time, even when not
overridden in collectl.conf. while not a bug, it does cause a slight
increase in overhead - 4. By Troy Heber
-
* New upstream release 3.6.1
* removed --ssh switch, making detecting the parent going away the default
behavior
* added switch --nohup which will allows collectl to continue running if
parent exits, which is more consistent with how *-nohup itself works
* in logmsg ONLY write to STDERR when attached to a terminal
* serious problem when using --tworaw and a flush interval < that for the
process data occurs because newer versions of zlib will fail if you try to
flush to a file that has not been updated. since I don't know which
version of zlib this started happening in and feel this is a relatively
rare case, we're just rejecting this combination regardless of zlib
version. I do have an email out to the zlib author and if I ever get to
the bottom of this will be ble to relax this restriction.
* use getimeofday() for timestamps in logmsg()
* enhanced timing parameters when -i0 used. if specified user 2nd/3rd
parameter as ratio to first making it possibily to measure loads of
different rations other than 1:6:30.
* discovered --import was missing from man pages and so added it
* when playing back a file, set $verboseFlag if user specified --verbose but
NEVER clear it
* experimental import: snmp, see http://collectl. sourceforge. net/Snmp. html
for details
* printf in record() blows up if formatting chars in command string!
[thanks mike]
* added accumulated time as a --top sort option
* changed formatting of accumulated time in process output to simply be
hh:mm:ss or mm::ss.ss when less than an hour to be more in line with top
* new swithes, --stats and --sumstats report stats in brief mode, the latter
only summary data
* during playback need to check $numProcessed before reporting none were
processed
* stats reporting logic wasn't processing 1st file, checking for
$numProcessed>1
* removed -oA and replaced/extended functionality with --stats/--statopts
* wasn't allowing --procopts playing back process data unless -sZ which was
silly
* subtle problem found: illegal 'last' in pidNew() because file disappeared
between initial -e and trying to open it a few usecs later! can't exit a
sub via last so changed to return(0)
* our friends at OFED slightly changed the output of perfquery again [thanks
frederic]
Branch metadata
- Branch format:
- Branch format 7
- Repository format:
- Bazaar repository format 2a (needs bzr 1.16 or later)