lp:debian/collectl

Created by Ubuntu Package Importer and last modified
Get this branch:
bzr branch lp:debian/collectl
Members of Ubuntu branches can upload to this branch. Log in for directions.

Related bugs

Related blueprints

Branch information

Owner:
Ubuntu branches
Status:
Development

Recent revisions

13. By Troy Heber

Add Replaces and Breaks collectl-utils dependenices to ensure proper
upgrade. (Closes #785038)

12. By Troy Heber

* rare, but if selecting processes by parent pid or command name, it's
  possible when a new pid is seen that it's already exited by the time
  we try to read /proc/pid/stat, and it will return an undef value
* finally cleaned up code to read speeds from /sys to use internal
  cat() to avoid misc 'Invalid Arg' errors. also fixed cat() to return
  null when nothing read.
* added mlx5 as a new type of IB device name [thanks fred]
* get lustre version a different way because format changed [thanks Jeff]
  also note that native lustre support in collectl is going away in
  summer of 2015!
* lexpr was incorrectly reporting sys/user cpu details in the wrong
  place and as a result showed up before the timestamp in some cases
* colmux has now been moved to the collectl package, release notes
  to be continued here going forward
* COLMUX: -oT -test wasn't including time column in help output whereas -od
  and -oD did
* COLMUX: new switch: -retaddr tells collectl to connect back to this
  address rather than the one colmux chooses by default which is default
  interface's addr
* COLMUX: change in way return address is determined because RHEL 7 changed
  the format of the ifconfig output, changing Bcast to broadcast and
  dropping addr:

11. By Troy Heber

* New upstream release 3.7.4
* typo in $netFilt (should have been $netFiltIgnore) preventing any
  network from being included in totals when *-netfilt specified, but
  also made me rethink the way summaries are calculalted (see next item)
* 2 more network types were discovered to be causing double counting
  in summaries, specifically vibr and vnets. since the exceptions occur
  at a far greater rate it was decided that rather than have a default list
  of those network types to exclude from the summaries, it makes far more
  sense to have a list with those that SHOULD be included as well as a
  mechanism for handling new summary types. This led to a reinterpretation
  of *-netfilt. see the man page and Network.html for more details
* removed references to XC, which is no longer supported
* use abs to generate path to exe, simpler and cleaner [thanks Jeff]
* extended the way formatit is loaded and changed the order that collectl.conf
  is discovered, noting it should only effect people actually modifying
  code or moving things to non*standard locations. it IS now documented
  in Startup and Initialization. [thanks again, Jeff]
* set max lines to read for diskstats to 20000 for those with real large
  disk counts where 10000 wasn't enough [thanks jean*marc]
* very rare, but if doing timing and no hires present, $microInterval gets set
  to zero and the division by the interval blows up
* finally remembered to remove -G and --group which were replaced by --tworaw
* clarified description of -s defaults in manpage as well as adding a
  pointer to the online documentation on file naming [thanks rob]
* added additional error message for when files match selection string
  but none contain *date-time.raw [thanks rob]
* add support for newer kernel CPU stats: guest, guest_nice
* now that 2.4 kernels no longer supported, make sure CPU stats contain
  at least softirq field
* change headers with % to PCT and remove space, also remove whitespace in
  interrupt detail output for type and devices columns [thanks rob]
* new switch --ALL, selects summary and detail data for all subsystems
  [thanks rob]
* new switch --full, selects --verbose, always includes RECORD separator and
  includes which subsystem data is being reported with each interval in
  the RECORD header to make parsing easier for rob [thanks rob]
* if you DON'T collect tcp data but want to play it back, variables weren't
  initialized to 0 and you get uninit variable warnings
* if disk name ends with a digit (can only happen when manually changing
  disk filtering in either collectl.conf or with *-rawdskfilt, don't
  include in disk summary stats [thanks guy]
* discovered a place where some numa counters go backwards! This MUST be a
  kernel bug but inserted code to mitigate and warn if it happens [thanks rob]
* removed a line of code incorrectly initializing $HCAPosts[] because that is
  now a doubly indexed array [thanks Jeff]
* discovered tap devices don't set default network speeds correctly and can
  cause 'bogus' messages so use default max
* make 'Intrpt' header mixed case for CPU details, not all upper
* new 3rd option for --top, allows one to display the top-n processes sorted
  by any column vertically, similar to playback mode, which in some cases
  can be very handy
* if only 1 tcp subtype selected with --tcpfilt, was printing column
  header of ERR and I've no idea why. Changed it to TCP.
* I didn't like --tcpfilt I by itself forcing --verbose so changed it to just
  being in the *-tcpfilt string will force it and updated man page as well
  since *-tcpfilt wasn't even documented in it
* As warned I'm in the process of direct support for lustre and you should
  contact Peter Piela at TeraScala to get a copy of his lustre plugin.
  Therefore *sl is being removed as a default. To get collectl's native
  lustre support in daemon mode, you must add it to *s. Native support will
  be completely removed around the summer of 2015.

10. By Troy Heber

* Support for infiniband extended counters also allows multiple copies to
  run
* Removed myrinet and quadrics support. Also dropped nvidia and sexpr as
  promised
* New switch --cpufilt, allows display a subset of CPUs for machines with
  high cpu counts

9. By Troy Heber

* typo in network plot header loop resulted in infinite loop
* remove $int/secs from numa hit rate calc AND add more precision to its
  output [thanks stig]

8. By Troy Heber

* new flag $exportComm must be set in gexpr/ganglia so that they won't
  generate an error if run without -f or -A [thanks tom]
* new switch: --intfilt allows filtering of interrupts
* always log messages of type F/E to syslog in daemon mode even if -m is not
  set [thanks again, tom]
* wasn't dealing correctly with missing whitespace after network name in
  /proc/dev/net in initRecord() [thanks andy]
* updated init.d script for suse per the maintainer's instructions [thanks
  tom]
* extra spaces were being printed in plot mode for tpc stats
* added entry to envrules.std to deal with intel Phi Co-Processor
* debian init.d script now does 'exit 1' if status reports 'not running'
* rawnetignore switch wasn't working correctly
* found/fixed some subtle problems with --procanalyze as well as some
  cleanup
* need to ignore first sample after initializing summary arrays
* need to init summary hashes for thrutime and accumT because get uninit var
  in print routine is only a single process entry
* found a typo in procAnalyze() to a $usecs which wasn't being used!
* added error check to make sure --procanalyze with -P requires -s
* added a little more debugging output for -d128
* discovered dynamic disk/network detail names for interactive mode were not
  being reported correctly. sounds a lot worse than it is because this is
  typically not done very often nor are disks/networks very dynamic except
  in large, virtualized environments such as clouds
* add to list of devices to exlude from network summary data: tap, dp and
  nl, which are associated with openstack cinder. remember you can always
  add more to that list with --netfilt
* $lastHour was never referenced and dayInit() called every time a log was
  created so fix logic to update $lastHour correctly AND call initDay() one
  time and do it before newLog() called.
* closed a couple of file handles that were left open and reportedly causing
  some defunct processes with -sx. [thanks brian]
* fixed bug in lustre stats recording [thanks roland]
* clarified --showsubopts text about disk and network filters in that they
  apply to both summary and detail data output
* fixed problem with --import and --stats
* --statsopt a didn't work because when changed some internal logic missed
  changing a test of $timestampFlag to $timestampCounter[$rawPFlag] and so
  now $timestampCount can be removed entirely
* clear $firstpass after 1st pass during playback
* make sure filename initialized before calling loadConfig so if there is an
  error logsys() doesn't get an undefined var warning
* to be safe, remove any quotes on net/dsk filters in case included by
  mistake in DaemonCommands string
* tightened up tests to see if daemonized collectl already running
* if no hiRes::Time, fudge the value of $microInterval based on -i [thanks
  Domi]
* new --procOpt k, removes known shells from process listing with -sZ,
  currently set to /bin.sh, /usr/bin/perl, /usr/bin/python and python

7. By Troy Heber

* set network speed for vnets to '??' so they'll use $DefNetSpeed for bogus
  checks since the kernel hardcodes then to 10 which makes no sense
* code to print brief totals for -st wasn't include in a conditional so
  you'd always get extra columns of output when *st was NOT included
* needed to initialize numaMem->{lock} for cases where user selects -sM and
  no data collected
* added randomize and align switches to graphite module and align switch
  only to gexpr.ph since gexpr uses current times in messages
* added escape switch to graphite to allow one to change the dots in
  hostname
* change to suse startup script to look in /usr/sbin instead of /usr/bin
* added debug mask of 16 to lexpr to help test x= switch
* can now use commas OR colons with lexpr,x= though commas preferred and
  colons may go away
* added disk qlen, wait, svctime and util to lexpr
* it was pointed out that in getExec() I'm initializing $oneline instead of
  $oneLine
* for debian init script, reverse logic for running start-stop-deamon with
  *test so it will work with buxybox too
* new switch: --cpuopts z (the only option) which suppresses lines of idle
  activity from detailed stats
* when purging imported detail plot data, only do so if file had changed
* when playing back multiple files, do NOT try to process a new file that
  has not yet seen the end of the current interval ($timestampCound==1)
* fix SuSE init.d script

6. By Troy Heber

* was not updating new major/minor numbers for a disk when they changed so
  got stuck in a loop which kept disk maj/min changed every interval
* new -r option to purge older .log files, def=12 months
* fixed DaemonCommands to preserver order so you can override anything by
  adding on the right side of it
* new 'align' switch added to lexpr so default is NOT to align to whole min
* for -sE do not convert negative temperatures [thanks kevin]
* add error handling to 'print' in logmsg
* vmstat needs to set $sameColsFlag to make header pagination work with -p
* new graphite switch f, use fqdn for host [thanks Bryant]
* when lexpr called with x= it needs to set summary data flag in case
  nothing else is being reported, otherwise timestamps print after the data
  instead of before
* lexpr typos: $tcpError, $udpError and $icmpError should not be singular
* timestamp wasn't being updated for -sD because it was specified in
  $dskdetFormat
* explicitly close logs before opening new ones in the hope that the
  occasionally corrputed file problems with gunzip will go away
* tcp 'last' variables weren't correctly initialized and so was printing bad
  data on first line of output
* modified lexpr, gexpr and graphite such that when i= is used, to align
  sending on whole minute boundaries which is particularly useful with rrd
* merged snmp and tcp stats under -st and changed export routines to show
  summary error counts for *st. removed snmp.ph from kit. summaries
  (based on *-tcpfilt) as does brief format
* correctly deal with dynamic disks/networks instead of pulling names from
  header, get them from raw file when discovered
* simplify code that deals with changed disks, now that more cleanly handled
* replace runtime calls to 'die' with calls to syslog
* readS was still left in INSTALL! [thanks gavin]
* added system boot time to header
* new values for procopts s/S to show process start times
* graphite.ph now prints loadavgs to 2 decimal places [thanks brandon]
* extended lexpr,x= functionality to also call an init routine
* initFormat now returns entire header!
* if nothing returned from an import module on a printVerbose or printPlot
  call for detail data do not call printText() since it will screw up colmux
  and plot detail file with empty lines
* new --rawdskignore AND --rawnetignore because sometimes easier to specify
  a pattern of things to ignore
* removed restriction for running as root to get network speeds via ethtool
  by looking in /sys/devices now
* slight change to way the disk queue depth is being calculated to provide
  better accuracy [thanks ken]
* new --dskopts f reports disk details with some fractional values
* always calculate disk details even when only doing -sd since a plugin
  might want to get at them
* new graphite switch b, will cause output to be prefaced by a specified
  string [thanks justin]
* slight change to s= functionality for lexpr, gexpr and graphite: no
  arguments will disable all but imported data, allowing you do log *s
  data to files sending over socket
* need to give other routines (specifically --import) access to the lexpr
  interval by declaring it with 'our'
* had to change the way lexpr/gexpr/graphite do min/max/avg since they were
  using a positional index to track intermediate values when clearly a hash
  is required for cases where not all intervals contain same elements
* -P and --plotflag had different effects on $headerRepeat because prior to
  calling getopts I was peeking ahead for an ARG of *P and not including
  --plo [thanks devilized]
* gexpr module has wrong units for network packets and with 'g' modes had to
  multiply kb counts by 1024 to convert to bytes, which is the units for
  these that ganglia uses [thanks, trevor]
* clean up handling of missing ipmitool and root access [thanks trevor]
* finally remembered to remove readS from the kit [thanks joseba]
* when filtering a process by the fill path with 'f', never include collectl
  itself
* documented utime in manpage
* if -i0 set $DefNetSpeed to 0 so we don't throw any 'bogus' network speed
  messages
* new switches, --rawdiskfilt and --rawnetfilt, allow one to filter
  disks/nets at time of data collection so they never appear in raw file
* added call to IntervalEnd() (if it exists) for --import
* add option timeout to --address when connecting back to explicit address
* moved code that deal with fractional intervals and !HiRes closer to other
  interval processing
* added 'strict' to snmp module as well as 'help' option: snmp,h
* fixed problems with --import
  * if --import is used to generate detail data with -f and -P not
    specified, collectl throws an error trying to close the detail log which
    clearly hasn't been created
  * when using interval other than the defaul AND -s-all, blank lines are
    printed for standard intervals which don't have imported data. this
    applied to brief, verbose AND detail data
* added some more systems to envrules: Proliant SL230/SL250 Gen 8 and
  SE1170s

5. By Troy Heber

* New upstream release 3.6.3
* finally remembered to remove readS from the kit
* when filtering a process by the fill path with 'f', never include collec
  itself
* documented utime in manpage
* if -i0 set $DefNetSpeed to 0 so we don't throw any 'bogus' network speed
  messages
* new switches, --rawdiskfilt and --rawnetfilt, allow one to filter
  disks/nets at time of data collection so they never appear in raw file
* added call to IntervalEnd() (if it exists) for --import
* add option timeout to --address when connecting back to explicit address
* moved code that deal with fractional intervals and !HiRes closer to othe
  interval processing
* added 'strict' to snmp module as well as 'help' option: snmp,h
* fixed problems with --import
* if --import is used to generate detail data with -f and -P not specifi
  collectl throws an error trying to close the detail log which clearly
  hasn't been created
* when using interval other than the defaul AND -s-all, blank lines are
  printed for standard intervals which don't have imported data. this
  applied to brief, verbose AND detail data
* added some more systems to envrules: Proliant SL230 /SL250 Gen 8 and SE1
* fixed serious bug introduced a number of versions ago, which during
  playback of multiple files and specifying date/time caused collectl to
  continue reading first timestamp in each file and generating 'uninit
  variable' errors. not harmful, but inefficient and ugly!
* added exit codes of 0/1 to all the exit points
* moved help text for --stats from basic to extended
* found $file=~/rawp/ near line 1440 clearing $1, $2 and $3 and so $prefix
  $fileDate and $fileTime were not getting set correctly
* clarified 'No files processed' message to be a little more explicit
* broaden where collectl looks for lustre modules and also fixed a typo of
  $lustops to $lustOpts
* procAnalize incorrectly totaling fault totals instead if interval values
* optimize new pid processing with --procfilt
* add new pids to pidSkip{} as appropriate
* undef pidSkip{} whenever pids wrap
* added hello.ph and graphite.ph to INSTALL
* was incorrectly setting DiskFilterFlag to 1 all the time, even when not
  overridden in collectl.conf. while not a bug, it does cause a slight
  increase in overhead

4. By Troy Heber

* New upstream release 3.6.1
* removed --ssh switch, making detecting the parent going away the default
  behavior
* added switch --nohup which will allows collectl to continue running if
  parent exits, which is more consistent with how *-nohup itself works
* in logmsg ONLY write to STDERR when attached to a terminal
* serious problem when using --tworaw and a flush interval < that for the
  process data occurs because newer versions of zlib will fail if you try to
  flush to a file that has not been updated. since I don't know which
  version of zlib this started happening in and feel this is a relatively
  rare case, we're just rejecting this combination regardless of zlib
  version. I do have an email out to the zlib author and if I ever get to
  the bottom of this will be ble to relax this restriction.
* use getimeofday() for timestamps in logmsg()
* enhanced timing parameters when -i0 used. if specified user 2nd/3rd
  parameter as ratio to first making it possibily to measure loads of
  different rations other than 1:6:30.
* discovered --import was missing from man pages and so added it
* when playing back a file, set $verboseFlag if user specified --verbose but
  NEVER clear it
* experimental import: snmp, see http://collectl.sourceforge.net/Snmp.html
  for details
* printf in record() blows up if formatting chars in command string!
  [thanks mike]
* added accumulated time as a --top sort option
* changed formatting of accumulated time in process output to simply be
  hh:mm:ss or mm::ss.ss when less than an hour to be more in line with top
* new swithes, --stats and --sumstats report stats in brief mode, the latter
  only summary data
* during playback need to check $numProcessed before reporting none were
  processed
* stats reporting logic wasn't processing 1st file, checking for
  $numProcessed>1
* removed -oA and replaced/extended functionality with --stats/--statopts
* wasn't allowing --procopts playing back process data unless -sZ which was
  silly
* subtle problem found: illegal 'last' in pidNew() because file disappeared
  between initial -e and trying to open it a few usecs later! can't exit a
  sub via last so changed to return(0)
* our friends at OFED slightly changed the output of perfquery again [thanks
  frederic]

Branch metadata

Branch format:
Branch format 7
Repository format:
Bazaar repository format 2a (needs bzr 1.16 or later)
This branch contains Public information 
Everyone can see this information.

Subscribers