Merge lp:~paelzer/ubuntu-manpage-repository/ubuntu-manpage-repository-speed-boost into lp:ubuntu-manpage-repository

Proposed by Christian Ehrhardt 
Status: Merged
Approved by: Christian Ehrhardt 
Approved revision: 244
Merged at revision: 244
Proposed branch: lp:~paelzer/ubuntu-manpage-repository/ubuntu-manpage-repository-speed-boost
Merge into: lp:ubuntu-manpage-repository
Diff against target: 263 lines (+47/-34)
3 files modified
bin/fetch-man-pages.sh (+12/-12)
bin/make-manpage-repo.sh (+29/-16)
bin/w3mman-to-html.pl (+6/-6)
To merge this branch: bzr merge lp:~paelzer/ubuntu-manpage-repository/ubuntu-manpage-repository-speed-boost
Reviewer Review Type Date Requested Status
Christian Ehrhardt  Approve
Paride Legovini lgtm Approve
Review via email: mp+456936@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Trying to help with fixing and debugging the slow execution in prodstack.
On one hand this will run the releases concurrently allowing it to make more use of the CPUs now given to us.

On the other hand adding timing into to the log will allow comparisons of this vs execution on other platforms where it seems faster.

A full run of noble+mantic is running locally atm, I'll provide a log later.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI: there is a bit of extra churn as I couldn't ignore the trailing whitespaces anymore :-)

Revision history for this message
Paride Legovini (paride) wrote :

Another LGTM level review, meaning that I didn't test any of this, however the changes do look good.

review: Approve (lgtm)
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

It built the same content just fine utilization gone up linearly as expected.

Two full releases in 18h
That formerly was more like 24h on my system being blocked by network.

The prod system is less network limited and should gain even more.

And either way, the timestamped logs help to track things better ...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thereby tests are good as well

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'bin/fetch-man-pages.sh'
--- bin/fetch-man-pages.sh 2018-06-05 23:34:38 +0000
+++ bin/fetch-man-pages.sh 2023-12-06 08:10:05 +0000
@@ -43,7 +43,7 @@
43NAME_AND_VER=$(echo "$PKG" | sed "s/\.deb$//")43NAME_AND_VER=$(echo "$PKG" | sed "s/\.deb$//")
44DEB="$TEMPDIR/$PKG"44DEB="$TEMPDIR/$PKG"
4545
46echo "INFO: fetching: $PKGURL"46echo "INFO ($(date '+%H:%M:%S.%N')) - ${DIST}: fetching: $PKGURL"
47curl --silent "$PKGURL" > "$DEB"47curl --silent "$PKGURL" > "$DEB"
4848
49DESTDIR="$PUBLIC_HTML_DIR/manpages/$DIST"49DESTDIR="$PUBLIC_HTML_DIR/manpages/$DIST"
@@ -52,13 +52,13 @@
52export W3MMAN_MAN='man --no-hyphenation'52export W3MMAN_MAN='man --no-hyphenation'
53export MAN_KEEP_FORMATTING=153export MAN_KEEP_FORMATTING=1
5454
55echo "INFO: Looking for manpages in [$DEB]"55echo "INFO ($(date '+%H:%M:%S.%N')) - ${DIST}: Looking for manpages in [$DEB]"
56# The .*man bit is to handle postgres' inane manpage installation56# The .*man bit is to handle postgres' inane manpage installation
57man=$(dpkg-deb -c "$DEB" | grep -E " \./usr/share.*/man/.*\.[0-9][a-zA-Z0-9\.\-]*\.gz$" | sed -e "s/^.*\.\//\.\//" -e "s/ \-> /\->/")57man=$(dpkg-deb -c "$DEB" | grep -E " \./usr/share.*/man/.*\.[0-9][a-zA-Z0-9\.\-]*\.gz$" | sed -e "s/^.*\.\//\.\//" -e "s/ \-> /\->/")
5858
59# Exit immediately if this package does not contain manpages59# Exit immediately if this package does not contain manpages
60if [ -z "$man" ]; then60if [ -z "$man" ]; then
61 echo "INFO: No manpages: [$DIST] [$PKG]"61 echo "INFO ($(date '+%H:%M:%S.%N')) - ${DIST}: No manpages: [$DIST] [$PKG]"
62 # Touch the cache file so we don't look again until package updated62 # Touch the cache file so we don't look again until package updated
63 sha1sum "$DEB" | awk '{ print $1 }' > "$DESTDIR/.cache/$NAME"63 sha1sum "$DEB" | awk '{ print $1 }' > "$DESTDIR/.cache/$NAME"
64 exit 064 exit 0
@@ -68,22 +68,22 @@
6868
69dpkg-deb -x "$DEB" "$TEMPDIR"69dpkg-deb -x "$DEB" "$TEMPDIR"
70for i in $man; do70for i in $man; do
71 #printf "%s\n" "INFO: Considering entry [$i]"71 #printf "%s\n" "DEBUG: Considering entry [$i]"
72 i=$(printf "%s" "$i" | sed "s/^.*\.\///")72 i=$(printf "%s" "$i" | sed "s/^.*\.\///")
73 if printf "%s" "$i" | grep -qs "\->"; then73 if printf "%s" "$i" | grep -qs "\->"; then
74 SYMLINK=174 SYMLINK=1
75 symlink_src_html=$(printf "%s" "$i" | sed -e "s/^.*\->//" -e "s/\.gz$/\.html/")75 symlink_src_html=$(printf "%s" "$i" | sed -e "s/^.*\->//" -e "s/\.gz$/\.html/")
76 i=$(printf "%s" "$i" | sed "s/\->.*$//") 76 i=$(printf "%s" "$i" | sed "s/\->.*$//")
77 #printf "%s\n" "INFO: [$i] is a symbolic link"77 #printf "%s\n" "DEBUG: [$i] is a symbolic link"
78 else78 else
79 SYMLINK=079 SYMLINK=0
80 fi80 fi
81 manpage="$TEMPDIR/$i"81 manpage="$TEMPDIR/$i"
82 i=$(printf "%s" "$i" | sed -e "s/usr\/share.*\/man\///i" -e "s/\.gz$//")82 i=$(printf "%s" "$i" | sed -e "s/usr\/share.*\/man\///i" -e "s/\.gz$//")
83 #printf "%s\n" "INFO: Considering manpage [$i]"83 #printf "%s\n" "DEBUG: Considering manpage [$i]"
84 # shellcheck disable=SC216684 # shellcheck disable=SC2166
85 if [ ! -s "$manpage" -o -z "$i" ] && [ "$SYMLINK" = "0" ]; then85 if [ ! -s "$manpage" -o -z "$i" ] && [ "$SYMLINK" = "0" ]; then
86 #printf "%s\n" "INFO: Skipping empty manpage [$manpage]"86 #printf "%s\n" "DEBUG: Skipping empty manpage [$manpage]"
87 continue87 continue
88 fi88 fi
89 out="$DESTDIR"/"$i".html89 out="$DESTDIR"/"$i".html
@@ -91,12 +91,12 @@
91 mkdir -p "$(dirname "$out")" "$outgz" > /dev/null || true91 mkdir -p "$(dirname "$out")" "$outgz" > /dev/null || true
92 if [ "$SYMLINK" = "1" ]; then92 if [ "$SYMLINK" = "1" ]; then
93 ln -f -s "$symlink_src_html" "$out"93 ln -f -s "$symlink_src_html" "$out"
94 printf "%s\n" "INFO: Created symlink [$out]"94 printf "%s\n" "INFO ($(date '+%H:%M:%S.%N')) - ${DIST}: Created symlink [$out]"
95 else95 else
96 if LN=$(zcat "$manpage" | head -n1 | grep "^\.so "); then96 if LN=$(zcat "$manpage" | head -n1 | grep "^\.so "); then
97 LN=$(printf "%s" "$LN" | sed -e 's/^\.so /\.\.\//' -e 's/\/\.\.\//\//g' -e 's/$/\.html/')97 LN=$(printf "%s" "$LN" | sed -e 's/^\.so /\.\.\//' -e 's/\/\.\.\//\//g' -e 's/$/\.html/')
98 ln -f -s "$LN" "$out"98 ln -f -s "$LN" "$out"
99 printf "INFO: Created symlink [%s]" "$out"99 printf "INFO ($(date '+%H:%M:%S.%N')) - ${DIST}: Created symlink [%s]" "$out"
100 else100 else
101 BODY=$(COLUMNS=100 /usr/lib/w3m/cgi-bin/w3mman2html.cgi "local=$manpage" | grep -A 1000000 "^<b>" | sed -e '/<\/body>/,+100 d' -e 's:^<b>\(.*\)</b>$:</pre><h4><b>\1</b></h4><pre>:g' -e 's:<a href="file\:///[^?]*?\([^(]*\)(\([^)]*\))">:<a href="../man\2/\1.\2.html">:g')101 BODY=$(COLUMNS=100 /usr/lib/w3m/cgi-bin/w3mman2html.cgi "local=$manpage" | grep -A 1000000 "^<b>" | sed -e '/<\/body>/,+100 d' -e 's:^<b>\(.*\)</b>$:</pre><h4><b>\1</b></h4><pre>:g' -e 's:<a href="file\:///[^?]*?\([^(]*\)(\([^)]*\))">:<a href="../man\2/\1.\2.html">:g')
102 TITLE=$(printf "%s" "$BODY" | head -n2 | tail -n1 | sed "s/<[^>]\+>//g")102 TITLE=$(printf "%s" "$BODY" | head -n2 | tail -n1 | sed "s/<[^>]\+>//g")
@@ -110,7 +110,7 @@
110$BODY110$BODY
111</pre><!--#include virtual='/below.html' -->" > "$out"111</pre><!--#include virtual='/below.html' -->" > "$out"
112112
113 printf "%s\n" "INFO: Created manpage [$out]"113 printf "%s\n" "INFO ($(date '+%H:%M:%S.%N')) - ${DIST}: Created manpage [$out]"
114 fi114 fi
115 fi115 fi
116 mv -f "$manpage" "$outgz"116 mv -f "$manpage" "$outgz"
@@ -121,7 +121,7 @@
121done121done
122122
123# After extracting all manpages, cache the sha1sum, so we don't123# After extracting all manpages, cache the sha1sum, so we don't
124# repeat the downloads 124# repeat the downloads
125sha1sum "$DEB" | awk '{ print $1 }' > "$DESTDIR/.cache/$NAME"125sha1sum "$DEB" | awk '{ print $1 }' > "$DESTDIR/.cache/$NAME"
126126
127# In the case of freakish package permissions, fix them on rm failure.127# In the case of freakish package permissions, fix them on rm failure.
128128
=== modified file 'bin/make-manpage-repo.sh'
--- bin/make-manpage-repo.sh 2023-12-01 14:24:45 +0000
+++ bin/make-manpage-repo.sh 2023-12-06 08:10:05 +0000
@@ -2,23 +2,23 @@
22
3###############################################################################3###############################################################################
4# Copyright (C) 2008 Canonical Ltd.4# Copyright (C) 2008 Canonical Ltd.
5# 5#
6# This code was originally written by Dustin Kirkland <kirkland@ubuntu.com>,6# This code was originally written by Dustin Kirkland <kirkland@ubuntu.com>,
7# based on a framework by Kees Cook <kees@ubuntu.com>.7# based on a framework by Kees Cook <kees@ubuntu.com>.
8# 8#
9# This program is free software: you can redistribute it and/or modify9# This program is free software: you can redistribute it and/or modify
10# it under the terms of the GNU General Public License as published by10# it under the terms of the GNU General Public License as published by
11# the Free Software Foundation, either version 3 of the License, or11# the Free Software Foundation, either version 3 of the License, or
12# (at your option) any later version.12# (at your option) any later version.
13# 13#
14# This program is distributed in the hope that it will be useful,14# This program is distributed in the hope that it will be useful,
15# but WITHOUT ANY WARRANTY; without even the implied warranty of15# but WITHOUT ANY WARRANTY; without even the implied warranty of
16# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the16# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
17# GNU General Public License for more details.17# GNU General Public License for more details.
18# 18#
19# You should have received a copy of the GNU General Public License19# You should have received a copy of the GNU General Public License
20# along with this program. If not, see <http://www.gnu.org/licenses/>.20# along with this program. If not, see <http://www.gnu.org/licenses/>.
21# 21#
22# On Debian-based systems, the complete text of the GNU General Public22# On Debian-based systems, the complete text of the GNU General Public
23# License can be found in /usr/share/common-licenses/GPL-323# License can be found in /usr/share/common-licenses/GPL-3
24###############################################################################24###############################################################################
@@ -75,27 +75,28 @@
75 fi75 fi
76 local deb="$1"76 local deb="$1"
77 local sum="$2"77 local sum="$2"
78 local distnopocket="$3"
78 local name=$(basename "$deb" | awk -F_ '{print $1}')79 local name=$(basename "$deb" | awk -F_ '{print $1}')
79 existing_sum=$(cat "$PUBLIC_HTML_DIR/manpages/$dist/.cache/$name" 2>/dev/null)80 existing_sum=$(cat "$PUBLIC_HTML_DIR/manpages/$dist/.cache/$name" 2>/dev/null)
8081
81 # Take the first two digits of the existing_sum modulo 28 to82 # Take the first two digits of the existing_sum modulo 28 to
82 # compare to the current day of month.83 # compare to the current day of month.
83 # 84 #
84 # Reasoning: this will invalidate the cache for everything ~85 # Reasoning: this will invalidate the cache for everything ~
85 # once per month (days: 1-28)86 # once per month (days: 1-28)
86 day_mod=$((0x$(echo "$existing_sum" | cut -b 1-2)%27 + 1))87 day_mod=$((0x$(echo "$existing_sum" | cut -b 1-2)%27 + 1))
87 if [ "$day_mod" -eq "$(date +%d)" ]; then88 if [ "$day_mod" -eq "$(date +%d)" ]; then
88 echo "INFO: date_mod match, regnerating: $deb ($day_mod)"89 echo "INFO ($(date '+%H:%M:%S.%N')) - ${distnopocket}: date_mod match, regnerating: $deb ($day_mod)"
89 return 090 return 0
90 fi91 fi
9192
92 # Of course, if the sum found in the packages file for this93 # Of course, if the sum found in the packages file for this
93 # package does not equal the sum I have on disk, regenerate.94 # package does not equal the sum I have on disk, regenerate.
94 if [ "$existing_sum" = "$sum" ]; then95 if [ "$existing_sum" = "$sum" ]; then
95 echo "INFO: cksum skip: $deb"96 echo "INFO ($(date '+%H:%M:%S.%N')) - ${distnopocket}: cksum skip: $deb"
96 return 197 return 1
97 else98 else
98 echo "INFO: cksum mismatch: $deb"99 echo "INFO ($(date '+%H:%M:%S.%N')) - ${distnopocket}: cksum mismatch: $deb"
99 return 0100 return 0
100 fi101 fi
101}102}
@@ -107,7 +108,7 @@
107 local deburl=$(get_deb_url "$deb")108 local deburl=$(get_deb_url "$deb")
108 # FIXME: the || true needs to bubble up to a list of things wrong obviously.109 # FIXME: the || true needs to bubble up to a list of things wrong obviously.
109 # shellcheck disable=SC2015110 # shellcheck disable=SC2015
110 is_pkg_cache_invalid "$deb" "$sum" && "$DIR/fetch-man-pages.sh" "$distnopocket" "$deburl" || true111 is_pkg_cache_invalid "$deb" "$sum" "$distnopocket" && "$DIR/fetch-man-pages.sh" "$distnopocket" "$deburl" || true
111}112}
112113
113link_en_locale() {114link_en_locale() {
@@ -135,8 +136,8 @@
135 return 0136 return 0
136}137}
137138
138declare -A pkg_handled139handle_series() {
139for dist in $DISTROS; do140 local dist="${1}"
140 # On one hand in some cases we do not want to know the pocket, as it141 # On one hand in some cases we do not want to know the pocket, as it
141 # would show up in paths, URLs and bug links142 # would show up in paths, URLs and bug links
142 distnopocket="${dist}"143 distnopocket="${dist}"
@@ -146,6 +147,7 @@
146 # It orders by likely most up-to-date pocket first and only re-renders if147 # It orders by likely most up-to-date pocket first and only re-renders if
147 # a newer version of the same source package is found later (even single148 # a newer version of the same source package is found later (even single
148 # Packages files can list the same source multiple times).149 # Packages files can list the same source multiple times).
150 declare -A pkg_handled
149 pkg_handled=()151 pkg_handled=()
150 for pocket in "-updates" "-security" ""; do152 for pocket in "-updates" "-security" ""; do
151 mkdir -p "$PUBLIC_HTML_DIR/manpages/$distnopocket/.cache" "$PUBLIC_HTML_DIR/manpages.gz/$distnopocket" || true153 mkdir -p "$PUBLIC_HTML_DIR/manpages/$distnopocket/.cache" "$PUBLIC_HTML_DIR/manpages.gz/$distnopocket" || true
@@ -153,7 +155,7 @@
153 for repo in $REPOS; do155 for repo in $REPOS; do
154 for arch in $ARCH; do156 for arch in $ARCH; do
155 file=$(get_packages_url "${dist}${pocket}" "$repo" "$arch")157 file=$(get_packages_url "${dist}${pocket}" "$repo" "$arch")
156 echo "INFO: Packages.gz: $file"158 echo "INFO ($(date '+%H:%M:%S.%N')) - ${dist}: Packages.gz: $file"
157 plist=$(mktemp "/tmp/XXXXXXX.manpages.${dist}${pocket}.$repo.$arch.plist")159 plist=$(mktemp "/tmp/XXXXXXX.manpages.${dist}${pocket}.$repo.$arch.plist")
158 curl -s "$file" \160 curl -s "$file" \
159 | gunzip -c \161 | gunzip -c \
@@ -164,20 +166,31 @@
164 while read -r binpkg version deb sum; do166 while read -r binpkg version deb sum; do
165 if dpkg --compare-versions "${version}" gt "${pkg_handled["$binpkg"]}"; then167 if dpkg --compare-versions "${version}" gt "${pkg_handled["$binpkg"]}"; then
166 if [[ -n "${pkg_handled[$binpkg]}" ]]; then168 if [[ -n "${pkg_handled[$binpkg]}" ]]; then
167 echo "INFO: binpkg: $binpkg ${version} > ${pkg_handled["$binpkg"]} (processing it again)"169 echo "INFO ($(date '+%H:%M:%S.%N')) - ${dist}: binpkg: $binpkg ${version} > ${pkg_handled["$binpkg"]} (processing it again)"
168 else170 else
169 echo "INFO: First encounter of binpkg: $binpkg ${version} (processing)"171 echo "INFO ($(date '+%H:%M:%S.%N')) - ${dist}: First encounter of binpkg: $binpkg ${version} (processing)"
170 fi172 fi
171 pkg_handled["$binpkg"]="${version}"173 pkg_handled["$binpkg"]="${version}"
172 handle_deb "$distnopocket" "$deb" "$sum"174 handle_deb "$distnopocket" "$deb" "$sum"
173 else175 else
174 echo "INFO: binpkg: $binpkg ${version} < ${pkg_handled["$binpkg"]} (not processing)"176 echo "INFO ($(date '+%H:%M:%S.%N')) - ${dist}: binpkg: $binpkg ${version} < ${pkg_handled["$binpkg"]} (not processing)"
175 fi177 fi
176 done < "${plist}"178 done < "${plist}"
177 rm -f "${plist}"179 rm -f "${plist}"
178 done180 done
179 done181 done
180 done182 done
183
184}
185
186# Simple parallelization on the level of releases; they do not overlap
187# in regard to directories/files, but doing so help to keep the network
188# connection utilized as one can fetch while the other is converting.
189# Furthermore it avoids that issues, or a lot of new content, in one release
190# (e.g. -dev opened) will make the regular update on the others take ages.
191for dist in $DISTROS; do
192 handle_series "${dist}" &
181done193done
194wait
182195
183"$DIR/make-sitemaps.sh"196"$DIR/make-sitemaps.sh"
184197
=== modified file 'bin/w3mman-to-html.pl'
--- bin/w3mman-to-html.pl 2009-02-09 21:37:49 +0000
+++ bin/w3mman-to-html.pl 2023-12-06 08:10:05 +0000
@@ -2,25 +2,25 @@
22
3##############################################################################3##############################################################################
4# This is the Ubuntu manpage repository generator and interface.4# This is the Ubuntu manpage repository generator and interface.
5# 5#
6# Copyright (C) 2008 Canonical Ltd.6# Copyright (C) 2008 Canonical Ltd.
7# 7#
8# This code was originally written by Dustin Kirkland <kirkland@ubuntu.com>,8# This code was originally written by Dustin Kirkland <kirkland@ubuntu.com>,
9# based on a framework by Kees Cook <kees@ubuntu.com>.9# based on a framework by Kees Cook <kees@ubuntu.com>.
10# 10#
11# This program is free software: you can redistribute it and/or modify11# This program is free software: you can redistribute it and/or modify
12# it under the terms of the GNU General Public License as published by12# it under the terms of the GNU General Public License as published by
13# the Free Software Foundation, either version 3 of the License, or13# the Free Software Foundation, either version 3 of the License, or
14# (at your option) any later version.14# (at your option) any later version.
15# 15#
16# This program is distributed in the hope that it will be useful,16# This program is distributed in the hope that it will be useful,
17# but WITHOUT ANY WARRANTY; without even the implied warranty of17# but WITHOUT ANY WARRANTY; without even the implied warranty of
18# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the18# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
19# GNU General Public License for more details.19# GNU General Public License for more details.
20# 20#
21# You should have received a copy of the GNU General Public License21# You should have received a copy of the GNU General Public License
22# along with this program. If not, see <http://www.gnu.org/licenses/>.22# along with this program. If not, see <http://www.gnu.org/licenses/>.
23# 23#
24# On Debian-based systems, the complete text of the GNU General Public24# On Debian-based systems, the complete text of the GNU General Public
25# License can be found in /usr/share/common-licenses/GPL-325# License can be found in /usr/share/common-licenses/GPL-3
26##############################################################################26##############################################################################

Subscribers

People subscribed via source and target branches