Merge lp:~paelzer/ubuntu-manpage-repository/ubuntu-manpage-repository-speed-boost into lp:ubuntu-manpage-repository

Proposed by Christian Ehrhardt 
Status: Merged
Approved by: Christian Ehrhardt 
Approved revision: 244
Merged at revision: 244
Proposed branch: lp:~paelzer/ubuntu-manpage-repository/ubuntu-manpage-repository-speed-boost
Merge into: lp:ubuntu-manpage-repository
Diff against target: 263 lines (+47/-34)
3 files modified
bin/fetch-man-pages.sh (+12/-12)
bin/make-manpage-repo.sh (+29/-16)
bin/w3mman-to-html.pl (+6/-6)
To merge this branch: bzr merge lp:~paelzer/ubuntu-manpage-repository/ubuntu-manpage-repository-speed-boost
Reviewer Review Type Date Requested Status
Christian Ehrhardt  Approve
Paride Legovini lgtm Approve
Review via email: mp+456936@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Trying to help with fixing and debugging the slow execution in prodstack.
On one hand this will run the releases concurrently allowing it to make more use of the CPUs now given to us.

On the other hand adding timing into to the log will allow comparisons of this vs execution on other platforms where it seems faster.

A full run of noble+mantic is running locally atm, I'll provide a log later.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI: there is a bit of extra churn as I couldn't ignore the trailing whitespaces anymore :-)

Revision history for this message
Paride Legovini (paride) wrote :

Another LGTM level review, meaning that I didn't test any of this, however the changes do look good.

review: Approve (lgtm)
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

It built the same content just fine utilization gone up linearly as expected.

Two full releases in 18h
That formerly was more like 24h on my system being blocked by network.

The prod system is less network limited and should gain even more.

And either way, the timestamped logs help to track things better ...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thereby tests are good as well

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bin/fetch-man-pages.sh'
2--- bin/fetch-man-pages.sh 2018-06-05 23:34:38 +0000
3+++ bin/fetch-man-pages.sh 2023-12-06 08:10:05 +0000
4@@ -43,7 +43,7 @@
5 NAME_AND_VER=$(echo "$PKG" | sed "s/\.deb$//")
6 DEB="$TEMPDIR/$PKG"
7
8-echo "INFO: fetching: $PKGURL"
9+echo "INFO ($(date '+%H:%M:%S.%N')) - ${DIST}: fetching: $PKGURL"
10 curl --silent "$PKGURL" > "$DEB"
11
12 DESTDIR="$PUBLIC_HTML_DIR/manpages/$DIST"
13@@ -52,13 +52,13 @@
14 export W3MMAN_MAN='man --no-hyphenation'
15 export MAN_KEEP_FORMATTING=1
16
17-echo "INFO: Looking for manpages in [$DEB]"
18+echo "INFO ($(date '+%H:%M:%S.%N')) - ${DIST}: Looking for manpages in [$DEB]"
19 # The .*man bit is to handle postgres' inane manpage installation
20 man=$(dpkg-deb -c "$DEB" | grep -E " \./usr/share.*/man/.*\.[0-9][a-zA-Z0-9\.\-]*\.gz$" | sed -e "s/^.*\.\//\.\//" -e "s/ \-> /\->/")
21
22 # Exit immediately if this package does not contain manpages
23 if [ -z "$man" ]; then
24- echo "INFO: No manpages: [$DIST] [$PKG]"
25+ echo "INFO ($(date '+%H:%M:%S.%N')) - ${DIST}: No manpages: [$DIST] [$PKG]"
26 # Touch the cache file so we don't look again until package updated
27 sha1sum "$DEB" | awk '{ print $1 }' > "$DESTDIR/.cache/$NAME"
28 exit 0
29@@ -68,22 +68,22 @@
30
31 dpkg-deb -x "$DEB" "$TEMPDIR"
32 for i in $man; do
33- #printf "%s\n" "INFO: Considering entry [$i]"
34+ #printf "%s\n" "DEBUG: Considering entry [$i]"
35 i=$(printf "%s" "$i" | sed "s/^.*\.\///")
36 if printf "%s" "$i" | grep -qs "\->"; then
37 SYMLINK=1
38 symlink_src_html=$(printf "%s" "$i" | sed -e "s/^.*\->//" -e "s/\.gz$/\.html/")
39- i=$(printf "%s" "$i" | sed "s/\->.*$//")
40- #printf "%s\n" "INFO: [$i] is a symbolic link"
41+ i=$(printf "%s" "$i" | sed "s/\->.*$//")
42+ #printf "%s\n" "DEBUG: [$i] is a symbolic link"
43 else
44 SYMLINK=0
45 fi
46 manpage="$TEMPDIR/$i"
47 i=$(printf "%s" "$i" | sed -e "s/usr\/share.*\/man\///i" -e "s/\.gz$//")
48- #printf "%s\n" "INFO: Considering manpage [$i]"
49+ #printf "%s\n" "DEBUG: Considering manpage [$i]"
50 # shellcheck disable=SC2166
51 if [ ! -s "$manpage" -o -z "$i" ] && [ "$SYMLINK" = "0" ]; then
52- #printf "%s\n" "INFO: Skipping empty manpage [$manpage]"
53+ #printf "%s\n" "DEBUG: Skipping empty manpage [$manpage]"
54 continue
55 fi
56 out="$DESTDIR"/"$i".html
57@@ -91,12 +91,12 @@
58 mkdir -p "$(dirname "$out")" "$outgz" > /dev/null || true
59 if [ "$SYMLINK" = "1" ]; then
60 ln -f -s "$symlink_src_html" "$out"
61- printf "%s\n" "INFO: Created symlink [$out]"
62+ printf "%s\n" "INFO ($(date '+%H:%M:%S.%N')) - ${DIST}: Created symlink [$out]"
63 else
64 if LN=$(zcat "$manpage" | head -n1 | grep "^\.so "); then
65 LN=$(printf "%s" "$LN" | sed -e 's/^\.so /\.\.\//' -e 's/\/\.\.\//\//g' -e 's/$/\.html/')
66 ln -f -s "$LN" "$out"
67- printf "INFO: Created symlink [%s]" "$out"
68+ printf "INFO ($(date '+%H:%M:%S.%N')) - ${DIST}: Created symlink [%s]" "$out"
69 else
70 BODY=$(COLUMNS=100 /usr/lib/w3m/cgi-bin/w3mman2html.cgi "local=$manpage" | grep -A 1000000 "^<b>" | sed -e '/<\/body>/,+100 d' -e 's:^<b>\(.*\)</b>$:</pre><h4><b>\1</b></h4><pre>:g' -e 's:<a href="file\:///[^?]*?\([^(]*\)(\([^)]*\))">:<a href="../man\2/\1.\2.html">:g')
71 TITLE=$(printf "%s" "$BODY" | head -n2 | tail -n1 | sed "s/<[^>]\+>//g")
72@@ -110,7 +110,7 @@
73 $BODY
74 </pre><!--#include virtual='/below.html' -->" > "$out"
75
76- printf "%s\n" "INFO: Created manpage [$out]"
77+ printf "%s\n" "INFO ($(date '+%H:%M:%S.%N')) - ${DIST}: Created manpage [$out]"
78 fi
79 fi
80 mv -f "$manpage" "$outgz"
81@@ -121,7 +121,7 @@
82 done
83
84 # After extracting all manpages, cache the sha1sum, so we don't
85-# repeat the downloads
86+# repeat the downloads
87 sha1sum "$DEB" | awk '{ print $1 }' > "$DESTDIR/.cache/$NAME"
88
89 # In the case of freakish package permissions, fix them on rm failure.
90
91=== modified file 'bin/make-manpage-repo.sh'
92--- bin/make-manpage-repo.sh 2023-12-01 14:24:45 +0000
93+++ bin/make-manpage-repo.sh 2023-12-06 08:10:05 +0000
94@@ -2,23 +2,23 @@
95
96 ###############################################################################
97 # Copyright (C) 2008 Canonical Ltd.
98-#
99+#
100 # This code was originally written by Dustin Kirkland <kirkland@ubuntu.com>,
101 # based on a framework by Kees Cook <kees@ubuntu.com>.
102-#
103+#
104 # This program is free software: you can redistribute it and/or modify
105 # it under the terms of the GNU General Public License as published by
106 # the Free Software Foundation, either version 3 of the License, or
107 # (at your option) any later version.
108-#
109+#
110 # This program is distributed in the hope that it will be useful,
111 # but WITHOUT ANY WARRANTY; without even the implied warranty of
112 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
113 # GNU General Public License for more details.
114-#
115+#
116 # You should have received a copy of the GNU General Public License
117 # along with this program. If not, see <http://www.gnu.org/licenses/>.
118-#
119+#
120 # On Debian-based systems, the complete text of the GNU General Public
121 # License can be found in /usr/share/common-licenses/GPL-3
122 ###############################################################################
123@@ -75,27 +75,28 @@
124 fi
125 local deb="$1"
126 local sum="$2"
127+ local distnopocket="$3"
128 local name=$(basename "$deb" | awk -F_ '{print $1}')
129 existing_sum=$(cat "$PUBLIC_HTML_DIR/manpages/$dist/.cache/$name" 2>/dev/null)
130
131 # Take the first two digits of the existing_sum modulo 28 to
132 # compare to the current day of month.
133- #
134+ #
135 # Reasoning: this will invalidate the cache for everything ~
136 # once per month (days: 1-28)
137 day_mod=$((0x$(echo "$existing_sum" | cut -b 1-2)%27 + 1))
138 if [ "$day_mod" -eq "$(date +%d)" ]; then
139- echo "INFO: date_mod match, regnerating: $deb ($day_mod)"
140+ echo "INFO ($(date '+%H:%M:%S.%N')) - ${distnopocket}: date_mod match, regnerating: $deb ($day_mod)"
141 return 0
142 fi
143
144 # Of course, if the sum found in the packages file for this
145 # package does not equal the sum I have on disk, regenerate.
146 if [ "$existing_sum" = "$sum" ]; then
147- echo "INFO: cksum skip: $deb"
148+ echo "INFO ($(date '+%H:%M:%S.%N')) - ${distnopocket}: cksum skip: $deb"
149 return 1
150 else
151- echo "INFO: cksum mismatch: $deb"
152+ echo "INFO ($(date '+%H:%M:%S.%N')) - ${distnopocket}: cksum mismatch: $deb"
153 return 0
154 fi
155 }
156@@ -107,7 +108,7 @@
157 local deburl=$(get_deb_url "$deb")
158 # FIXME: the || true needs to bubble up to a list of things wrong obviously.
159 # shellcheck disable=SC2015
160- is_pkg_cache_invalid "$deb" "$sum" && "$DIR/fetch-man-pages.sh" "$distnopocket" "$deburl" || true
161+ is_pkg_cache_invalid "$deb" "$sum" "$distnopocket" && "$DIR/fetch-man-pages.sh" "$distnopocket" "$deburl" || true
162 }
163
164 link_en_locale() {
165@@ -135,8 +136,8 @@
166 return 0
167 }
168
169-declare -A pkg_handled
170-for dist in $DISTROS; do
171+handle_series() {
172+ local dist="${1}"
173 # On one hand in some cases we do not want to know the pocket, as it
174 # would show up in paths, URLs and bug links
175 distnopocket="${dist}"
176@@ -146,6 +147,7 @@
177 # It orders by likely most up-to-date pocket first and only re-renders if
178 # a newer version of the same source package is found later (even single
179 # Packages files can list the same source multiple times).
180+ declare -A pkg_handled
181 pkg_handled=()
182 for pocket in "-updates" "-security" ""; do
183 mkdir -p "$PUBLIC_HTML_DIR/manpages/$distnopocket/.cache" "$PUBLIC_HTML_DIR/manpages.gz/$distnopocket" || true
184@@ -153,7 +155,7 @@
185 for repo in $REPOS; do
186 for arch in $ARCH; do
187 file=$(get_packages_url "${dist}${pocket}" "$repo" "$arch")
188- echo "INFO: Packages.gz: $file"
189+ echo "INFO ($(date '+%H:%M:%S.%N')) - ${dist}: Packages.gz: $file"
190 plist=$(mktemp "/tmp/XXXXXXX.manpages.${dist}${pocket}.$repo.$arch.plist")
191 curl -s "$file" \
192 | gunzip -c \
193@@ -164,20 +166,31 @@
194 while read -r binpkg version deb sum; do
195 if dpkg --compare-versions "${version}" gt "${pkg_handled["$binpkg"]}"; then
196 if [[ -n "${pkg_handled[$binpkg]}" ]]; then
197- echo "INFO: binpkg: $binpkg ${version} > ${pkg_handled["$binpkg"]} (processing it again)"
198+ echo "INFO ($(date '+%H:%M:%S.%N')) - ${dist}: binpkg: $binpkg ${version} > ${pkg_handled["$binpkg"]} (processing it again)"
199 else
200- echo "INFO: First encounter of binpkg: $binpkg ${version} (processing)"
201+ echo "INFO ($(date '+%H:%M:%S.%N')) - ${dist}: First encounter of binpkg: $binpkg ${version} (processing)"
202 fi
203 pkg_handled["$binpkg"]="${version}"
204 handle_deb "$distnopocket" "$deb" "$sum"
205 else
206- echo "INFO: binpkg: $binpkg ${version} < ${pkg_handled["$binpkg"]} (not processing)"
207+ echo "INFO ($(date '+%H:%M:%S.%N')) - ${dist}: binpkg: $binpkg ${version} < ${pkg_handled["$binpkg"]} (not processing)"
208 fi
209 done < "${plist}"
210 rm -f "${plist}"
211 done
212 done
213 done
214+
215+}
216+
217+# Simple parallelization on the level of releases; they do not overlap
218+# in regard to directories/files, but doing so help to keep the network
219+# connection utilized as one can fetch while the other is converting.
220+# Furthermore it avoids that issues, or a lot of new content, in one release
221+# (e.g. -dev opened) will make the regular update on the others take ages.
222+for dist in $DISTROS; do
223+ handle_series "${dist}" &
224 done
225+wait
226
227 "$DIR/make-sitemaps.sh"
228
229=== modified file 'bin/w3mman-to-html.pl'
230--- bin/w3mman-to-html.pl 2009-02-09 21:37:49 +0000
231+++ bin/w3mman-to-html.pl 2023-12-06 08:10:05 +0000
232@@ -2,25 +2,25 @@
233
234 ##############################################################################
235 # This is the Ubuntu manpage repository generator and interface.
236-#
237+#
238 # Copyright (C) 2008 Canonical Ltd.
239-#
240+#
241 # This code was originally written by Dustin Kirkland <kirkland@ubuntu.com>,
242 # based on a framework by Kees Cook <kees@ubuntu.com>.
243-#
244+#
245 # This program is free software: you can redistribute it and/or modify
246 # it under the terms of the GNU General Public License as published by
247 # the Free Software Foundation, either version 3 of the License, or
248 # (at your option) any later version.
249-#
250+#
251 # This program is distributed in the hope that it will be useful,
252 # but WITHOUT ANY WARRANTY; without even the implied warranty of
253 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
254 # GNU General Public License for more details.
255-#
256+#
257 # You should have received a copy of the GNU General Public License
258 # along with this program. If not, see <http://www.gnu.org/licenses/>.
259-#
260+#
261 # On Debian-based systems, the complete text of the GNU General Public
262 # License can be found in /usr/share/common-licenses/GPL-3
263 ##############################################################################

Subscribers

People subscribed via source and target branches