Stringify apt completions again

Commit 09685c3682 tried making the apt
completions faster by doing two things:

1. Introduce a limiting "head"
2. Re-replace our "string" usage with tr

Unfortunately, in doing so it introduced a few issues:

1. The "tr" had a dangling "+" so it cut apart package
   descriptions that contained a "+".
   This caused e.g. "a C++ library" to generate another completion
   candidate, "library".
2. In reusing "tr" it probably reintroduced #8575,
   as tr is not 8-bit-clean.
3. It filtered too early, on the raw apt-cache output,
   which caused it to fill up with long descriptions.
   So e.g. for "texlive" it would only generate 10 completions,
   where it should have matched 54 packages.

Because most of the speedup is in the "head" stopping early, we
instead go back to the old string way, but introduce a limiting "head"
after the "sed" (which will have removed everything but the package
name line and the first line of the description)

In my tests this is about ~10% slower than doing head early and using
tr, but it's more correct.

Admittedly I haven't been able to reproduce the 35s scenario that
09685 talks about, but the most likely cause of that is *apt-cache*
being slow - I don't see how string can be that much slower on another
system - and so it will most likely also be fixed by doing head here.

Future possibilities here include:

1. Using "apt-cache search --names-only", which gives a much nicer
format (but only for non-installed packages - the search strings are
apparently ANDed?)
2. Switching to `string split`, possibly using NUL and using `string
split0`?
3. Introducing a `string --null-in` switch so we can get by with one
`string`
4. (multi-threaded execution so the `string`s run in parallel)
This commit is contained in:
Fabian Boehm 2022-09-23 15:30:42 +02:00
parent c90ac7bf7f
commit b88b257726

View file

@ -7,44 +7,31 @@ function __fish_print_apt_packages
return
end
type -q -f apt-cache || return 1
if not set -q _flag_installed
if test (string length (commandline -ct)) -lt 4
# Only print prefix matches for shorter search strings
__fish_apt_print_matches (commandline -ct)'.*'
else
__fish_apt_print_matches '.*'(commandline -ct)'.*'
end
# Do not generate the cache as apparently sometimes this is slow.
# http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=547550
# (It is safe to use `sed -r` here as we are guaranteed to be on a GNU platform
# if apt-cache was found.)
# Uses the UTF-8/ASCII record separator (0x1A) character.
#
# Note: This can include "Description:" fields which we need to include,
# "Description-en_GB" (or another locale code) fields which we need to include
# as well as "Description-md5" fields which we absolutely do *not* want to include
# The regex doesn't allow numbers, so unless someone makes a hash algorithm without a number in the name,
# we're safe. (yes, this should absolutely have a better format).
#
# aptitude has options that control the output formatting, but is orders of magnitude slower
#
# sed could probably do all of the heavy lifting here, but would be even less readable
#
# The `head -n2500` causes us to stop once we have 2500 lines. We do it after the `sed` because
# Debian package descriptions can be extremely long - texlive-latex-extra has about 2700 lines in Debian 11.
apt-cache --no-generate show '.*'(commandline -ct)'.*' 2>/dev/null | sed -r '/^(Package|Description-?[a-zA-Z_]*):/!d;s/Package: (.*)/\1\t/g;s/Description-?[^:]*: (.*)/\1\x1a\n/g' | head -n 2500 | string join "" | string replace --all --regex \x1a+ \n | uniq
return 0
else
set -l packages (dpkg --get-selections | string replace -fr '(\S+)\s+install' "\$1" | string match -e (commandline -ct))
__fish_apt_print_matches $packages
apt-cache --no-generate show $packages 2>/dev/null | sed -r '/^(Package|Description-?[a-zA-Z_]*):/!d;s/Package: (.*)/\1\t/g;s/Description-?[^:]*: (.*)/\1\x1a\n/g' | head -n 2500 | string join "" | string replace --all --regex \x1a+ \n | uniq
return 0
end
end
function __fish_apt_print_matches
type -q -f apt-cache || return 1
# Do not generate the cache as apparently this sometimes be very slow.
# http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=547550
# (It is safe to use `sed -r` here as we are guaranteed to be on a GNU platform
# if apt-cache was found.)
# Uses the UTF-8/ASCII record separator (0x1A) character.
#
# Note: This can include "Description:" fields which we need to include,
# "Description-en_GB" (or another locale code) fields which we need to include
# as well as "Description-md5" fields which we absolutely do *not* want to include
# The regex doesn't allow numbers, so unless someone makes a hash algorithm without a number in the name,
# we're safe. (yes, this should absolutely have a better format).
#
# aptitude has options that control the output formatting, but is orders of magnitude slower
#
# We limit the number of results generated by `apt-cache` directly (prior to string manipulation
# and deduplication) because it is the bottleneck; since we are limiting before more filtering,
# we use a more generous limit than we otherwised would have. We don't use `string join`/`string
# match` here because of excessive buffering.
# The limit used was experimentally derived by attempting to balance `apt-cache` runtime against
# the number of valid results omitted by reducing the limit for a variety of search types.
# `sed` could probably do all of the heavy lifting here, but would be even less readable.
apt-cache --no-generate show $argv 2>/dev/null | head -n2500 | sed -r '/^(Package|Description-?[a-zA-Z_]*):/!d;s/Package: (.*)/\1\t/g;s/Description-?[^:]*: (.*)/\1\x1a\n/g' | tr -d \n | tr \x1a+ \n | uniq
end