Make apt completions useable once again

`apt-cache` is just so incredibly slow that filtering against the final results
just doesn't cut it. Attempting to match against 'ac.*' (already taking
advantage of changing short search terms into prefix-only matches) would take
35 seconds, all of bottlenecked before the filtering step. This change uses more
of a heuristic to filter `apt-cache` results directly (before additional
filtering) to speed things up.

A variety of different limits from 100 to 5000 were timed and their result sets
compared to see what ended up artificially limiting valid completions vs what
took too long to be considered functional/usable and this is where we ended up.
This commit is contained in:
Mahmoud Al-Qudsi 2022-09-22 13:43:38 -05:00
parent 6a93d58797
commit 09685c3682

View file

@ -7,28 +7,44 @@ function __fish_print_apt_packages
return return
end end
type -q -f apt-cache || return 1
if not set -q _flag_installed if not set -q _flag_installed
# Do not generate the cache as apparently sometimes this is slow. if test (string length (commandline -ct)) -lt 4
# http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=547550 # Only print prefix matches for shorter search strings
# (It is safe to use `sed -r` here as we are guaranteed to be on a GNU platform __fish_apt_print_matches (commandline -ct)'.*'
# if apt-cache was found.) else
# Uses the UTF-8/ASCII record separator (0x1A) character. __fish_apt_print_matches '.*'(commandline -ct)'.*'
# end
# Note: This can include "Description:" fields which we need to include,
# "Description-en_GB" (or another locale code) fields which we need to include
# as well as "Description-md5" fields which we absolutely do *not* want to include
# The regex doesn't allow numbers, so unless someone makes a hash algorithm without a number in the name,
# we're safe. (yes, this should absolutely have a better format).
#
# aptitude has options that control the output formatting, but is orders of magnitude slower
#
# sed could probably do all of the heavy lifting here, but would be even less readable
apt-cache --no-generate show '.*'(commandline -ct)'.*' 2>/dev/null | sed -r '/^(Package|Description-?[a-zA-Z_]*):/!d;s/Package: (.*)/\1\t/g;s/Description-?[^:]*: (.*)/\1\x1a\n/g' | string join "" | string replace --all --regex \x1a+ \n | uniq
return 0 return 0
else else
set -l packages (dpkg --get-selections | string replace -fr '(\S+)\s+install' "\$1" | string match -e (commandline -ct)) set -l packages (dpkg --get-selections | string replace -fr '(\S+)\s+install' "\$1" | string match -e (commandline -ct))
apt-cache --no-generate show $packages 2>/dev/null | sed -r '/^(Package|Description-?[a-zA-Z_]*):/!d;s/Package: (.*)/\1\t/g;s/Description-?[^:]*: (.*)/\1\x1a\n/g' | string join "" | string replace --all --regex \x1a+ \n | uniq __fish_apt_print_matches $packages
return 0 return 0
end end
end end
function __fish_apt_print_matches
type -q -f apt-cache || return 1
# Do not generate the cache as apparently this sometimes be very slow.
# http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=547550
# (It is safe to use `sed -r` here as we are guaranteed to be on a GNU platform
# if apt-cache was found.)
# Uses the UTF-8/ASCII record separator (0x1A) character.
#
# Note: This can include "Description:" fields which we need to include,
# "Description-en_GB" (or another locale code) fields which we need to include
# as well as "Description-md5" fields which we absolutely do *not* want to include
# The regex doesn't allow numbers, so unless someone makes a hash algorithm without a number in the name,
# we're safe. (yes, this should absolutely have a better format).
#
# aptitude has options that control the output formatting, but is orders of magnitude slower
#
# We limit the number of results generated by `apt-cache` directly (prior to string manipulation
# and deduplication) because it is the bottleneck; since we are limiting before more filtering,
# we use a more generous limit than we otherwised would have. We don't use `string join`/`string
# match` here because of excessive buffering.
# The limit used was experimentally derived by attempting to balance `apt-cache` runtime against
# the number of valid results omitted by reducing the limit for a variety of search types.
# `sed` could probably do all of the heavy lifting here, but would be even less readable.
apt-cache --no-generate show $argv 2>/dev/null | head -n2500 | sed -r '/^(Package|Description-?[a-zA-Z_]*):/!d;s/Package: (.*)/\1\t/g;s/Description-?[^:]*: (.*)/\1\x1a\n/g' | tr -d \n | tr \x1a+ \n | uniq
end