Make apt completions useable once again

`apt-cache` is just so incredibly slow that filtering against the final results just doesn't cut it. Attempting to match against 'ac.*' (already taking advantage of changing short search terms into prefix-only matches) would take 35 seconds, all of bottlenecked before the filtering step. This change uses more of a heuristic to filter `apt-cache` results directly (before additional filtering) to speed things up. A variety of different limits from 100 to 5000 were timed and their result sets compared to see what ended up artificially limiting valid completions vs what took too long to be considered functional/usable and this is where we ended up.
2024-11-10 23:24:39 +00:00 · 2022-09-22 13:43:38 -05:00 · 2022-09-22 13:43:38 -05:00 · 09685c3682
commit 09685c3682
parent 6a93d58797
1 changed files with 34 additions and 18 deletions
--- a/share/functions/__fish_print_apt_packages.fish
+++ b/share/functions/__fish_print_apt_packages.fish
@ -7,28 +7,44 @@ function __fish_print_apt_packages
            return
    end
    type -q -f apt-cache || return 1
    if not set -q _flag_installed
-        # Do not generate the cache as apparently sometimes this is slow.
+        if test (string length (commandline -ct)) -lt 4
-        # http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=547550
+            # Only print prefix matches for shorter search strings
-        # (It is safe to use `sed -r` here as we are guaranteed to be on a GNU platform
+            __fish_apt_print_matches (commandline -ct)'.*'
-        # if apt-cache was found.)
+        else
-        # Uses the UTF-8/ASCII record separator (0x1A) character.
+            __fish_apt_print_matches '.*'(commandline -ct)'.*'
-        #
+        end
        # Note: This can include "Description:" fields which we need to include,
        # "Description-en_GB" (or another locale code) fields which we need to include
        # as well as "Description-md5" fields which we absolutely do *not* want to include
        # The regex doesn't allow numbers, so unless someone makes a hash algorithm without a number in the name,
        # we're safe. (yes, this should absolutely have a better format).
        #
        # aptitude has options that control the output formatting, but is orders of magnitude slower
        #
        # sed could probably do all of the heavy lifting here, but would be even less readable
        apt-cache --no-generate show '.*'(commandline -ct)'.*' 2>/dev/null | sed -r '/^(Package|Description-?[a-zA-Z_]*):/!d;s/Package: (.*)/\1\t/g;s/Description-?[^:]*: (.*)/\1\x1a\n/g' | string join "" | string replace --all --regex \x1a+ \n | uniq
        return 0
    else
        set -l packages (dpkg --get-selections | string replace -fr '(\S+)\s+install' "\$1" | string match -e (commandline -ct))
-        apt-cache --no-generate show $packages 2>/dev/null | sed -r '/^(Package|Description-?[a-zA-Z_]*):/!d;s/Package: (.*)/\1\t/g;s/Description-?[^:]*: (.*)/\1\x1a\n/g' | string join "" | string replace --all --regex \x1a+ \n | uniq
+        __fish_apt_print_matches $packages
        return 0
    end
 end
 function __fish_apt_print_matches
    type -q -f apt-cache || return 1
    # Do not generate the cache as apparently this sometimes be very slow.
    # http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=547550
    # (It is safe to use `sed -r` here as we are guaranteed to be on a GNU platform
    # if apt-cache was found.)
    # Uses the UTF-8/ASCII record separator (0x1A) character.
    #
    # Note: This can include "Description:" fields which we need to include,
    # "Description-en_GB" (or another locale code) fields which we need to include
    # as well as "Description-md5" fields which we absolutely do *not* want to include
    # The regex doesn't allow numbers, so unless someone makes a hash algorithm without a number in the name,
    # we're safe. (yes, this should absolutely have a better format).
    #
    # aptitude has options that control the output formatting, but is orders of magnitude slower
    #
    # We limit the number of results generated by `apt-cache` directly (prior to string manipulation
    # and deduplication) because it is the bottleneck; since we are limiting before more filtering,
    # we use a more generous limit than we otherwised would have. We don't use `string join`/`string
    # match` here because of excessive buffering.
    # The limit used was experimentally derived by attempting to balance `apt-cache` runtime against
    # the number of valid results omitted by reducing the limit for a variety of search types.
    # `sed` could probably do all of the heavy lifting here, but would be even less readable.
    apt-cache --no-generate show $argv 2>/dev/null | head -n2500 | sed -r '/^(Package|Description-?[a-zA-Z_]*):/!d;s/Package: (.*)/\1\t/g;s/Description-?[^:]*: (.*)/\1\x1a\n/g' | tr -d \n | tr \x1a+ \n | uniq
 end