Stringify apt completions again

Commit 09685c3682 tried making the apt completions faster by doing two things: 1. Introduce a limiting "head" 2. Re-replace our "string" usage with tr Unfortunately, in doing so it introduced a few issues: 1. The "tr" had a dangling "+" so it cut apart package descriptions that contained a "+". This caused e.g. "a C++ library" to generate another completion candidate, "library". 2. In reusing "tr" it probably reintroduced #8575, as tr is not 8-bit-clean. 3. It filtered too early, on the raw apt-cache output, which caused it to fill up with long descriptions. So e.g. for "texlive" it would only generate 10 completions, where it should have matched 54 packages. Because most of the speedup is in the "head" stopping early, we instead go back to the old string way, but introduce a limiting "head" after the "sed" (which will have removed everything but the package name line and the first line of the description) In my tests this is about ~10% slower than doing head early and using tr, but it's more correct. Admittedly I haven't been able to reproduce the 35s scenario that 09685 talks about, but the most likely cause of that is *apt-cache* being slow - I don't see how string can be that much slower on another system - and so it will most likely also be fixed by doing head here. Future possibilities here include: 1. Using "apt-cache search --names-only", which gives a much nicer format (but only for non-installed packages - the search strings are apparently ANDed?) 2. Switching to `string split`, possibly using NUL and using `string split0`? 3. Introducing a `string --null-in` switch so we can get by with one `string` 4. (multi-threaded execution so the `string`s run in parallel)
2024-09-20 22:42:04 +00:00 · 2022-09-23 15:30:42 +02:00 · 2022-09-23 15:30:42 +02:00 · b88b257726
commit b88b257726
parent c90ac7bf7f
1 changed files with 21 additions and 34 deletions
--- a/share/functions/__fish_print_apt_packages.fish
+++ b/share/functions/__fish_print_apt_packages.fish
@ -7,44 +7,31 @@ function __fish_print_apt_packages
            return
    end

+    type -q -f apt-cache || return 1
    if not set -q _flag_installed
-        if test (string length (commandline -ct)) -lt 4
-            # Only print prefix matches for shorter search strings
-            __fish_apt_print_matches (commandline -ct)'.*'
-        else
-            __fish_apt_print_matches '.*'(commandline -ct)'.*'
-        end
+        # Do not generate the cache as apparently sometimes this is slow.
+        # http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=547550
+        # (It is safe to use `sed -r` here as we are guaranteed to be on a GNU platform
+        # if apt-cache was found.)
+        # Uses the UTF-8/ASCII record separator (0x1A) character.
+        #
+        # Note: This can include "Description:" fields which we need to include,
+        # "Description-en_GB" (or another locale code) fields which we need to include
+        # as well as "Description-md5" fields which we absolutely do *not* want to include
+        # The regex doesn't allow numbers, so unless someone makes a hash algorithm without a number in the name,
+        # we're safe. (yes, this should absolutely have a better format).
+        #
+        # aptitude has options that control the output formatting, but is orders of magnitude slower
+        #
+        # sed could probably do all of the heavy lifting here, but would be even less readable
+        #
+        # The `head -n2500` causes us to stop once we have 2500 lines. We do it after the `sed` because
+        # Debian package descriptions can be extremely long - texlive-latex-extra has about 2700 lines in Debian 11.
+        apt-cache --no-generate show '.*'(commandline -ct)'.*' 2>/dev/null | sed -r '/^(Package|Description-?[a-zA-Z_]*):/!d;s/Package: (.*)/\1\t/g;s/Description-?[^:]*: (.*)/\1\x1a\n/g' | head -n 2500 | string join "" | string replace --all --regex \x1a+ \n | uniq
        return 0
    else
        set -l packages (dpkg --get-selections | string replace -fr '(\S+)\s+install' "\$1" | string match -e (commandline -ct))
-        __fish_apt_print_matches $packages
+        apt-cache --no-generate show $packages 2>/dev/null | sed -r '/^(Package|Description-?[a-zA-Z_]*):/!d;s/Package: (.*)/\1\t/g;s/Description-?[^:]*: (.*)/\1\x1a\n/g' | head -n 2500 | string join "" | string replace --all --regex \x1a+ \n | uniq
        return 0
    end
 end
-
-function __fish_apt_print_matches
-    type -q -f apt-cache || return 1
-
-    # Do not generate the cache as apparently this sometimes be very slow.
-    # http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=547550
-    # (It is safe to use `sed -r` here as we are guaranteed to be on a GNU platform
-    # if apt-cache was found.)
-    # Uses the UTF-8/ASCII record separator (0x1A) character.
-    #
-    # Note: This can include "Description:" fields which we need to include,
-    # "Description-en_GB" (or another locale code) fields which we need to include
-    # as well as "Description-md5" fields which we absolutely do *not* want to include
-    # The regex doesn't allow numbers, so unless someone makes a hash algorithm without a number in the name,
-    # we're safe. (yes, this should absolutely have a better format).
-    #
-    # aptitude has options that control the output formatting, but is orders of magnitude slower
-    #
-    # We limit the number of results generated by `apt-cache` directly (prior to string manipulation
-    # and deduplication) because it is the bottleneck; since we are limiting before more filtering,
-    # we use a more generous limit than we otherwised would have. We don't use `string join`/`string
-    # match` here because of excessive buffering.
-    # The limit used was experimentally derived by attempting to balance `apt-cache` runtime against
-    # the number of valid results omitted by reducing the limit for a variety of search types.
-    # `sed` could probably do all of the heavy lifting here, but would be even less readable.
-    apt-cache --no-generate show $argv 2>/dev/null | head -n2500 | sed -r '/^(Package|Description-?[a-zA-Z_]*):/!d;s/Package: (.*)/\1\t/g;s/Description-?[^:]*: (.*)/\1\x1a\n/g' | tr -d \n | tr \x1a+ \n | uniq
-end