* Add heuristic checking for HTML anchors
Previously only anchors specified or generated in markdown could be
linked to, without complaint from the link checker. We now use a
simple heuristic check for `name` or `id` attributes.
Duplicate code has been refactored and all XML anchor checks updated
to use regex rather than substring match.
* Fix regexp and refactor
* cargo/manifests: allow user to use native-tls ...
... if `ring` can't be used on the user's platform (e.g. mips/ppc/riscv)
* CI: test for native tls build
Many servers will return errors (e.g. 400/403) to requests that do not
set a User-Agent header. This results in issues in both the link_checker
and load_data components. With the link_checker these are false positive
dead links. In load_data, remote data fails to be fetched. To mitigate
this issue, this sets a default User-Agent of
$CARGO_PKG_NAME/$CARGO_PKG_VERSION
Note that the root cause of this regression from zola v0.9.0 is that
reqwest 0.10 changed their default behavior and no longer sets a
User-Agent by default:
https://github.com/seanmonstar/reqwest/pull/751Fixes#950.
* Treat 304 (Not Modified) requests as valid.
* Add tests for 301-to-200 links, 301-to-404 links, and 500 links.
This helps to test redirections and the previously-added
response.status() checking for non-success status codes in check_url().
* Make names for HTTP mock paths unique, to avoid weird behavior. It
seems like mocks with the same path can potentially bleed between
tests, so you may end up with an unexpected response which causes the
test to sometimes pass and sometimes fail.
* Fix Clippy warnings about String::from(format!()).
Certain tests involving HTTP requests were sometimes hanging
indefinitely, so this uses Mockito for HTTP mocking. This seemingly
resolves the issue and makes these tests more reliable.
The existing can_fail_404_links test has been renamed to
can_fail_unresolved_links, to represent what actually occurs in the
test. The can_fail_404_links test now deals with a proper 404
response.
Just to be clear, the check_site test in the site component will
still create outgoing HTTP requests (due to the URLs used in the
test_site), so this commit only uses HTTP mocking where possible.
The can_fail_404_links() test doesn't encounter a 404 response in
actuality, since the google.comys domain doesn't resolve. When the
test is updated such that the response's status code is a 404, the
test fails because the check_url() function doesn't handle
non-success responses how the test's assertions expect. This commit
updates check_url() to handle non-success responses, treating them
much like errors.
* Add anchor existant checking to link_checker component
* Oops, forgot some changes
* Drop scraper dependency and rework tests
* Handle name attributes
As mentioned in #381, crates.io 404's any request without an Accept:
text/html header. It 200's any request with one, but at least
false-successes don't prevent checking any other links.
This also makes it easier to add a custom User-Agent if desired.
rustfmt and fix a clippy nit (unnecessary return) while I'm here.