Writing to the IPI register causes a trap, which sets the event...
causing every CPU to go into a tight loop of traps contended on the BHL.
We don't need to clear IPIs in WFE mode, so don't do that.
Signed-off-by: Hector Martin <marcan@marcan.st>
This stops DCP from killing our modeset if the connection cycles.
Also force a (potential) configure cycle if the display is external;
this makes sure updated stage2s will have a chance at fixing issues of
old stage1s. Modesetting is fast when it's the same mode as before.
Signed-off-by: Hector Martin <marcan@marcan.st>
Get rid of asc_cpu_stop() which was never a thing. The CPU start bit
should always be off in the steady state, it is only used momentarily to
start the CPU.
Signed-off-by: Hector Martin <marcan@marcan.st>
Commit 9c795fbdbf introduced the pair of
WFE and SEV for spinlock, but it caused delays of tens of seconds. A
possible explanation for the delay is lack of data synchronization
barrier between the store instruction and SEV instruction.
Arm Architecture Reference Manual for A-profile architecture (issue H.a)
says:
> Arm recommends that software includes a Data Synchronization Barrier
> (DSB) instruction before any SEV instruction. The DSB instruction
> ensures that no instructions, including any SEV instructions, that
> appear in program order after the DSB instruction, can execute until
> the DSB instruction has completed.
However, inserting a DSB instruction still didn't resolve the delay.
The exclusive load is an alternative to the SEV instruction. The manual
says:
> ...However, in Armv8, when the global monitor for a PE changes from
> Exclusive Access state to Open Access state, an event is generated.
> This is equivalent to issuing an SEVL instruction on the PE for which
> the monitor state has changed. It removes the need for spinlock code
> to include an SEV instruction after clearing a spinlock.
As an additional benefit, the exclusive load is local to the PE and
eliminates spurious events for other PEs.
Trusted Firmware-A v2.6 also employs the same algorithm.
Signed-off-by: Akihiko Odaki <akihiko.odaki@gmail.com>
If an explicitly specified display mode exceeds the allocated
framebuffer allocate a new one from the top of RAM.
Note: macOS panics immediately with a realloced framebuffer.
Signed-off-by: Janne Grunau <j@jannau.net>
DART nodes for dcp and disp0 have pre-allocated L1 and L2 tables which
are annotated in the ADT. The pre-allocated memory is specified in
"pt-region-${DEVICE}". The first page is used as L1 table and the
following pages are used as L2 tables. The number of valid L2 tables is
specified in "l2-tt-${DEVICE}". The first entry identifies the region
and the second entry is the number of valid L2 tables.
iboot (macOS 12.3) inits just 2 L2 tables. Larger framebuffers require
more. By using the pre-allocated page tables we do not have worry about
keeping the memory mapped after m1n1 executes the next target.
Signed-off-by: Janne Grunau <j@jannau.net>
The reserved framebuffer on the Mac Studio is 0x854000 bytes. This is
too small for 1920x1200 with 4 byte per pixel. Setting 1920x1200 as
mode crashes dcp but not the actual display controller. The display
remains working and even comes back after display hotplug/power cycle.
Signed-off-by: Janne Grunau <j@jannau.net>
The device tree for multi die SoCs as the M1 Ultra has its devices
under "/soc/dieX" instead of directly under "/soc".
Signed-off-by: Janne Grunau <j@jannau.net>
Get rid of the hv_rearm() thing (which was always a bit dodgy) and
instead properly make sure that all CPUs rendezvous when needed and
switch the active proxy thread without ever exiting exception context.
The Python side can now switch proxy context (by waiting directly for
a proxy boot) without having to exit out of the hypervisor callback,
so cpu() now works as a normal Python method.
Add a cpus() iterator so you can do things like:
>>> for i in cpus(): bt()
Signed-off-by: Hector Martin <marcan@marcan.st>
This should reduce memory traffic spam and power usage from lock
contention when threads are blocked on a spinlock.
Signed-off-by: Hector Martin <marcan@marcan.st>
Previously RAM was mapped ad-hoc, but this can end up interacting
poorly with the tracer infrastructure which we are now using for RAM
too. Move to mapping guest RAM via the tracer infra, and also unmap the
TZ carveouts in the Python side so it knows about them.
This is a HV ABI break.
Signed-off-by: Asahi Lina <lina@asahilina.net>
The HV tick polling now only runs on CPU#0. All CPUs have the 1000Hz
HV tick, but secondaries only use it to poll the FIQ state and that path
does not take the BHL if no other FIQ was pending.
Signed-off-by: Hector Martin <marcan@marcan.st>
This fixes display DART real-time cache hits causing AMCC exceptions.
The relevant carve-outs have flags 0x60004016; 0x60004002 is used for
DCP which is non-realtime, so I'm guessing the '16' means we should map
it uncached.
Signed-off-by: Hector Martin <marcan@marcan.st>
The PCIe 4 link speed is only described as "target-link-speed" in the
"lan-10gb". This changed in macos 12.3 or earlier. Verified on Mac
Studio and with the template Mac Mini ADT.
Reported-by: Jeff Geerling <geerlingguy@mac.com>
Signed-off-by: Janne Grunau <j@jannau.net>
Adds support for up to 64-byte ops and more SIMD/paired operations.
This is good enough to trace a lot of GPU VM address space.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Turns out it's just an 8-bit bool, not 32 bits, and when cast to int
the top bits can cause it to be interpreted as an error randomly...
Signed-off-by: Hector Martin <marcan@marcan.st>
This code is gated behind the CHAINLOADING define. To build a
release-style m1n1 with chainloading for use with the installer
or kmutil, use:
make CHAINLOADING=1 RELEASE=1
To tell m1n1 to chainload another binary, use this var payload:
chainload=<ESP partition UUID>;<file path>
e.g.
chainload=a17b7e46-e950-bb4f-bc82-8ab1047a058e;m1n1/m1n1.bin
Closes: #154
Co-authored-by: Finn Behrens <me@kloenk.dev>
Co-authored-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Hector Martin <marcan@marcan.st>
- Hide console by default unless booting in verbose mode.
- In verbose mode, enable USB early and poll for connection before
launching payloads.
- Show console on fallback to proxy.
Signed-off-by: Hector Martin <marcan@marcan.st>
Do not forget to clear the L2C_ERR flags in silent mode also. This is
necessary for proper recovery.
Signed-off-by: Martin Povišer <povik@protonmail.com>
This should not be required but it looks like the 12.1 ANS2 firmware
complains when NVME_CC.EN is cleared before the IO queues have been
deleted.
Signed-off-by: Sven Peter <sven@svenpeter.dev>
Looks like Apple decided to change the compatible to sart,coastguard and rely
on the sart-version ADT property to differentiate between M1 and M1 Pro/Max
instead. sart-version is already present in the 11.x ADT so let's just
always decide based on that.
Signed-off-by: Sven Peter <sven@svenpeter.dev>
When loading m1n1.bin, SEPFW follows m1n1 and thus will show up at
the payload offset if there are no payloads.
Fixes: #158
Signed-off-by: Hector Martin <marcan@marcan.st>
These properties are needed such that U-Boot can enable the
serial port in its early pre-relocation phase.
Signed-off-by: Mark Kettenis <kettenis@openbsd.org>
This drops the tx/rxcookie and accepts TYPE_NOTIFY in addition to
TYPE_REPLY for command replies to make the DCP code also work on 11.x.
It'll still complain about an unexpected message during init but work.
Signed-off-by: Sven Peter <sven@svenpeter.dev>
s/IDLE/HIBERNATE/ to keep in sync with the Linux driver and then
hibernate DCP but send ANS2 to sleep to allow reusing both.
Signed-off-by: Sven Peter <sven@svenpeter.dev>
If the co-processor crashes afk_epic_poll will always fail which results
in afk_epic_rx getting stuck in an infinite loop calling afk_epic_poll
again and again.
This happens with e.g. old/incompatible DCP firmware.
Make sure the m1n1 proxy still works in those cases by propagating the
error correctly.
Signed-off-by: Sven Peter <sven@svenpeter.dev>
This calibration blob is stored in the WiFi chipset SROM on other
platforms, but Apple decided to move it to sysconfig instead...
Signed-off-by: Hector Martin <marcan@marcan.st>
Read the antenna SKU from the ADT and store it in a
"apple,antenna-sku" property on the relevant node in the FDT.
Signed-off-by: Mark Kettenis <kettenis@openbsd.org>
This sets both the target and the max link speed of the root ports
to the maximum specified in the ADT.
Signed-off-by: Hector Martin <marcan@marcan.st>
Expose the usb_iodev_vuart_setup function in uartproxy. This opens
the secondary ACM pipe to new uses outside the hypervisor. E.g. it can
be set up as another stream for sending proxy requests.
Sample usage from proxyclient:
p.usb_iodev_vuart_setup(p.iodev_whoami())
p.iodev_set_usage(IODEV.USB_VUART, USAGE.UARTPROXY)
# the second virtual serial now also serves proxy
Signed-off-by: Martin Povišer <povik@protonmail.com>
Turns out this isn't hardware-specific, but rather a change Apple made
retroactively in 12.0 RC. Doesn't look like there's a saner way than
this...
Signed-off-by: Hector Martin <marcan@marcan.st>
This makes it almost as fast as it was before the switch to an
uncached framebuffer, as far as I can tell.
Signed-off-by: Hector Martin <marcan@marcan.st>
This turns on the system level cache. The carveout unmapping also moves
here, and now it handles T8103/T6000 properly.
Signed-off-by: Hector Martin <marcan@marcan.st>
Adds support for the 3rd USB-C port on 2021 Macbook Pros.
Currently up to 8 USB-C ports are supported which should be sufficient
for expected future devices.
Tested on Macbook Pro 14" and Mac Mini.
Signed-off-by: Janne Grunau <j@jannau.net>
This passes through the hypervisor USB device disable to Linux, so we no
longer need to hack device trees.
Signed-off-by: Hector Martin <marcan@marcan.st>
This needs to happen in-place for the iterator to not be invalidated.
Explicitly request that and fail if it does not work.
Signed-off-by: Hector Martin <marcan@marcan.st>
Turns out AMCC on t600x throws errors when DISP0 real-time memory
requests hit the CPU cache, and then macOS panics.
Signed-off-by: Hector Martin <marcan@marcan.st>
This needs an extra L1 translation level, but only on SoCs with support
for >36-bit PAs. On M1, we bypass it and keep starting at L2.
Signed-off-by: Hector Martin <marcan@marcan.st>
This adds some missing fixes for M1/T8103 and reworks the code to split
off common parts, and also handle per-revision bits.
Signed-off-by: Hector Martin <marcan@marcan.st>
For now we compute this as phys_base aligned down to a 4GiB boundary.
Hopefully that works for future SoCs too.
Signed-off-by: Hector Martin <marcan@marcan.st>
The M1 Pro/Max Macs use a different base address for the UART and the
WDT than earlier models.
Remove hardcoded base addresses constants and replace them with loads
from the ADT.
- The base address of the WDT is already retrieved in wdt_disable; also
use this address when triggering a reboot.
- Retrieve the base address of uart0 in uart_init. If the operation
fails, the error will be signaled on the early uart (if not disabled).
- The early debug UART can’t use the ADT (or shouldn’t) so it is now
disabled by default. To enable it, add
-DEARLY_UART -DEARLY_UART_BASE=0xuart_address
to the CFLAGS.
Signed-off-by: Vincent Duvert <vincent@duvert.net>
This enables parent devices that are required by active child devices,
because iBoot leaves behind some broken dependencies.
Signed-off-by: Hector Martin <marcan@marcan.st>
This unbreaks dcp.py and other things that need to access reserved
regions. This way we don't have to start doing manual MMU maps, but
we're still safe from SErrors caused by hitting TZ carveouts.
Signed-off-by: Hector Martin <marcan@marcan.st>
We saw some crazy speculation running in the HV breaking things by
reading from invalid RAM, so let's actually map only what's available.
For now we do map all lowmem as we haven't seen SErrors there yet, but
we stop at the high boundary.
Fixes: #97
Signed-off-by: Hector Martin <marcan@marcan.st>
Also capture config at cpu0 guest entry time, to make sure we don't
carry over guest changes to EL1 regs after that.
Signed-off-by: Hector Martin <marcan@marcan.st>
This makes sure any pending memory ops that might trigger an
asynchronous SError do so here, and not later. This fixes SErrors
breaking proxy ops.
Signed-off-by: Hector Martin <marcan@marcan.st>
This does an explicit hypervisor rendezvous. It's not great because it
introduces spurious guest IPIs, but xnu doesn't seem to care...
Signed-off-by: Hector Martin <marcan@marcan.st>