Commit 9c795fbdbf introduced the pair of
WFE and SEV for spinlock, but it caused delays of tens of seconds. A
possible explanation for the delay is lack of data synchronization
barrier between the store instruction and SEV instruction.
Arm Architecture Reference Manual for A-profile architecture (issue H.a)
says:
> Arm recommends that software includes a Data Synchronization Barrier
> (DSB) instruction before any SEV instruction. The DSB instruction
> ensures that no instructions, including any SEV instructions, that
> appear in program order after the DSB instruction, can execute until
> the DSB instruction has completed.
However, inserting a DSB instruction still didn't resolve the delay.
The exclusive load is an alternative to the SEV instruction. The manual
says:
> ...However, in Armv8, when the global monitor for a PE changes from
> Exclusive Access state to Open Access state, an event is generated.
> This is equivalent to issuing an SEVL instruction on the PE for which
> the monitor state has changed. It removes the need for spinlock code
> to include an SEV instruction after clearing a spinlock.
As an additional benefit, the exclusive load is local to the PE and
eliminates spurious events for other PEs.
Trusted Firmware-A v2.6 also employs the same algorithm.
Signed-off-by: Akihiko Odaki <akihiko.odaki@gmail.com>
Do not go through a formatted cmdline string when invoking tools. There
are user-controlled paths involved which can contain spaces (and other
foul characters).
Signed-off-by: Martin Povišer <povik@protonmail.com>
If an explicitly specified display mode exceeds the allocated
framebuffer allocate a new one from the top of RAM.
Note: macOS panics immediately with a realloced framebuffer.
Signed-off-by: Janne Grunau <j@jannau.net>
DART nodes for dcp and disp0 have pre-allocated L1 and L2 tables which
are annotated in the ADT. The pre-allocated memory is specified in
"pt-region-${DEVICE}". The first page is used as L1 table and the
following pages are used as L2 tables. The number of valid L2 tables is
specified in "l2-tt-${DEVICE}". The first entry identifies the region
and the second entry is the number of valid L2 tables.
iboot (macOS 12.3) inits just 2 L2 tables. Larger framebuffers require
more. By using the pre-allocated page tables we do not have worry about
keeping the memory mapped after m1n1 executes the next target.
Signed-off-by: Janne Grunau <j@jannau.net>
The reserved framebuffer on the Mac Studio is 0x854000 bytes. This is
too small for 1920x1200 with 4 byte per pixel. Setting 1920x1200 as
mode crashes dcp but not the actual display controller. The display
remains working and even comes back after display hotplug/power cycle.
Signed-off-by: Janne Grunau <j@jannau.net>
The device tree for multi die SoCs as the M1 Ultra has its devices
under "/soc/dieX" instead of directly under "/soc".
Signed-off-by: Janne Grunau <j@jannau.net>
Instead of directly taking a proxy entry code to determine what to
decode, just take an is_fault argument and only show and decode ESR/FAR
if true.
Signed-off-by: Hector Martin <marcan@marcan.st>
Get rid of the hv_rearm() thing (which was always a bit dodgy) and
instead properly make sure that all CPUs rendezvous when needed and
switch the active proxy thread without ever exiting exception context.
The Python side can now switch proxy context (by waiting directly for
a proxy boot) without having to exit out of the hypervisor callback,
so cpu() now works as a normal Python method.
Add a cpus() iterator so you can do things like:
>>> for i in cpus(): bt()
Signed-off-by: Hector Martin <marcan@marcan.st>
This should reduce memory traffic spam and power usage from lock
contention when threads are blocked on a spinlock.
Signed-off-by: Hector Martin <marcan@marcan.st>