I also have an M1 Max 64GB: Qwen 3.6 benefits from MTP (after rounds of parameter optimization). MLX was unstable (haven't tried it recently), faster at TG but slower at PP, so inconclusive.
Yeah. I have not really tinkered much with parameter optimisation for the 35B model with MTP. Would be interested to see what you've found.
I'm using the GGUF too; it appears slightly faster in llama.cpp now than current LM Studio but it's not clear to me if that is down to LM Studio having a little more code overhead, older llama.cpp under the hood, or just parameter differences.
"Apple is now focused on its rival to Meta’s Ray-Ban smart glasses in 2027, and a “display-equipped AR/XR smart glasses device powered by optical waveguides” in 2029, Kuo says."
seems like basically Vision Pro is going to become Vision XR. All roads lead to Rome.
Doubtful, the screen is still the same, same amount of light is getting projected, same amount of graphics calculations is still happening. If there's any impact it's probably quite low, maybe less than 0.0001%
I'm a bit disappointed that Opus 4.6 wasn't in this because the tokenizer changed quite a bit from 4.7 onward. I was so annoyed by 4.7 that I've been forcing 4.6 ever since. I've been annoyed by 4.8 a bit too, so I haven't felt the urge to move on.
tribalism, Us vs. Them, racism, patriotism/nationalism, etc all seem closely related.
In terms of social life, and romantic life, it's interesting how heavily we rely on shared/common background, which tends to cause this clustering effect.
reply