My iPhone 16 Pro Max produces garbage output when running MLX LLMs

📝 Discussion Summary (Click to expand)

1. Phone calculators are not the real calculators
Users keep replacing the stock app with emulators of TI‑, HP‑, or NumWorks calculators because they need a full history, CAS, or a familiar interface.

“I use the NumWorks emulator app whenever I need something more advanced.” – varun_ch
“I was pretty delighted to realize I could now delete the lame Calculator.app … I settled on NumWorks.” – xp84

2. Built‑in calculator apps feel under‑baked
The default iOS/Android calculators lack history, symbolic evaluation, and a good UI for long expressions.

“Honestly, the main beef I have with Calculator.app is that on a screen this big, I ought to be able to see several previous calculations and scroll up if needed.” – xp84
“Calculator.app does have history now … it goes back to 2025 on my device.” – vscode‑rest

3. Apple’s MLX/LLM bug shows a hardware‑level defect
A specific iPhone 16 Pro Max fails to run Apple’s own LLM correctly, pointing to a defect in the Neural Engine or its driver.

“Apple’s own LLM silently failed on this device … it seems Bad (TM) that Apple would ship devices where their own LLM didn’t work.” – bri3d
“The author’s conclusion was still completely reasonable given the evidence they had.” – TimByte

4. Floating‑point/NaN behaviour is a source of confusion
The discussion turns to IEEE‑754 guarantees, NaN propagation, and the limits of reproducibility across platforms.

“Anything that relies on bit patterns of NaNs behaving in a certain way … is in dangerous territory.” – ekelsen
“Binary operations combining two NaN inputs must result in one of the input NaNs.” – addaon

These four threads capture the bulk of the conversation’s concerns and preferences.

🚀 Project Ideas

MobileCalc REPL

Summary

A mobile calculator app that behaves like a REPL: you can edit previous expressions, assign variables, and re-run dependent calculations.
Provides full history, syntax highlighting, CAS support, and graphing (2D/3D) in a single, lightweight UI.

Details

Key	Value
Target Audience	Students, engineers, hobbyists who need a powerful calculator on their phone.
Core Feature	Interactive expression editor with variable assignment, history navigation, and graphing.
Tech Stack	SwiftUI + Combine (iOS), Kotlin Multiplatform for Android, MathJax for rendering, libqalculate for CAS.
Difficulty	Medium
Monetization	Revenue‑ready: $4.99 one‑time purchase or $0.99/month subscription for advanced features.

Notes

HN commenters lament the lack of variable support in built‑in calculators: “I want to be able to return to an earlier expression, modify it, assign it to a variable…” (varun_ch).
The app would let users “tap to select previous expressions” and “modify the variable and rerun” (josephg).
The ability to preview the whole expression as you type would satisfy “built‑in calculator apps are surprisingly underbaked” (varun_ch).
Discussion potential: comparing to existing emulators (NumWorks, HP Prime) and why a native mobile REPL is superior.

FPRepro Checker

Summary

A cross‑platform CLI/web service that runs a suite of floating‑point expressions on multiple devices/architectures and reports discrepancies.
Helps developers detect non‑reproducible results caused by hardware, compiler, or runtime differences.

Details

Key	Value
Target Audience	Mobile and embedded developers, QA engineers, scientific computing teams.
Core Feature	Automated reproducibility tests across iOS, Android, macOS, Linux, and various CPU/GPU backends.
Tech Stack	Rust (performance), Docker for isolated environments, WebAssembly for browser runs, REST API.
Difficulty	High
Monetization	Revenue‑ready: $99/month for enterprise API access, free tier with limited runs.

Notes

Addresses frustration about “floating point accumulation doesn’t commute” and inconsistent results across devices (bri3d, ekelsen).
Provides a practical utility for debugging the “Apple Intelligence” LLM issue where math operations diverge on a specific iPhone 16 Pro Max.
Could spark discussion on IEEE 754 compliance and platform‑specific quirks.

SmartKeyboard

Summary

A lightweight, privacy‑first iOS keyboard that replaces the default predictive text with a modern, ML‑based next‑word model.
Offers customizable language models, offline mode, and real‑time correction.

Details

Key	Value
Target Audience	iOS users frustrated with broken predictive text, developers of custom keyboards.
Core Feature	On‑device language model (e.g., GPT‑2 distilled) with fast inference, context‑aware suggestions, and user‑controlled privacy settings.
Tech Stack	Swift + CoreML, TensorFlow Lite, optional server fallback for heavy models.
Difficulty	Medium
Monetization	Hobby (open source) with optional in‑app purchases for premium language packs.

Notes

Directly responds to “Typing on my iPhone… just gives up and stops correcting anything at all” (sen) and the broader “iOS Keyboard is Broken” discussion (macintux, taneq).
Users would appreciate a keyboard that “doesn’t randomly break” and can be tuned to their language habits.
Potential for community contributions and model fine‑tuning.

NeuralEngine Debugger

Summary

A diagnostic app for iOS that runs a battery of tests on the Apple Neural Engine (ANE) and GPU tensor cores, reporting performance, correctness, and compatibility.
Includes a UI to run sample LLM inference and compare results across device models.

Details

Key	Value
Target Audience	iOS developers, ML engineers, QA teams testing on Apple hardware.
Core Feature	Automated ANE health checks, benchmark suite, reproducibility checker for tensor operations, LLM inference demo.
Tech Stack	Swift, Metal, MLX, XCTest for automated tests, CoreML for model loading.
Difficulty	High
Monetization	Revenue‑ready: $29.99 one‑time purchase or $4.99/month for cloud‑based test results and analytics.

Notes

Addresses the “Apple Intelligence” bug where a specific iPhone 16 Pro Max produced wrong math results (bri3d, zczb).
Provides a practical tool for developers to verify that their models run correctly on the target device.
Could generate discussion on hardware‑level ML debugging and the need for better diagnostics in the Apple ecosystem.

My iPhone 16 Pro Max produces garbage output when running MLX LLMs

🚀 Project Ideas

MobileCalc REPL

Summary

Details

Notes

FPRepro Checker

Summary

Details

Notes

SmartKeyboard

Summary

Details

Notes

NeuralEngine Debugger

Summary

Details

Notes

Read Later