Saw this a couple days ago, and really liked the work. I was thinking about your slow UI response time. There really shouldn't be any real noticeable lag going on, even the basic pi zero has what a 1ghz core? I suspect theres something off with your thread scheduling setup.
I do my stuff on embedded hardware, using freeRTOS and I set my schedules to run the lcd once every 40ms to give 25fps and my leds to run there next update at something else (cant remember what tbh

)
that leaves me tons of time to setup other things like button detection and what not. That runs at 5ms per check I think...
It might also help you to use compiled code rather than python VM.
VMs have a really large overhead and on a small SOC it might be pushing its limits.
I dont know if theres any route to direct port python code to a compiled machine code... Thats something worse looking into.