More

hamza_q_ · 2026-02-21T20:25:32 1771705532

Cool! I did an incomplete version in Rust a while back as well. Not a source port, tried to recreate the game from scratch myself, without looking at the C src code

https://github.com/hamzaq2000/wolf3d-reimpl-rs

hamza_q_ · 2025-12-23T20:19:06 1766521146

Use Demucs bruh https://github.com/adefossez/demucs

yunwal · 2025-12-23T20:33:25 1766522005

Hilarious that this is maintained by facebook and yet SAM fails so badly

hamza_q_ · 2025-12-23T20:05:14 1766520314

Yeah I was frustrated by slow and hard to use OSS diarization too; recently released a library to address that, check it out: https://github.com/narcotic-sh/senko

Also https://zanshin.sh, if you'd like speaker diarization when watching YouTube videos

noman-land · 2025-12-23T23:26:38 1766532398

Hey, thanks for this. Been trying it out and it's very fast but seems to hear more speakers than are in the audio. I didn't see a way to tweak speaker similarity settings or merge speakers in some way. Any advice?

hamza_q_ · 2025-12-25T19:30:23 1766691023

Thanks for checking it out!

Yeah unfortunately, since the diarization is acoustic features based, it really does require high recorded voice fidelity/quality to get the best results. However, I just added another knob to the Diarizer class called mer_cos, which controls the speaker merging threshold. The default is 0.875, so perhaps try lowering to 0.8. That should help.

I'll also get around to adding a oracle/min/max speakers feature at some point, for cases where you know the exact number of speakers ahead of time, or wanna set upper/lower bounds. Gotten busy with another project, so haven't done it yet. PR's welcome though! haha

noman-land · 2025-12-26T03:01:05 1766718065

Thanks, `mer_cos` definitely gets me closer. I appreciate that. Yeah, I was thinking providing a param for the expected number of speakers would be nice. I'll check out the codebase and see if that's something I can contribute :).

hamza_q_ · 2025-12-26T20:26:32 1766780792

Yeah would love contributions! Here's a brief overview of how I think it can be done:

Senko has two clustering types, (1) spectral for audio < 20 mins in length, and (2) UMAP+HDBSCAN for >= 20 mins. In the clustering code, spectral actually already supports orcale/min/max speakers, but UMAP+HDBSCAN doesn't. However, someone forked Senko and added min/max speakers to that here (for oracle, I guess min = max): https://github.com/DedZago/senko/commit/c33812ae185a5cd420f2...

So I think all that's required is basically just testing this thoroughly to make sure it doesn't introduce any regressions in clustering quality. And then just wiring the oracle/min/max parameters to the Diarizer class, or diarize() func.

websiteapi · 2025-12-23T21:38:51 1766525931

looks interesting. will check it out.

hamza_q_ · 2025-12-22T20:51:09 1766436669

Thanks for COD: MW2 (2009), Vince. The game of my childhood. Rest in Peace.

hamza_q_ · 2025-11-14T21:34:51 1763156091

Cool use of ONNX! Fluid Inference also have great implementations of Parakeet v2/v3 in CoreML for Apple devices and OpenVINO for Intel:

https://github.com/FluidInference/FluidAudio

https://github.com/FluidInference/eddy-audio

hamza_q_ · 2025-11-04T06:19:06 1762237146

Location: Vancouver, BC, Canada

Remote: Yes

Willing to relocate: Yes

Technologies: diarization, Voice AI, PyTorch, CoreML,

Svelte/SvelteKit, Flask, SQLite, Tauri

Résumé/CV: https://hamzaq.com/Hamza_Qayyum_Resume_Public.pdf

Email: mhamzaqayyum [at] icloud [dot] com

---------

Projects:

- Senko: very fast, accurate, speaker diarization (https://senko.sh)

- Zanshin: novel media player that allows you to navigate by speaker (https://zanshin.sh)

hamza_q_ · 2025-10-25T18:26:50 1761416810

Thought about it but it seems they have some stringent pre-req's they'd like: https://github.com/ghostty-org/ghostty/issues/189

I didn't care for those; just told Claude Code to add in the feature directly. So they probably wouldn't accept the PR if I made one.

hamza_q_ · 2025-09-21T17:17:40 1758475060

Thanks :) Agreed, the limiting factor has been diarization (generating the "who speaks when" data) speed. But the diarization backend of this app that I developed can now process 1 hour of audio in ~8 seconds on a M3 Mac. So that's more or less a solved problem now (at least on Mac), just UI work remains.

hamza_q_ · 2025-09-11T16:49:58 1757609398

We do know; it's just not in the popular conscience yet. Read a bit of Marshall McLuhan.

hamza_q_ · 2025-09-11T16:48:04 1757609284

Taking bets on how fast Marshall McLuhan re-enters the public conscience :)