Hacker Newsnew | past | comments | ask | show | jobs | submit | hamza_q_'s commentslogin

Cool! I did an incomplete version in Rust a while back as well. Not a source port, tried to recreate the game from scratch myself, without looking at the C src code

https://github.com/hamzaq2000/wolf3d-reimpl-rs



Hilarious that this is maintained by facebook and yet SAM fails so badly


Yeah I was frustrated by slow and hard to use OSS diarization too; recently released a library to address that, check it out: https://github.com/narcotic-sh/senko

Also https://zanshin.sh, if you'd like speaker diarization when watching YouTube videos


Hey, thanks for this. Been trying it out and it's very fast but seems to hear more speakers than are in the audio. I didn't see a way to tweak speaker similarity settings or merge speakers in some way. Any advice?


Thanks for checking it out!

Yeah unfortunately, since the diarization is acoustic features based, it really does require high recorded voice fidelity/quality to get the best results. However, I just added another knob to the Diarizer class called mer_cos, which controls the speaker merging threshold. The default is 0.875, so perhaps try lowering to 0.8. That should help.

I'll also get around to adding a oracle/min/max speakers feature at some point, for cases where you know the exact number of speakers ahead of time, or wanna set upper/lower bounds. Gotten busy with another project, so haven't done it yet. PR's welcome though! haha


Thanks, `mer_cos` definitely gets me closer. I appreciate that. Yeah, I was thinking providing a param for the expected number of speakers would be nice. I'll check out the codebase and see if that's something I can contribute :).


Yeah would love contributions! Here's a brief overview of how I think it can be done:

Senko has two clustering types, (1) spectral for audio < 20 mins in length, and (2) UMAP+HDBSCAN for >= 20 mins. In the clustering code, spectral actually already supports orcale/min/max speakers, but UMAP+HDBSCAN doesn't. However, someone forked Senko and added min/max speakers to that here (for oracle, I guess min = max): https://github.com/DedZago/senko/commit/c33812ae185a5cd420f2...

So I think all that's required is basically just testing this thoroughly to make sure it doesn't introduce any regressions in clustering quality. And then just wiring the oracle/min/max parameters to the Diarizer class, or diarize() func.


looks interesting. will check it out.


Thanks for COD: MW2 (2009), Vince. The game of my childhood. Rest in Peace.


Cool use of ONNX! Fluid Inference also have great implementations of Parakeet v2/v3 in CoreML for Apple devices and OpenVINO for Intel:

https://github.com/FluidInference/FluidAudio

https://github.com/FluidInference/eddy-audio


Location: Vancouver, BC, Canada

Remote: Yes

Willing to relocate: Yes

Technologies: diarization, Voice AI, PyTorch, CoreML,

Svelte/SvelteKit, Flask, SQLite, Tauri

Résumé/CV: https://hamzaq.com/Hamza_Qayyum_Resume_Public.pdf

Email: mhamzaqayyum [at] icloud [dot] com

---------

Projects:

- Senko: very fast, accurate, speaker diarization (https://senko.sh)

- Zanshin: novel media player that allows you to navigate by speaker (https://zanshin.sh)


Thought about it but it seems they have some stringent pre-req's they'd like: https://github.com/ghostty-org/ghostty/issues/189

I didn't care for those; just told Claude Code to add in the feature directly. So they probably wouldn't accept the PR if I made one.


Thanks :) Agreed, the limiting factor has been diarization (generating the "who speaks when" data) speed. But the diarization backend of this app that I developed can now process 1 hour of audio in ~8 seconds on a M3 Mac. So that's more or less a solved problem now (at least on Mac), just UI work remains.


We do know; it's just not in the popular conscience yet. Read a bit of Marshall McLuhan.


Taking bets on how fast Marshall McLuhan re-enters the public conscience :)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: