Assuming you currently recompute the whole thing on every edit, it might be fairly straightforward to speed it up by reusing the results of the previous run. Areas which do not overlap the edited area shouldn't need to be recalculated. Bounding boxes would be a quick way to test overlap.
Having checked the website https://kidzfun.art, it looks like you compute the masks once, and don't recalculate anything upon edits.
If you would like to dynamically adjust the masks, the following algorithm should be O(sqrt(N)) lines of JavaScript executed for realistic cases:
Split the image up into monochromatic regions of at most 1000-4000 pixels (you want to regions to be roundish rather than long and stringy, so use breadth first search to create them). Then keep track of a graph of these regions with edges between adjacent regions. For a 2000x2000 pixel image, the graph should contain ~4000 nodes, so finding a monochromatic component would not take an appreciable amount of time.
Updating the graph after a fill operation is straightforward - just relabel the color associated with each changed region. The only point I'm uncertain of is whether canvas operations are fast enough that each region can have its own mask and up to 4000 draws as described in the blog post don't take too long. The individual masks wouldn't be very big, and the total number of pixels to be colored isn't any different to the blog post, so it may be possible.
If something is drawn with the pen tool, then every region it touches would need to be recalculated. Assuming the pen tool is circular and not substantially larger than 1000 pixels itself, this should be a fast operation as it won't overlap too many large regions and there are few enough regions we can iterate through all of them every frame for a bounding box test.
There are cases where this goes wrong - for example if the regions end up being very stretched. If there are 1000 1-pixel wide vertical lines, and the drawing tool allows a 1000 pixel horizontal line to be drawn in 1 frame, the algorithm described looks at a million pixels (possibly 4 times each as it checks for adjacency), which might be too slow. However, since this is aimed at a touchscreen, I'd be impressed if your children can and want to draw accurately enough to cause a problem. Further optimisations could work around this problem (and other problems such as having lots of 1-pixel regions that would bloat the graph).
devit's suggestions are also worth incorporating.
Since this algorithm only becomes interesting with edits, it's not really something I could implement as a pull request to the GH repo.