Hacker Newsnew | past | comments | ask | show | jobs | submit | Bootstrapper909's commentslogin

Very neat! Generally speaking, it seems like testing and debugging, hasn't been getting as much love as other aspects of computer programmeing.

Debuggers still work pretty much the same as gdb, only now with an integrated UI, and there's so much room for improvement. This is a great start and can be taken much further with abilities to mimic complete sessions to easily reproduce bugs and run them again and again until fixed.

We see the same at the related testing automation field (disclaimer - checksum.ai founder). Same testing methods, same testing problems, only fancier packages.


Testing gets a lot of love, at least in theory. Much less in practice because it is a whole lot of extra work.

As for checksum.ai, is it some kind of fuzzer? I think fuzz-testing is under utilized, in fact, many of my colleagues don't even know that it exists, and I have never been on a project where it was done. Generally, fuzzing is done in the context of security, but I see no reason why it should be limited to it.


Not exactly (disclaimer - on the checksum team). Our goal is to generate end to end tests that mimic user behavior, using real production sessions to train our models. These end to end tests could then be run during development, effectively smoking out real potential bugs before your latest deployment makes it into production. Of course, part of the generated tests could involve fuzz testing if it makes sense (if there is a form field, input, etc.)


Very cool! Any specific reason you started with Java?


time travel debugger for Java, since we worked in financial domain where Java was predominant. We couldn't debug on remote machines for the lack of access. We had to log a lot and it felt like guesswork. Having a debugger with a back button made a lot of sense. We logged everything by default and reverse mapped it with the code, so that we could reverse-F8.


Yep totally understand. We are an early stage startup and currently 100% focused on improving our models and our product.

We don’t have pricing, not because we try to be vague, but because we haven’t fully figured out our training costs, which can vary significantly per app. We are very much in the “Do things that don’t scale” phase where we hand-pick our customers, provide white-glove treatment and prioritize learnings over price


Reasonable.


Thanks for your feedback. It definitely makes sense and we'll incorporate it!


Thanks for your kind words! Yes many teams struggle with that (and I have in the past) and the essence of ur mission is to allow dev teams to focus on progressing on their roadmap and goals instead of wrestling with tests.

Feel free to sign up for a demo if that's a priority for your team. Even if it's just to chat and connect.


It's all of the above but more specifically:

1. We use AI to analyze the user patterns and find common paths and edge cases, basically building a representation of your UX in a DB

2. We then use the DB to train another ML model that learns how to use your app the same way a user does. Given a certain page and user context, the ML can complete UX flows.

3. Finally, we learn to generate assertions, run the tests and convert the model actions in step 2 into proper Playwright or Cypress tests


That's a fair comment and I guess we are missing and "AND" there.

1. We (and others) have tested our tools' impact on memory, CPU, network performance and found only negligible impact, even on slower/older devices

2. Also, they are used by F500 companies and have wide adoption, which indicates that other well established devs have run the same tests and decided to move forward.

We'll work on the language there to clarify.


We're currently focusing on web apps.

There's nothing "specific" in the underlying model that prevents it from testing mobile. It's just a matter of focus at the current time.


Our landing page at checksum.ai has a video in the hero section of test. We added some graphics (e.g. the green checkmark), but the steps executed are real tests that we generated.

But the tl;dr is 1. We learn how to use your app based on real sessions (we remove sensitive information on the client side) 2. We train a model on this data 3. We connect this model to a browser and generate Playwright or Cypress tests

The end result is code written and Playwright or Cypress. You can edit and run the tests regularly


I agree! My experience with test generation tools was also lukewarm which is why we founded Checksum.

> How is the product different from the other test generation tools

We train our models based on real user sessions. So our tests are: 1. Completely auto-generated 2. Achieve high coverage of real user flows, including detecting edge cases 3. Automatically maintained and execute with our models so they are less flakey.

> How do you check if the are testing the intended behavior

Our models are trained on many real sessions so it learns how your website (and others) should behave. In that sense, it's similar to a manual QA tester which can detect bugs. To supplement for functionality that is not obvious by the UI, we are now looking at adding LLMs to parse code, but most of the functionality can be inferred from the UI


So you are saying that your system needs me to do all the testing (it is infeasible to watch our users, because we test the product before it is released to any users) so it can learn how to test?

How can it know, by watching my clicks, how I decide if the behavior is correct on the backend?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: