I'm a big fan of lightweight, automated tests. Despite that, I still default to manual verification. Usually I do both.
Automated tests omit a certain type of feedback that I think remains important to the development loop. Automation doesn't care about a poor UX; it only verifies what you tell it to.
For instance, I regularly contribute to a CLI that's widely used at $WORK. I can easily write tests to verify the I/O of a command I'm working on that assert correctness. Yet if I actually try to use the command I'm changing, usually as part of verifying my changes, I tend to discover usability issues that make the program more pleasant to use and the tests would happily ignore.
Also, there's certainly cases where automation isn't worth the cost. Maybe because the resulting tests are complex, or brittle. I've often found UI tests to lie in this category (but maybe I'm doing them wrong).
Because of these things I think manual testing is the right default. Automated tests should also exist; but manual tests should _always_ be part of the process.
Ai2 (https://allenai.org) is a Seattle based non-profit AI research institute founded in 2014 by the late Paul Allen. We pursue foundational AI research and innovation to deliver real-world impact through large-scale open models, data, robotics, conservation, and beyond.
My team is a group of software engineers redefining how researchers and engineers use state of the art GPU clusters. We own and actively develop Beaker, a GPU-first job orchestration system used by Ai2 researchers to manage and execute frontier research workloads, such as large-scale, distributed pre-training and online reinforcement learning. We’re also responsible for Ai2’s on-premise GPU servers from the bare-metal up, operating a high performance storage cluster and designing and developing critical systems that teams across the institute rely on for pushing forward cutting-edge, open science.
We're looking for a Senior Software Engineer to join our team. You should be proficient in Go and Python and have prior experience operating and configuring linux servers in a professional setting.
Ai2 (https://allenai.org) is a Seattle based non-profit AI research institute founded in 2014 by the late Paul Allen. We develop foundational AI research and innovation to deliver real-world impact through large-scale open models, data, robotics, conservation, and beyond.
My team (ReOps) maintains the software and servers that allow research teams at Ai2 to execute machine learning workloads on high performance, SOTA GPU clusters.
Most of our time is spent contributing to Beaker (https://blog.allenai.org/beaker-ed617d5f4593), a GPU-first job orchestration system that was authored at the institute. We also spend a fair amount of time configuring and operating the underlying on-premise GPU servers.
We're looking for a Senior Software Engineer to join our team. You should be proficient in Go and Python and have prior experience operating and configuring linux servers in a professional setting.
This inspired us to do a little exploration. We used the top cited papers of a few authors to produce a list that might be interesting, and to do some additional analysis. Take a look: https://github.com/allenai/author-explorer
But falling behind is very different than "being done." I think the original tweet is very much an exaggeration, and agree with the point made here.
Google is no where close to "being done." Sure, their answers aren't perfect. But they've managed to deploy them at scale. They're probably available globally. They're fast. And they probably see way more eyeballs than OpenAI's system.
It's going to take a long time for folks to deploy advanced techniques like this at the scale required for something like Google. And if anyone has the resources to do this, it's Google. So I suspect Google will just learn from these examples and integrate them into their existing offering, which will probably eclipse any chance at disruption -- both because of their existing market share and because of the computational firepower they have to make this happen.
I feel like the mass centralization of content is starting to unwind a bit. As things scale the generalized sources usually become less valuable to me. With more content comes more noise, and that noise is hard to sift through. And while Google isn't perfect, they're better at sifting through this noise than most sites are.
Take StackOverflow as an example. When it first emerged I found it really useful. Answers were generally high quality. There were valuable discussions about the merits of one approach versus another. Now it's a sea of duplicate questions, poor answers and meandering discussions. I rarely visit it anymore, as it's rarely helpful. And I regularly have to correct information others glean from it, as it's often wrong or incomplete.
So I suppose this all goes to say that I'm optimistic that things are headed in the right direction. I imagine things will ebb and flow for some time. But I believe Google and other search engines will always have a role to play, as there will always be new, valuable things to discover.
The Allen Institute for Artificial Intelligence | Multiple Research & Software Engineering Roles | REMOTE | https://allenai.org
AI2 is a non-profit research institute founded in 2014 with the mission of conducting high-impact AI research and engineering in service of the common good.
Our headquarters are in Seattle, WA. Employees are free to work remotely or on-site.
We have multiple roles open for both Researchers and Engineers. You can find a full list of open positions here:
The Allen Institute for Artificial Intelligence | Multiple Research & Software Engineering Roles | REMOTE | https://allenai.org
AI2 is a non-profit research institute founded in 2014 with the mission of conducting high-impact AI research and engineering in service of the common good.
Our headquarters are in Seattle, WA. Employees are free to work remotely or on-site if preferable.
We have multiple roles open for both Researchers and Engineers. You can find a full list of open positions here:
The Allen Institute for Artificial Intelligence | Multiple Research & Software Engineering Roles | REMOTE | https://allenai.org
AI2 is a non-profit research institute founded in 2014 with the mission of conducting high-impact AI research and engineering in service of the common good.
Our headquarters are in Seattle, WA. Employees are free to work remotely or on-site if preferable.
We have multiple roles open for both Researchers and Engineers. You can find a full list of open positions here:
Which is why it's important for folks to start applying AI to more interesting (but harder, more nuanced) problems. Instead of making it easier for people to write emails, or targeting ads, it should be used to help doctors, surgeons and scientists.
The problem is that these problems are less profitable. And that the companies with enough compute to train these types of models are concerned about getting more eyeballs, not making the world a better place.
The problem is not that those problems are less profitable. The problem is a combination of
1. Those problems are much harder
2. The potential harm from getting them wrong is much larger
Yup, I definitely agree that they're harder (and noted this). But I'm not sure I agree with your second point. Or rather, I think there's some nuance to it.
Sure, using AI to treat people without a human in the loop would clearly do harm. But using AI as an assistant, to help a doctor make the right diagnosis, seems like it'd do the opposite. It'd help doctors serve a larger patient population, make less mistakes, and probably equate to less harm in the long run.
Anyway, I think we can all agree that using AI for anything other than ad targeting is a net win.
Automated tests omit a certain type of feedback that I think remains important to the development loop. Automation doesn't care about a poor UX; it only verifies what you tell it to.
For instance, I regularly contribute to a CLI that's widely used at $WORK. I can easily write tests to verify the I/O of a command I'm working on that assert correctness. Yet if I actually try to use the command I'm changing, usually as part of verifying my changes, I tend to discover usability issues that make the program more pleasant to use and the tests would happily ignore.
Also, there's certainly cases where automation isn't worth the cost. Maybe because the resulting tests are complex, or brittle. I've often found UI tests to lie in this category (but maybe I'm doing them wrong).
Because of these things I think manual testing is the right default. Automated tests should also exist; but manual tests should _always_ be part of the process.