I love this! I use coding agents to generate web-based slide decks where “master slides” are just components, and we already have rules + assets to enforce corporate identity. With content + prompts, it’s straightforward to generate a clean, predefined presentation.
What I’d really want on top is an “improv mode”: during the talk, I can branch off based on audience questions or small wording changes, and the system proposes (say) 3 candidate next slides in real time. I pick one, present it, then smoothly merge back into the main deck.
Example: if I mention a recent news article / study / paper, it automatically generates a slide that includes a screenshot + a QR code link to the source, then routes me back to the original storyline.
With realtime voice + realtime code generation, this could turn the boring old presenter view into something genuinely useful.
There was a pre-LLM version of this called "battledecks" or "PowerPoint Karaoke"[0] where a presenter is given a deck of slides they've never seen and have to present on it. With a group of good public speakers it can be loads of fun (and really impressive the degree that some people can pull it off!)
There is a Jackbox game called "Talking Points" that's like this: the players come up with random ideas for presentations, your "assistant" (one of the other players) picks what's on each slide while you present: https://www.youtube.com/watch?v=gKnprQpQONw
Caro’s first LBJ biography tells of how the future president became a congressman in Texas in his 20s, by carting around a “claque” of his friends to various stump speeches and having them ask him softball questions and applauding loudly after
I guess you could have two people per presentation, one person who confirms whether to slide in the generated slide or maybe regenerate. And then of course, eventually that's just an agent
You're describing almost verbatim what we're building at Octigen [1]! Happy to provide a demo and/or give you free access to our alpha version already online.
I built something similar at a hackathon, a dynamic teleprompter that adjusts the speed of tele-prompting based on speaker tonality and spoken wpm. I can see extending the same to an improv mode. This is a super cool idea.
The end result would be a normal PPT presentation, check https://sli.dev as an easy start, ask Codex/Claude/... to generate the slides using that framework with data from something.md.
The interesting part here is generating these otherwise boring slide decks not with PowerPoint itself but with AI coding agents and a master slides, AGENTS.md context.
I’ll be showing this to a small group (normally members only) at IPAI in Heilbronn, Germany on 03/03. If you’re in the area and would like to join, feel free to send me a message I will squeeze you in.
In my AGENTS.md file i have a _rule_ that tells the model to use Apache ECharts, the data comes from the prompt and normally .csv/.json files.
Prompt would be like: "After slide 3 add a new content slide that shows a bar chart with data from @data/somefile.csv" ... works great and these charts can be even interactive.
You could try something like mermaid (or ASCII) -> nano banana. You can also go the other way and turn images into embedded diagrams (which can be interactive depending on how you're sharing the presentation)
Not my normal use-case, but you can always fall back and ask the AI coding agent to generate the diagram as SVG, for blocky but more complex content like your examples it will work well and still is 100% text based, so the AI coding agents or you manually can fix/adjust any issues.
An image generation skill is a valid fallback, but in my opinion it's hard to change details (json style image creation prompts are possible but hard to do right) and you won't see changes nicely in the git history.
In your use case you can ask the AI coding agent to run a script.js to get the newest dates for the project from a page/API, then it should only update the dates in the roadmap.svg file on slide x with the new data.
This way you will automagically have the newest numbers and can track everything within git in one prompt. Save this as a rule in AGENTS.md and run this every month to update your slides with one prompt.
Honest question: would a normal CS student, junior, senior, or expert software developer be able to build this kind of project, and in what amount of time?
I am pretty sure everybody agrees that this result is somewhere between slop code that barely works and the pinnacle of AI-assisted compiler technology. But discussions should not be held from the extreme points. Instead, I am looking for a realistic estimation from the HN community about where to place these results in a human context. Since I have no experience with compilers, I would welcome any of your opinions.
> Honest question: would a normal CS student, junior, senior, or expert software developer be able to build this kind of project, and in what amount of time?
I offered to do it, but without a deadline (I work f/time for money), only a cost estimation based on how many hours I think it should take me: https://news.ycombinator.com/item?id=46909310
The poster I responded to had claimed that it was not possible to produce a compiler capable of compiling a bootable Linux kernel within the $20k cost, nor for double that ($40k).
I offered to do it for $40k, but no takers. I initially offered to do it for $20k, but the poster kept evading, so I settled on asking for the amount he offered.
This will actually work well with my current workflow: dictation for prompts, parallel execution, and working on multiple bigger and smaller projects so waiting times while Codex is coding are fully utilized, plus easy commits with auto commit messages. Wow, thank you for this. Since skills are now first class tools, I will give it a try and see what I can accomplish with them.
I know/hope some OpenAI people are lurking in the comments and perhaps they will implement this, or at least consider it, but I would love to be able to use @ to add files via voice input as if I had typed it. So when I say "change the thingy at route slash to slash somewhere slash page dot tsx", I will get the same prompt as if I had typed it on my keyboard, including the file pill UI element shown in the input box. Same for slash commands. Voice is a great input modality, please make it a first class input. You are 90% there, this way I don't need my dictation app (Handy, highly recommended) anymore.
Also, I see myself using the built in console often to ls, cat, and rg to still follow old patterns, and I would love to pin the console to a specific side of the screen instead of having it at the bottom and pls support terminal tabs or I need to learn tmux.
So much this. I'm eagerly waiting to see what anthropic and OpenAI do to make dictation-first interaction a first class citizen instead of requiring me to use a separate app like Super Whisper. It would dramatically improve complex, flow-breaking interactions when adding files, referencing users or commands, etc.
Importantly I want full voice control over the app and interactions not just dictating prompts.
It's on Lovable so you can just fork it and take a look (the prompt is in supabase/functions/transform-render/index.ts):
Transform this idealized architectural rendering into the most brutally realistic, depressing photograph possible. This is the WORST CASE scenario - what the building will actually look like in reality:
- Set on a dreary, grey, overcast late November day with flat, lifeless lighting
- The sky is a uniform dirty grey, threatening rain
- All trees are completely bare - just skeletal branches against the grey sky
- The landscaping is dead, muddy, or non-existent. No lush gardens, just patchy brown grass and bare dirt
- Remove ALL people, the scene should feel empty and abandoned
- Any water features should look stagnant and grey
- Add realistic weathering, dirt streaks, and construction residue on the building
- The building materials should look how they actually appear, not the idealized clean version
- Include visible utility boxes, drainage grates, and other mundane infrastructure usually hidden in renders
- The overall mood should be bleak but realistic - this is what buyers will actually see on a random Tuesday in late autumn
- Maintain the exact building, angle, and composition, just strip away all the marketing polish
The goal is honest truth, not beauty. Show what the architect's client will actually see when they visit the site.
>> Remove ALL people, the scene should feel empty and abandoned
That really captures the vibe in Kendall square on the weekend, but for maximum "honest truth" there should be double-parking, delivery trucks and ubers stuck in traffic waiting on a thousand people to scurry across the street from the subway entrance, huddling against the cold. Some dirty snowbanks and grey slush puddles in the crosswalks would really nail it.
There they say that: "Observations made by MTG-S1 will feed into data products that support national weather services …".
So I guess there will be no simple, publicly available REST API or so... but if anybody finds anything, let us know here :)
For the datasets, I tried to access (like the full disc image in visible wavelength, MTG 0 degree), it is sufficient to register at eumetsat to get a username and password. The eumdac python tool is probably the easiest way to access the data:
(If you do not want to use python, the --debug option is quite useful to see exactly the request made. The output is either some JSON metadata or a large zip with the netcdf data)
Most weather data isn't generally available by easy to query REST API's (at least not at the original sources). One side project I had I wanted to use NOMADs data, and it was quite a grind downloading and processing the raw datasets into something usable at an application level (or viable to expose via an API).
That’s why you have service/products that have the sole purpose of taking all these region specific data sources and processing them in to a generic json api.
The government orgs probably do it intentionally so they don’t have ten million devices pinging their servers to update weather widgets.
The Latent Space podcast just released a relevant episode today where they interviewed Kevin Weil and Victor Powell from, now, OpenAI, with some demos, background and context, and a Q&A.
The YouTube link is here: https://www.youtube.com/watch?v=W2cBTVr8nxU
oh i was here to post it haha - thank you for doing that job for me so I'm not a total shill. I really enjoyed meeting them and was impressed by the sheer ambition of the AI for Science effort at OAI - in some sense I'm making a 10000x smaller scale bet than OAI on AI for Science "taking off" this year with the upcoming dedicated Latent Space Science pod.
generally think that there's a lot of fertile ground for smart generalist engineers to make a ton of progress here this year + it will probably be extremely financially + personally rewarding, so I broadly want to create a dedicated pod to highlight opportunities available for people who don't traditionally think of themselves as "in science" to cross over into the "ai for hard STEM" because it turns out that 1) they need you 2) you can fill in what you don't know 3) it will be impactful/challenging/rewarding 4) we've exhausted common knowledge frontiers and benchmarks anyway so the only* people left working on civilization-impacting/change-history-forever hard problems are basically at this frontier
Wasn't aware you're so active on HN; sorry for stealing your karma.
Love the idea of a dedicated series/pod where normal people take on hard problems by using and leveraging the emergent capabilities of frontier AI systems.
Song name is: Windowdipper from ꪖꪶꪶ ꪮꪀ ꪗꪖꪶꪶ by Jib Kidder
https://jibkidder.bandcamp.com/track/windowdipper
reply