Hacker Newsnew | past | comments | ask | show | jobs | submit | raverbashing's favoriteslogin

Amazon, Esri, Grab, Hyundai, Meta, Microsoft, Precisely, Tripadvisor and TomTom, along with 10s of other businesses got together and offer OSM data in Parquet on S3 free of charge. You can query it surgically and run analytics on it needing only MBs of bandwidth on what is a multi-TB dataset at this point. https://tech.marksblogg.com/overture-dec-2024-update.html

If you're using ArcGIS Pro, use this plugin: https://tech.marksblogg.com/overture-maps-esri-arcgis-pro.ht...


For real data you can use Gaia ESA archive: https://gea.esac.esa.int/archive/

I went to study MSc in Space Science and Technology as a hobby few years ago. In one course (2022) we had an assignment to find Supernovae from recent Gaia data (Python code). Then made sure this is observable by University’s robotic telescope (and compliant with local weather forecast). Next requested the observation from the telescope and if successful, received the pictures next day. Had to analyse the results as well. It surprised me how much data there actually is available in quite open format from ESA missions.

Controlling remote telescope few thousand kilometres away was also a nice experience.


It’s true that ARM alone isn’t the reason for the M1’s performance, but it’s definitely a significant factor. x86 is old — modern x86 chips are still backwards-compatible with the original 8086 from 1978 — and it’s stuck with plenty of design decisions that might have been the correct choice sometime within the past 45 years but no longer today. Whereas the M1 only implements AArch64, a complete redesign of the ARM architecture from 2012, so it doesn’t have to deal with legacy architectural baggage. (We’ve known x86 was the wrong design since the 80’s — hence why there’s no Intel chips in smartphones — but it hasn’t been realistic for anybody except Apple to spend 10 years and billions of dollars to make a high-performance non-x86 chip.)

Some examples:

- x86 guarantees strong memory ordering on multi-processor systems, which adds completely unnecessary overhead to every memory access. arm64 uses a weak memory model instead, providing atomic instructions with relaxed or acquire/release semantics (see https://youtu.be/KeLBd2EJLOU?t=28m19s for a more detailed discussion). This significantly improves performance all around the board, but especially with reference counting operations (which are extremely common and often a bottleneck in code written in ObjC/Swift): https://twitter.com/Catfish_Man/status/1326238434235568128

> fun fact: retaining and releasing an NSObject takes ~30 nanoseconds on current gen Intel, and ~6.5 nanoseconds on an M1

- x86 instruction decode is pretty awful, a significant bottleneck, and not parallelizable due to the haphazardly-designed variable-length CISC instruction set. arm64’s instruction set is highly regular and easy to decode, so Apple can decode up to 8 instructions per clock (as opposed to 4 for x86 chips). Most sources agree this is why the M1 can have such a big out-of-order-execution window and achieve such high instruction-level parallelism compared to Intel/AMD.

- x86_64 has only 16 architectural registers, compared to 32 for arm64. This means the compiler has a much harder time generating efficient, parallelizable code and must resort to spilling registers much more often.



Nothing strange, I have a small script to cleanup pdfs in general (reducing their size as well), essentially

  pdftops -paper A4 -expand -level3 file.pdf

  ps2pdf14 -dEmbedAllFonts=true          \
    -dUseFlateCompression=true           \
    -dOptimize=true                      \
    -dProcessColorModel=/DeviceRGB       \
    -r72                                 \
    -dDownsampleGrayImages=true          \
    -dGrayImageResolution=150            \
    -dAutoFilterGrayImages=false         \
    -dGrayImageDownsampleType=/Bicubic   \
    -dDownsampleMonoImages=true          \
    -dMonoImageResolution=150            \
    -dMonoImageDownsampleType=/Subsample \
    -dDownsampleColorImages=true         \
    -dColorImageResolution=150           \
    -dAutoFilterColorImages=false        \
    -dColorImageDownsampleType=/Bicubic  \
    -dPDFSETTINGS=/ebook                 \
    -dNOSAFER                            \
    -dALLOWPSTRANSPARENCY                \
    -dShowAnnots=false                   \
     file.pdf file.pdf
that's is. After if needed we can add extra metadata. It's not specially designed to remove certain kind of tracking but simple and useful enough in most cases.

If you do not want to mess with Rust borrow checker, you do not really need a garbage collector: you can rely on Rust reference counting. Use 1.) Rust reference-counted smart pointers[1] for shareable immutable references and 2.) Rust internal mutability[2] for non-shareable mutable references checked at runtime instead of compile time. Effectively, you will be writing kind of verbose Golang with Rust's expressiveness.

[1] https://doc.rust-lang.org/book/ch15-04-rc.html

[2] https://doc.rust-lang.org/book/ch15-05-interior-mutability.h...


I find “4nn4’$ 4rch1v3 dot ORG” actually way better than pirate bay for pirating knowledge.

It’s amazing the amount of books that copyright laws prevent us from finding

https://www.theatlantic.com/technology/archive/2012/03/the-m...


There are "dumb TVs" - https://www.tomsguide.com/how-to/how-to-buy-a-dumb-tv-and-wh...

I have latest LG OLED and I would have hapilly paid more for to get it without smarts built in. Using it with attached Apple TV and the experience is sublime when compared to the builtin crap.


A few years back, I was trying to find out how to reduce mistakes in the programs I write.

I got introduced to Lamport's TLA+ for creating formal specifications, thinking of program behaviors in state machines. TLA+ taught me about abstraction in a clear manner.

Then I also discovered the book series "software foundations", which uses the Coq proof assistant to build formally correct software. The exercises in this book are little games and I found them quite enjoyable to work through.

https://softwarefoundations.cis.upenn.edu/


> even if I do get frustrated with mind-numbing Async issues from time to time.

A noticeable portion of async issues come from the fact that a lot of people use Tokio's multithreaded async runtime. Tokio allows you to mix async and native theads, which is both a virtuoso technical accomplishment and also a bit ridiculous.

If you use Tokio's single-threaded runtime, things get simpler.

The remaining async challenges are mostly the usual "Rust tax", turned up to 11. Rust wants you to be painfully aware that memory allocations are expensive and that sharing memory in a complex concurrent system is risky.

In sync Rust, the usual advice is "Don't get too tricky with lifetimes. Use `clone` when you need to."

In async Rust without native threads, the rules are something like:

1. Boxing your futures only costs you a heap allocation, and it vastly simplifies many things.

2. If you want a future to remain around while you do other stuff, have it take ownership of all its parameters.

Where people get in the most trouble is when they say, "I want to mix green threads and OS threads willy-nilly, and I want to go to heroic lengths to never call `malloc`." Rust makes that approach look far too tempting. And worse, it requires you to understand and decide whether you're taking that approach.

But if you remember "Box more futures, own more parameters, and consider using a single-threaded runtime" then async Rust offers some pretty unique features in exchange for a pretty manageable amount of pain.

Also, seriously, more people should consider Kotlin. It has many Rust-like features, but it has a GC. And you don't need to be constantly aware of the tradeoffs between allocation and sharing, if that's not a thing you actually care about.


What is the best (affordable!) CAD software for a hobbyist with no experience? I was recently looking into this and the leading software was like 6,000 USD a year per seat.

TIL mobile JavaScript console https://eruda.liriliri.io/

Every cloudflare site responds with `ip=x.x.x.x` at /cdn-cgi/trace

https://troyhunt.com/cdn-cgi/trace


I honestly didn't expect this from Reddit. It seems like investors are really tightening their grip and they are banning subreddits and long-time users who oppose these changes left and right.

I built a free API emulating the Reddit API[1]. It was returning the same data as the existing publicly accessible .json endpoints on reddit.com (for example https://www.reddit.com/r/Save3rdPartyApps.json). They not only blocked my requests, but also banned the subreddit I created and my 13 years old personal Reddit account (permanently!).

1 - https://api.reddiw.com


The Bard model (Bison) is available without region lock as part of Google Cloud Platform. In addition to being able to call it via an API, they have a similar developer UI to the OpenAI playground to interactively experiment with it.

https://console.cloud.google.com/vertex-ai/generative/langua...


Highly recommend the CrimeFlare tool for determining the “real” IP of sites so you may bypass CloudFlare and connect to sites directly.

Tangential: you can finetune something like flan-ul2 to do quote extraction using examples generated from chatgpt. If you have a good enough GPU, it should help cut down costs significantly

I've solved your data binding problem with: GraphQL (server) + UrQL (JS graph client with the graphcache plugin) + GraphQL Code Generator (creates TS types for your API).

The data syncs perfectly, keeps up-to-date. GraphQL subscriptions allow for real time updates. Oh, and React-Hook-Form for forms, which feeds quite nicely into my GraphQL mutations. It's a real neat solution.

As for server side, I've started using Python-Strawberry (was using Graphene but it stopped receiving updates) with Django.

It's super solid stack, with typings flowing all the way from server API through to my react components.


It's annoying for sure. I deal with abuse at a large scale.

I'd recommend:

- Rate-limit everything, absolutely everything. Set sane limits.

- Rate-limit POST requests harder. Preferably dynamically based on geoip.

- Rate-limit login and comment POST requests even harder. Ban IPs that exceed the amount.

- Require TLS. Drop TLSv1.0 and TLSv1.1. Bots certainly break.

- Require SNI. Do not reply without SNI (nginx has 444 return code for that). Ban IP's on first hit that connect without. There's no legitimate use and you'll also disappear from places like Shodan.

- If you can, require HTTP/2.0. Bots break.

- Ban IP's listed on StopForumSpam, ban destination e-mail addresses listed there. If possible also contribute back to SFS and AbuseIPDB.

- Collect JA3 hashes, figure out malicious ones, ban IPs that use those hashes. This blocks a lot of shit trivially because targeting tools instead of behaviour is accurate.


To what extent do these concerns change or disappear if using alternatives, like Pulsar or Redpanda?

It's been awhile since I dug into this but a couple jobs ago I was very concerned with what happens if replaying a topic from 0 for a new consumer: existing, up-to-date consumers are negatively impacted! As I recall this was due to fundamental architecture around partitions, and a notable advantage of Pulsar was not having such issues. Is that correct? Is that still the case?


Curl, jq is the good answer here.

More from the fringe, I'm quite enjoying zx, a really nodejs powered scripting environment. https://github.com/google/zx

Covered somewhat in ZX 3.0 https://news.ycombinator.com/item?id=28195580 (189 points, 8 months ago, 163 points)


ISPs may also be lobbying us through social media. Peter Pomerantsev makes a strong case that moderated forums are for sale in This Is Not Propaganda [1]. For example, he mentions "consensus cracking",

> There are instructions on how to control an internet forum, including tips on “consensus cracking”: using a fake persona to express the ideas you oppose in such a weak and unconvincing manner that you can then use another fake persona to knock them down.

Here is Peter speaking [2] a week ago at a Stanford conference on disinformation where Obama gave the keynote [3].

[1] https://www.goodreads.com/book/show/41717504-this-is-not-pro...

[2] https://youtu.be/Nd1CKG3o818?t=23848

[3] https://youtu.be/YrMMiDXspYo?t=1855


Yeah youtube-dl is missing workarounds for throttling implemented by google. youtube-dl is pretty much unmaintained compared to yt-dlp:

https://github.com/ytdl-org/youtube-dl/graphs/commit-activit...

https://github.com/yt-dlp/yt-dlp/graphs/commit-activity


> The scary part is that it's impossible to verify where most of the online content on both sides come from. The enemies of the West must be having a field day with how easy it is to insert radically opposite and polarising views into each side and then watching big issues become quashed, and little issues become magnified beyond proportion.

I have a theory that I cannot provide verifiable evidence for, but due to the technical fluency of the readers here I believe it may be interesting to some.

I run a small marketing service that ingests new content submitted to a number of social media sites (colloquially known as “social listening”). We run text analytics on the content, primarily to find marketing opportunities for customers. That system also has very rudimentary checks for “bot” accounts.

Starting in early 2020 there was a massive, massive spike in the number of bot accounts creating and responding to content on reddit. Our system doesn’t “cross-reference” flagged accounts very well, but I manually went through the post history on a few of those accounts and found that many of them had responded with congruent comments to submissions of other flagged accounts.

Furthermore, most of the flagged accounts had a similar pattern in the timing of their posts. Posts and comments were relatively irregular and sporadic near the start of the accounts’s history, indicative of a real user. Then, submissions completely stopped for a number of months. After the pause, the account would resume submissions and comments with far more regularity. The patterns exhibited by those accounts may indicate that they were overtaken and sold in bulk accounts lists for use as bot accounts.

Every account that I checked was posting content with a clear narrative.

I believe these are very large bot networks upvoting and submitting content of a particular nature in order to sway popular discourse and give an appearance of a particular consensus among conversation participants.

The plausibility of my theory has been augmented by the fact that rudimentary software for creating reddit bot networks can be found for sale on various “botting” forums. Furthermore, I was accepted into the OpenAI GPT-3 beta a few months ago; the capabilities of that model have further convinced me of the validity of my theory.

If you have experience with bots, natural language processing, or another related field, please feel free to point out flaws in my theory!


I would also add that Constraint Programming has a small but very dedicated community, and the amount of innovation in the space is incredible given how niche it is. Some things I find amazing:

* Dozens of solvers with hundreds of search methodologies

* Intermediate Language specifications (FlatZinc) to allow solvers written in one language to interact with completely different languages via a common intermediate language.

* Entire catalogs of useful constraints https://sofdem.github.io/gccat/

* Solver Competitions https://www.minizinc.org/challenge.html

* Peer Reviewed journals https://www.springer.com/journal/10601


I’ve pointed out, before, that one advantage I have, is a huge portfolio (check my SO story[0]).

It’s tens of thousands of lines of code, in multiple shipping product form. All you need to do is clone a repo, hit “Archive,” and you have a built and App Store-ready app. Some of these apps are still on the App Store (most have been deprecated, over the years). I have full source for shipping apps (over 20), going back to 2012. I've been writing Swift -every single day-, since the day it was announced, in 2014, and have released quite a few apps, written entirely in native Swift. I'm working on a big one, right now.

Here’s an example of a repo for a currently shipping free app that is available as an iOS/iPadOS app, a Mac app, a Watch app, and a TV app. It’s a Bluetooth BLE explorer app (yes, you can sniff Bluetooth on an Apple Watch)[1], [2], [3], [4]. It uses this cross-platform Swift BLE SPM module[5].

All of the repos also include things like graphic asset originals (usually Adobe Illustrator). I’m a passably good designer. At one time, I considered becoming a professional artist.

All of my work is localizable and accessible. A number of my apps have been localized in multiple languages. These days, I also tend to do things like support Dark Mode.

I have a ton of SPM modules, tested, documented, tagged and available for immediate integration. I use most of them in my own work.

I have full source for a couple of server systems, that are in heavy use, today (I use them in my own work, and one is a worldwide standard, in use by thousands, daily).

I have dozens of blog posts, articles, tutorials, explorations and other online writings[6]. I go into great detail, how I design, test, architect, and think. Most of this stuff is extremely detailed, and comes with supporting playgrounds. I’m a fairly good writer. There’s a lot there, but it’s quite readable.

I have given instruction on technical stuff for years. The most recent one was a Zoom class on intro to Core Bluetooth, using Swift[7]. It was received well.

I don’t know if I have a single fork. I’m the original author of all of it. Since I have over a decade of commit history, across multiple public repository systems, that’s easy to prove. I also tend to have fairly informative (and frequent) checkins. It’s simple to see how I work. My GH Activity Graph is solid green[8].

My technical ability is not a matter for debate, it’s easy to see what I bring to the table (including limitations). I'm satisfied that there's lots of stuff I can do. I won't bother trying to claim abilities that I don't have.

Any interview should be only determining whether or not I’d get along, and whether or not I would be a good “personality fit.” Since I spent decades at my jobs, including at one of the most famous brands on Earth, that should also be easy to figure out. I could definitely see that some companies would not want me, but that should be simple to determine. I’m a completely open book. My LinkedIn profile is full of testimonials, by former managers, coworkers, employees, and open-source project partners.

It’s been my experience that all this has been completely ignored, in favor of ridiculous 50-line binary tree tests.

In one interview, I sent the recruiter links to several public repos of code for shipped applications, that pretty much exactly fit the requirements of the job they contacted me for. This was ignored. Instead, I was passed to an obviously bored tester, who gave me a binary tree test in a language not used by the open position, and I was dinged for not using a formulaic approach, unique to that language (which, did I mention?, was not the one used for the posted job). The repos that I had sent, were in the language that was specified in the opening.

After a few of these broken, insulting, awkward, hazing rituals, I simply gave up looking. It’s plain that no one wants me, and I won’t go where I’m not wanted. I'm fine, doing my own thing.

[0] https://stackoverflow.com/story/chrismarshall (SO Story)

[1] https://github.com/RiftValleySoftware/BlueVanClef (App Source)

[2] https://apps.apple.com/us/app/blue-van-clef-for-mobile/id151... (iOS/iPadOS App - Includes Watch App)

[3] https://apps.apple.com/us/app/blue-van-clef-for-tv/id1529181... (TV App)

[4] https://apps.apple.com/us/app/blue-van-clef/id1529005127?mt=... (Mac App)

[5] https://github.com/RiftValleySoftware/RVS_BlueThoth (BLE SPM Module)

[6] https://littlegreenviper.com/miscellany/ (Writing)

[7] https://github.com/ChrisMarshallNY/ITCB-master (Core Bluetooth Course)

[8] https://github.com/ChrisMarshallNY#github-stuff (GH ID)


Disclaimer: I have a degree in robotics and worked for what is now Lyft Level 5, dealing with autonomous driving.

What it does is basically a big optimization problem. You have 6 parameters for each picture (x, y, z, roll, pitch, yaw, although in practice there are better parametrizations based on what is called Quaternions). Than three prameters for the position of each keypoint. Those are simply the same physical spots on multiple image that are matched based on their visual appearance.

Last component is a metric that you try to optimize. That is mostly just reprojection error. Given all your current estimates where yould the keypoints be projected on this synthetic image. Then you compare it with the pixel locations of where it actually in and try to minimize this.

It is actually a very versatile and robust pipeline which gives you what you’ve seen on the images.

Last step is to produce the dense reconstruction which is commonly done using patch match algorithm.

You can try it for yourself with the OpenMVG library. Very hackable and versatile.


Reminds me of `echo $(dig @ns1.google.com o-o.myaddr.l.google.com TXT +short | tr -d \")`. I have no idea where this DNS query came from, because searching all of Google turns up nothing but https://github.com/GoogleCloudPlatform/cloud-self-test-kit/b..., which is never referenced by anyone. I had to track it down myself for a bootstrap.sh, but I don't like using undocumented sources for critical infrastructure.

My use case was needing to set the result of `hostname -f` in /etc/hosts in an automated fashion if a VPS provider didn't already add a line for the public Internet address in that file. You need to do this so that sendmail doesn't fail on `apt install` when it attempts to read your FQDN. So I couldn't use the NGINX example posted elsewhere here.

It seems like https://checkip.amazonaws.com/ is much more "reliable" in that it is publicly documented at https://docs.aws.amazon.com/sdk-for-net/v3/developer-guide/s....

To anyone who needs to read this: please don't use "services" like icanhazip for your provisioning. Even my examples above are bad.

It does strike me as weird that there is seemingly no POSIX-compliant way to get your public Internet address, from my readings.

Edit: Oh goodness... even Amazon's documentation recommends using Google's undocumented DNS query.[1]

[1]: https://aws.amazon.com/premiumsupport/knowledge-center/route...


RapidAPI. You can consume multiple API's from one account and as a developer it's a good way to monetize an API if there isn't too much competition in your niche. Not sure what you mean by firewalling but RapidAPI does authenticate their requests to your endpoint so as a developer you can do access control in this way.

SourceHut, maybe? https://sourcehut.org/

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: