I was an SDE on the S3 Index team 10 years ago, but I doubt much of the core stack has changed.
S3 is comprised primarily of layers of Java-based web services. The hot path (object get / put / list) are all served by synchronous API servers - no queues or workers. It is the best example of how many transactions per second a pretty standard Java web service stack can handle that I’ve seen in my career.
For a get call, you first hit a fleet of front-end HTTP API servers behind a set of load balancers. Partitioning is based on the key name prefixes, although I hear they’ve done work to decouple that recently. Your request is then sent to the Indexing fleet to find the mapping of your key name to an internal storage id. This is returned to the front end layer, which then calls the storage layer with the id to get the actual bits. It is a very straightforward multi-layer distributed system design for serving synchronous API responses at massive scale.
The only novel bit is all the backend communication uses a home-grown stripped-down HTTP variant, called STUMPY if I recall. It was a dumb idea to not just use HTTP but the service is ancient and originally built back when principal engineers were allowed to YOLO their own frameworks and protocols so now they are stuck with it. They might have done the massive lift to replace STUMPY with HTTP since my time.
Rest assured STUMPY was replaced with another home grown protocol! Though I think a stream oriented protocol is a better match for large scale services like S3 storage than a synchronous protocol like HTTP.
Partitioning is based on the key name prefixes, although I hear they’ve done work to decouple that recently.
They may still use key names for partitioning. But they now randomly hash the user key name prefix on the back end to handle hotspots generated by similar keys.
> The hot path (... list) are all served by synchronous API servers
Wait; how does that work, when a user is PUTting tons of objects concurrently into a bucket, and then LISTing the bucket during that? If the PUTs are all hitting different indexing-cluster nodes, then...?
(Or do you mean that there are queues/workers, but only outside the hot path; with hot-path requests emitting events that then get chewed through async to do things like cross-shard bucket metadata replication?)
LIST is dog slow, and everyone expects it to be. (my research group did a prototype of an ultra-high-speed S3-compatible system, and it really helps not needing to list things quickly)
I also work at AWS on EC2 and agree with this assessment. Amazon is large and the culture varies across organizations. I’ve worked at multiple big tech companies and AWS EC2, while not perfect, is the best place I’ve been part of.
What causes have the other 19,920 COVID cases been traced to? I read the articles you linked and this is the only other data I could find:
“The results showed that more than a thousand COVID-19 cases had been linked to construction and nursing homes - while bars and restaurants had just 22 cases.”
1,000 cases from construction and nursing homes brings the number to 18,920 cases we need data on. This is the problem. There is so little high confidence tracing data that we can’t make strong decisions on what quarantine actions are materially impactful. Whose to say a substantial amount of that remaining number isn’t caused by bars and restaurants? If that data is out there, please share it. If not, please support more rigorous contact tracing.
Grocery stores probably closer to zero. Reminds me of stranger danger.
You most likely will be infected by a family member or office coworker.
In march during the full lockdowns, something like 20% of the population was still meeting their friends and family behind closed doors like nothing was happening, and this is why the virus didn't die out in 2 weeks.
The reason why China welded apartment doors getting shut was not to prevent people getting out.
It was to block family and friends of the apartment house inhabitants from visiting.
I happen to have both a Jarvis desk I bought 3 years ago, and just recently bought an UpLift v2 with the commercial crossbar package as an additional desk in my office. Both sit next to each other, making comparison easy.
I was content with my Jarvis until I received the Uplift. Stability comparisons are night and day: Uplift (with crossbar) has vastly superior stability at standing heights. The Jarvis has noticeable sway just using a keyboard and mouse, while the Uplift remains stable even when pounding on music gear. Both are stable at sitting heights. I found accessories (such as cable management) to be better with Uplift as well. It is enough of a difference that I am considering replacing the Jarvis with another Uplift desk in the future.
To Jarvis’ defense, I am comparing their 3 year old product to their competitors’ product I bought 1 month ago, and the overall desk experience with Jarvis is still good enough that I’m in no rush to replace it. I will be only buying standing desks with crossbars from now on, though.
I went with the commercial C-frame with a 60” desk and I think the 4 legged version would be overkill. I have a 27” iMac sitting on a 4U rack monitor stand filled with devices and probably around 30 pounds of music equipment on top of the desk and it feels consistently stable with no strain when changing heights. Unless you are putting seriously heavy stuff on top I don’t think you need 4 legs.
No CPU holder, but I got the retractable keyboard tray. At the lowest height (under 24”), it collides with the crossbar when fully pushed back. This never impacts me because it is well below the lowest height setting I use for sitting and can’t imagine it being an issue for anyone else. If the CPU holder collides with the crossbar, though, it’d probably happen at an unacceptable height. I would check the dimensions of everything before purchase.
For my purposes, I have an an additional desktop machine I simply keep on the floor by the desk.
I am a long-time user of Soundcloud that has recently stepped back from the service. As an EDM enthusiast, the service is the best place for finding up-and-coming talent and great remixes that you can't find anywhere else. The killer feature is the personal stream. Once you've started following a large enough pool of artists and labels, there is no better mechanism for wading through the pool of mediocrity to search out those rare gems.
Unfortunately, Soundcloud's method for organizing your content really breaks down under heavy use. I have over 1000 songs liked, and scrolling through the like list, dynamically fetching ten songs at a time, is a huge pain. Playlist creation and maintenance is even more cumbersome and limited than Spotify, which is saying something.
Worst of all, the usefulness of the stream, arguably Soundcloud's most unique and valuable listener-facing feature, really breaks down the longer you're using the service. Most heavy users I know end up following over a few hundred artists/labels/channels, accumulated over years of use, and the signal-to-noise ratio becomes unbearable. The sad thing is that the breakdown is purely a UX problem. The webapp is an infinite-scroll nightmare: forcing you to start at the top of the stream every time and fetch tracks 10 at a time. It doesn't clean up after itself, so after an hour of slowly chipping away through your feed, the browser gets so slow and unresponsive on my top-spec Macbook Pro that I often give up and don't bother trying to listen to new tracks that have been posted to my feed over 18 hours ago. Of course, as I now check Soundcloud less and less frequently, that means I am missing a ton of content.
On top of that, due to Soundcloud's reputation as solely a promotional tool within the artist community, you end up wading through a ton of 1 - 2 minute previews and other low-quality throwaways, and songs that you like can disappear from the service at any time, without notice. You cannot build a stable music library on top of the service. In fact, I used to have a process where I'd look for new songs on Soundcloud, and if I found something good, look it up on Spotify and save it within my Spotify collection, because I knew it wouldn't just disappear on me after a few months. Once Spotify upped their discovery game and the available EDM content grew, I removed Soundcloud from the process entirely.
Soundcloud used to be the best game in town for music discovery, especially EDM, but they let that slip over the last few years of struggling to monetize and Spotify's constant progression has now really chipped away that advantage. I've signed up for this GO service, but I highly doubt I'll keep it past the 30 day free trial. $10 a month for offline download and removing ads? As far as I can tell, my stream is the same mess as it was before, and they don't even distinguish between GO-exclusive and free tracks. I don't see anywhere where I can find GO-exclusive tracks within the app at all, actually.
New music discovery is hard and the big players (Spotify, Apple Music, Google Play) still haven't fully cracked it, at least for specific-niche enthusiasts like myself. When the Soundcloud stream is working well, it is the best interface I've used for finding new music I like, but the complete lack of focus and neglect on that front from this initiative means that I probably won't be coming back.
1) the bad UX when having a big library
2) the 3rd party developer experience.
Starting from 2) I'd say SoundCloud has been fairly open in their early stage, and a lot more conservative later on. I'm not sure what's the direction they are going to take in the future. If I was running SoundCloud, I'd keep the API as a playground, not for production use, and I'd try to fix the major issue in my product instead of leaving room for someone else to crack that business.
Regarding 1) you are absolutely right. It boils down to 3 major areas of the product: classification, recommendation and search.
Big players are working on that, but they are all implementing some sort of social driven collaborative filter (which is sad, if we think about how personal is the listening experience). Soundcloud should try to have a more hybrid approach introducing some more advance machine learning neural network technique to get rid of the 'cold start' effect and have a better targeting of the 'long tail' of their 125m tracks.
Once they have the 'data' layer done, they should complement with a product UX experience, including curated playlists, radios, personal automatic generated playlist, and an overall better UX experience for the user-listener segment of their audience.
Great analysis of Soundcloud's problems. I love it for all the same reasons (something like 90% of the new music I've found over the past 4 years is via soundcloud), but any ability to filter/manage/sort/search your favorites or your stream is basically missing.
And, as you say, it's purely a UX problem.
Also the fact that they don't have limited permission tokens is insane. A lot of artists have a "free download for followers" policy, but the integrations that enforce that require 100% complete access to your account. As in, sign me up to follow anyone, post anything, etc. Just to verify that I'm following you.
I actually spent the last two years writing a music app that integrated with the Soundcloud API to provide the bulk of the music content. To call their iOS SDK "neglected" is an understatement, but what I found worse was the opaque failure modes and restricted access provided by the official REST API.
For example, Soundcloud provides artists with a switch to allow/forbid third-party clients using their API to access the audio streams of the songs they post. This restriction, naturally, doesn't affect the official Soundcloud apps. While it makes my app a second-class citizen for accessing Soundcloud content, I understand the motivation and reasoning for the feature.
However, how does the Soundcloud API surface these restrictions? Through one of the dozens of flags it attaches to the json response it returns when you query for a track's metadata? Ha, no, all the various "track downloadable" and "streamable" flags are all set to true. Instead, it just returns a 404 when you try to fetch the data from the track URL...except when it started returning 401 instead (an admittedly more appropriate return code)...until it started returning 404 again.
So as a developer, my only avenue for not surfacing non-playable tracks within my client was to attempt to download each track, catch any 4xx responses, ASSUME the reason is due to permissions rather than any other potential causes for 4xx errors, and hide the content within the app.
I can understand Soundcloud's lack of enthusiasm for providing decent 3rd-party integrations, but if your external API is this much of a mess, I'd hate to see what you vend internally. The fact their permissions token provides zero granularity is not surprising to me at all.
That is tragic. I'm such a huge fan of SoundCloud - I even pay for a pro account, even though I don't post that much audio, just to give them $$ - but ... sigh. These are not good problems.
And here I thought their Roshi library looked pretty good. Maybe the talent is all on the backend there.
I also recently quit Amazon. I also worked within AWS. However, I absolutely loved it, and had the complete opposite experience to almost everything you are describing. My reasons for leaving had absolutely nothing to do with Amazon, and I would work there again.
My bosses were excellent and cared deeply about my personal and professional development. I never got the impression that I was viewed as a drone. I have nothing but respect for the members of upper management that I met, who came off as smart, driven, and truly passionate about their work.
I had worked at a few other companies before joining Amazon, and what I found most refreshing was that, even when I was an SDEI, my opinion about the direction of the team and the projects we were working on was sought and valued. I had never experienced that before at previous employers, where I was very much a "drone".
However, AWS does promote a blunt culture where direct feedback is encouraged. Having never been encouraged at previous employers to provide thoughts on high level design and strategic roadmap decisions before, the ideas I would present would often times be suboptimal, and a senior dev would be quick to point out the flaws in my approach. Let me be clear, however, that it was always the IDEA that was attacked and never ME, personally. I found this approach incredibly helpful in my journey to become a better software engineer. I got along incredibly well with my colleagues and at no point did I ever not feel like a respected and valued member of the team.
I am willing to concede that I was fortunate to have very good direct managers during my time at AWS, and while members of other teams around me also reported similar contentment when I talked to them, I did notice a team or two whose direct managers did not seem up to the task. I firmly believe your experience with a company is at least 80% your direct manager, and if I was reporting to one of those managers that I did not respect I would probably be telling a different story.
This is all to say, I believe you when you say you had a terrible experience, but I wanted to balance your negative anecdote with my positive one.
S3 is comprised primarily of layers of Java-based web services. The hot path (object get / put / list) are all served by synchronous API servers - no queues or workers. It is the best example of how many transactions per second a pretty standard Java web service stack can handle that I’ve seen in my career.
For a get call, you first hit a fleet of front-end HTTP API servers behind a set of load balancers. Partitioning is based on the key name prefixes, although I hear they’ve done work to decouple that recently. Your request is then sent to the Indexing fleet to find the mapping of your key name to an internal storage id. This is returned to the front end layer, which then calls the storage layer with the id to get the actual bits. It is a very straightforward multi-layer distributed system design for serving synchronous API responses at massive scale.
The only novel bit is all the backend communication uses a home-grown stripped-down HTTP variant, called STUMPY if I recall. It was a dumb idea to not just use HTTP but the service is ancient and originally built back when principal engineers were allowed to YOLO their own frameworks and protocols so now they are stuck with it. They might have done the massive lift to replace STUMPY with HTTP since my time.