Hacker Newsnew | past | comments | ask | show | jobs | submit | mixu's commentslogin

Good! If you are wondering what this looks like in practice, I booked 3 flights this year with Ryanair and EVERY single time my tickets (directly purchased from their site) were flagged as "made through a third-party travel agent".

The "verification" workflow is super obtrusive: either pay them to use facial recognition technology or do slower verification (which I assume would be too slow if you saw this last minute). If you missed the email, you'd end up having to pay 55 eur to fix the issue. I was able to complain to customer service but it was definitely incredibly user hostile, intrusive and just ridiculous given that I booked directly via their site.

> Dear AAA this booking, AABBCC, appears to have been made through a third-party travel agent who has no commercial relationship with Ryanair to sell our flights. Therefore, Ryanair has blocked this booking.

> As third-party travel agents often do not provide Ryanair with the correct passenger email address and payment details, we need to verify a passenger's identity before they can manage their booking and check-in online.

> Ryanair needs to carry out this verification process in order to ensure we can comply with safety and security requirements.

> Once a passenger on the bookings has completed Ryanair's verification process, we will provide full access to the booking, including to the ability to make changes to the booking, add additional services, and complete online check-in.

> Express Verification is available at a cost of EUR 0.59c per booking.

> This fee covers the cost of the verification. Ryanair does not benefit commercially from this. There is no charge for Standard Verification.

> Passengers who do not avail of online verification (Express Verification or Standard Verification) to verify their bookings can verify at the Ryanair ticket desk in the airport, however they will be charged an airport check-in fee of up to €/£55.


Ryanair.com tickets were flagged as such? I can't believe this, have used the website in the past two years 10 or so times, never had this.


Here's my setup:

- I use two large 64oz mason jars (best price via Walmart):

https://www.walmart.com/ip/Ball-Wide-Mouth-64oz-Half-Gallon-...

- In the past I used a metal filter from IKEA to filter the coffee by pouring into an empty mason jar:

https://www.ikea.com/us/en/p/oeverst-metal-coffee-filter-sta...

In the past year I switched to a mason jar pouring spout that has a built in filter that's at the right granularity to filter cold brew as you pour it into a glass:

https://www.amazon.com/Tea-Spot-stainless-Comfortable-Pitche...

That saves a bit of effort since the cold brew gets filtered when I pour it.

As far as how I prepare it:

I add a bit shy of 1/2 a cup of ground coffee per 64oz mason jar, mix vigorously and let it sit overnight (less than that and the flavors aren't quite yet there).

I don't proactively filter out the coffee grinds since the built in filter takes care of that. When one jar starts to run low, I start another one.


I use filters like these:

https://www.amazon.com/dp/B0CB628TNX/

Get cheap mason jars locally wherever, (wal mart is usually cheap, some grocery stores may have cheap ones), grounds in filter, filter in jar, add water, jar goes in fridge. You can also use them to make cold (or hot) brewed iced tea from loose leaves. There are different sizes of those filters for different sizes of jars. I like the 64oz jars.

Pull the filter out when it’s done to avoid overbrewing/oversteeping. Keeps a few days in the fridge.

Recipes (grounds:water ratio, brew time) abound online. Pick one and try it. I like 10-12 hours with something like 80g of grounds, with this method, for a 64oz jar.

I should get a pour spout lid for the jar, that’d never occurred to me and they look nice.

One downside to filtering at the lid instead of in a basket-type filter is that you have to pour it in another container to stop the brew. Whether that matters depends on consumption patterns and personal preference.


> Pull the filter out when it’s done to avoid overbrewing/oversteeping

I’d never have thought about that! Thanks, appreciate it


Ended up doing a DIY homemade thing with a filter I had, will check the recommendations.

Many thanks! the coffee-water ratio was going to be my next question.


For fun, I ran this against node-glob ( https://github.com/isaacs/node-glob ).

Looks like it exhibits the slower behavior:

  n,elapsed
  1,0.07
  2,0.07
  3,0.07
  4,0.07
  5,0.16
  6,1.43
  7,19.90
  8,240.76
See this gist for the script https://gist.github.com/mixu/e4803da16e42439480eba2b29fa4448...


If you're willing to wait until Electron releases a Chrome 59 -based build, I'll be updating https://github.com/mixu/electroshot which handles screenshots and print-to-PDF along with a bunch of other niceties.


Does it support paged media?


I wrote one of these for fun a while back using the following approach:

- Files are indexed by inode and device, files with the same inode + device are considered equal. (My main use case for this was to bundle up JS files.)

- Files are then indexed by size; only files with the same size are compared.

- During comparison, the files are read at block sizes increasing in powers of two, starting with 2k. The blocks are hashed and compared, and if they do not match the comparison is stopped early (often without having to read the full file). If all the hashes are equal, then the files are considered to be equal.

- Hashes are only computed when needed and cached in memory. Since the hash block size increases in powers of two, only a few dozen hashes are needed even for large files (reducing memory usage compared to a fixed hash block size).

link: https://github.com/mixu/file-dedupe


I wrote mine along similar lines, except without using hashing at all. Files of identical size are compared byte-by-byte instead, until first difference or end of file. As many files as possible at a time, of course, to avoid having to read through files multiple times. This avoids any uncertainty about hash collisions.

To find out how many files of each size you have:

find ~ -type f -printf '%s\n' | sort | uniq -c | sort -n


A hash is often use for an online algorithm. If you know the hash you know there is a potential for dedupe, and you can do a byte for byte comparison. I suppose you could use size as a prefilter for dedupe. This is if you do dedupe on a file level. Dedupe on block level doesn't care about the content, only the blocks, and it's not unusual to see for instance mp3 files with the same mp3 stream but different metadata. You cannot do the latter without hashing.


The "compare by front first" method sounds quite like a prefix tree.


If you need static export, I wrote a project a while back that converts markdown to static output with Ghost theme support: https://github.com/mixu/ghost-render


Any chance you could be more specific as to what you feel is missing in the book?

Granted, "distributed systems" is a enormous topic that no book can cover fully, but I have tried to cover things like:

- key papers (Lamport; Fischer, Lynch and Patterson; Chandra and Toueg etc.)

- topics relevant to highly successful commercial systems (e.g. 2PC => *SQL systems, Paxos => GFS/Chubby, ZAB => Zookeeper, Dynamo => Riak/Voldemort/Cassandra)

- and recent topics such as CRDTs and the CALM theorem.

Having a sense of how time, consistency and fault tolerance have been explained and handled is (I think) a prerequisite to more advanced topics, but I'd be interested in hearing what parts you'd feel need improvement because some day (~ some years from now) - I will revise the book and it would be nice to have a solid list of issues to revise.


I'm not sure exactly what the grandparent comment meant, but I think I have an idea. I only skimmed the contents so take this with a grain of salt.

Your book is focusing on a pretty narrow part of distributed computing. I would rename it "Managing State in Distributed Systems", or "Distributed Storage Systems". Your examples are Bigtable and Dynamo, which fall in this category.

The book seems to be aimed at sort of a "beginning" audience. But the topics are inappropriate for a beginning audience, and skewed for an expert audience.

Real distributed systems try to be stateless wherever possible. You need "big computer science" to manage state in distributed systems, but most code in a distributed system should not manage state. These techniques should be confined to specialized storage systems.

Here are some examples of real world distributed systems that don't use the described techniques to manage state:

  - clusters of stateless web servers + single master database (99%+ of websites people use)
  - message queue / work queue.  A single machine can productively manage 1,000 - 10,000 stateless workers, depending on the workload.
  - MapReduce
  - Original GFS
  - Napster
  - BitTorrent (tracker and trackerless would be interesting to write about)
  - BitCoin
The title seems to imply a practical bent, but it seems more like a collection of ideas (which are important and interesting, but not really what engineers need to know. IMO the #1 skill for distributed computing is to be competent at BOTH programming a single computer and at system administration).

If I wanted to be harsh, I would say it looks like you read a bunch of stuff and didn't work with it or implement it? At the very least, the ideas don't seem to be put in the context of commonly deployed distributed systems.

People need to understand these simpler, more robust, and more performant techniques, and how to apply them to their specific problem domain, rather than blindly throwing consensus at every problem (which is a disturbing trend I've seen).


It goes even beyond that. A lot of other very important, fundamental topics belong under the umbrella of distributed systems, starting with routing. The Internet is, after all, a giant distributed routing system.

Another topic that's huge all by itself is peer-to-peer networks, and all their associated aspects, such as structured (DHTs like Chord, Cassandra, etc.) vs unstructured (Gnutella, Kazaa, etc.), P2P search, handling churn, handling peers with heterogenous capabilities, peer selection, topology organization, decentralized routing, file-sharing (torrents) vs streaming (PPLive, Spotify), etc.

Other topics (with several overlapping aspects) include:

- Security, such as Sybil attacks, group key management, etc;

- Overlay networks;

- CDNs;

- Ad hoc and mesh networks;

- MMOs and multiplayer games;

- SCADA and industrial control systems;

- Pub/Sub systems and application layer multicast;

- Distributed file systems;

- Load balancing and bandwidth management;

And that's just off the top of my head... I'm sure I'm missing other important topics.


> It goes even beyond that. A lot of other very important, fundamental topics belong under the umbrella of distributed systems, starting with routing. The Internet is, after all, a giant distributed routing system.

Yes! DNS is also fascinating as far as distributed databases and consistency go:

http://pages.cs.wisc.edu/~akella/CS740/S08/740-Papers/MD88.p...


Indeed, but ultimately covering all of those topics would require an incredible amount of time and effort. So I need to pick and choose my battles as some topics are more important or interesting to me than others. :)


Completely understand. But as chubot suggests, the topic of "Distributed Systems" is really broad and something narrower in the title, such as "Distributed Data Systems" may be more apt.


> The title seems to imply a practical bent, but it seems more like a collection of ideas (which are important and interesting, but not really what engineers need to know. IMO the #1 skill for distributed computing is to be competent at BOTH programming a single computer and at system administration).

I think this is something where different authors will emphasize different aspects. My view is that understanding of how to deal with the evolution of state within a system is crucial. Even systems that are not databases per se still have a dependency on how state is managed because you want to be able to reason about how some specific answer to a computation was derived and what guarantees it comes with (from strong consistency to some alternative but hopefully precise definition). I figure there will be disagreement on whether this important, and that's fine. There are other books.

That does bring up an interesting question: which books on distributed systems do you feel exhibit your preferred approach (free or paid)?

Re: the suggested topics:

Clusters of stateless web servers + single master. This is definitely a common setup, but you need very little if any distributed systems research to implement it.

Queues: I find the larger scale implications of queuing to be rather interesting (specifically, how cascading failures can be caused by an inadequate understanding of interactions between queues) but haven't found a good discussion beyond Google's findings that doing duplicate work often pays off as reduced 95th percentile latency.

MapReduce: There are many good books covering this topic in much more depth and specificity, so I didn't feel like I had that much to add. MR does use the techniques described - beyond job assignment the whole system rests on the DFS which uses block-level replication and some coordination protocol to maintain metadata state.

I kind of assume people have had some exposure to the paradigm at this point and do address MapReduce a bit in the context of the CALM theorem, which notes that a much larger set of relational algebra operations can actually be executed safely without coordination. Another point might be that MapReduce is inefficient in that it provides too much fault tolerance for typical workloads and cluster sizes.

Original GFS: the design has been largely superseded both by newer version of HDFS (e.g. eliminating the single point of failure in the initial design) and Google's (unpublished?) internal equivalents. BTW, the original GFS relies on Chubby, which uses Paxos internally.

Napster, BitTorrent and BitCoin: peer-to-peer systems definitely deserve a more extensive treatment in a later version of the book. The issues here are different in that trust, efficiency and resiliency are more important and I didn't have the bandwidth to handle them in the book as it stands.

Thanks for your comment, and I hope this doesn't sound like a rebuttal - I just wanted to think through the topics you mentioned one by one.


> Clusters of stateless web servers + single master. This is definitely a common setup, but you need very little if any distributed systems research to implement it.

First, define "stateless"? I would not characterized such a system as stateless at all. Even if you're not using sticky sessions (with cache servers/load balancers talking to each other for failover using fairly involved protocols), there's still state that's ephemeral (sessions) in your application server, as well as bulk of the persistent state that's provided with essentially "faith based consistency" (consider typical memcached cluster with client doing consistent hashing -- asynchronously replicated MySQL with failover, etc... -- in case of a failure, neither availability nor consistency are guaranteed).

On the level of protocols design, the whole idea of stateless protocols (REST) vs. stateful ones (sticky load balancers + SOAP, CORBA, RMI, etc...) is by itself a big distributed systems topic.

A web browser talking to a web server is by definition a distributed system. I am typing this as fast as I can, praying not to get the "your link is invalid" error from HN right now -- this a real example of distributed system and cache coherence/consistency/atomicity. Here's one paper that deals with just these sorts of question (in context of NFS): ftp://ftp.cs.berkeley.edu/ucb/sprite/papers/state.ps‎

> Re: Chubby and GFS

Original GFS doesn't rely on Chubby, BigTable does however (for metadata). I believe newer versions (Collosus) by extension rely on Chubby as they rely on BigTable.

F1/Spanner, however, use consensus and transactions far more than others and is very interesting in this sense.

[Edit: more elaboration on distributed systems issues in a "stateless" cluster of web servers].


Couple of suggestions:

1) I'd avoid mentioning CAP and FLP impossibility result much further in, this is like starting a discussion of mathematical logic or Computer Science with Godel's Incompleteness Theorem.

I'd definitely cover vector clocks and causality before: you really need this background in order to understand CAP/FLP. You may want to take a look at the approach Vijay Garg takes in his book:

http://www.amazon.com/Elements-Distributed-Computing-Vijay-G...

(E.g., use of vector clocks for proofs)

2) Don't over-focus on "webscale" (using the term only semi-ironically here) projects (I say this having contributed to multiple such projects). NFS, AFS, and CodaFS all raised very interesting questions about distributed state and disconnected operations. You may also note how some of the authors on the Plan9 (9fs, 9p, fossil) papers overlap with authors of some of the Google papers. "Mobile sensor networks" have been essentially an excuse to build distributed systems. Chord, Pastry, and Kademelia (sp? The System that Bittorrent uses) -- are actually far more truly "distributed" than many webscale systems (consistent hashing is a very simple DHT). Finally, don't forget that the Internet itself is a distributed system: the most successful eventually consistent, highly available distributed database is DNS.

I'd actually cover the web-giant infrastructure paper stuff a bit later on. While the Dynamo paper did have some novel contributions (regarding Gossip and anti-entropy), it's not what it's notable for: what's far more interesting in that paper is the way these concepts fit together.

I'd even hold my nose and cover CORBA: it's an interesting example in that it handled everything by the textbook, but failed due to complexity costs it had to incur in order do that. Far simpler RPC protocols and Web Services prevailed at the end by promising less. JINI may also be interesting to cover -- excellent system, far too ahead of its time -- Java VM and ecosystem were immature (anyone else remember Blackdown JDK on Linux?), marketed incorrectly (driver architecture?) to a market that wasn't ready for it (this is when dot-com startups could suffice on buying a couple of E10Ks or high-end Alphas).

3) This is an enormous and highly ambitious project and I commend you from embarking on it (I just re-read http://paulgraham.com/hs.html -- see the part about hard problems and ambition). Nonetheless, if you're serious about this, be sure to get plenty of editors -- there's a lot of room to make subtle mistakes (which I've done all the time). A lot of this can be genuinely confusing -- e.g., partitions aren't just due to networks, but at the same time not every type of a failure or a stop is a partition (I used to site GC pauses an example of a partition, but now I'm not so sure).

I'd help, but I've neither the cycles nor the habit of academic rigor; if you're still in touch with any profs from your uni, they could either help themselves or point you to others.


Thanks for your kind words. There are definitely some cases where the topics could be reordered for greater clarity and I'll revisit them in the next iteration of the book based on the things people have pointed out (again, after a hiatus).

One of the challenges is to find the right balance between rigorous exposition one the one hand and keeping a brisk pace on the other (after all, writing for the web is different from writing a textbook). So I am grateful for all the input I've received thus far but it has definitely been challenging to find editors and reviewers - in particular because this is an unpaid effort on my part.


Here's one way:

1) Join the HTML5 trial http://www.youtube.com/html5

2) Open Chrome inspector, select the <video> element

3) In the Chrome console, type $0.playbackRate = 2.0 ($0 refers to the element you've selected in the inspector)

This also works for other sites that use the HTML5 video element, and allows you to go faster than 2x if you prefer.

YouTube's HTML5 interface also has a menu option (gear icon) which has a speed control, but the Chrome inspector trick works with any HTML5 site.


Quite handy - thanks for the tip!


If you find the book helpful - great! I'm glad. I can definitely see a lot of faults in my own work but I don't have infinite time to fix them, which is why the book is on Github ( https://github.com/mixu/singlepageappbook ).

There are things that I would definitely discuss differently if I wrote the book today - since things have developed a quite bit in the past year and half. I'll probably start working on a second edition late this year to rewrite the parts that I find most annoying in my book. I need to finish and ship a bunch of open source stuff before that though.


Excellent work. This is why I love our internet age. A single person can have a huge impact on many people, and he/she doesn't need to have massive media behind.


Any chance you could at least outline what you think the biggest changes in the past year and a half have been, for those of us who read the book? What are the things you would have written about differently?


I haven't yet fully formulated what I want to say so this is still just speculation on what the 2nd edition might contain. I can see a lot of room for improvement, especially in the latter chapters. Overall, the way I would frame the problem is much crisper now, and I think I have figured out how to better explain and implement my controller-averse approach, as well as server/client hybrid rendering and routing.

The community is more sophisticated about packaging these days so I would spend more time going beyond the basics and it's obvious what the most popular frameworks are (from ~8 contenders to ~3) so I should bring those in concretely to illustrate points. When I wrote the book, many of the frameworks were pre-1.0 and hadn't figured things out.

I also wrote another (to be released) book just on distributed systems which has helped with my thinking re: backend integration/caching/offline though I'm not sure how that will be directly reflected in this book.

Finally, I'd like to audit my use of indefinite articles, particularly within chapter headings.


+1. I'd be so interested in hearing your answer as well.


Have you considered making it in ePub format? It would be a really great!


It's actually available in mobi (Kindle) and epub on the front page, see the side bar at http://singlepageappbook.com/ or direct links: http://singlepageappbook.com/mixu-single-page-apps.mobi and http://singlepageappbook.com/mixu-single-page-apps.epub


(author here) - if you're at YC's startup school today and have questions, come talk to me. I've got a yellow tshirt and look like my Github pic.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: