Depending on their needs and options, people are using Spark, Hive, Kafka or shell scripts to ingest data into hadoop.
If you don't like Nifi UI but you still want to make use of its advantages, you could try https://kylo.io/ , which runs on top of Nifi and it has a much simpler UI.
That's another reason for putting another router after your ISP's box. As long as I'm not an admin on that one, they can do a lot of shady things. Also using a DNS server with external forwarders (PiHole is great for that).
The problem is that the number of attack vectors are legion. If they don't get you by DNS they'll get you by one of the thousands of other attack vectors.
As an aside I once found out you can replace 'sort | uniq' entirely with an obscure awk command so long as you don't require the output to be sorted. Iirc it performs twice as fast.
The awk commands prints the first occurrence of each line in the order they are found in the file. I can imagine that sometimes that might be even better than sorted order.
If you're already in Windows land, you should consider leveraging PowerShell instead of bash. Pretty much all the same tooling is there, only with more descriptive names, tab completion on everything, passes typed object data instead of text parsing, etc.
ricardo.ch (similar to craigslist) is doing the same in Switzerland. I think this gives the user the impression that he has to be responsible about his actions on the website.
It looks like a faster version of HDFS since it's written in C++ (vs Java).
Another important aspect is that is using SSD + SATA(I suppose) , which could be a better option than standard SATA/SSD or LV cache using SATA + SSD.
Even if it's just a new thing, if it proves to be faster it may be implemented in Hadoop ecosystem in the future. HDFS has a lot of features being a mature piece of software but it lacks on the response time.
During non-GC periods, probably true. But having a realtime filesystem service that is prone to stop-the-world GC pauses is a showstopper for many applications.
Also, a C++ implementation is likelier to use far less memory than a Java implementation, assuming the skills of both programmers are roughly equal.
The underlying local filesystem on each node is not truly realtime, so a "realtime distributed file system" is already quite a stretch. Also JVM is perfectly fine with pause times below a few tens of ms worst-case (when using properly tuned G1, CMS GC), which is lower than worst-case latency induced by network + I/O.
As for using less memory - you don't allocate buffers for file data on the JVM heap. You allocate them in native memory exactly as you'd do it in C++. Therefore it is possible to create a JVM-based file system that handles petabytes of data with just as little as 100 MB heap, used mostly for small temporary objects.
Also, the code here is using mutexes a lot to synchronize threads and lock out whole objects. Therefore I think these "realtime" claims are quite exaggerated.
You're using the academic version of realtime, not the one that anybody cares about. HDFS's biggest problem is, and has always been, that it's literally impossible to tune it to give anything like reliable performance, mostly because the nameserver is a single point of lag for the entire system. "Worst case network and IO" latency is a huge stretch. Network performance is predictably sub-ms if you're using a network designed for modern distributed computing (A real stretch, I know, since almost all HDFS installations are on old-school core-router-tree infrastructure.) The IO operations are incredibly unpredictable - For a client at a time. Having individual servers that 10-20ms worst-case performance hiccoughs is nowhere near as bad for a system as all of your clients hiccoughing for even 5ms at the same time.
HDFS biggest problem is its SPOF master-slave architecture, not JVM nor GC. With a truly distributed shared nothing system Java Gc would not be a problem, because servers can now run with no major Gc for hours or days. So two servers or clients doing Gc at the same time are very unlikely. And even if some of them do, the pauses from Gc are much more predictable than the pauses from I/O which on a loaded system can take seconds, not milliseconds.
Also if GC was such a huge problem, exchanges or HFT companies wouldn't use Java for their low latency stuff, and there definitely are companies which do.
> Also if GC was such a huge problem, exchanges or HFT companies wouldn't use Java for their low latency stuff, and there definitely are companies which do.
Sure and this DFS in C++ memory use is probably huge compared to many hand-crafted assembly or C programs from 1980s. But who cares? 100 MB or even 1GB is really tiny for today's server hardware. And Java runtime itself is a few MB really. What takes most memory in many Java programs (e.g. IDEs) is code and libraries.
Size can lead to a tremendous difference in performance on modern CPUs, particularly if you can take advantage of L2/L3 instruction and data caches. It still matters, even on modern "big memory" systems where gigabytes of installed RAM are the norm.
Technically correct, but filesystems are mostly about I/O. For example this Baidu filesystem copies blocks of data into userland memory and transfers them in RPC messages - any system using proper zero copy approach would easily beat it even if coded in Python or JS. Baidu also seems to use threads, locks and SEDA instead of more efficient (but much harder to code) thread-per-core async architecture. Threadpools and lock based synchronization are terrible for latency.
The fact that something is in C++ doesn't make it automatically efficient. And particularly, if we're talking about milliseconds, not nanoseconds here, in Java or C# you can do just everything what you can do in C++, performance-wise.
I think it would be a great exercise to shadow your Project Manager(if you have one) for a while. If you don't have one, maybe just try to virtually manage the project you're working on. You'll find that the PM job is not really what you expected to be.
I'm a sysadmin and I've tried to simulate this kind of change and I found that there are a lot of bureaucracy tasks which I don't really like :). If you ask me, the proper career path has the "team lead" position as the first step, which brings you closer to a "* manager" feeling.
I can feel that at work, but the PM skill is required to improve myself, maybe I don't need that skill in my work, I got some extra project out of work, I hope I can manage that well, then we may build a studio or something.
Be ready is never ready.I just want to know what I'm missing and how to get it.
This guy says that if the webpage "asks" for another's page credentials, lastpass plugin will give it. Every character/keystroke in specific fields could be catched/logged , here you have an example from ... eBay : https://news.ycombinator.com/item?id=12000820
Anyway, this was already fixed and pushed to the users, as the guy mentions in his post.