Hacker Newsnew | past | comments | ask | show | jobs | submit | ciucanu's commentslogin

Windows app down for me also.


Depending on their needs and options, people are using Spark, Hive, Kafka or shell scripts to ingest data into hadoop.

If you don't like Nifi UI but you still want to make use of its advantages, you could try https://kylo.io/ , which runs on top of Nifi and it has a much simpler UI.


That's another reason for putting another router after your ISP's box. As long as I'm not an admin on that one, they can do a lot of shady things. Also using a DNS server with external forwarders (PiHole is great for that).


Not only that, but the quality of commercial router security is appalling. See for example

https://www.securityevaluators.com/whitepaper/sohopelessly-b...

That paper needs wider exposure, though sadly it didn’t get much traction here when I submitted it.


This should not be underestimated. The ISP cannot be trusted and DNS poisoning is easily done through their box.


The problem is that the number of attack vectors are legion. If they don't get you by DNS they'll get you by one of the thousands of other attack vectors.


I usually do text processing in Bash, Notepad++ and Excel. Each has its own pros and cons, that's why I usually combine them.

Here you have the tools I use in Bash:

grep, tail, head, cat, cut, less, awk, sed, sort, uniq, wc, xargs, watch ...


As an aside I once found out you can replace 'sort | uniq' entirely with an obscure awk command so long as you don't require the output to be sorted. Iirc it performs twice as fast.

  cat file.txt | awk '!x[$0]++'


The awk commands prints the first occurrence of each line in the order they are found in the file. I can imagine that sometimes that might be even better than sorted order.


sort has a -u option on my linux... ------ -u, --unique with -c, check for strict ordering; without -c, output only the first of an equal run


If you're already in Windows land, you should consider leveraging PowerShell instead of bash. Pretty much all the same tooling is there, only with more descriptive names, tab completion on everything, passes typed object data instead of text parsing, etc.


Ahem... what is powershell core? (I take exception to your if condition). As someone on Arch- I enjoy it a lot.


Bash with Notepad++ and Excel? Do you use Wine or WSL?


It's kind of mandatory to use Windows in certain envs.


I wouldn't be surprised to read this kind of news about some of the Eastern Europe countries in the future.


I wouldn't be surprised to read this kind of news about Western European countries in the future !


East 2nd, East 2nd, at least this time.


I originally downvoted you, but now I think I see where you were going with this.

It is possible in at least one instance.


ricardo.ch (similar to craigslist) is doing the same in Switzerland. I think this gives the user the impression that he has to be responsible about his actions on the website.


More that it weeds out all the scammers hiding behind fake adresses.


Zurich | Big Data Engineer, Data Scientist... | ONSITE - looking for new colleagues | Contact: https://www.linkedin.com/in/ciucanu


It looks like a faster version of HDFS since it's written in C++ (vs Java).

Another important aspect is that is using SSD + SATA(I suppose) , which could be a better option than standard SATA/SSD or LV cache using SATA + SSD.

Even if it's just a new thing, if it proves to be faster it may be implemented in Hadoop ecosystem in the future. HDFS has a lot of features being a mature piece of software but it lacks on the response time.


"It looks like a faster version of HDFS since it's written in C++ (vs Java)."

This is non sequitur. The conclusion does not follow from the premise.


During non-GC periods, probably true. But having a realtime filesystem service that is prone to stop-the-world GC pauses is a showstopper for many applications.

Also, a C++ implementation is likelier to use far less memory than a Java implementation, assuming the skills of both programmers are roughly equal.


The underlying local filesystem on each node is not truly realtime, so a "realtime distributed file system" is already quite a stretch. Also JVM is perfectly fine with pause times below a few tens of ms worst-case (when using properly tuned G1, CMS GC), which is lower than worst-case latency induced by network + I/O.

As for using less memory - you don't allocate buffers for file data on the JVM heap. You allocate them in native memory exactly as you'd do it in C++. Therefore it is possible to create a JVM-based file system that handles petabytes of data with just as little as 100 MB heap, used mostly for small temporary objects.

Also, the code here is using mutexes a lot to synchronize threads and lock out whole objects. Therefore I think these "realtime" claims are quite exaggerated.


You're using the academic version of realtime, not the one that anybody cares about. HDFS's biggest problem is, and has always been, that it's literally impossible to tune it to give anything like reliable performance, mostly because the nameserver is a single point of lag for the entire system. "Worst case network and IO" latency is a huge stretch. Network performance is predictably sub-ms if you're using a network designed for modern distributed computing (A real stretch, I know, since almost all HDFS installations are on old-school core-router-tree infrastructure.) The IO operations are incredibly unpredictable - For a client at a time. Having individual servers that 10-20ms worst-case performance hiccoughs is nowhere near as bad for a system as all of your clients hiccoughing for even 5ms at the same time.


HDFS biggest problem is its SPOF master-slave architecture, not JVM nor GC. With a truly distributed shared nothing system Java Gc would not be a problem, because servers can now run with no major Gc for hours or days. So two servers or clients doing Gc at the same time are very unlikely. And even if some of them do, the pauses from Gc are much more predictable than the pauses from I/O which on a loaded system can take seconds, not milliseconds.

Also if GC was such a huge problem, exchanges or HFT companies wouldn't use Java for their low latency stuff, and there definitely are companies which do.


> Also if GC was such a huge problem, exchanges or HFT companies wouldn't use Java for their low latency stuff, and there definitely are companies which do.

Can you name one?


LMAX, New York Exchange.


Wow, that's neat. Thanks for the pointer!


> As for using less memory - you don't allocate buffers for file data on the JVM heap.

I meant the code size and heap allocations for data structures, not file buffers.

And 100MB is huge compared to many C++ programs. And that's on top of the Java runtime!


Sure and this DFS in C++ memory use is probably huge compared to many hand-crafted assembly or C programs from 1980s. But who cares? 100 MB or even 1GB is really tiny for today's server hardware. And Java runtime itself is a few MB really. What takes most memory in many Java programs (e.g. IDEs) is code and libraries.


Size can lead to a tremendous difference in performance on modern CPUs, particularly if you can take advantage of L2/L3 instruction and data caches. It still matters, even on modern "big memory" systems where gigabytes of installed RAM are the norm.


Technically correct, but filesystems are mostly about I/O. For example this Baidu filesystem copies blocks of data into userland memory and transfers them in RPC messages - any system using proper zero copy approach would easily beat it even if coded in Python or JS. Baidu also seems to use threads, locks and SEDA instead of more efficient (but much harder to code) thread-per-core async architecture. Threadpools and lock based synchronization are terrible for latency.

The fact that something is in C++ doesn't make it automatically efficient. And particularly, if we're talking about milliseconds, not nanoseconds here, in Java or C# you can do just everything what you can do in C++, performance-wise.


I think it would be a great exercise to shadow your Project Manager(if you have one) for a while. If you don't have one, maybe just try to virtually manage the project you're working on. You'll find that the PM job is not really what you expected to be.

I'm a sysadmin and I've tried to simulate this kind of change and I found that there are a lot of bureaucracy tasks which I don't really like :). If you ask me, the proper career path has the "team lead" position as the first step, which brings you closer to a "* manager" feeling.

Good luck!!


I can feel that at work, but the PM skill is required to improve myself, maybe I don't need that skill in my work, I got some extra project out of work, I hope I can manage that well, then we may build a studio or something.

Be ready is never ready.I just want to know what I'm missing and how to get it.


It should be safe enough, with a few exceptions, one of those being this exploit: https://news.ycombinator.com/item?id=12171547

This guy says that if the webpage "asks" for another's page credentials, lastpass plugin will give it. Every character/keystroke in specific fields could be catched/logged , here you have an example from ... eBay : https://news.ycombinator.com/item?id=12000820

Anyway, this was already fixed and pushed to the users, as the guy mentions in his post.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: