I notice that it casts everything to string for MD5 to work. In that case, how does it handle two databases having different types for the same columns? I'm thinking about floats and numerics (decimal places), timestamps (some have timezone support, some don't) and bytes (hex, base64) in particular, but there are definitely others that I'm missing as well.
Great question, this is something we put a lot of effort into.
For both dates and numerics, we format them in a normalized format, and round them to the lowest mutual precision. It gets even more complicated, because when databases reduce the precision of timestamps, some of them round it, and some of them truncate. We implemented both, and we either truncate or round both timestamps, according to which database has the column with the lower precision.
We haven't got to bytes and strings yet, but it's on our list, and I imagine we'll use a similar approach.
For now, we print a warning whenever we don't have special handling for the column type. If you see a value mismatch where it shouldn't be, let us know and we'll implement it next.
Location: Malaysia / Singapore
Remote: Yes
Willing to relocate: Depends on location / package
Technologies: Python, SQL (PostgreSQL and MS SQL), Javascript, Typescript, CSS, Qlik Sense, IT Audit
Résumé/CV: Available upon request
Email: alexkoay88 at gmail.com
I've been in multiple roles over the years, software development, ERP support, internal IT auditor, and now in an data engineering role. I'm a proficient polyglot who has been programming for since junior high (15+ years experience, started out with C++ and Python).
Looking for managerial / senior / lead / consulting roles.
One thing I've found really helpful doing double-entry is that I can make sure nothing falls under the radar.
With single-entry (where you just key in expenses for the month), it's easy to forget that cup of coffee you paid for in a rush. It doesn't matter with double-entry since you can always just check if your wallet tallies up.
I usually just classify variances under miscellaneous expenses if it's not a large amount, but sometimes I've found myself somehow "missing" a few hundred bucks which I then would start tracking down.
> I can't see how this fits with the process of switching from open mind to closed mind, as Cleese describes it.
Granted, you will need to be proficient enough in the act before you can be creative with it. However, I would think that musicians have their open time, where they explore themes and ideas, just fiddling with chords and seeing where it goes, i.e. improv, versus their closed time, when they find an interesting theme and want to flesh it out even more, i.e. composing.
> The work gets done in sessions that are better described by the concept of "flow", where if you still want to use Cleese's concepts of "open mind" and "closed mind" then it is best said that you drift frequently perhaps imperceptibly between the two.
My personal experience with drifting frequently between the two is that you hardly get things done. An example I can think of is of programming a particular algorithm. With the "open mind" you'd explore how to express it in interesting ways, but with the "closed mind" you'd have to implement it and move on. If you were to drift between the two frequently, it occurs to me that I would possibly be:
1. spending time exploring (while the deadline is looming)
2. implementing a bad solution (if I did not explore enough)
3. doing no.2 halfway, getting a flash of brilliance, do no.1 to see if it works, then rinse and repeat
By setting aside time to play with the ideas, and then get down to actually fleshing the concepts out, I know how much time I have before making a decision, and once that time is up, I make the best decision I can within that time-frame and stick with it. Second-guessing after this point would be counter-productive, as Cleese mentions.
I notice that it casts everything to string for MD5 to work. In that case, how does it handle two databases having different types for the same columns? I'm thinking about floats and numerics (decimal places), timestamps (some have timezone support, some don't) and bytes (hex, base64) in particular, but there are definitely others that I'm missing as well.