Interesting Stuff - Week 5

Throughout the week, I read a lot of blog-posts, articles, etc., that has to do with things that interest me:

This is the “roundup” of the posts that has been most interesting to me, this week.

Distributed Computing

Life Beyond Distributed Transactions. An excellent piece about distributed transactions in large scale systems. As a side note; the queue.acm.org is a goldmine if you are interested in enterprise computing related papers.
How Uber Manages a Million Writes Per Second Using Mesos and Cassandra Across Multiple Datacenter. Very interesting post about how Uber has designed their systems.
The Infrastructure Behind Twitter: Scaling Networking, Storage and Provisioning. Similar to the post above, but this time about Twitter. Some interesting takeaways:
- There is no such a thing as a “temporary change or workaround”. In most cases, workarounds are technical debt.
- Architect beyond the original specifications and requirements.

Real Time Credit Card Fraud Detection with Apache Spark and Event Streaming. A post how you how to build a real time solution for credit card fraud detection.
Introduction to Machine Learning with Python. First part in a series about machine learning.
THE YEAR IN SQL ENGINES. So this is not about relational databases, but a roundup of various sql engines for data science and big data.
fst: Fast serialization of R data frames. A new R package for serialization of data.
A look back at the year in R and Microsoft. Looking at what happened in 2016 in R and Microsoft (related to machine learning).

Streaming Live Data and the Hadoop Ecosystem. A very interesting presentation about Hadoop and streaming of data in Hadoop.
New in Azure Stream Analytics: Geospatial functions, Custom code and lots more!. Microsoft has just released new features and functionality for Azure Stream Analytics (ASA). I have played around with the Visual Studio tools for ASA, and it rocks!

JSON data in clustered column store indexes. Jovan has written a really nice post how Clustered Column Store indexes can give you compression and query performance benefits for JSON data store in SQL Server.
How to determine what causes a particular wait type. A post by Paul Randal from 2014 about how to find out when and why wait types occur.

Finally two more posts by Bob Dorr about SQL Server and Linux:

SQL Server on Linux: An LLDB Debugging Tale. What Microsoft did in order to be able to debug SQL Server running on Linux.
SQL Server on Linux: Scatter/Gather == Vectored I/O. How scatter/gather are done on Linux.

That’s all for this week. I hope you enjoy what I did put together. If you have ideas for what to cover, please comment on this post or ping me.