Nick Fisher's tech blog

Jmh

Improving Java IO Performance: Caching Data, When Appropriate

The sample code for this post can be found on GitHub.

The biggest bottleneck with I/O resources on the filesystem is the operating system, which controls access to the filesystem. Reading from, and writing to, the operating system, is much more expensive than storing data in memory, and that is the subject of this post: caching.

Improving Java IO Performance: Does Compression Actually Help?

The sample code associated with this blog post can be found on GitHub.

The question “does compression actually help?” is admittedly pretty loaded. The real answer is sometimes, and it depends. I will not try to answer every use case, but I will provide a very specific example here that appears to provide a “probably not” answer (for this specific use case).

Improving Java IO Performance: Appropriately Using Random Access Over Streams

The sample code for this blog post can be found on GitHub.

A flavor I/O performance optimization that applies specifically to the filesystem is the decision on when to use Random Access instead of something like a BufferedInputStream. Random access allows for accessing a file in a similar way as a large array of bytes stored on the filesystem. From the oracle documentation on the RandomAccessFile class:

Improving Java IO Performance: Formatting Costs

The sample code associated with this blog post can be found on GitHub.

Another potential source of I/O bottlenecks, across any medium, could be the process you choose to format the data in in the first place. For example, XML used to be a standard way to send information across the wire or store in a backend system, but the size overhead of XML as compared to JSON is about double (not to mention it’s somehow harder to read when formatted compared to JSON).

How to Benchmark Java Code Using JUnit and JMH

You can view the sample code associated with this post on GitHub.

JMH is a lightweight code generator that can benchmark Java code. While many of the performance bottlenecks in today’s world are related to network calls and/or database queries, it’s still a good idea to understand the performance of our code at a lower level. In particular, by automating performance tests on our code, we can usually at least ensure that the performance was not accidentally made worse by some refactoring effort.