"After all, the engineers only needed to refuse to fix anything, and modern industry would grind to a halt." -Michael Lewis

Enable Massive Growth

Improving Java IO Performance: Reducing Method Call Overhead

Nov 2018

You can view the sample code associated with this blog post on GitHub.

While you can achieve massive improvements in I/O operations via buffering, another key part of tuning java code in general, which is applicable to I/O bound operations, is method call overhead. Methods that are unnecessarily called repeatedly can bog down operations.

To prove my point, we'll set up benchmarking via JMH like so:

    public static void runBenchmark(Class clazz) throws Exception {
        Options options = new OptionsBuilder()
                .include(clazz.getName() + ".*")
                .mode(Mode.AverageTime)
                .warmupTime(TimeValue.seconds(1))
                .warmupIterations(2)
                .measurementIterations(2)
                .timeUnit(TimeUnit.MILLISECONDS)
                .measurementTime(TimeValue.seconds(1))
                // OS bottleneck, so we use should one
                // thread at a time for accurate results
                .threads(1)
                .forks(1)
                .shouldFailOnError(true)
                .shouldDoGC(true)
                .build();

        new Runner(options).run();
    }

We'll have a benchmark that uses DataInputStream.readLine(), which calls read() under the hood on each character. Even though we are buffering the data, we are still calling read() on each byte that has already been loaded into memory:

    @Benchmark
    public void readEachCharacterUnderTheHood() throws Exception {
        try (FileInputStream fileInputStream = new FileInputStream(Utils.smallCsvFilePath);
             BufferedInputStream bufferedInputStream = new BufferedInputStream(fileInputStream);
             DataInputStream dataInputStream = new DataInputStream(bufferedInputStream)) {
            int count = 0;
            while (dataInputStream.readLine() != null) {
                count++;
            }

            assertEquals(Utils.numberOfNewLines_inSmallCsv, count);
        }
    }

The performance of this method on my machine is:

Benchmark                                              Mode  Cnt  Score   Error  Units
MethodCallOverheadTests.readEachCharacterUnderTheHood  avgt    2  1.560          ms/op

Conversely, BufferedReader is implemented to buffer the buffer, so that the underlying stream does not get hit with repeated method calls. From the Oracle documentation on the BufferedReader class:

In general, each read request made of a Reader causes a corresponding read request to be made of the underlying character or byte stream. It is therefore advisable to wrap a BufferedReader around any Reader whose read() operations may be costly, such as FileReaders and InputStreamReaders.

And:

Programs that use DataInputStreams for textual input can be localized by replacing each DataInputStream with an appropriate BufferedReader.

So, a benchmark that achieves the same result would look like:

    @Benchmark
    public void faster_usingBufferedReader() throws Exception {
        try (FileReader fileReader = new FileReader(Utils.smallCsvFilePath);
             BufferedReader bufferedReader = new BufferedReader(fileReader)) {
            int count = 0;
            while (bufferedReader.readLine() != null) {
                count++;
            }

            assertEquals(Utils.numberOfNewLines_inSmallCsv, count);
        }
    }

When run back to back, the benchmarks on my machine look like:

Benchmark                                              Mode  Cnt  Score   Error  Units
MethodCallOverheadTests.faster_usingBufferedReader     avgt    2  0.700          ms/op
MethodCallOverheadTests.readEachCharacterUnderTheHood  avgt    2  1.560          ms/op

Or, the BufferedReader is indeed ~2 times as fast.

Nick Fisher is a software engineer in the Pacific Northwest. He focuses on building highly scalable and maintainable backend systems.