Improving Java IO Performance: Reducing Method Call Overhead
Nov 2018
You can view the sample code associated with this blog post on GitHub.
While you can achieve massive improvements in I/O operations via buffering, another key part of tuning java code in general, which is applicable to I/O bound operations, is method call overhead. Methods that are unnecessarily called repeatedly can bog down operations.
To prove my point, we'll set up benchmarking via JMH like so:
public static void runBenchmark(Class clazz) throws Exception {
Options options = new OptionsBuilder()
.include(clazz.getName() + ".*")
.mode(Mode.AverageTime)
.warmupTime(TimeValue.seconds(1))
.warmupIterations(2)
.measurementIterations(2)
.timeUnit(TimeUnit.MILLISECONDS)
.measurementTime(TimeValue.seconds(1))
// OS bottleneck, so we use should one
// thread at a time for accurate results
.threads(1)
.forks(1)
.shouldFailOnError(true)
.shouldDoGC(true)
.build();
new Runner(options).run();
}
We'll have a benchmark that uses DataInputStream.readLine(), which calls read() under the hood on each character. Even though we are buffering the data, we are still calling read() on each byte that has already been loaded into memory:
@Benchmark
public void readEachCharacterUnderTheHood() throws Exception {
try (FileInputStream fileInputStream = new FileInputStream(Utils.smallCsvFilePath);
BufferedInputStream bufferedInputStream = new BufferedInputStream(fileInputStream);
DataInputStream dataInputStream = new DataInputStream(bufferedInputStream)) {
int count = 0;
while (dataInputStream.readLine() != null) {
count++;
}
assertEquals(Utils.numberOfNewLines_inSmallCsv, count);
}
}
The performance of this method on my machine is:
Benchmark Mode Cnt Score Error Units
MethodCallOverheadTests.readEachCharacterUnderTheHood avgt 2 1.560 ms/op
Conversely, BufferedReader is implemented to buffer the buffer, so that the underlying stream does not get hit with repeated method calls. From the Oracle documentation on the BufferedReader class:
In general, each read request made of a Reader causes a corresponding read request to be made of the underlying character or byte stream. It is therefore advisable to wrap a BufferedReader around any Reader whose read() operations may be costly, such as FileReaders and InputStreamReaders.
And:
Programs that use DataInputStreams for textual input can be localized by replacing each DataInputStream with an appropriate BufferedReader.
So, a benchmark that achieves the same result would look like:
@Benchmark
public void faster_usingBufferedReader() throws Exception {
try (FileReader fileReader = new FileReader(Utils.smallCsvFilePath);
BufferedReader bufferedReader = new BufferedReader(fileReader)) {
int count = 0;
while (bufferedReader.readLine() != null) {
count++;
}
assertEquals(Utils.numberOfNewLines_inSmallCsv, count);
}
}
When run back to back, the benchmarks on my machine look like:
Benchmark Mode Cnt Score Error Units
MethodCallOverheadTests.faster_usingBufferedReader avgt 2 0.700 ms/op
MethodCallOverheadTests.readEachCharacterUnderTheHood avgt 2 1.560 ms/op
Or, the BufferedReader is indeed ~2 times as fast.
Nick Fisher is a software engineer in the Pacific Northwest. He focuses on building highly scalable and maintainable backend systems.