"After all, the engineers only needed to refuse to fix anything, and modern industry would grind to a halt." -Michael Lewis

Enable Massive Growth

Improving Java IO Performance: Formatting Costs

Nov 2018

The sample code associated with this blog post can be found on GitHub.

Another potential source of I/O bottlenecks, across any medium, could be the process you choose to format the data in in the first place. For example, XML used to be a standard way to send information across the wire or store in a backend system, but the size overhead of XML as compared to JSON is about double (not to mention it's somehow harder to read when formatted compared to JSON).

We can compare the performance of a couple of different options related to formatting by comparing the MessageFormatter class with simple addition. With a test setup like so:

    public static void runBenchmark(Class clazz) throws Exception {
        Options options = new OptionsBuilder()
                .include(clazz.getName() + ".*")
                .mode(Mode.AverageTime)
                .warmupTime(TimeValue.seconds(1))
                .warmupIterations(2)
                .measurementIterations(2)
                .timeUnit(TimeUnit.MILLISECONDS)
                .measurementTime(TimeValue.seconds(1))
                // OS bottleneck, so we use should one
                // thread at a time for accurate results
                .threads(1)
                .forks(1)
                .shouldFailOnError(true)
                .shouldDoGC(true)
                .build();

        new Runner(options).run();
    }
....
    @Test
    public void launchBenchmark() throws Exception {
        Utils.runBenchmark(this.getClass());
    }

We can compare the performance of a MessageFormatter in both a precompiled state and a state that is not precompiled:

    public static int COUNT = 25000;
    public static int NUM = 7;

    @Benchmark
    public void formatUsingMessageFormatter_preCompiled() {
        MessageFormat formatter = new MessageFormat("The square of {0} is {1}\n");
        Integer[] values = new Integer[2];
        values[0] = NUM;
        values[1] = NUM * NUM;
        for (int i = 0; i < COUNT; i++) {
            String s = formatter.format(values);
            System.out.print(s);
        }
    }

    @Benchmark
    public void formatWithoutPrecompiling() {
        String format = "The square of {0} is {1}\n";
        Integer[] values = new Integer[2];
        values[0] = NUM;
        values[1] = NUM * NUM;
        for (int i = 0; i < COUNT; i++) {
            String s = MessageFormat.format(format, values);
            System.out.print(s);
        }
    }

The performance of these methods on my machine look like:

Benchmark                                                     Mode  Cnt    Score   Error  Units
FormattingCostsTests.formatUsingMessageFormatter_preCompiled  avgt    2  275.921          ms/op
FormattingCostsTests.formatWithoutPrecompiling                avgt    2  334.822          ms/op

Now, we can achieve the same result using garden variety addition, and compare that to a completely precompiled state:

    @Benchmark
    public void printingWithNoFormattingCosts() {
        for (int i = 0; i < COUNT; i++) {
            System.out.print("The square of 7 is 49\n");
        }
    }

    @Benchmark
    public void formatUsingAddition() {
        for (int i = 0; i < COUNT; i++) {
            String s = "The square of " + NUM + " is " + NUM * NUM + "\n";
            System.out.print(s);
        }
    }

The resulting performance of everything together, on my machine, is:

Benchmark                                                     Mode  Cnt    Score   Error  Units
FormattingCostsTests.formatUsingAddition                      avgt    2   59.710          ms/op
FormattingCostsTests.formatUsingMessageFormatter_preCompiled  avgt    2  275.921          ms/op
FormattingCostsTests.formatWithoutPrecompiling                avgt    2  334.822          ms/op
FormattingCostsTests.printingWithNoFormattingCosts            avgt    2   57.381          ms/op

Or, the decision to not use the MessageFormatter class achieved a dramatic (~4 times) performance improvement.

Nick Fisher is a software engineer in the Pacific Northwest. He focuses on building highly scalable and maintainable backend systems.