Draft of the plan to refactor the preprocessor

Discussion of all aspects of the game engine, including development of new and existing features.

Moderator: Forum Moderators

Post Reply
CrawlCycle
Posts: 27
Joined: November 20th, 2020, 5:07 am

Draft of the plan to refactor the preprocessor

Post by CrawlCycle »

Background
Motivation
  • The preprocessor is a small header and a source file (1801 lines) that contains a large class (1402 lines) with a large method (607 lines). The internal workings of the preprocessor is unclear at a quick glance. The output of the preprocessor is in a format that is incompletely documented. Refactoring the code would make it more understandable.
  • Testing the internal logic of the preprocessor is a good idea. The tests would 1. help to check its correctness and 2. protect future improvements and bug-fixes of the preprocessor against resurrection of old bugs that have been fixed.
  • However, testing the internal logic is impossible without refactoring the preprocessor. The header file describes a small public API (about 6 functions and 1 class). There are not enough windows to monitor and test the internal logic of the preprocessor.
Good things about the current preprocessor
  • It mostly works as intended, except for a few bugs.
  • It has gone through many rounds of optimizations of performance since the beginning of Wesnoth.
  • The public API is small.
Goals of the refactoring in the order of importance
  1. Continuously ensure the new and current preprocessors are equally correct.
  2. Continuously ensure the new preprocessor has the same or better performance.
  3. Make the code more understandable.
  4. Enable testing of the internal of the code.
  5. Enable future improvements of the architecture of the modules for handling WML.
  6. Possibly enable future reuse of the benchmarking methods for other refactoring projects.
Strategy for meeting the goals
Goals A and B
  1. Treat the current preprocessor as the reference implementation.
  2. Perform functional test of the new preprocessor. The input is the .cfg files of the Mainline campaigns and a selection of user-made contents. Ensure the
    output of the new preprocessor is same as the output of the current preprocessor.
  3. Benchmark the new preprocessor against the current preprocessor. The input is the same.
Goals C and D
  1. Draft the new architecture of the preprocessor at this forum.
  2. Refactor the preprocessor according to the new architecture
  3. Add doxygen documentation
  4. Add feature tests, integration tests, and unit tests
Goal E
  1. Draft the new architecture of the WML modules at this forum
  2. Refactor the preprocessor according to the new archiecture
Goal F
  1. Put the benchmark into a test
Plan
Benchmark + Functional test

Command-line options:
  • --run_test=preprocessor_benchmark
    Only run the preprocessor_benchmark test
  • --input_directory
    The test will search for cfg recursively in this directory
  • --time-limit=100
    the benchmark will use at most 100 seconds.
The test:
  • Create a new boost test “benchmark_preprocessor”
  • The test take in extra command-line arguments
    https://www.boost.org/doc/libs/1_56_0/l ... -name.html
  • If the “--input_directory” command-line argument is not provided, the test does nothing.
  • Search for .cfg files in the `input_directory` recursively. Sort the path to the .cfg files.
    The search would use a boost::filesystem::recursive_directory_iterator.
  • Preprocess the .cfg files in a round-robin manner. Record the timing. Stop when we reach the time limit.
  • For the first time that a .cfg file get preprocessed, append the output of the preprocessor to `benchmark_preprocessor_output.log`.
  • The test preprocesses the .cfg files in a round robin manner.
  • The test write the timing to a `benchmark_preprocessor_timing.log` file.
  • `benchmark_preprocessor_timing.log` is a command separated file with the following columns:
    Average timing (in milliseconds), number of times that the .cfg is preprocessed, absolute path to the .cfg
Potential problems:
  • If other benchmarks follows this pattern, we might have name clash between two benchmarks.
  • When the preprocessors of the master branch and refactor branch generates two different outputs, we can't pinpoint which .cfg file causes the difference.
Last edited by CrawlCycle on November 27th, 2020, 10:05 pm, edited 60 times in total.
User avatar
Celtic_Minstrel
Developer
Posts: 2166
Joined: August 3rd, 2012, 11:26 pm
Location: Canada
Contact:

Re: Draft of the plan to refactor the preprocessor

Post by Celtic_Minstrel »

CrawlCycle wrote: November 26th, 2020, 6:55 pm a selection of user-made contents.
Just noting that it makes sense to select add-ons known to stress the preprocessor. I think Ageless Era might be one of the best examples, and possibly Legend of the Invicibles (but not sure on that one, it might be more of a Lua-heavy campaign rather than macro-heavy).
Author of The Black Cross of Aleron campaign and Default++ era.
Former maintainer of Steelhive.
User avatar
Pentarctagon
Project Manager
Posts: 5527
Joined: March 22nd, 2009, 10:50 pm
Location: Earth (occasionally)

Re: Draft of the plan to refactor the preprocessor

Post by Pentarctagon »

I recall Legend of the Invincibles being more lua-heavy. Ageless Era is definitely a good case for performance testing though.
99 little bugs in the code, 99 little bugs
take one down, patch it around
-2,147,483,648 little bugs in the code
User avatar
max_torch
Inactive Developer
Posts: 414
Joined: July 31st, 2011, 5:54 pm

Re: Draft of the plan to refactor the preprocessor

Post by max_torch »

Since readability/understandability is a concern, is there a "coding style guide" that we would like to use to "proofread" the code?
Like such
CrawlCycle
Posts: 27
Joined: November 20th, 2020, 5:07 am

Re: Draft of the plan to refactor the preprocessor

Post by CrawlCycle »

max_torch wrote: November 27th, 2020, 4:39 am Since readability/understandability is a concern, is there a "coding style guide" that we would like to use to "proofread" the code?
Like such
Using an automatic formatter can be a good idea. We are currently using clang-format. The .clang-format file of Wesnoth is at here. Other aspects of the code can also contribute to readability. For example, a huge file (11,000 lines as described in this question.) or a gigantic god object is unlikely to be readable even with good formatting.
User avatar
Celtic_Minstrel
Developer
Posts: 2166
Joined: August 3rd, 2012, 11:26 pm
Location: Canada
Contact:

Re: Draft of the plan to refactor the preprocessor

Post by Celtic_Minstrel »

Don't actually just go and format the code with clang-format. Although there's a rough style guide and possibly even a clang-format definition file for it, there are too many places that just don't follow the style guide (mostly older code). Using clang-format will generally require some manual fixing, too.
Author of The Black Cross of Aleron campaign and Default++ era.
Former maintainer of Steelhive.
User avatar
octalot
General Code Maintainer
Posts: 783
Joined: July 17th, 2010, 7:40 pm
Location: Austria

Re: Draft of the plan to refactor the preprocessor

Post by octalot »

The whole "the serial workflow over one repository" section seems to be working around problems that source-control tools have already fixed. As Iris said on IRC yesterday, using branches means that you don't need a shim.
CrawlCycle
Posts: 27
Joined: November 20th, 2020, 5:07 am

Re: Draft of the plan to refactor the preprocessor

Post by CrawlCycle »

Thanks a lot for reading the long plan and providing feedback.

I agree that the shim is unnecessary since the old and new preprocessor have the same public API and only a few source files #include the preprocessor. There is no need for a compatibility layer in this case. I have deleted the steps regarding the shim.

I want Github to benchmark the new preprocessor in the refactor branch against a clone of the preprocessor in the master branch.

If the new preprocessor and the original preprocessor don't coexist in the refactor branch, Github has to:
  • build the refactor branch
  • switch branch (to keep the modified source lists)
  • build the master branch
  • benchmark and compare
Currently, building a branch takes 45 minutes to 1 hour. If the master and refactor branches are not too different, the second build takes 5-10 minutes.
The developer would also want to run the benchmark locally before pushing a commit. That means building two branches locally.
I was thinking about reducing the build time by stripping down the build system (source lists) to the bare minimum required to build the preprocessor.

Building two branch means the master and refactor branches both build "benchmark.cpp" and "functional_test.cpp". Benchmark.cpp reports the timing for each branch, "functional_test.cpp" writes the output of the preprocessor to a file for each branch. After that the developer or Github runs a script to compare the timing and output of the preprocessors in the two branches. The ".cfg" files still need to stay the same for both the refactor and master branch. I was thinking about putting the ".cfg" files into another repository and let github download it.

I think I get part of what you mean now: [p-clone] is not necessary. However, comparing the clone of the preprocessor ([p-clone]) in the refactor branch with the new preprocessor ([p-new]) means the developer don't have to switch branch and rebuild when they run the benchmark locally.

If we directly compare the preprocessor in the master branch ([p-master]) with the new preprocessor in the refactor branch ([p-new]), then the developer need to do the following right after they commit locally:
  • Run benchmark
  • Stash change
  • Switch branch
  • Build and run benchmark again
  • Compare the two benchmarks by running a script...
If they want to run the benchmark before they commit, they have to stash their change. All these steps required to run the benchmark locally seems tedious ...

I still don't quite get: "The whole "the serial workflow over one repository" section seems to be working around problems that source-control tools have already fixed." Do you mind to explain how to make this even simpler? Thanks!
Last edited by CrawlCycle on November 27th, 2020, 6:26 pm, edited 1 time in total.
User avatar
Celtic_Minstrel
Developer
Posts: 2166
Joined: August 3rd, 2012, 11:26 pm
Location: Canada
Contact:

Re: Draft of the plan to refactor the preprocessor

Post by Celtic_Minstrel »

With respect to local testing, it's possible to check out two branches simultaneously. I don't remember exactly how to set that up off the top of my head, but previously I had the master and 1.12 branches checked out at the same time. It's the "git-worktree" command that octalot mentioned on IRC.
Author of The Black Cross of Aleron campaign and Default++ era.
Former maintainer of Steelhive.
Post Reply