The WLB data used for segmentation experiments reported in Read et. al. (2012) was created using data from the WeSearch Data Collection (http://moin.delph-in.net/WeSearch). The original data retains some (normalised) markup that is considered relevant for parsing that we stripped for these experiments. In order to produce the A and B versions of the unsegmented text, we ran the following script, which adds blank lines at paragraph breaks (version A) and at paragraph breaks, after blockquotes, after headings and after list items (version B). scripts/make_wdc.sh