The WeScience data used for segmentation experiments reported in Read et. al. (2012)
was created using data from the WeScience project 
(http://moin.delph-in.net/WeScience). 

The original data retains some (normalised) markup that is considered
relevant for parsing that we stripped for these experiments:

cat txt/ws0* txt/ws1{0,1,2,3} |scripts/makeWSgold.pl > segmented.txt

In order to produce the A and B versions of the unsegmented text, we re-ran the
original WeScience preprocessing scripts (available in SVN), altered to retain
paragraph breaks (resulting in pre-AB.txt). We then stripped markup as above,
but adding blank lines at paragraph breaks (version A) and at paragraph breaks,
after blockquotes, after headings and after list items (version B).

cat pre-AB.txt |scripts/makeA.pl > A/unsegmented.txt
cat pre-AB.txt |scripts/makeB.pl > B/unsegmented.txt