I tried to import a large XML dump from Wikipedia and failed. The automatic import feature of MediaWiki failed and gerenated numerous error messages. After Googling around for along time, I came up with this solution:
1. Remove the tags <restrictions>...</restrictions> from your XML file. Some version inconsistencies appear to cause this problem
To remove the tags, simply run this command in Shell:
sed s/"<restrictions>.*<\/restrictions>"//g test.txt
2. For large XML files, make the following change in the includes/parser/Preprocessor_DOM.php
In line 107, change
$result = $dom->loadXML( $xml );
to
$result = $dom->loadXML( $xml, 1<<19 );
3. Install the parserFunctions extension in your wiki. This is necessary to parse the coded information in most of the templates.
4. Run importDump.php from command line on the server:
php importDump.php <filename.xml>
The problem with this is that there are simply too many templates ( several GBs apparently, according to the wikipedia dumps website) and most of them are totally useless. Therefore, the problem is really isolating and exporting the relevant ones in the first place.
One practical way to solve this problem (a random algorithm indeed) is to export a handful of long and general wiki articles that will hopefully cover many of the most popular and useful templates and which have a large contributor base.
For example, for my translation project, I made list of such “hub” pages:
Mathematics
America
Biology
Physics
Religion
Human
Language
Farsi
Persian
Literature
Book
Novel
Poem
Poetry
Translation
Linguistics
English.
Applying the described steps to the exported XML file for this list, one can hopefully process and import a good many usefull templates.
