diff --git a/ningxia/README.md b/ningxia/README.md new file mode 100644 index 0000000..7daeaa2 --- /dev/null +++ b/ningxia/README.md @@ -0,0 +1,46 @@ +## Ningxia scraping + +Zip of full article dump: [articles-ningxia.zip](./articles-ningxia.zip). + +There are, once again, files that are likely just links to PDFs: + +```console +.rw-r--r-- 264 tlater 9 Apr 22:20 ./2016-08-17_738.txt +.rw-r--r-- 180 tlater 9 Apr 22:20 ./2016-08-17_739.txt +.rw-r--r-- 201 tlater 9 Apr 22:19 ./2017-03-16_676.txt +.rw-r--r-- 394 tlater 9 Apr 22:19 ./2017-04-13_666.txt +.rw-r--r-- 326 tlater 9 Apr 22:19 ./2017-04-21_662.txt +.rw-r--r-- 204 tlater 9 Apr 22:19 ./2017-05-16_655.txt +.rw-r--r-- 316 tlater 9 Apr 22:19 ./2017-06-19_645.txt +.rw-r--r-- 187 tlater 9 Apr 22:18 ./2017-09-15_607.txt +.rw-r--r-- 171 tlater 9 Apr 22:18 ./2018-03-08_551.txt +.rw-r--r-- 174 tlater 9 Apr 22:17 ./2018-05-25_517.txt +.rw-r--r-- 143 tlater 9 Apr 22:17 ./2018-06-08_512.txt +.rw-r--r-- 216 tlater 9 Apr 22:17 ./2018-07-13_504.txt +.rw-r--r-- 131 tlater 9 Apr 22:17 ./2018-08-10_479.txt +.rw-r--r-- 198 tlater 9 Apr 22:16 ./2018-12-20_385.txt +.rw-r--r-- 300 tlater 9 Apr 22:15 ./2019-02-15_359.txt +.rw-r--r-- 241 tlater 9 Apr 22:15 ./2019-04-17_331.txt +.rw-r--r-- 209 tlater 9 Apr 22:15 ./2019-05-21_309.txt +.rw-r--r-- 264 tlater 9 Apr 22:15 ./2019-06-11_306.txt +.rw-r--r-- 325 tlater 9 Apr 22:15 ./2019-06-11_307.txt +.rw-r--r-- 306 tlater 9 Apr 22:15 ./2019-07-22_286.txt +.rw-r--r-- 131 tlater 9 Apr 22:14 ./2019-09-05_266.txt +.rw-r--r-- 264 tlater 9 Apr 22:14 ./2019-09-09_265.txt +.rw-r--r-- 177 tlater 9 Apr 22:14 ./2019-11-19_231.txt +.rw-r--r-- 203 tlater 9 Apr 22:13 ./2020-02-01_158.txt +.rw-r--r-- 204 tlater 9 Apr 22:13 ./2020-03-01_151.txt +.rw-r--r-- 158 tlater 9 Apr 22:12 ./2020-04-01_125.txt +.rw-r--r-- 131 tlater 9 Apr 22:13 ./2020-04-01_126.txt +.rw-r--r-- 182 tlater 9 Apr 22:13 ./2020-04-01_127.txt +.rw-r--r-- 176 tlater 9 Apr 22:12 ./2020-04-17_95.txt +.rw-r--r-- 398 tlater 9 Apr 22:12 ./2020-04-17_96.txt +.rw-r--r-- 174 tlater 9 Apr 22:12 ./2020-05-12_72.txt +.rw-r--r-- 151 tlater 9 Apr 22:12 ./2020-06-04_63.txt +.rw-r--r-- 137 tlater 9 Apr 22:12 ./2020-06-10_59.txt +.rw-r--r-- 161 tlater 9 Apr 22:11 ./2020-07-10_46.txt +.rw-r--r-- 206 tlater 9 Apr 22:11 ./2020-07-17_41.txt +.rw-r--r-- 189 tlater 9 Apr 22:11 ./2020-09-04_33.txt +.rw-r--r-- 156 tlater 9 Apr 22:11 ./2020-09-07_30.txt +.rw-r--r-- 201 tlater 9 Apr 22:11 ./2020-10-01_15.txt +``` diff --git a/ningxia/articles-ningxia.zip b/ningxia/articles-ningxia.zip new file mode 100644 index 0000000..b84f95a Binary files /dev/null and b/ningxia/articles-ningxia.zip differ