Add Readme

main
Tristan Daniël Maat 2022-04-09 17:43:47 +01:00
parent 60d7eec53f
commit 9030da9a0c
Signed by: tlater
GPG Key ID: 49670FD774E43268
1 changed files with 23 additions and 0 deletions

23
Readme.md Normal file
View File

@ -0,0 +1,23 @@
# Province article scraping
A couple of scripts to scrape article text from various provinces for
a text analysis university course.
We need:
Qinghai
: page 14-75
Ningxia
: page 11-42
Shanxi
: page 2-18
Xinjiang
: page 10-20
The websites all have subtle differences, so there's simply a folder +
scripts for each (the scripts are simple enough that there's no need
for deduplication or anything complex). Written in python/js where
necessary for educational purposes.