Jump to content

Wikipedia:WikiProject TypoScan

From Wikipedia, the free encyclopedia
TypoScan
Developer(s)TypoScan Team
Written inC#/SQL backend
Operating system(Plugin for AutoWikiBrowser)
Available inEnglish
TypeSpell checkers
LicenseGPL

WikiProject TypoScan maintains a list of all articles with known typos in an SQL database. It leverages the typo finding and fixing power of RegExTypoFix with the automation and framework of AutoWikiBrowser to then assign blocks of typos to users to fix.

It's crazy ambitious, and that's why it's going to work. - mboverload@

What is it?

[edit]

At its core TypoScan is a list of tens of thousands of articles that have typos in them according to RegExTypoFix, a sister project to TypoScan. This list is then put in a database and assigned to users in 100 article blocks with AutoWikiBrowser. These blocks do not imply any kind of obligation - if you don't do some of the articles they are just put back in the pool. People are free to grab 100 articles and only do 10 of them. The progress is tracked either way. Cool!

Usage Guide

[edit]

Any user with AWB can use TypoScan. It's a plugin!

First, load the plugin from the top-line menu: Plugins --> Load --> TypoScan.dll.

In AWB select "TypoScan" from Make list, and click "Make list", you will be given 100 articles. Turn on RegExTypoFix, and select to skip if no typos fixed. It is probably also worth enabling general fixes, auto tag, and any other similar custom things you have of your own - They can then be applied to articles with typo fixes, improving them further at the same time.

Every 25 pages or when you close AWB your results will AUTOMAGICALLY be uploaded to the TypoScan point tracking server at toollabs:awb/typoscan You can manually upload your progress by going to Plugins > TypoScan plugin > Upload.

Don't worry if you check out a list of articles and cannot process them all. If AWB does not report within 3 hours about their status the database server will automatically put them back in the pool. There are tens of thousands of articles to go through - so don't worry about it! Bear in mind, that if you make a list and leave it for a reasonable period of time, its probably worth clearing your ListMaker and taking a new list.

Please note that only any articles added via the Make List "TypoScan" function will be recorded back to the database. If you have to close AWB, if you've saved the article list, it is not linked in with the TypoScan plugin. Please just clear the list and start with a new list.

Roadmap

[edit]
TypoScan Roadmap

edit watch unwatch

  •  Done | Do the initial scan of the Wikipedia database extract with DBScanner against the list of typos in RegExTypoFix
  •  Done | Produce a basic SQL backend to track progress
  •  Done | Add initial list of articles with typos to MySQL Database
 Done Build a small application to take article list from the database scan and add them to the database (Takes ~4 seconds to populate nearly 70k articles into local database on reasonable spec server)
  •  Done | Work with the AWB developers on integration
 Done IListProvider AWB Plugin written/modified to be able to parse XML output from database generated in PHP, to give list of articles to work on (known as a workload)
Give user 100 articles to process per workload (can collect more than workload before processing, articles just appended to list)
 Done | Produce check in/check out/timeout system to track what has and hasn't been typo fixed.
Timestamped ("checked out") in database when list is requested with that article in
If the timestamp is more than 2 hours old and not marked as finished, it will be pulled for another user
 Done | Find way for client to upload the status to the server (check in/finished)
articles (article id's) can be posted back to the script and marked as finished
 Done | When to write the status decide if it will be in intervals of time, edits, or at the end of program
Can be done on demand by using Plugins menu
rev 3169 Automatically done when program is closing if there are articles to be submitted
rev 3170 Automatically done every 25 finished articles
  • ☒N | Build a small application to "sync" a new article list from a new dump with the database
List of articles to be removed & added (Can be done based on ListComparer and the already written application)
Periodically its probably worth clearing the database
  • ☒N | Integrate false-positive reporting with AWB. Then use this reporting to find regular expressions that don't produce good results.
Typo stats have been suggested for this reason
  •  Done | Expand plugin and DB for other projects
  •  Done | Log whether editied or ignored/skipped
Logs reason to database also. Stats included to show statistics
  • ☒N | Add as plugin expansion way for DBScanner to add straight to TypoScan DB

Recurring tasks

[edit]

Latest dump: April 2012

  • Download latest database
  • Rescan latest database
  • Compare results to last scan
    • List of articles to be removed
    • List of articles to be added
  • Merge results into SQL database

Participants

[edit]

Please feel free to add yourself here, and to indicate any areas of particular interest.

  1. Reedy (WikiProject Lead / Lead Developer)
  2. MaxSem (Developer)
  3. mboverload@ (Founder / Wikipedia presence manager / Public Relations)
  4. ·Add§hore·
  5. Dspradau
  6. Harryboyles
  7. Closedmouth
  8. Rjwilmsi
  9. Brenont
  10. ThaddeusB (talk · contribs) (fixing typos for now, possibly project development in the future) 07:35, 30 November 2008 (UTC)[reply]
  11. Bwilkins (talk · contribs) 23:03, 1 December 2008 (UTC)[reply]
  12. Robert Skyhawk (talk · contribs) 00:11, 8 December 2008 (UTC)[reply]
  13. Dillard421 (talk · contribs)
  14. Arbitrarily0 (talk · contribs)
  15. Shirulashem (talk · contribs)
  16. gracefool (talk · contribs) 10:56, 23 June 2009 (UTC)[reply]
  17. Yotcmdr (talk)
  18. ZsinjTalk
  19. Aphrodite4497 (talk · contribs)
  20. Marek69 (talk · contribs)
  21. Darkwind (talk · contribs)
  22. Allmightyduck (talk · contribs)
  23. bender235 (talk · contribs)
  24. Afaber012 (talk · contribs)
  25. GoingBatty
  26. ChrisGualtieri
  27. Tito Dutta
  28. Inks.LWC (talk · contribs)
  29. Mjs1991 (talk · contribs)
  30. Tuvok[T@lk/Improve]
  31. Breno (talk · contribs)
  32. AnthonyW90 (talk · contribs) (interested in fixing typos, in particular taking on articles with multiple typos)
  33. Tentinator (talk · contribs)
  34. Jamesmcmahon0 (talk · contribs)
  35. T24boo (talk · contribs)
  36. Clarkcj12 (talk · contribs)
  37. Faizan (talk · contribs)
  38. SecretName101 (talk) 22:46, 28 December 2014 (UTC)[reply]
  39. Creativecreatr (talk · contribs) (I am interested in working on fixing typos) 09:40, 26 May 2020 (UTC)[reply]
  40. Lakun.patra (talk · contribs)
  41. Bop34 (talk · contribs)
  42. Snoozebug (talk · contribs) (interested in working on fixing typos, especially articles with multiple) 15:29, 28 November 2023 (UTC)[reply]