TTMServer performance and coverage issues
Open, MediumPublic1 Estimated Story Points
Actions

Assigned To

None

Authored By

	Nikerabbit
	Jun 3 2015, 11:25 AM

Description

The latest fixes to TTMServer done some months ago are not enough. During translation rally at translatewiki.net, translation memory was using too much cpu time. In addition there have been reports and observations that suggestions are not found, for example when translating the tech news with many repeating parts.

During the Lyon hackathon I spoke with David Chan who suggested to replace the current FuzzyLikeThis query with checking some ngrams from beginning and end of the strings. Those need to be stored separately at indexing time unless there is a way to instruct ES to do it for us. In any case short one to three word strings need special attention.

It seems that current performance bottleneck is fetching too many string contents for comparison and scoring, not the scoring itself.

Details

	Subject	Repo	Branch	Lines +/-
	Use Filtered query instead of post_filter for TTMServer suggestion.	mediawiki/extensions/Translate	master	+9 -4

Customize query in gerrit

Related Objects
Search...

Status	Subtype	Assigned	Task
Resolved		None	T195760 Language Annual Plan 2018-2019
Resolved		Nikerabbit	T204818 Language tools maintenance intervention: Improve the quality of translations for Translate extension
Open		None	T101236 TTMServer performance and coverage issues
Declined		dcausse	T177774 Refactor Elastic TTM Server implementation to allow experimenting new queries without breaking production usage
Resolved	BUG REPORT	Nikerabbit	T249906 Translation memory suggestion page sources are sometimes duplicated
Resolved		Nikerabbit	T264730 Explain the current translation memory architecture
Resolved		Nikerabbit	T267030 Create development and testing environment for translation memory
Open		None	T267031 Exports and imports for translation memory
Open		None	T267032 Benchmarking script for translation memory

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

We should probably let users know that starting next week, thanks to @Phoenix303, they should get faster translation suggestions and that they should report any weirdness.

Phoenix303 mentioned this in rMEXT6db2ffca54d4: Updated mediawiki/extensions Project: mediawiki/extensions/Translate….Jun 25 2015, 2:16 PM

https://gerrit.wikimedia.org/r/219388 (branch master): WMF-deploy-2015-06-30_(1.26wmf12)

• gpaumier moved this task from To Triage to Announce in next Tech/News on the User-notice board.Jun 25 2015, 3:10 PM

• gpaumier moved this task from Announce in next Tech/News to In current Tech/News draft on the User-notice board.Jun 25 2015, 3:58 PM

• gpaumier moved this task from In current Tech/News draft to Recently announced in Tech/News on the User-notice board.Jun 26 2015, 9:19 PM

Legoktm removed a subscriber: • Forrestbot.Jun 29 2015, 5:49 PM

• gpaumier moved this task from Recently announced in Tech/News to Already announced/Archive on the User-notice board.Jul 2 2015, 8:02 PM

Arrbee removed Nikerabbit as the assignee of this task.Jul 21 2015, 9:27 PM

Arrbee removed a project: LE-Sprint-88.

I do not experience a notable speedup with TM suggestions.

At least when translating the weekly tech newsletter in MW: invariant or next-to-invariant strings of more than, say, 5 characters length are never found in TM. Maybe, this is another issue that has to be investigated separately.

In T113711#1727543, @dcausse wrote:

I think one of the problem with this function is that it uses very slow elasticsearch functionalities:

fuzzy like this: deprecated and will be removed in elasticsearch 2.0 due to perf issues

function score on all docs: the levenshtein distance will be applied on all docs returned by the fuzzy like this, we could optimize this part by running this function score inside the rescore phase which would allow us to compute the distance on a limited number of docs thanks to the rescore window.

Concerning the fuzzy like this I would suggest investigating into another approach based on char n-gram.

• Elitre subscribed.Oct 26 2015, 12:20 PM

I'm having problems with the December issue of the VE newsletter. I was able to get the translation memory just once, but then fixing something on the source code and then going back to translate isn't bringing back the TM for me :/ Anything I could do?

In T101236#1894652, @Elitre wrote:

I'm having problems with the December issue of the VE newsletter. I was able to get the translation memory just once, but then fixing something on the source code and then going back to translate isn't bringing back the TM for me :/ Anything I could do?

On what kind of translation units were you looking for suggestions? I only see very long units which change every time, of very short recurring items (mostly the headers), for which TM currently works for me.

Nikerabbit moved this task from translation aids to TTMServer on the MediaWiki-extensions-Translate board.Jan 22 2016, 3:12 PM

Even the headers were failing for me.

Translations:VisualEditor/Newsletter/2016/February/38/it AFAICT is the same than in the previous newsletter, and it isn't suggested by the system ATM.

Yes, it's identical: https://meta.wikimedia.org/?diff=15373013&oldid=15166609
The unit is rather long and the "Loading..." for translation memory suggestions systematically times out after 10 seconds for me.

Amire80 lowered the priority of this task from High to Medium.Mar 30 2016, 9:09 AM

Amire80 edited projects, added Language-Engineering April-June 2016; removed WMF-deploy-2015-06-30_(1.26wmf12), Patch-For-Review.

Amire80 moved this task from Backlog to Translate on the Language-Engineering April-June 2016 board.

For clarification, the normal priority is relative to other tasks in Language-Engineering April-June 2016 and does not mean this task would suddenly not be important.

Nikerabbit added a parent task: T124423: Rewrite Fuzzy Like query for Translate to use with ES > 2.Apr 13 2016, 6:54 AM

Nikerabbit mentioned this in T132076: TTMServer should support multi-dc configuration.Apr 20 2016, 2:21 PM

Nikerabbit mentioned this in T124423: Rewrite Fuzzy Like query for Translate to use with ES > 2.May 12 2016, 8:35 AM

EBernhardson removed a parent task: T124423: Rewrite Fuzzy Like query for Translate to use with ES > 2.May 25 2016, 5:32 PM

EBernhardson subscribed.Jun 13 2016, 4:25 PM

Restricted Application added a project: Discovery-Search. · View Herald TranscriptJun 13 2016, 4:25 PM

Arrbee mentioned this in Language-Team.Sep 26 2016, 9:30 AM

This doesn't contain actionable work for Discovery-Search so I have removed that tag; let us know if you need input on this task, and we will happily provide it.

I think this is a bigger issue for e.g. Tech News than one would first assume.

We have a couple of items that are specifically designed to be exactly the same every week, to make it easier for the translators – they are more complicated the first time you translate them, because you might need to figure out how to best represent dates in your language, but it will save time and effort in the long run. Or so goes the theory. But if they don't, then you have complicated items where you either have to go back to another issue and copy and paste, remember what to do with the code or be familiar enough with what it's doing that you realize you can simply remove it and exchange it for dates in normal text.

Examples:

<translate><!--T:20-->
You can join the next meeting with the VisualEditor team. During the meeting, you can tell developers which bugs you think are the most important. The meeting will be on [<tvar|time>http://www.timeanddate.com/worldclock/fixedtime.html?hour=20&min=00&sec=0&day=14&month=02&year=2017</> {{#time:<tvar|defaultformat>j xg</>|<tvar|date4>2017-02-14</>|<tvar|format_language_code>{{CURRENTCONTENTLANGUAGE}}</>}} at 20:00 (UTC)]. See [[<tvar|link>mw:VisualEditor/Weekly triage meetings</>|how to join]].</translate>

<translate><!--T:41-->
The [[<tvar|version>mw:MediaWiki 1.29/wmf.12</>|new version]] of MediaWiki will be on test wikis and MediaWiki.org from {{#time:<tvar|defaultformat>j xg</>|<tvar|date1>2017-02-14</>|<tvar|format_language_code>{{CURRENTCONTENTLANGUAGE}}</>}}. It will be on non-Wikipedia wikis and some Wikipedias from {{#time:<tvar|defaultformat>j xg</>|<tvar|date2>2017-02-15</>|<tvar|format_language_code>{{CURRENTCONTENTLANGUAGE}}</>}}. It will be on all wikis from {{#time:<tvar|defaultformat>j xg</>|<tvar|date3>2017-02-16</>|<tvar|format_language_code>{{CURRENTCONTENTLANGUAGE}}</>}} ([[<tvar|calendar>mw:MediaWiki 1.29/Roadmap</>|calendar]]).</translate>

I'm translating almost every week the tech news, and it's quite frustrating when you can't get the correct suggestions from the translation memory. Instead you have to go to the previous tech news and copypaste the correct content from there. In my opinion this should get higher priority as it affects many users globally and makes translating time consuming.

dcausse added a project: Epic.Oct 9 2017, 2:16 PM

dcausse created subtask T177774: Refactor Elastic TTM Server implementation to allow experimenting new queries without breaking production usage .Oct 9 2017, 2:20 PM

There is a bunch of reports again that TTMServer doesn't work. From API response for a frequently appearing long paragraph I can see it is spending a lot of time in TTMServer (21.37 seconds) without returning any results. Assuming nothing changed in ElasticSearch cluster, it looks like we have a crossed some kind of threshold.

@Nikerabbit What's the long-term implications of this, if we have indeed crossed said treshold? What can we expect?

(If possible to say, I mean, I do understand that "some kind of treshold" isn't very exact.)

So, only recent change in Translate is c55a8ee – I don't see how it could cause this, but maybe @dcausse could know that, or whether there has been any changes in ElasticSearch cluster that could cause TTMServer to work poorly.

If this is not caused by any external changes, it means that the algorithm has stopped working (what I called the threshold) for some reason such as:

The amount of data has increased to the extent that the search is now taking too long and timing out.
The amount of data has increased to the extent that the algorithm, which only loads a subset of it, incorrectly guesses based on the first subset to not load a second, larger subset.
The amount of data has increased to the extent that the algorithm loads more data (but still only a subset) fails to find matches because it loads a poorly selected and/or two small subset of the data.

I won't be able to debug this extensively until August. Once we understand the issue, we can attempt some small tweaks, if possible. If larger changes are required, I expect those could be started at FYQ2 at earliest by me/Language team. Maybe earlier if we get help and or TTMServer stops working completely and this task would be re-prioritized.

dcausse added a project: Discovery-Search.Jul 3 2018, 7:56 AM

• EBjune moved this task from needs triage to Up Next on the Discovery-Search board.Jul 5 2018, 5:18 PM

I pulled latency numbers from the apiaction logs. Overall it doesn't look like performance has changed in any noticable way in the last 90 days on the wmf prod cluster. The y axis here is milliseconds for all lines except n_requests where it is an absolute count per day:

translation aids latency percentiles, Apr 6 - Jul 5 2018 (386×726 px, 72 KB)

Generated using the following HQL. This is the time for the all translationaids, but in a look it seems like time spent is almost entirely in ttmserver.

SELECT date(concat_ws('-', YEAR, MONTH, DAY)) AS date,
       count(1) AS n_requests,
       percentile_approx(timespentbackend, array(0.5, 0.75, 0.95, 0.99, 0.999)) AS percentiles
FROM wmf_raw.apiaction
WHERE YEAR = '2018'
  AND params['action'] = 'translationaids'
  AND (params['prop'] IS NULL
       OR params['prop'] LIKE '%ttmserver%')
GROUP BY YEAR,
         MONTH,
         DAY

I'm noting for documentation purposes that multiple people are again complaining about translation memory not working when translating tech news.

debt edited projects, added Discovery-Search (Current work); removed Discovery-Search.Sep 13 2018, 5:16 PM

Liuxinyu970226 awarded a token.Sep 14 2018, 1:13 PM

Liuxinyu970226 unsubscribed.

Pginer-WMF added a parent task: T204818: Language tools maintenance intervention: Improve the quality of translations for Translate extension.Sep 19 2018, 11:53 AM

EBernhardson edited projects, added Discovery-Search; removed Discovery-Search (Current work).Oct 2 2018, 5:41 PM

Pginer-WMF added a project: Language-Team (Language-2018-October-December).Dec 10 2018, 11:40 AM

Pginer-WMF edited projects, added Language-Team (Language-2019-January-March); removed Language-Team (Language-2018-October-December).Dec 27 2018, 11:13 AM

debt moved this task from Up Next to making others happy on the Discovery-Search board.Jan 29 2019, 6:44 PM

Pginer-WMF edited projects, added Language-Team (Language-2019-April-June); removed Language-Team (Language-2019-January-March).Mar 29 2019, 10:02 PM

Pginer-WMF edited projects, added Language-Team (Language-2019-July-September); removed Language-Team (Language-2019-April-June).Jul 9 2019, 1:48 PM

Nikerabbit mentioned this in T228834: Translation memory is slow to load .Jul 24 2019, 8:18 AM

Quiddity subscribed.Sep 5 2019, 9:10 PM

Pginer-WMF edited projects, added Language-Team (Language-2020-January-March); removed Language-Team (Language-2019-July-September).Dec 11 2019, 2:01 PM

Nikerabbit added a subtask: T249906: Translation memory suggestion page sources are sometimes duplicated.May 26 2020, 7:42 AM

Nikerabbit closed subtask T249906: Translation memory suggestion page sources are sometimes duplicated as Resolved.Jun 4 2020, 7:04 AM

If this is still needed, feel free to reopen.

CBogen closed this task as Declined.Aug 6 2020, 7:18 PM

abi_ mentioned this in T255886: Translation suggestions sometimes just won't load.Aug 18 2020, 11:07 AM

This task is part of our annual goals. The plan is that we will have a deep look how this could either be improved incrementally or re-architectured to solve the performance issues.

Nikerabbit merged a task: T255886: Translation suggestions sometimes just won't load.Aug 24 2020, 7:58 AM

Nikerabbit added subscribers: AlexisJazz, Tacsipacsi, Amire80 and 2 others.

CBogen moved this task from making others happy to watching / waiting on the Discovery-Search board.Aug 27 2020, 8:51 PM

In T101236#3037332, @Johan wrote:

I think this is a bigger issue for e.g. Tech News than one would first assume.

We have a couple of items that are specifically designed to be exactly the same every week, to make it easier for the translators – they are more complicated the first time you translate them, because you might need to figure out how to best represent dates in your language, but it will save time and effort in the long run. Or so goes the theory. But if they don't, then you have complicated items where you either have to go back to another issue and copy and paste

I made https://meta.wikimedia.org/wiki/Template:SALT to simplify this. You enter {{subst:SALT}} as the translation and it'll substitute the contents of that section from the issue of the previous week, or the week before that, or fallback content from the subpages of the template.

It's a workaround, but strangely faster than using the translation memory even when the translation memory DOES work. With the translation memory, you:

Have to wait for suggestions (no waiting for SALT)
Check if the top suggestion isn't a moronic one because the suggestions are served in random order (SALT will never insert outdated crap)
Go over to the right, click or tap the suggestion and be frustrated that the suggestion isn't loaded if you only managed to hit the box but not the text (SALT requires no selection of anything)
Go back to the left to publish the translation (never had to leave the left for SALT!)

This would be even better if a single button gadget could insert "{{subst:SALT}}", but I can't seem to figure this one out because the Translate extension seems to generate the form entirely with JS.

This would be even better if a single button gadget could insert "{{subst:SALT}}", but I can't seem to figure this one out because the Translate extension seems to generate the form entirely with JS.

https://gerrit.wikimedia.org/g/mediawiki/extensions/Translate/+/335a7dc01152d96627b3dce1d2104f129b82825d/hooks.txt#120 may help.

Pginer-WMF edited projects, added Language-Team (Language-2020-October-December); removed Language-Team (Language-2020-January-March).Oct 6 2020, 10:50 AM

Nikerabbit added a subtask: T264730: Explain the current translation memory architecture.Oct 6 2020, 10:56 AM

Nikerabbit closed subtask T264730: Explain the current translation memory architecture as Resolved.Oct 29 2020, 2:01 PM

Nikerabbit added a subtask: T267030: Create development and testing environment for translation memory.Nov 2 2020, 3:04 PM

Nikerabbit added a subtask: T267031: Exports and imports for translation memory.Nov 2 2020, 3:07 PM

Nikerabbit added a subtask: T267032: Benchmarking script for translation memory.Nov 2 2020, 3:10 PM

• Rileych added a project: Technical-Debt.Nov 23 2020, 3:27 PM

AlexisJazz mentioned this in T268274: Enable translating from any already translated language in Extension:Translate.Nov 23 2020, 3:47 PM

abi_ closed subtask T267030: Create development and testing environment for translation memory as Resolved.Nov 26 2020, 12:16 PM

For the first time for years, I got a memory suggestion for “Latest tech news from…” unit. Has there any work recently done?
Maybe T308676: Elasticsearch 7.10.2 rollout plan?

In T101236#8226890, @Pols12 wrote:

For the first time for years, I got a memory suggestion for “Latest tech news from…” unit. Has there any work recently done?
Maybe T308676: Elasticsearch 7.10.2 rollout plan?

Indeed we did upgrade the cluster recently but we did not expect any changes to memory translations, we haven't seen noticeable perf improvement overall but perhaps this upgrade had a very positive impact on perf on translation memories (maybe WAND?).

Nikerabbit merged a task: T323856: Tux does not always show 100% matches that should be in translation memory.Dec 11 2022, 8:12 AM

Nikerabbit added a subscriber: Al12si.

Interesting feature request from last merged task:

Suggestions should not be weighted only by number of times used but also by date; recent translations should be given more weight.

	F23374883: translation aids latency percentiles, Apr 6 - Jul 5 2018
	Jul 5 2018, 10:30 PM

TTMServer performance and coverage issuesOpen, MediumPublic1 Estimated Story PointsActions

Description

Details

Related ObjectsSearch...

Event Timeline

TTMServer performance and coverage issues
Open, MediumPublic1 Estimated Story Points
Actions

Related Objects
Search...