Page MenuHomePhabricator

Welcome emails: reserve control group
Closed, ResolvedPublic

Description

We want to reserve a control group of users in the welcome emails experiment who will not receive emails. This will allow us to see whether sending the emails has an impact on key newcomer metrics, like activation, retention, and productivity.

To do this, we should refrain from exporting a random 20% of the users that we would otherwise export. They will be users who have email addresses and opted-in (all the criteria listed in T303780), and therefore will be statistically the same as those who do get emails. In order to do the analysis, we'll need to know exactly which users are in this control group. An open question for the engineers is how to keep track of that.

Event Timeline

Change 773204 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[operations/mediawiki-config@master] GrowthExperiments: Add mailing list question for eswiki

https://gerrit.wikimedia.org/r/773204

Change 775951 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[operations/mediawiki-config@master] GrowthExperiments: Start mailing list campaign on eswiki

https://gerrit.wikimedia.org/r/775951

@MMiller_WMF @nettrom_WMF we are able to define a percentage of users who are assigned to welcome survey question groups. I've made two identical groups with the mailing list, and 80% of users will be assigned to the one called T303240_mailinglist while the other 20% are assigned to T303240_mailinglist_control. The experiment group is then saved in the user's welcomesurvey-responses user property, so we will be able to query that later. Does that sound OK?

Change 775951 had a related patch set uploaded (by Kosta Harlan; author: Kosta Harlan):

[operations/mediawiki-config@master] GrowthExperiments: Start mailing list campaign on eswiki

https://gerrit.wikimedia.org/r/775951

For clarity – I've put a block on that patch until we are ready to actually start the campaign.

@MMiller_WMF @nettrom_WMF we are able to define a percentage of users who are assigned to welcome survey question groups. I've made two identical groups with the mailing list, and 80% of users will be assigned to the one called T303240_mailinglist while the other 20% are assigned to T303240_mailinglist_control. The experiment group is then saved in the user's welcomesurvey-responses user property, so we will be able to query that later. Does that sound OK?

@MMiller_WMF just checking in on this.

In code review, @Urbanecm_WMF pointed out that the practical impact of the patch is:

  1. 80% of newly registered users get the mailinglist question, while 20% of users are in control
  2. eswiki is in the group where 80% of users get Growth features (T289786), while 20% are in control

That means that there would be some percentage of users who get the welcome survey with the mailing list question but do not get the homepage.

If it is important for the mailinglist experiment that, of the users who get the mailinglist, 100% also get the homepage, then the easiest thing to do would be to have eswiki become one of the wikis where homepage is the default experience for 100% of users. Otherwise, it should be straightforward in data analysis to pick out which users got the mailinglist welcome survey and also got the homepage.

@kostajh -- thanks for checking in. The issue that @Urbanecm_WMF pointed out seems like it would be an easy whether or not we have a control group for the emails -- even without a control group, there would be people who don't have the homepage that would then get emails encouraging them to go to their homepage and get started. What experience would those people encounter if they click on the link we created in T304805: Welcome emails: track homepage visits?

And a question about your control group implementation. It sounds like the plan is to randomize who has the option on their welcome survey of opting-in, right? As opposed to giving everyone the option to opt-in and then randomizing which users we actually export to be sent emails?

@kostajh -- thanks for checking in. The issue that @Urbanecm_WMF pointed out seems like it would be an easy whether or not we have a control group for the emails -- even without a control group, there would be people who don't have the homepage that would then get emails encouraging them to go to their homepage and get started. What experience would those people encounter if they click on the link we created in T304805: Welcome emails: track homepage visits?

If we don't change eswiki to a wiki where 100% of new users get Growth features, then there will be some subset of users in T304805 who get the welcome survey but then do not have Special:Homepage enabled and are not directed there after completing the survey.

And a question about your control group implementation. It sounds like the plan is to randomize who has the option on their welcome survey of opting-in, right? As opposed to giving everyone the option to opt-in and then randomizing which users we actually export to be sent emails?

The plan is that there are two groups, T303240_mailinglist and T303240_mailinglist_control. Both groups have the mailing list checkbox item. 80% of users who create an account are placed in the T303240_mailinglist welcome survey group. 20% of users who create an account are placed in T303240_mailinglist_control group. The script we use to export will only export users in the T303240_mailinglist (80%) group. So the 20% of users who are in T303240_mailinglist_control will have seen the mailing list checkbox, but will not be included in exported data. For data analysis purposes, Morten will be able to look at the welcome survey response user property to look at behavior of users in both groups. Does that sound ok?

Thanks for the clarifications, @kostajh.

If we don't change eswiki to a wiki where 100% of new users get Growth features, then there will be some subset of users in T304805 who get the welcome survey but then do not have Special:Homepage enabled and are not directed there after completing the survey.

For these people, given that they won't have Special:Homepage enabled, it sounds like that would mean that if they open their welcome email and click on the link we're creating in T304805, they won't have access to the homepage, and instead they'll see this:

image.png (782×2 px, 604 KB)

If that's what would happen, we definitely don't want any users to experience it. As you said, this would not happen if we take the wiki to 100% for all accounts. I understand why that is the easiest and simplest solution. Just to understand the full range of options, are any of these also appealing solutions?

  • Keep any of the "Growth control group" people out of both the mailinglist and mailinglist_control groups, so that they are not receiving emails, nor are they in the control group. Downside: shrinks the number of people we are emailing.
  • Let the link we generate in T304805 turn the homepage on. Downside: mixes people into the email experiment who had a different experience than the others, of not having gotten brought to the Growth features immediately after account creation.

The plan is that there are two groups, T303240_mailinglist and T303240_mailinglist_control. Both groups have the mailing list checkbox item. 80% of users who create an account are placed in the T303240_mailinglist welcome survey group. 20% of users who create an account are placed in T303240_mailinglist_control group. The script we use to export will only export users in the T303240_mailinglist (80%) group. So the 20% of users who are in T303240_mailinglist_control will have seen the mailing list checkbox, but will not be included in exported data. For data analysis purposes, Morten will be able to look at the welcome survey response user property to look at behavior of users in both groups. Does that sound ok?

Yes, this sounds good. Thank you for explaining.

Change 773204 merged by jenkins-bot:

[operations/mediawiki-config@master] GrowthExperiments: Add mailing list question for eswiki

https://gerrit.wikimedia.org/r/773204

Mentioned in SAL (#wikimedia-operations) [2022-04-06T07:55:58Z] <kharlan@deploy1002> Synchronized wmf-config: Config: [[gerrit:773204|GrowthExperiments: Add mailing list question for eswiki (T303240 T305015)]] (duration: 00m 56s)

Thanks for the clarifications, @kostajh.

If we don't change eswiki to a wiki where 100% of new users get Growth features, then there will be some subset of users in T304805 who get the welcome survey but then do not have Special:Homepage enabled and are not directed there after completing the survey.

For these people, given that they won't have Special:Homepage enabled, it sounds like that would mean that if they open their welcome email and click on the link we're creating in T304805, they won't have access to the homepage, and instead they'll see this:

image.png (782×2 px, 604 KB)

If that's what would happen, we definitely don't want any users to experience it. As you said, this would not happen if we take the wiki to 100% for all accounts. I understand why that is the easiest and simplest solution. Just to understand the full range of options, are any of these also appealing solutions?

  • Keep any of the "Growth control group" people out of both the mailinglist and mailinglist_control groups, so that they are not receiving emails, nor are they in the control group. Downside: shrinks the number of people we are emailing.

Right. I think the export script would need to be modified to check that the user is in the Growth features group when exporting the emails.

  • Let the link we generate in T304805 turn the homepage on. Downside: mixes people into the email experiment who had a different experience than the others, of not having gotten brought to the Growth features immediately after account creation.

Not so straightforward to do, technically, but we do have a longstanding task/request to do that work T269847: Automatically enable the Homepage when a logged-in user visits [[Special:Homepage]]. So maybe it is time for us to tackle that one, and then we don't need to worry about whether the user was in the Growth features control or experiment group when they get the welcome email.

@kostajh and I talked today, and we devised a new solution, which is the one we want to implement in this task.

  • Current state in eswiki: 100% of accounts get Welcome Survey. 80% get Growth features.
  • Desired state for this experiment:
    • 80% of accounts get Welcome Survey. That same 80% get Growth features.
    • Inside the 80% of those who get the Welcome Survey and Growth features, 80% are in the treatment group for emails, and 20% are in the control group.

In other words, only those in the treatment group for Growth features should get the welcome survey when they register (and therefore only those people will end up with the opt-in checkbox). This solves the issue of people getting emails to go to their homepage when they don't have it turned on. The only negative side effect is that the overall Growth experiment in eswiki is tweaked a little vs. other wikis, in that eswiki's control group didn't see the Welcome Survey. I think this is likely a minimal side effect.

MShilova_WMF changed the task status from Open to In Progress.Apr 7 2022, 6:04 PM
MShilova_WMF triaged this task as High priority.

@Tgr will work on this while I'm away.

Change 778998 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@master] Skip welcome surveys for users in the no-homepage control group

https://gerrit.wikimedia.org/r/778998

Change 778998 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] Skip welcome surveys for users in the no-homepage control group

https://gerrit.wikimedia.org/r/778998

Tgr changed the task status from In Progress to Open.Apr 11 2022, 9:36 PM
Tgr moved this task from Code Review to QA on the Growth-Team (Sprint 0 (Growth Team)) board.
  • Desired state for this experiment:
    • 80% of accounts get Welcome Survey. That same 80% get Growth features.
    • Inside the 80% of those who get the Welcome Survey and Growth features, 80% are in the treatment group for emails, and 20% are in the control group.

In other words, only those in the treatment group for Growth features should get the welcome survey when they register (and therefore only those people will end up with the opt-in checkbox). This solves the issue of people getting emails to go to their homepage when they don't have it turned on. The only negative side effect is that the overall Growth experiment in eswiki is tweaked a little vs. other wikis, in that eswiki's control group didn't see the Welcome Survey. I think this is likely a minimal side effect.

I was asked to take a look at this and I'm not sure I understand why we have to align the groups. The configuration that @kostajh proposed in T305015#7831373 allows us to determine whether users should get the email based on their Welcome Survey response. We can also determine if users have access to the Homepage by checking the growthexperiments-homepage-enable user preference. I expect we'll be using a SQL query to export the dataset, which means that excluding those who don't have the Homepage enabled can be done by adding that to the query.

If we keep giving the Welcome Survey to 100% of users, it also means we can compare across groups (email yes/no and Homepage yes/no) to learn more about what the effect of that question on the Welcome Survey is, as well as what the effect of receiving the email is.

@nettrom_WMF -- thanks for looking at this. You're right -- we don't have to align the groups. I suppose we came up with that as an alternative to having to modify the export script to take into account whether the user has homepage enabled. But the way you are proposing sounds good to me, too, if that works for @Tgr. To sum up, here's what I think we're saying. Is this right?

  • Current state in eswiki: 100% of accounts get Welcome Survey. 80% get Growth features.
  • Desired state for this experiment:
    • 100% of accounts get Welcome Survey. 80% get Growth features. This is no change.
    • Independently, 80% of new accounts are in the treatment group for emails, and 20% are in the control group.
    • Only opted-in people who are in the treatment group for emails AND the treatment group for the Growth features are exported for emails.

This means that no one will get an email encouraging them to go to the homepage unless they already have the homepage available. It also means that some people will choose the opt-in checkbox for emails but get no email. We are okay with that.

To sum up, here's what I think we're saying. Is this right?

Yes, this is exactly what I was suggesting! Thanks for summarizing it, I'll try to remember to do something like that next time so it's easier to get an overview.

Tgr changed the task status from Open to In Progress.Apr 18 2022, 11:29 PM
Tgr moved this task from QA to In Progress on the Growth-Team (Sprint 0 (Growth Team)) board.

Change 783908 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@master] Revert "Skip welcome surveys for users in the no-homepage control group"

https://gerrit.wikimedia.org/r/783908

Change 784303 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@master] Email list export: Exclude users for whom Growth features are disabled

https://gerrit.wikimedia.org/r/784303

Change 783908 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] Revert "Skip welcome surveys for users in the no-homepage control group"

https://gerrit.wikimedia.org/r/783908

@Tgr : do you have access to MariaDB's JSON functions in the query builder in MediaWiki? I saw the comment in https://gerrit.wikimedia.org/r/784303 about not being able to query the timestamps. Newer versions of MariaDB makes that a lot easier since they can process the JSON payload for you. I use that to only extract responses rendered or submitted in a specific month in this notebook, see the get_responses function in "Functions to Get and Process Survey Responses". For example:

CAST(json_value(up_value, '$._submit_date') AS CHAR CHARACTER SET utf8)
  REGEXP "^{month}" -- month of submission matches

Where month is added in through Python and is in "YYYYmm" format (e.g. 202203). Thought this might be helpful in case it's available and you'd like to skip doing it all in PHP.

Change 783916 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@wmf/1.39.0-wmf.8] Revert "Skip welcome surveys for users in the no-homepage control group"

https://gerrit.wikimedia.org/r/783916

If we keep giving the Welcome Survey to 100% of users, it also means we can compare across groups (email yes/no and Homepage yes/no) to learn more about what the effect of that question on the Welcome Survey is, as well as what the effect of receiving the email is.

I thought if we deploy the revert now, the original patch won't reach production, but I miscounted :( It's actually in production between April 14-21, so during that time some users did not get the welcome survey. Sorry about that, I hope it won't complicate the analysis too much.

@Tgr : do you have access to MariaDB's JSON functions in the query builder in MediaWiki? I saw the comment in https://gerrit.wikimedia.org/r/784303 about not being able to query the timestamps. Newer versions of MariaDB makes that a lot easier since they can process the JSON payload for you. I use that to only extract responses rendered or submitted in a specific month in this notebook, see the get_responses function in "Functions to Get and Process Survey Responses". For example:

CAST(json_value(up_value, '$._submit_date') AS CHAR CHARACTER SET utf8)
  REGEXP "^{month}" -- month of submission matches

Where month is added in through Python and is in "YYYYmm" format (e.g. 202203). Thought this might be helpful in case it's available and you'd like to skip doing it all in PHP.

Thanks! I think the main problem here is not being able to filter by an index, and we can work around that by iterating via user_registration instead of user_id, and at that point we won't really need the survey submission date anymore (technically, the two could be different, but I don't think there's any disadvantage to using the registration date instead).

Change 783916 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@wmf/1.39.0-wmf.8] Revert "Skip welcome surveys for users in the no-homepage control group"

https://gerrit.wikimedia.org/r/783916

Mentioned in SAL (#wikimedia-operations) [2022-04-19T20:49:00Z] <urbanecm@deploy1002> Synchronized php-1.39.0-wmf.8/extensions/GrowthExperiments/: e152df0: Revert "Skip welcome surveys for users in the no-homepage control group" (T305015) (duration: 00m 55s)

Change 784700 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[mediawiki/extensions/GrowthExperiments@master] Email list export: Speed up

https://gerrit.wikimedia.org/r/784700

Change 784730 had a related patch set uploaded (by Gergő Tisza; author: Gergő Tisza):

[operations/mediawiki-config@master] [beta] Enable Growth campaigns on all beta wikis

https://gerrit.wikimedia.org/r/784730

Tgr changed the task status from In Progress to Open.Apr 20 2022, 4:35 PM
Tgr moved this task from In Progress to Code Review on the Growth-Team (Sprint 0 (Growth Team)) board.

Change 784303 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] Email list export: Exclude users for whom Growth features are disabled

https://gerrit.wikimedia.org/r/784303

Change 784700 merged by jenkins-bot:

[mediawiki/extensions/GrowthExperiments@master] Email list export: Speed up

https://gerrit.wikimedia.org/r/784700

Etonkovidova subscribed.

Re-checked on beta assigning users to different groups. A User will have
"mailinglist":true,"_group":"T303240_mailinglist",
or
"mailinglist":true,"_group":"T303240_mailinglist_control"

ExportWelcomeSurveyMailingListData.php will correctly export users only in "T303240_mailinglist".

Checked in some new accounts in eswiki wmf.10 - the WelcomeSurvey is displayed with the check box for opt-in.