Is Your Writing Tool Training AI on Your Unpublished Research?
Every modern writing tool now has an AI feature. Autocomplete, error fixing, rewriting, summarising. The features are genuinely useful. The question almost nobody asks before turning them on is the one that matters most for a researcher: is your writing tool training AI on your unpublished research? Your half-finished manuscript, your novel method, your unsubmitted grant proposal. If a vendor is feeding that into model training, your competitive edge and your confidentiality obligations are both quietly at risk. This is the under-asked question, why it matters specifically for academics, and how to check a vendor before it is too late.
Why this is different for academics
For most users, “the tool learned from my text” is mildly uncomfortable. For a researcher it can be a real problem.
Consider what is actually in your drafts:
- An unpublished result that competitors would love to see early.
- A methodology you have not yet protected or published.
- Confidential collaborator data, interview transcripts, or human-subjects material covered by ethics approval.
- A grant proposal whose ideas you would prefer not to leak before the funding decision.
Academic value is often front-loaded into the unpublished phase. The whole point of an embargo, a pre-registration, or a careful submission timeline is that when something becomes public is a decision you control. A model trained on your draft erodes that control in ways you cannot reverse. You cannot un-train a model, and you cannot audit what it absorbed.
There is also a plain confidentiality angle. If your manuscript contains personal data, handing it to a model-training pipeline may itself be a processing activity you never authorised, which drags GDPR back into the picture.
How vendors actually use your data (and how they describe it)
The tricky part is the language. Vendors rarely say “we train on your documents.” They say things that sound reassuring while leaving the door open. Learn to read the gap between the marketing and the terms.
| What they say | What it might actually mean |
|---|---|
| “We use your data to improve our services.” | Broad enough to include model training. |
| “We may use content to develop new features.” | Same, dressed differently. |
| “Your data is not shared with third parties.” | Says nothing about their own training. |
| “AI features are powered by [third-party model].” | Your text may leave for an external provider. |
| “We do not train on your data.” | The clear commitment you actually want. |
The pattern to watch is vagueness about purpose. A vendor that wanted to train on your text but also wanted you comfortable would write exactly the soft phrasing in the left column. A vendor that does not train on your text can say so in one unambiguous sentence. The absence of that sentence is itself information.
The other thing to check is whether AI features route your text to a third-party model provider. Even if your editor does not train on your data itself, sending your manuscript to an external API can mean another company processes it under its own terms, and possibly outside the EU. That folds the question back into data residency and Schrems II, which we cover in Schrems II and your academic software stack.
Why “we anonymise it first” is not the reassurance it sounds like
Some vendors soften the training question by saying they only use “anonymised” or “aggregated” content. Be sceptical. Anonymising free text is much harder than anonymising a spreadsheet. A methods section, a distinctive turn of phrase, an unusual combination of variables, these can re-identify a document or its author even with names stripped out. And for training purposes the linguistic content is precisely what gets absorbed, so the parts that make your draft yours are the parts the model keeps. “We anonymise before training” is better than nothing, but it is not the same as “we do not train on your content,” and you should not let the first stand in for the second.
There is also the matter of permanence. If a vendor changes its mind, it can stop training on new data tomorrow. It cannot remove your draft from a model already trained on it. That asymmetry is why the commitment you want is a forward-looking, unconditional “we do not and will not,” not a present-tense “we currently don’t.”
How inscrive handles it
inscrive’s position is short and deliberately unambiguous: inscrive never uses your documents or data to train AI models. Full stop. That is not buried in a sub-clause or reserved for paying customers. It is the baseline.
inscrive does offer AI assistance, on the Pro tier, in a specific and contained form: it suggests fixes for LaTeX compile errors. When your build fails on a stray brace or a missing package, the AI proposes a correction. That feature exists to save you the tedium of decoding cryptic LaTeX error logs. It is not a pipeline that harvests your prose to improve a model. Your research stays your research.
A few things follow from that design:
- Your unpublished work is not training material. There is no path by which your manuscript becomes part of a model other people query.
- Your data stays in the EU. Projects are hosted by Hetzner in Germany and Finland, in ISO 27001-certified data centres, with no third-country transfers. So the “where does my text go” question has a clean answer.
- It is backed contractually. For institutions there is a signed DPA, and there is an independent inspection and audit report behind the security claims. The no-training commitment is part of a broader, documented compliance posture, not a slogan. For the plain-English version of the DPA, see why your LaTeX tool needs a signed DPA.
Worth saying plainly: inscrive is freemium, and the no-AI-training commitment plus EU data residency apply on the Free plan (€0, up to 10 active projects, unlimited collaborators) just as they do on Pro and Organizations. You do not buy your way into privacy here. The AI compile-fix assistance is the Pro feature; the promise not to train on your work is universal.
How to check any vendor in five minutes
You do not need a law degree. You need to read the right two documents and ask the right questions.
- Search the privacy policy and terms for the words “train,” “training,” “machine learning,” and “improve our services.” Read every hit.
- Look for an explicit negative. “We do not use your content to train AI models” is the sentence you want. If it is missing, treat training as possible.
- Check the AI feature’s data flow. Does your text go to a third-party model provider? Where is that provider hosted?
- Find the opt-out, if any. Some tools train by default and let you opt out. Default-on is a meaningful signal about priorities.
- Ask directly. Email support: “Do you use customer documents to train or fine-tune AI models, now or in future?” A crisp “no” in writing is worth keeping.
If a tool cannot give you a clean “no,” you have learned what you needed to know. For a fuller buyer’s checklist on the broader data questions, our compliance overview and the GDPR page go deeper.
The takeaway
AI writing features are useful, and you do not have to swear them off. You do have to know whether the convenience is paid for with your unpublished research. The vendors worth trusting say “we do not train on your data” in one plain sentence and keep your text in a jurisdiction you can point to on a map. inscrive’s answer is exactly that: no training on your work, ever, with everything hosted in the EU. The rest is reading the policy closely enough to catch the soft phrasing that hopes you won’t.
inscrive.io never trains AI on your documents, keeps your research in EU data centres, and still gives you AI help where it counts: fixing LaTeX compile errors. Start writing, it’s free, and read the GDPR page for the details.




