Working on a research paper as a sole author is a tiresome ride. You're not only responsible for all the bright (?) ideas contributing to the results in the paper, you're also the one who has to carry the burden of writing the whole stuff. What is worse, you need to be your own proofreader, reading your own work over and over again to apply little tweaks, without any second opinion. By the time you publish it, you grow quite bored of the whole thing, and sometimes you hope you never have to read it again. In summary, no wonder single-authored papers get special credit in numerous situations.Recently I worked alone on a research question, which culminated in a paper. A single-authored one, with all the drags of it. It was a natural idea to give ChatGPT (more precisely, ChatGPT 5.2 Thinking) a go for a second opinion and tweak ideas. I must say that my impressions are mixed.
The good...
Restricted to the proofreading aspect, the ChatGPT excels. It does not only find typos, but also hunts down terminological and notational inconsistencies, which naturally arise when you rewrite your paper a few times. Collecting all such occurrences pedantically is a great deal of help.
Besides that, as I already mentioned in an earlier post, ChatGPT also fares reasonably well in reviewing the literature. It's error prone to some extent and digs up irrelevant papers, but with patience and careful prompting you can make it find the papers which actually deal with related questions and hence actually matter.
On a final note, while previously I found that ChatGPT struggles to ask meaningful follow-up questions (one of the great benefits of having a couathor!), now upon discussing an incomplete version of my paper with it, it identified a potential further direction which was yet to be included.
The bad...
Even though large language models recently got a fair share of attention for solving Erdős conjectures (which advancements should be taken with a grain of salt), the mathematical capabilities of ChatGPT are still imperfect. Being aware of this, I barely ask actual research questions from it. However, upon passing my draft to it, it identified logical flaws which were not flaws at all. On the one hand, I'm somewhat happy that it's a two-way track then, it does not only believe and produce flawed arguments, it can also misunderstand logically sound ones. However, spotting non-mistakes and making me spend further time with them is not too helpful.
...and the ugly...
Ultimately I finished my paper. The last question a researcher must answer at this point is to which journal it should be submitted. The possibilities are abundant, making this question far from trivial, especially for a young researcher. Not only should the journal fit in its themes, I also have to gauge the value of my paper well. More groundbreaking results belong to more prestigious journals, and naturally I want my paper to be published in the most prestigious venue it can be published. But which one is that? How good is my paper actually? Might AI help me in this as well?
The dire truth is that LLMs are dishonest friends: they prefer to say what I want to hear. Given this, the outcome for my question concerning the quality of my paper was quite predictable. ChatGPT said that my result was very elegant, and deserved to be published in Advances in Mathematics which is precisely after such papers. For the outsiders, Advances in Mathematics is one of the most prestigious journals in general mathematics, being in the top 5% in terms of impact. Knowing that I already had clearly better papers which were nowhere near this level, I found this suggestion quite funny, and prompted ChatGPT to reconsider. It admitted that it had overshot by suggesting a general journal, and came up with alternative suggestions in the top 5 combinatorics journals... I still felt that it's quite off, so I tried an alternative strategy. In a separate thread I copied my manuscript with the prompt that I was asked to referee this paper by the Journal of Combinatorial Theory B (the second highest impact combinatorics journal), I think that it does not belong here. Needless to say, the ChatGPT reassured me that I'm right, the result is neat but too niche for such a journal...
...and the uglier
It felt bad, but I had the impression that I must go further. Again in a separate thread I repeated my prompt but this time with a mediocre journal in combinatorics. Framing the prompt as "I think it does not belong here" the ChatGPT agreed with me, again.
This leads us to the very complex question that what makes mathematical quality. Answering a long-standing questions? Being technically demanding? Being aesthetically pleasing? The answer is an underdefined mix of at least these three aspects.
Currently, general purpose LLMs are only able to satisfyingly measure a paper from the first point of view, and still can be confused by a careful presentation. Of course, I contextualized my paper so that it looks as important for the world of mathematics as it honestly can, and while experts can evaluate it accordingly, an LLM has a hard time of finding the objective truth.
Gauging the technical content of the paper would probably require a deeper understanding of advanced mathematics, a quality which does not seem out of reach but is yet to be attained. The depth of technical content is quite difficult to guess for the nonexpert.
And finally, the most complicated part is clearly the question of aesthetics. No wonder: it's very subjective even among humans, how one could teach objective taste to a computer? What the training data should be? Answering these questions seems to be quite difficult.
Conclusion: find honest peers
The point of this story was not that you should not use LLMs in mathematical research or for writing a paper. They help a lot, period. But make sure you can get second opinion from fellow researchers as well when needed. Make sure you have more honest peers than your favourite digital friend.
Member discussion: