We have heard too much about the word big data, but few people realize that with the development of the Internet, what kind of "big" word does this mean? With the exception of a few companies that use the term, many companies that truly own big data, handle and handle the numbers that ordinary people may not have access to during their lifetimes: the WeChat circle of friends uploads 1 billion images each day. Alipay's daily transaction value peaked at more than RMB 20 billion. Jingdong uploaded millions of new product information maps every day...
These numbers are great news for artificial intelligence algorithms that require data training. It also means that the importance of data for artificial intelligence is rising linearly with the development of computing power and algorithms. But how do you filter out the vast amounts of data that are really useful to us? How can we make decisions that are beneficial to ourselves by analyzing these data? This is what data scientists are doing.
In this issue, we invited the chief scientist of iPIN, Pan Yi, who received a Ph.D. in science from Sun Yat-sen University at the end of 2004. He was in Hong Kong University of Science and Technology from February 2005 to August 2007 and August 2007. ~September 2009 Hewlett-Packard Laboratories in the United States conducted research on data mining and artificial intelligence. In October 2009, he passed the 100-person plan to enter Sun Yat-sen University and worked in the Department of Computer Science. Since 2014, he has been the chief scientist of iPIN.
In 2005, Dr. Pan Wei participated in the Data Mining International Competition organized by the American Computer Society (ACM) (KDDCup: the world's most important competition in data mining every year). The theme of the game that year was the classification of search engine queries. The first place to get all three projects (including the accuracy, performance, and innovation of the query classification algorithm). Has been granted two US patents. Has published more than 20 academic papers in world-class academic conferences, periodicals and magazines in related fields, including Artificial Intelligence, IEEE Transactions on Knowledge Discovery and Data Engineering, ACM Transactions on Information Systems, AAAI, IJCAI, ACM SIGKDD, UAI , ICDM and so on. He is a reviewer (programme committee) of several magazines and conferences, including IEEE Transactions on Knowledge Discovery and Data Engineering, IEEE/ACM... AAAI, IJCAI, ICDM, WSDM, CIKM, ECML, ACML, BMWT, AAIM, PRICAI. WI, WINE and so on.
If you also want to communicate with nearly ten million elite students, and want to become our chief scientist in the industry, please send an email to
After â–Ž circles you graduate and go to the Hong Kong University of Science and Technology, what your core research interests are?
Here I would like to thank my doctoral advisors Professor Li Lei and Professor Yao Zhengan. Although they are not originally in the field of data mining and machine learning, they are very grateful to me for their tolerance and practical guidance. My doctoral research direction is based on kernel (Kernel) machine learning algorithm research. Later, at Hong Kong University of Science and Technology, under the guidance of teacher Yang Qiang, I applied the nuclear method to case-based reasoning.
At the same time, we are also doing search engine query classification; and in cooperation with NEC, we have studied semi-supervised sequence relational learning algorithms and applied them to indoor wireless positioning. Among them, the problem of query classification comes from the needs of large companies such as Google, Baidu, Yahoo, and Microsoft as search engines. The goal is to increase the accuracy of ad placement and the quality of search results. In the HKUST's experience, I would like to thank Professor Yang Qiang for his guidance and assistance. During that time, I exercised and improved my research skills in data mining and machine learning, such as selecting directions and discovering research points. Essay writing skills.
He went to HP Labs later. Why get tickets?Fortunately at HKUST we participated in the 2005 ACM KDDCup competition and won the first of all three. It did have a great impact on my research direction and results, and as a result I was given an opportunity to work in an HP lab later.
What is your core research direction in HP Labs?My project to participate in HP Labs was a research project called chameleon project, which was actually a personalized recommendation algorithm. At the time of the PC era, one of the five personal computers in the global market was produced by HP. In the United States, this share 1/4 is even more, as long as the user permits, similar to today's mobile Internet, HP can collect various behavioral data of the user on the PC, and then provide personalized recommendation services for the user, and thus better The user experience was improved. At that time, the data used by the recommended system algorithm was mainly user scoring data. That is, after users consumed a certain product or service, they scored and recommended the system to work effectively. During my participation in the chameleon project, we found that most of the user behaviors lacked the process of scoring . This is actually very reasonable. Many people do not necessarily score when they spend or experience, so I propose how to score without the user. Next, we can still make recommendations. Later, I proposed the One-Class Collaborative Filtering (OCCF) algorithm and published it on ICDM'08. Afterwards, in order to solve the problem of computational efficiency, I also proposed a new OCCF acceleration algorithm, which was KDD'. 09 accepted.
The experience of working in HP Labs further strengthened my ability and confidence to enter relatively unfamiliar research areas, including: ability to solve application problems, basic mathematics, algorithm analysis ability, and engineering ability.
Why did you choose to go to school after returning from HP?One is the reason for my own personality: I like to study independently and autonomously. I also don't like to deal with complex human relationships. Another is the reason for the family and work place. There is also a relaxed and free academic atmosphere of Sun Yat-sen University, good research conditions and environment, and the students are also very good.
How has the experience of teaching at CUHK affected and helped the work of iPIN’s chief scientist?After returning to Sun Yat-sen University, combined with my past experience in HKUST and HP Labs, my research in the laboratory focused on collaborative filtering, information detection, and natural language processing. Over the past few years, I have further accumulated my research experience in related fields. At the same time, I have also accumulated some experience in selecting and developing students. I still remember that I was just entering graduate school and my tutor, Professor Yao Zhengan, told us this sentence: " There are no bad students, only bad teachers ."
At that time, I thought that Yao was so dare to say, but he really helped me very much. At the same time, our doctoral students did not have the same research direction. I always remember when I was a teacher. To use his words to ask himself, exactly, I was doing personalized algorithm research. Therefore, I hope to teach students in accordance with their aptitude. This exercises my ability in personnel training.
At Sun Yat-Sen University, I mainly talked about courses such as database, data mining, and information retrieval. In conjunction with my research direction, I hope to add some new content to some classes every year. I hope that students will learn more advanced knowledge, and at the same time It helps me comb my own research.
When an academic fan like Yan Shuicheng in the production industry talked about his recent transfer from the academic world to the industrial world, he was embarrassed. When you entered the industry in 2014, you became a CDO of a startup company. Where did the determination come from?In fact, I myself am equally embarrassed, but because of my own experience in front of some of the reasons, like to solve practical problems, or to solve practical problems-oriented. In addition, in the current environment, the real problem encountered in universities is that the industry has real big data and more real application problems.
When did you first feel that Title, the industry’s chief scientist, was a Title? Because of which landmark event?To be honest, I don't think I'm qualified for this title. Maybe this issue should be said by my partner or in the future. Thank you!
What is different in the academic world and in the industry? Where is the same?My current direction is data mining, machine learning, natural language processing, and rely heavily on large-scale data. In terms of issues raised, the industry is more practical and direct; academics pay more attention to basic research.
In terms of problem solving: the industry will pay more attention to the balance between the cost and effectiveness of the solution. The academic community will be more involved in the innovation of algorithms.
From young scientists in academics to chief scientists in startup companies, what difficulties have they encountered during their growth? In fact, it does not necessarily mean the process of growth. It may be how to adapt to the change of roles.
The first is to have a good partner, we must obtain a more consistent view of character, values, and at the same time, there must be complementary roles. In addition, in universities, it is mainly to cultivate talents and send them to the society. For example, in the iPIN, the formation of a team is a key task, and it must be accompanied by talent selection and talent development. At the same time, as mentioned above, companies are very cost-conscious (including: money, manpower, and time). Your task is not just to send papers. It is more important to consider whether your solution can be landed or not .
What is the requirement for a chief scientist for a real company?I can't reach it right now. I think first of all, there must be sufficient depth and breadth of personal academic standards. At the same time, we must continue to learn and understand practical application problems. And can be transformed into machine learning problems, propose, filter a variety of solutions to the problem, but also to understand the academic, industrial development trends, insight into the future of research, technology development direction, have their own judgment, layout in advance.
As a researcher in the direction of big data, how do you choose the direction of the industry?First of all, I would like to briefly introduce some products of iPIN:
Perfect voluntary, HaoHR and compass. Perfect Volunteer is a self-employed voluntary application that can customize volunteer programs for college students and understand employment prospects in advance. Through the analysis of the past 40 million college students using exclusive database and innovative algorithms to help candidates choose science and more efficient and appropriate universities and professions.
haoHR is a product that smartly matches more suitable resumes and releases HR resume selection work. The use of semantic analysis to interpret job-requirements intelligent talent portraits helps HR find more talents with similar experience in job description in a short time. This simplifies the search and selection of HR résumés and allows HR to spend more time on more valuable work.
Compass is a product that automatically matches opportunities and makes career plans based on the user's work experience. Through artificial intelligence semantic analysis technology to deeply interpret the past experience of job seekers, to help job seekers find more and better job opportunities in a comprehensive, accurate, and rapid manner. Use Big Data to analyze hundreds of millions of careers and market trends to help job seekers make timely decisions in a professional direction.
These are the three products. In fact, although these products are very different in product form, their cores are the analysis and excavation of workplace talent data . In 2013, we established the use of talent data to build First Chinese economic map. After the preliminary completion in 2014, we continued to develop related products, such as the above examples.
Please take a concrete example as an example. What level of data mining can you really create value? ————How these data have been collected, aggregated, framed, and machine learning, natural language processing, complex data analysis, predictive models, large-scale computing, visualization, data applications and other steps have become valuable data for end users.I specifically talk about perfectionism. It is the most widely used college entrance examination volunteering tool in 2015. It has tailor-made voluntary programs and knows in advance the detailed employment information of over 100,000 colleges and universities in more than 2,500 universities across the country. It is called by many users. Entrance examination volunteer artifact. Perfect Volunteer is created by the team of scientists of artificial intelligence company iPIN. It adopts the principle of voluntarily filling in gold and adopts the following five steps: “Admission probability prediction - personal preference screening - personality career matching - employment prospect analysis - voluntary strategy selection†. It is the real needs of the users that we have obtained after analyzing according to the user's research situation. To achieve these, the Perfect Volunteer product can help college students and parents to choose volunteers more scientifically and reasonably, and truly navigate their dreams.
Is it a set of methodologies common or has some exclusive experience?First of all, you must be familiar with or have similar experience, have relevant data, have a market, have a moderate degree of competition (the education market and market competition must have a great price), and you must learn to follow the trend, because in the process of actual entrepreneurship There are occasional factors that will need to be changed.
Student Question: How to come up with an algorithm. I often see the paper will find some very clever algorithms, in addition to considering the application scenarios of the algorithm, but also want to know how an algorithm was born?In fact, this may be a long process. We often say that we have taken a small step on the shoulders of giants. First of all, we have to climb on the shoulders of the giants and climb up. It takes only a small step to make sense. So we need to understand the past work in one (or some) field, and what are the major and influential researchers in the field. What are the top conferences and periodicals? Now that the Internet is so advanced, it should not be too difficult to understand them. There are a large number of books and documents that have gone through the past that we need to read, absorb, and digest.
This process involves selecting directions and discovering research points (or questions). This can train this ability in a large number of essay readings. Not only do you have to learn the advantages of the literature, but you must also be used to questioning them . Which assumptions are not suitable and which algorithms have room for improvement
In addition, in the process, you should also put forward your specific research questions in combination with practical applications. After discovering the problem (the same as climbing on the shoulders of giants, you are ready to take a small step ). Usually, we think that the process of resolving the algorithm and verifying the validity of the algorithm is relatively easy. Of course, there are a lot of details involved here, and we will not expand it. If you are interested, you can refer to Eamonn Keogh's 2012 KDD. A tutorial on how to do data mining research.
Student questioning: Data mining is not currently a well-defined discipline. If you choose this direction, what courses should be required? What courses are you taking?Required: (pre-course courses include program design, data structure and algorithm, composition principle, computer network, etc.) Database, probability and statistics, machine learning and pattern recognition.
Optional: GPU/Parallel Computing, Data Warehousing, Data Visualization, Deep Learning, Business Intelligence (BI), Community Intelligence (CI), some courses for different application areas such as information retrieval, NLP, speech, and images.
ceramic cap
ceramic cap
YANGZHOU POSITIONING TECH CO., LTD. , https://www.yzpst.com